Research Engineer Large Language Model (LLM) Pretraining
Mô tả công việc
Key Responsibilities
Data Engineering for Pretraining
• Process large- scale Vietnamese and multilingual datasets.
• Build and maintain scalable pipelines for text collection, cleaning, deduplication, filtering, and quality scoring.
• Develop automated dataset validation and quality assurance tools.
• Implement tokenization workflows, corpus sharding, mixture sampling, and dataset balancing.
Model Training & Optimization
• Conduct model fine- tuning, instruction tuning, and alignment if needed.
• Run full- scale LLM experiments and troubleshoot training issues.
• Support distributed training of LLMs using DeepSpeed, Megatron- LM, FSDP, or similar.
• Optimize throughput, memory efficiency, and multi- node GPU performance.
Infrastructure & Engineering
• Work with multi- GPU/multi- node clusters using Slurm, Docker/Singularity.
• Develop reusable tools for logging, checkpointing, and evaluations.
• Maintain experiment tracking pipelines.
Evaluation & Benchmarking
• Prepare and maintain Vietnamese and multilingual benchmark suites.
• Analyze results to guide improvements.
• Implement automated evaluation pipelines.
Cập nhật gần nhất lúc: 2025-11-16 02:15:02
CÔNG TY CỔ PHẦN VINSMART FUTURE
Bí kíp tìm việc an toàn
Tiện ích hỗ trợ bạn
Việc làm đề xuất liên quan
Hiện tại chúng tôi chưa có việc làm đề xuất phù hợp với bạn.








