AI Engineer vs ML Engineer: khác biệt, skill set, khi nào cần từng vai trò?

Machine Learning Engineer (truyền thống):
- Build/train model từ data: feature engineering, train classical ML (XGBoost, scikit) hoặc deep learning (PyTorch, TF).
- Responsible cho pipeline MLOps: data ingestion, feature store, model registry, training pipeline, serving.
- Cần kiến thức sâu: linear algebra, calculus, statistics, optimization, architecture deep learning.
- Workflow: vấn đề business → data collection → feature engineering → model selection → train/eval → deploy → monitor drift → retrain.
- Artifact chính: model weights + training pipeline.

AI Engineer (post-LLM, emerged 2023+):
- Build app dùng pre-trained foundation model (LLM, VLM, diffusion) từ provider hoặc open-source.
- Không train model from scratch; focus vào prompt engineering, RAG, fine-tuning (PEFT), agents, eval, system integration.
- Cần: strong software engineering + practical knowledge về LLM behavior, prompt craft, evaluation, cost/latency optimization, guardrails.
- Workflow: business problem → prompt + RAG design → eval → deploy → monitor quality/cost → iterate prompt.
- Artifact chính: prompt + retrieval pipeline + agent orchestration, model là commodity.

Overlap: cả hai dùng python/pytorch, cần hiểu metrics và deployment. Nhiều MLE chuyển thành AIE.

Khác biệt cốt lõi:

	ML Engineer	AI Engineer
Model origin	Train from scratch / domain	Use pre-trained foundation
Data need	Large labeled dataset	Small eval set + RAG corpus
Core skill	Math, ML theory, MLOps	Prompt, RAG, system design, LLM ops
Debugging	Feature, training dynamics	Prompt, retrieval, hallucination
Cost	Training compute	Inference tokens
Failure mode	Model accuracy drop, drift	Hallucination, jailbreak, cost spike
Typical stack	PyTorch, Kubeflow, MLflow	LangChain/LlamaIndex, vector DB, LLM API

Khi cần MLE:
- Task cần model chuyên biệt chưa có pre-trained (fraud, recommendation, forecasting, CTR).
- Dataset proprietary lớn, cần custom model.
- Regulated domain cần interpretability (logistic regression > black box).
- Edge/embedded cần model nhỏ custom.
- Classical tasks: time series, tabular, CV ngành hẹp.

Khi cần AIE:
- Task NLP/text generation/chatbot/Q&A/summarization → RAG + LLM.
- Code generation/review, doc processing, search.
- Agent tự động hoá workflow.
- Multi-modal (VLM, ảnh → text).
- Fast prototyping business feature AI — time to market quan trọng.

Org thực tế:

Start-up / mid-size product: 80-90% nhu cầu giờ là AIE (dùng API). Thuê MLE khi cần custom model.
Tech giant / research lab: cần cả hai, phân tầng. Research scientist train foundation → ML engineer productionize → AI engineer build feature trên đó.
Team role thường gộp ở công ty nhỏ — "ML Engineer" làm cả hai, gọi tên "AI/ML Engineer" phổ biến.

Skill path để transition ML → AI engineer (phổ biến 2024-2025):
1. Hiểu transformer architecture ở mức đủ (không cần train from scratch).
2. Thành thạo prompt engineering, few-shot, CoT.
3. Build RAG end-to-end (chunking, embedding, vector DB, reranker).
4. Fine-tune với PEFT (LoRA).
5. Eval framework (RAGAS, LLM-judge).
6. LLMOps (LiteLLM, Langfuse, cost/latency opt).
7. Agent pattern (ReAct, tool use, MCP).
8. System design AI app end-to-end.

Thách thức ở từng vai trò:
- MLE: model drift, retraining cost, distribution shift.
- AIE: prompt brittleness, hallucination, provider dependency, cost unpredictability, jailbreak.

Xu hướng 2025+: ranh giới mờ dần. Nhiều AIE fine-tune model (LoRA); nhiều MLE phải serve LLM. Công ty tuyển "AI Engineer" thường mong cả hai.

Machine Learning Engineer (traditional):
- Builds/trains models from data: feature engineering, training classical ML (XGBoost, scikit) or deep learning (PyTorch, TF).
- Owns the MLOps pipeline: data ingestion, feature store, model registry, training pipeline, serving.
- Needs deep knowledge: linear algebra, calculus, statistics, optimization, deep learning architectures.
- Workflow: business problem → data collection → feature engineering → model selection → train/eval → deploy → drift monitoring → retrain.
- Main artifact: model weights + training pipeline.

AI Engineer (post-LLM, emerged 2023+):
- Builds apps using pre-trained foundation models (LLM, VLM, diffusion) from providers or open source.
- Doesn't train models from scratch; focuses on prompt engineering, RAG, PEFT fine-tuning, agents, evaluation, system integration.
- Needs: strong software engineering + practical LLM behavior knowledge, prompt craft, evaluation, cost/latency optimization, guardrails.
- Workflow: business problem → prompt + RAG design → eval → deploy → monitor quality/cost → iterate prompt.
- Main artifact: prompt + retrieval pipeline + agent orchestration; the model is a commodity.

Overlap: both use Python/PyTorch and need deployment metrics knowledge. Many MLEs transition into AIE.

Core differences:

	ML Engineer	AI Engineer
Model origin	Train from scratch / domain	Use pre-trained foundation
Data needs	Large labeled dataset	Small eval set + RAG corpus
Core skills	Math, ML theory, MLOps	Prompt, RAG, system design, LLM ops
Debugging	Features, training dynamics	Prompt, retrieval, hallucination
Cost	Training compute	Inference tokens
Failure modes	Accuracy drop, drift	Hallucination, jailbreak, cost spikes
Typical stack	PyTorch, Kubeflow, MLflow	LangChain/LlamaIndex, vector DB, LLM API

When to hire MLE:
- Task requires a custom model with no pre-trained equivalent (fraud, recommendations, forecasting, CTR).
- Large proprietary dataset needing a custom model.
- Regulated domain requiring interpretability (logistic regression > black box).
- Edge/embedded needing tiny custom models.
- Classic tasks: time series, tabular, narrow-domain CV.

When to hire AIE:
- NLP / text gen / chatbot / Q&A / summarization → RAG + LLM.
- Code generation/review, doc processing, search.
- Agents automating workflows.
- Multi-modal (VLM, image → text).
- Rapid prototyping of AI features — time-to-market matters.

Real-world org patterns:

Startup / mid-sized product: 80–90% of needs today are AIE (API-based). Hire MLE only when custom models are required.
Tech giant / research lab: needs both, layered. Research scientists train foundations → MLEs productionize → AIEs build features on top.
Role often merged at small companies — "ML Engineer" does both, hence the popular "AI/ML Engineer" title.

Skill path to transition ML → AI engineer (common in 2024–2025):
1. Understand transformer architecture enough (don't need to train from scratch).
2. Master prompt engineering, few-shot, CoT.
3. Build end-to-end RAG (chunking, embedding, vector DB, reranker).
4. PEFT fine-tuning (LoRA).
5. Eval frameworks (RAGAS, LLM judge).
6. LLMOps (LiteLLM, Langfuse, cost/latency optimization).
7. Agent patterns (ReAct, tool use, MCP).
8. End-to-end AI app system design.

Role-specific challenges:
- MLE: model drift, retraining cost, distribution shift.
- AIE: prompt brittleness, hallucination, provider dependency, unpredictable cost, jailbreaks.

2025+ trend: the line is blurring. Many AIEs fine-tune (LoRA); many MLEs serve LLMs. "AI Engineer" job postings typically expect both.

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering