MLOps quản lý lifecycle ML classic (train → deploy → monitor) với mô hình in-house, feature pipeline, model registry. LLMOps kế thừa nhưng phải giải quyết 5 đặc thù của LLM:
1. Model ownership: LLM thường là API bên thứ 3 (OpenAI, Anthropic) → quan tâm rate limit, latency p99, version deprecation, data privacy; nếu self-host cần GPU serving (vLLM, TGI).
2. Prompt là artifact chính: cần prompt versioning, A/B test, rollback — tools: PromptLayer, Langfuse, Braintrust.
3. Evaluation khó hơn: output text tự do, không có accuracy đơn giản → kết hợp automated metric (BLEU/BERTScore) + LLM-as-judge + human eval + golden dataset regression suite.
4. Chi phí theo token + non-determinism: pricing per input/output token; temperature>0 → cùng prompt ra kết quả khác → test phải multi-seed + statistical. Cần cost observability ($/request, $/user).
5. Observability + Safety mới: trace toàn chain (agent steps, RAG, tool calls) — LangSmith/Langfuse/Arize Phoenix; thêm guardrails (PII redact, prompt injection, content filter) không có trong MLOps classic.
MLOps manages classic ML lifecycles (train → deploy → monitor) with in-house models, feature pipelines, and model registries. LLMOps inherits from MLOps but must handle 5 LLM-specific issues:
1. Model ownership: LLMs are usually 3rd-party APIs (OpenAI, Anthropic) → concerns are rate limits, p99 latency, version deprecation, data privacy; self-hosting needs GPU serving (vLLM, TGI).
2. The prompt is the key artifact: need prompt versioning, A/B testing, rollbacks — tools: PromptLayer, Langfuse, Braintrust.
3. Harder evaluation: free-form text output, no simple accuracy → combine automated metrics (BLEU/BERTScore) + LLM-as-judge + human eval + golden dataset regression suite.
4. Token-based cost + non-determinism: pricing per input/output token; temperature>0 → same prompt, different outputs → tests need multi-seed + statistical comparison; track cost observability ($/request, $/user).
5. New observability + safety layer: trace the full chain (agent steps, RAG, tool calls) — LangSmith/Langfuse/Arize Phoenix; add guardrails (PII redaction, prompt injection defense, content filtering) absent in classic MLOps.