Agent memory: short-term vs long-term khác nhau thế nào?

Agent cần nhiều loại memory với vai trò khác nhau:

1. Short-term / Working memory — conversation history trong session hiện tại, lưu trực tiếp trong context window. Giới hạn bởi context size; khi quá dài phải xử lý:
- Sliding window: giữ N turn gần nhất.
- Summarization: LLM tóm tắt history cũ thành 1 đoạn.
- Token truncation: cắt theo token count.

2. Long-term memory — persist qua nhiều session, lưu ngoài (DB/vector store). Hai dạng:
- Semantic memory: kiến thức, sự kiện ("user tên là An, thích Python"). Lưu dưới dạng key-value hoặc vector.
- Episodic memory: các sự kiện/conversation cụ thể đã xảy ra. Retrieve bằng vector search khi relevant.

3. Procedural memory — cách thực hiện task (system prompts, learned workflows). Thường là prompt templates versioned.

Implementation patterns:
- Mem0, Zep, Letta (MemGPT) — framework quản lý memory tiered.
- Extract → Store → Retrieve pipeline: sau mỗi turn, LLM extract fact quan trọng → lưu vector DB với metadata (user_id, timestamp, type); turn sau retrieve top-K dựa trên query hiện tại.
- Reflection: định kỳ LLM review history, consolidate/deduplicate, forget thông tin cũ không còn đúng.

Challenges: staleness (memory outdated), contradiction (thông tin mâu thuẫn), privacy (PII trong memory), scale (per-user memory cost). Production cần TTL, versioning, explicit "forget" endpoint cho GDPR.

Agents need multiple memory types with different roles:

1. Short-term / Working memory — conversation history in the current session, lives in the context window. Bounded by context size; when too long, handle via:
- Sliding window: keep the last N turns.
- Summarization: LLM compresses old history into a paragraph.
- Token truncation: cut by token count.

2. Long-term memory — persists across sessions, stored externally (DB/vector store). Two forms:
- Semantic memory: knowledge/facts ("user's name is An, likes Python"). Stored as key-value or vectors.
- Episodic memory: specific past events/conversations. Retrieved via vector search when relevant.

3. Procedural memory — how to perform tasks (system prompts, learned workflows). Usually versioned prompt templates.

Implementation patterns:
- Mem0, Zep, Letta (MemGPT) — frameworks for tiered memory.
- Extract → Store → Retrieve pipeline: after each turn, the LLM extracts important facts → stores in vector DB with metadata (user_id, timestamp, type); next turn retrieves top-K based on current query.
- Reflection: periodically the LLM reviews history, consolidates/deduplicates, forgets outdated info.

Challenges: staleness, contradictions, privacy (PII in memory), scale (per-user cost). Production needs TTL, versioning, explicit "forget" endpoint for GDPR.

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering