Nhiều use case RAG cần filter theo metadata kèm similarity search: "tìm document chỉ thuộc product X, ngôn ngữ VN, sau 2024". Có 3 chiến lược cơ bản:
1. Post-filtering (filter sau)
- Flow: ANN search top-K trong toàn bộ index → filter metadata sau.
- Ưu: đơn giản, không đụng index.
- Nhược: nếu filter loại phần lớn → top-K có thể empty/quá ít. Phải tăng K lên nhiều (ví dụ K=500 để lọc còn 10) → chậm, tốn memory.
- Dùng khi filter loại bỏ < 20% corpus.
2. Pre-filtering (filter trước)
- Flow: query metadata index trước → subset vectors → chỉ ANN search trong subset.
- Ưu: chính xác, không miss.
- Nhược: phá ANN performance. HNSW traversal dựa trên graph; filter trước làm graph thưa → có thể "unreachable neighborhood" → recall drop. Với subset rất nhỏ (< 1% corpus) thường fallback brute force.
- Dùng khi filter loại bỏ > 80% corpus.
3. Filtered HNSW / hybrid (state-of-art)
- Kỹ thuật tích hợp filter VÀO ANN traversal: khi đi qua graph, chỉ xét node pass filter; nếu đủ neighbor pass → output, không đủ → expand thêm.
- Giữ recall cao + filter đúng, tránh case edge của pre/post.
- Qdrant gọi là payload-based filtering, Weaviate là filtered vector search, pgvector + ivfflat/hnsw từ v0.7 hỗ trợ filter trong traversal.
Kỹ thuật hỗ trợ:
- Metadata index — index riêng (B-tree, bitmap, inverted) cho field filter phổ biến (tenant_id, date, category) → pre-filter nhanh.
- Hybrid index / Partitioning — chia vector thành multiple collections/namespaces theo filter phổ biến (1 collection per tenant, per language, per year). Query đúng collection, không cần filter global.
- Pre-filter selectivity estimation — DB ước lượng % vector pass filter → chọn chiến lược động (thấp → pre, cao → post, giữa → filtered HNSW).
Performance tips thực tế:
1. Cardinality thấp (tenant_id với 100 tenant) → partition theo tenant, không filter runtime.
2. Cardinality cao + loại nhiều (user_id với 10M user, mỗi query chỉ 1 user) → filtered HNSW + metadata index.
3. Range filter (date > X) → cẩn thận selectivity thay đổi theo query; dùng filtered HNSW.
4. Composite filter (AND của nhiều field) → selectivity tích; có thể rất thấp → fallback brute force subset.
5. Luôn measure recall sau khi thêm filter; recall có thể drop 10-30% nếu config sai.
Multi-tenancy là use case điển hình: Pinecone namespace, Qdrant collection shards, Weaviate multi-tenancy object. Tách tenant thành unit indexing → filter free, bảo mật tốt hơn (không lẫn data).
Many RAG use cases combine similarity search with metadata filters: "find documents only in product X, language VN, after 2024". Three core strategies:
1. Post-filtering
- Flow: ANN search top-K over the full index → filter by metadata afterward.
- Pros: simple, index-agnostic.
- Cons: if the filter drops most results → top-K may end up empty/too small. You must oversample K (e.g. K=500 to yield 10) → slow, memory-heavy.
- Use when the filter drops < 20% of the corpus.
2. Pre-filtering
- Flow: query the metadata index first → vector subset → run ANN only over that subset.
- Pros: accurate, no misses.
- Cons: breaks ANN performance. HNSW traversal relies on the graph; pre-filtering sparsifies it → "unreachable neighborhoods" → recall drops. For very small subsets (< 1% corpus), falls back to brute force.
- Use when the filter drops > 80% of the corpus.
3. Filtered HNSW / hybrid (state-of-the-art)
- Integrates filters INTO ANN traversal: while walking the graph, only consider nodes passing the filter; if not enough neighbors pass → expand further.
- Keeps high recall + correct filtering, avoiding pre/post edge cases.
- Qdrant calls it payload-based filtering, Weaviate is filtered vector search, pgvector with ivfflat/hnsw from v0.7 supports in-traversal filtering.
Supporting techniques:
- Metadata index — separate index (B-tree, bitmap, inverted) for common filter fields (tenant_id, date, category) → fast pre-filter.
- Hybrid index / Partitioning — split vectors into multiple collections/namespaces per common filter (1 per tenant, per language, per year). Query the right collection, skip runtime filter.
- Pre-filter selectivity estimation — the DB estimates % passing → picks strategy dynamically (low → pre, high → post, middle → filtered HNSW).
Real-world performance tips:
1. Low cardinality (tenant_id with 100 tenants) → partition by tenant, no runtime filter.
2. High cardinality + selective (user_id across 10M, one per query) → filtered HNSW + metadata index.
3. Range filter (date > X) → selectivity varies per query; use filtered HNSW.
4. Composite AND → selectivity multiplies; can go very low → fall back to brute-force subset.
5. Always measure recall after adding filters; recall can drop 10–30% with the wrong config.
Multi-tenancy is the canonical case: Pinecone namespaces, Qdrant collection shards, Weaviate multi-tenancy objects. Partitioning tenants as indexing units → free filtering and better security (no cross-tenant leakage).