Requirements: upload/download/sync files, share với others, version history, ~1B users, ~10 exabytes storage.
- Chunking: chia file thành chunks (4-8MB), mỗi chunk được hash (SHA-256) để detect duplicates (deduplication) và support delta sync (chỉ upload chunks thay đổi).
- Upload flow: Client chunker → tính hash của mỗi chunk → gửi chunk hashes lên server → server trả lại chunks nào cần upload → client upload missing chunks lên Blob Storage (S3) → server ghi metadata vào DB.
- Metadata DB: lưu file tree structure, ownership, permissions, version history – dùng RDBMS (MySQL) cho ACID và complex queries.
- Blob Storage: S3-compatible object storage cho raw file chunks.
- Deduplication: nếu hai users upload cùng file, chỉ lưu một bản vật lý, tham chiếu từ nhiều users – tiết kiệm storage đáng kể.
- Sync service: khi file thay đổi trên device A → upload delta chunks → notify other devices qua WebSocket/long polling → devices download changed chunks.
- Conflict resolution: last-write-wins hoặc tạo conflict copy như Dropbox.
Bandwidth optimization: client-side deduplication và delta sync giảm upload data tới 90%. Permission model: owner, editor, viewer; sharing links với expiry. CDN cho download popular files. File metadata search dùng Elasticsearch.
Requirements: upload/download/sync files, sharing with others, version history, ~1B users, ~10 exabytes of storage.
- Chunking: split files into chunks (4–8MB each); each chunk is hashed (SHA-256) to detect duplicates (deduplication) and support delta sync (only upload changed chunks).
- Upload flow: Client chunker → compute hash of each chunk → send chunk hashes to the server → server responds with which chunks are missing → client uploads missing chunks to Blob Storage (S3) → server writes metadata to the database.
- Metadata DB: stores file tree structure, ownership, permissions, and version history — use an RDBMS (MySQL) for ACID guarantees and complex queries.
- Blob Storage: S3-compatible object storage for raw file chunks.
- Deduplication: if two users upload the same file, store only one physical copy and reference it from multiple users — significant storage savings.
- Sync service: when a file changes on device A → upload delta chunks → notify other devices via WebSocket/long polling → devices download the changed chunks.
- Conflict resolution: last-write-wins or create a conflict copy (like Dropbox).
Bandwidth optimization: client-side deduplication and delta sync can reduce uploaded data by up to 90%. Permission model: owner, editor, viewer; sharing links with expiry. CDN for downloading popular files. File metadata search powered by Elasticsearch.