Data Partitioning là chia một large table/dataset thành nhiều phần nhỏ hơn để cải thiện performance và manageability. Horizontal Partitioning (Sharding): chia rows – ví dụ users 1-1M vào partition 1, 1M-2M vào partition 2; mỗi partition có cùng schema. Vertical Partitioning: chia columns – ví dụ tách blob/text columns ít được đọc ra table riêng để hot data nằm cùng nhau trên disk, cải thiện cache efficiency. Partitioning strategies cho Horizontal: Range Partitioning (theo date range, ID range – tốt cho time-series, dễ archive old data); Hash Partitioning (hash của partition key – phân phối đều, tránh hot spots); List Partitioning (theo enumerated values, ví dụ country); Composite Partitioning (kết hợp nhiều strategies). PostgreSQL Table Partitioning là built-in solution: khai báo partition key, tự động route inserts và prune partitions khi query.
Lợi ích: partition pruning giảm data scanned, parallel query trên nhiều partitions, dễ archive/drop old partitions (DROP PARTITION nhanh hơn DELETE). Khác biệt với Sharding: partitioning thường trong cùng một database instance, sharding là across nhiều servers.
Data Partitioning is the technique of splitting a large table or dataset into smaller pieces to improve performance and manageability. Horizontal Partitioning (Sharding): splits rows — e.g., users 1–1M go to partition 1, 1M–2M to partition 2; each partition has the same schema. Vertical Partitioning: splits columns — e.g., separating rarely-read blob/text columns into a separate table so hot data is co-located on disk, improving cache efficiency. Horizontal partitioning strategies: Range Partitioning (by date or ID range — good for time-series data, easy to archive old data); Hash Partitioning (hash of the partition key — even distribution, avoids hot spots); List Partitioning (by enumerated values, e.g., country); Composite Partitioning (combining multiple strategies). PostgreSQL Table Partitioning is a built-in solution: declare a partition key, and it automatically routes inserts and prunes partitions at query time.
Benefits: partition pruning reduces the amount of data scanned; parallel queries across partitions; easy archiving or dropping old partitions (DROP PARTITION is much faster than DELETE). Key difference from Sharding: partitioning typically operates within a single database instance, while sharding spreads data across multiple servers.