Time-Series Database (TSDB) là database được tối ưu đặc biệt cho dữ liệu có timestamp – metrics, sensor readings, financial ticks, logs, IoT data.
- Đặc điểm: write-heavy (liên tục insert new data points), dữ liệu cũ ít được query, aggregation theo time windows (avg, sum, min/max trong khoảng thời gian), data retention policies (tự động xóa data cũ).
- Vì sao RDBMS không tốt cho time-series: index B-Tree không hiệu quả cho sequential time-based writes, phải manually partition và archive old data.
- TSDB tối ưu qua: columnar storage (compress cùng metric đến 10-100x), time-based partitioning built-in, downsampling (giảm resolution của old data), optimized aggregation functions.
- InfluxDB: phổ biến nhất, có InfluxQL/Flux query language, built-in retention policies.
- TimescaleDB: extension của PostgreSQL, dùng được SQL quen thuộc, hypertables tự động partition.
- Prometheus + Grafana: stack tiêu chuẩn cho infrastructure monitoring.
- Cassandra: cũng thường dùng cho time-series ở scale rất lớn (IoT).
- Use cases: application metrics (Datadog, New Relic dùng TSDB), stock prices, server monitoring, IoT sensor data.
A Time-Series Database (TSDB) is a database optimized specifically for timestamped data — metrics, sensor readings, financial ticks, logs, and IoT data.
- Key characteristics: write-heavy (continuously inserting new data points), old data is rarely queried, aggregation by time windows (avg, sum, min/max over time ranges), and data retention policies (automatic deletion of old data).
- Why RDBMS falls short for time-series: B-Tree indexes are inefficient for sequential time-based writes, and old data must be manually partitioned and archived.
- TSDBs optimize via: columnar storage (compresses the same metric up to 10–100x), built-in time-based partitioning, downsampling (reducing resolution of old data), and optimized aggregation functions.
- InfluxDB: the most popular, with InfluxQL/Flux query language and built-in retention policies.
- TimescaleDB: a PostgreSQL extension using familiar SQL, with hypertables that auto-partition.
- Prometheus + Grafana: the standard stack for infrastructure monitoring.
- Cassandra: also commonly used for time-series at very large scale (IoT).
- Use cases: application metrics (Datadog, New Relic use TSDBs), stock prices, server monitoring, and IoT sensor data.