Time-Series Storage: Design Choices That Shape Cost and Performance

"Normalizing series identity into a separate metadata table and referencing it by a compact ID reduces time-series storage by about forty-two percent in our experiment. Instead of repeating dimension strings like device name, region, and location on every row, each row carries only a small integer key and the full dimension strings are stored once per unique series."

"High-cardinality fields like request IDs and session tokens should be kept out of series identity. When the number of unique dimension combinations approaches the number of rows, normalization gains collapse and both storage and indexing costs grow linearly."

"Storing series dimensions as flexible JSON (e.g., PostgreSQL jsonb) with targeted indexes avoids schema migrations as tags progress, but requires deliberate indexing policy to prevent index sprawl and type drift."

"Time partitioning allows O(1) data expiration and partition pruning, but creates a write hotspot on the current window. Adding a second axis (series identity) distributes writes and narrows read scans. Downsampling from five-second to one-hour resolution reduces row count by 720 times, retaining full resolution only for the window where it matters and serving older queries from pre-aggregated rollups."

Time-series data records measurements over time rather than only current state. Storage design choices such as row layout, compression, and partitioning strongly affect cost and query performance. Normalizing series identity into a separate metadata table and referencing it by compact IDs reduces storage by about 42% by storing dimension strings once per unique series. High-cardinality fields like request IDs and session tokens should be excluded from series identity because normalization benefits collapse when unique dimension combinations approach row count. Flexible JSON storage for series dimensions can avoid schema migrations but requires careful indexing to prevent index sprawl and type drift. Time partitioning enables O(1) expiration and partition pruning but can create write hotspots, which can be mitigated by adding a second axis for series identity. Downsampling from five-second to one-hour resolution reduces row count by 720x while keeping full resolution only for the most relevant recent window and using rollups for older queries.

#time-series-storage #schema-normalization #partitioning #indexing #downsampling

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

Time-Series Storage: Design Choices That Shape Cost and PerformanceTime-Series Storage: Design Choices That Shape Cost and Performance Briefly

Time-Series Storage: Design Choices That Shape Cost and Performance
Time-Series Storage: Design Choices That Shape Cost and Performance
Briefly