Columnar Data Storage: A Deep-Dive into Parquet, Delta Lake, Columnstore Indexes, and More!
Edward Pollack
Analytic data storage on the Microsoft data platform has evolved greatly over the years. From the early days of PowerPivot and SQL Server Analysis Services to the advent of columnstore indexes and the eventual adoption of the parquet format as the de-facto storage standard for analytic data in Azure, a lot has happened in the past fifteen years.
This session is a deep-dive into how columnstore technologies work, including:
* Overview and effectiveness of columnstore storage formats
* Encoding and compression algorithms
* Columnstore index implementation in SQL Server
* Parquet file format
* Delta Parquet file format
* Vertipaq (row order)/V-Order/Z-Order optimization/Liquid Clustering
* Delta Lake implementation in Microsoft Fabric
* Other formats!
Understanding how analytic data is stored can allow for optimizations to be made to queries and the decisions made when architecting data structures. These improvements can decrease data size, speed-up analytics performance, and reduce computational overhead, thereby reducing hosting costs.
These technologies will continue to evolve as data grows larger and organizational needs become more complex. Working effectively with these data storage formats will allow for fast querying of large amounts of data, both now and in the future.
Get the Latest
Sign up to stay up to date with news, special announcements and educational content.
Redgate will only contact you about PASS Data Community Summit (in line with our Privacy Policy) unless you separately request emails about Redgate. You can unsubscribe from these updates at any time.
Thanks for submitting! We'll be in touch soon.
