Column-stores gained popularity as a promising physical
design alternative. Each attribute of a relation is physically stored as a separate column
allowing queries to
load only the required attributes.
The overhead incurred is
on-the-fly tuple reconstruction for multi-attribute queries.
Each tuple reconstruction
is a join of two columns based on tuple IDs, making it a
significant cost component. The ultimate physical design is to have multiple presorted copies of each base table
such that tuples are already appropriately organized in multiple different orders across the various columns. This requires
the ability to predict the workload, idle time to prepare, and infrequent updates.
In this paper, we propose a novel design, partial sideways cracking, that
minimizes the tuple reconstruction cost in a self-organizing way.
It achieves performance similar to using presorted data,
but without requiring the heavy initial presorting step itself.
Instead, it handles
dynamic, unpredictable workloads with no idle time and frequent updates.
Auxiliary dynamic data structures, called cracker maps,
provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction.
A map is continuously physically reorganized as an integral part of query evaluation,
providing faster and reduced data access for future queries.
To enable flexible and self-organizing behavior in storage-limited environments,
maps are materialized only partially
as demanded by the workload. Each map is a collection of
separate chunks that are individually reorganized, dropped or recreated as needed.
We implemented partial sideways cracking in an open-source column-store.
A detailed experimental analysis
demonstrates that it brings significant performance benefits
for multi-attribute queries.