Publications :: Search

Self-organizing Tuple Reconstruction in Column-stores

Show publication

On this page you see the details of the selected publication.

    Publication properties
    Title: Self-organizing Tuple Reconstruction in Column-stores
    Rating: (3)
    Discussion: 0 comments
    Date: 2009
    Publication type: Conference paper
    No. First name Last name Show
    1. Stratos Idreos
    2. Martin Kersten
    3. Stefan Manegold
    Download (by DOI): 10.1145/1559845.1559878
    BibTeX: conf/sigmod/IdreosKM09
    DBLP: db/conf/sigmod/sigmod2009.html#IdreosKM09

    The following keywords have been assigned to this publication so far. If you have logged in, you can tag this publication with additional keywords.

    1. MonetDB

    If you log in you can tag this publication with additional keywords

    A publication can refer to another publication (outgoing references) or it can be referred to by other publications (incoming references).

    Incoming References
    No incoming references have been assigned to this publication yet.
    Outgoing References
    No outgoing references have been assigned to this publication yet.

    If you log in you can add references to other publications

    A publication can be assigned to a conference, a journal or a school.

    Conference Track
    Conference Name: ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009 2009
    Track Name: Research


    Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates.

    In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.