Publications
Search

Publications :: Search

Type-Based Categorization of Relational Attributes

Show publication

On this page you see the details of the selected publication.

    Publication properties
    Title: Type-Based Categorization of Relational Attributes
    Rating: (not rated yet)
    Discussion: 0 comments
    Date: 2009
    Publication type: Conference paper
    Authors:
    No. First name Last name Show
    1. Babak Ahmadi
    2. Marios Hadjieleftheriou
    3. Thomas Seidl
    4. Divesh Srivastava
    5. Suresh Venkatasubramanian
    Download (by DOI): 10.1145/1516360.1516372
    BibTeX: conf/edbt/AhmadiHSSV09
    DBLP: db/conf/edbt/edbt2009.html#AhmadiHSSV09
    Bookmark:

    The following keywords have been assigned to this publication so far. If you have logged in, you can tag this publication with additional keywords.

    Keywords
    No keywords have been assigned to this publication yet.

    If you log in you can tag this publication with additional keywords

    A publication can refer to another publication (outgoing references) or it can be referred to by other publications (incoming references).

    Incoming References
    No incoming references have been assigned to this publication yet.
    Outgoing References
    No outgoing references have been assigned to this publication yet.

    If you log in you can add references to other publications

    A publication can be assigned to a conference, a journal or a school.

    Conference Track
    Conference Name: EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24-26, 2009 2009
    Track Name: Research
    URL: http://www.edbt.org/Proceedings/2009-StPetersburg/edbt/sessions/research.html

    Abstract
           In this work we concentrate on categorization of relational
           attributes based on their data type. Assuming that attribute
           type/characteristics are unknown or unidentifiable, we analyze
           and compare a variety of type-based signatures for classifying
           the attributes based on the semantic type of the data contained
           therein (e.g., router identifiers, social security numbers,
           email addresses). The signatures can subsequently be used for
           other applications as well, like clustering and index
           optimization/compression. This application is useful in cases
           where very large data collections that are generated in a
           distributed, ungoverned fashion end up having unknown,
           incomplete, inconsistent or very complex schemata and schema
           level meta-data. We concentrate on heuristically generating
           type-based attribute signatures based on both local and global
           computation approaches. We show experimentally that by
           decomposing data into q-grams and then considering signatures
           based on q-gram distributions, we achieve very good
           classification accuracy under the assumption that a large sample
           of the data is available for building the signatures. Then, we
           turn our attention to cases where a very small sample of the
           data is available, and hence accurately capturing the q-gram
           distribution of a given data type is almost impossible. We
           propose techniques based on dimensionality reduction and
           soft-clustering that exploit correlations between attributes to
           improve classification accuracy.