Publications
Search

Publications :: Search

Incremental Maintenance of Length Normalized Indexes for Approximate String Matching

Show publication

On this page you see the details of the selected publication.

    Publication properties
    Title: Incremental Maintenance of Length Normalized Indexes for Approximate String Matching
    Rating: (5)
    Discussion: 0 comments
    Date: 2009
    Publication type: Conference paper
    Authors:
    No. First name Last name Show
    1. Marios Hadjieleftheriou
    2. Nick Koudas
    3. Divesh Srivastava
    Download (by DOI): 10.1145/1559845.1559891
    BibTeX: conf/sigmod/HadjieleftheriouKS09
    DBLP: db/conf/sigmod/sigmod2009.html#HadjieleftheriouKS09
    Bookmark:

    The following keywords have been assigned to this publication so far. If you have logged in, you can tag this publication with additional keywords.

    Keywords
    No keywords have been assigned to this publication yet.

    If you log in you can tag this publication with additional keywords

    A publication can refer to another publication (outgoing references) or it can be referred to by other publications (incoming references).

    Incoming References
    No incoming references have been assigned to this publication yet.
    Outgoing References
    No outgoing references have been assigned to this publication yet.

    If you log in you can add references to other publications

    A publication can be assigned to a conference, a journal or a school.

    Conference Track
    Conference Name: ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009 2009
    Track Name: Research
    URL: http://www.sigmod09.org/

    Abstract

    Approximate string matching is a problem that has received a lot of attention recently. Existing work on information retrieval has concentrated on a variety of similarity measures (TF/IDF, BM25, HMM, etc.) specifically tailored for document retrieval purposes. As new applications that depend on retrieving short strings are becoming popular (e.g., local search engines like YellowPages.com, Yahoo!Local, and Google Maps) new indexing methods are needed, tailored for short strings. For that purpose, a number of indexing techniques and related algorithms have been proposed based on length normalized similarity measures. A common denominator of indexes for length normalized measures is that maintaining the underlying structures in the presence of incremental updates is inefficient, mainly due to data dependent, precomputed weights associated with each distinct token or string. Incorporating updates usually is accomplished by rebuilding the indexes at regular time intervals. In this paper we present a framework that advocates lazy update propagation with the following key feature: Efficient, incremental updates that immediately reflect the new data in the indexes in a way that gives strict guarantees on the quality of subsequent query answers. More specifically, our techniques guarantee against false negatives and limit the number of false positives produced. We implement a fully working prototype and illustrate that the proposed ideas work really well in practice for real datasets.