Information Retrieval on Online Documents

Artikkel

Publiseringsdato

03.02.2010

Pubid

12.01.2010

Referanse

Conference article. Workshop on Knowledge Representation & Retrieval.  Swiss AI Society. 12.01.2010 Zurich

Forfattarar

 

Information retrieval (IR) is the key technology for knowledge management which guarantees access to large corpora of unstructured data. There have been various approaches towards designing an IR system that can extract information from documents speedily and consistently. In this paper, we propose a 3-phase architecture which we call the document refinement system (DRS). The system initially classifies the document as belonging to a specific domain. Then the document is scanned and sentences relevant to templates corresponding to several aspects of the domain are extracted. Ultimately, the templates are filled by syntactic and semantic analysis of these sentences. We conduct a series of experiments to show that this system has high recall and precision.