Information Retrieval on Online Documents


Information retrieval (IR) is the key technology for knowledge management which guarantees access to large corpora of unstructured data. There have been various approaches towards designing an IR system that can extract information from documents speedily and consistently. In this paper, we propose a 3-phase architecture which we call the document refinement system (DRS). The system initially classifies the document as belonging to a specific domain. Then the document is scanned and sentences relevant to templates corresponding to several aspects of the domain are extracted. Ultimately, the templates are filled by syntactic and semantic analysis of these sentences. We conduct a series of experiments to show that this system has high recall and precision.

Conference article. Workshop on Knowledge Representation & Retrieval. Swiss AI Society. 12.01.2010 Zurich