InfoRadar@UPRM
InfoRadarTM
was designed by the
Programming Systems Research
Group of the Laboratory for Computer Science
at the
Massachusetts Institute of Technology,
and now is in the process of being updated by the Advanced Data Management Research
Group of the PRECISE Project
at the
University of Puerto Rico at Mayagüez.
InfoRadarML: A Multi-Lingual Information Discovery Tool Exploiting Automatic
Document Categorization
The central hypothesis
of this work is that retrieval effectiveness of multilingual documents can be
improved by simultaneously providing the search engine human-translated
multilingual queries identified with their source languages. InfoRadarML
enhances InfoRadar
by adding support for
multilingual queries and document collections.
Project Leader:
Dr. Bienvenido Velez-Rivera
Students: Jairo E. Valiente-Fernandez (graduate)
Lissete Toledo (undergraduate)
Description:
InfoRadar, an information retrieval system developed at the MIT Laboratory for Computer
Science , supports a novel user interaction model based on Visual Query Hierarchies .
Visual query hierarchies help users quickly focus on their particular information needs.
Formulating precise and effective queries in information retrieval systems has always been
a difficult task, even for experienced users. Several factors contribute to this problem.
The task of formulating an effective query requires the user to predict which terms appear
in documents relevant to the information need. This often requires extensive knowledge
about the document collection that it difficult to obtain in large document corpora.
In addition, users want to avoid retrieving irrelevant documents due to a query that
is under-specified or contains ambiguous terms. As a result, users of an information
retrieval system with a large corpus are often faced with the task of manually sifting
through very large and often inappropriate result sets.
This research has proposed a possible solution
to the critical issue of retrieving and analyzing of multilingual information
resources on the University of Puerto Rico website system, using InforadarML.
The proposed prototype takes into account the requirements and constraints of
each language supported(English and Spanish).
InfoRadarML enhances InfoRadar, our previous
research prototype, by adding support for multi-lingual queries and document
collections. The main purpose of the new enhancements is to allow users to
formulate and process queries containing terms in multiple languages.
These queries are sent to the InforadarML
search engine where they are matched against a collection of documents, also in
multiple languages. The search engine
then returns an integrated result set containing all documents in the selected
languages. In order to fulfill this
goal, we have focused our client-side implementation effort along two lines: providing users with ways to type special
characters into their queries, and allowing users to submit multi-lingual
queries to the search engine. On the
server side, InforadarML has been enhanced with a multi-lingual indexing module capable of automatically
tagging documents with their source language and, based on this, conduct
feature extraction using language-specific algorithms. The search engine proper has also been
enhanced with multi-lingual query processing capabilities.
Experimental Plan:
- Implement Inforadar site indexing ALL website data at UPRM
- Develop a multilingual support
- Make Inforadar the official search engine for the UPRM web site
- Conduct usability study
- Analyze real user feedback
- Incorporate feedback into an improved design
[ Next: How Inforadar works ]
|