Setu
Setu [Cross Lingual Information Retrieval/Document Access Across Languages]
The web is a critical and vast source of information in today’s world. However most of the information is in English, which is understood by less than 5% of the Indian population. Search engines, the primary mechanisms to find information on the web, do not facilitate querying foreign language documents in Indian languages. DAAL proposes to bridge this digital language divide by enabling a person to query the web for documents and obtain the results in an Indian language like Hindi. It uses a combination of Cross Lingual Information Retrieval, Machine Translation and transliteration, etc and is built over existing search engines.
Setu is a realization of
DAAL
, which tries to bridge this digital language divide. Setu tries to attempt this problem by providing a user friendly interface for entering the query in Indian Language (e.g. Hindi). The query is translated to English and sent to the search engine for results, the retrieved results are then sent to our machine translation system MaTra for translating these results into Indian Language. These translated results are then displayed to the user. Also when the user clicks on any of the results to visit that page, the contents of that page are translated using MaTra and displayed to the user in Indian Language.
Clustering of Web Based Results (Clustering support for Setu) is an offshoot of Setu. It provides clustering facility to Setu by grouping the search results returned by the search engine. It helps in easier navigation through the results as the search engines return huge number of results, also it can help in the identification of the major discriminating senses/attributes of the query. Clustering of search results is different from document clustering, as the snippets (description of search result returned by the search engine) are not essentially complete sentences and the information available is also very less.
Staff members associated with project Setu: Dr. M Sasikumar, Chandrashekhar, Prashant More, Alok Dadhekar, Prakash Pimpale, Deepali Nemade, Mayank Madhav, Sarvesh Nikumbh, Aparna Mukherjee