The Department of Electronics and Information Technology has launched Internet search engine, Sandhan, which enables users to search for tourism-related information across websites in five languages including Bengali, Hindi, Marathi, Tamil and Telugu.
How does it work?
Users can enter their queries in one of the five languages after that Sandhan retrieves a set of relevant documents in the chosen language, from data crawled related to tourism domain. Retrieved documents are presented to the user in the form of an ordered list based on the relevance of the document.
– Users can submit a query using either the InScript keyboard or the phonetic keyboard. In the case of the InScript keyboard, users can type using that keyboard layout or an onscreen keyboard can be used to submit a query to the system.
– Sandhan can process the query based on its language and retrieves results only in chosen language.
– Snippets generated for each of the retrieved document help the user understand the context of query terms in that document.
– Many of the Indian language web pages are in custom fonts that make the search difficult for retrieving documents. Sandhan uses a font transcoder that converts the custom fonts into Unicode fonts for processing.
– A summary is produced for each retrieved document. Basically, it helps users to get an idea about the overall content of the document without opening it.
– An additional URL-based semantic search facility is provided for Tamil language.
Sandhan has been developed by 120 researchers of 12 institutions including IIT Bombay, CDAC Noida, IIT Kharagpur, CDAC Pune, Dhirubhai Ambani Institute of Information and Communication Technology Gandhinagar and others over a period of six years. The project is led by Dr. Pushpak Bhattacharya under the Technology Development for Indian Languages (TDIL) programme of DeitY.