DSpace
 

DSpace at IIT Bombay >
IITB Publications >
Article >

Please use this identifier to cite or link to this item: http://dspace.library.iitb.ac.in/jspui/handle/10054/12718

Title: Synset Based Multilingual Dictionary: Insights, Applications and Challenges
Authors: MOHANTY, RK
BHATTACHARYYA, P
KALELE, S
PANDEY, P
SHARMA, A
KOPRA, M
Issue Date: 2007
Publisher: UNIV SZEGED, DEPT INFORMATICS
Citation: GWC 2008: FOURTH GLOBAL WORDNET CONFERENCE, PROCEEDINGS, (), 321-333
Abstract: In this paper, we report our effort at the standardization, design and partial implementation of a multilingual dictionary in the context of three large scale projects, viz., (i) Cross Lingual Information Retrieval, (ii) English to Indian Language Machine Translation, and (iii) Indian Language to Indian Language Machine Translation. These projects are large scale, because each project involves 8-10 partners spread across the length and breadth of India with great amount of language diversity. The dictionary is based not on words but on WordNet SYNSETS, i. e., concepts. Identical dictionary architecture is used for all the three projects, where source to target language transfer is initiated by concept to concept mapping. The whole dictionary can be looked upon as an M X N matrix where M is the number of synsets (rows) and N is the number of languages (columns). This architecture maps the lexeme(s) of one language-standing for a concept-with the lexeme(s) of other languages standing for the same concept. In actual usage, a preliminary WSD identifies the correct row for a word and then a lexical choice procedure identifies the correct target word from the corresponding synset. Currently the multilingual dictionary is being developed for 11 languages: English, Hindi, Bengali, Marathi, Punjabi, Urdu, Tamil, Kannada, Telugu, Malayalam and Oriya. Our work with this framework makes us aware of many benefits of this multilingual concept based scheme over language pair-wise dictionaries. The pivot synsets, with which all other languages link, come from Hindi. Interesting insights emerge and challenges are faced in dealing with linguistic and cultural diversities. Economy of representation is achieved on many fronts and at many levels. We have been eminently assisted by our long standing experience in building the WordNets of two major languages of India, viz., Hindi and Marathi which rank 5th (similar to 500 million) and 14th (similar to 70 million) respectively in the world in terms of the number of people speaking these languages.
URI: http://dspace.library.iitb.ac.in/xmlui/handle/10054/12718
http://hdl.handle.net/10054/12718
Appears in Collections:Article

Files in This Item:

There are no files associated with this item.

View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback