Framework for Improving English to Hindi Rule-Based Translation System
Keywords:
Natural Language Processing (NLP), Machine Translation (MT), Rule Based Machine Translation (RBMT), Morphological Analyzer (MA), Lexical Resource (LR)Abstract
Presently the information captured through Morpheme, Lexeme or Word-based Morphological Analysis, for word or words in phrases is not enough for Natural Language Processing (NLP) systems as words have different meanings as individuals or in groups. Since some phrases are well structured, sentence level morphological analyzer provides effective knowledge base for NLP. This paper presents work of phrase level and word level morph analyzers for English-Hindi language pair(s) of (most vibrant) tourism domain. The approach proposed is of identifying unique sentence structures capable of representing complete targeted corpus. First the available corpus is used to analyze sentence structures with the help of available and developed IT tools to provide the necessary information such as occurrence of the “group of words”, classify these group of words into various grammatical categories, study their behavior in rule-based machine translation system, find out the divergence between human and machine interpretation and find suitable rules to reduce the divergence. This captured intelligence can be useful as knowledge base for NLP systems.
References
James, A. (2002). Natural Language Understanding. Pearson.
Jusoh, S. (2018). A Study on NLP Applications and Ambiguity Problems. Journal of Theoretical & Applied Information Technology, 96(6).
Wurzel, W. U. (1996). On the similarities and differences between inflectional and derivational morphology. STUF-Language Typology and Universals, 49(3), 267-279.
Booij, G. (2009). Morphological analysis. The Oxford handbook of grammatical analysis. Oxford University Press.
Allen, M., Badecker, W., & Osterhout, L. (2003). Morphological analysis in sentence processing: An ERP study. Language and Cognitive Processes, 18(4), 405-430.
Baker, K., Franz, A., Jordan, P., Mitamura, T., & Nyberg, E. (1994). Coping with ambiguity in a large-scale machine translation system. In COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics.
Sinha, R. M. K., & Jain A., (2003), Angla-Hindi: An English to Hindi machine-aided translation system. MT Summit IX, New Orleans, USA, 494-497.
Sinha, K., Mahesh, R., & Thakur, A. (2005). Translation Divergence in English-Hindi MT. In Proceedings of the 10th EAMT Conference: Practical applications of machine translation.
Sinha, R. M. K., & Thakur, A. (2005). Divergence patterns in machine translation between Hindi and English. 10th Machine Translation summit (MT Summit X), Phuket, Thailand, 346-353.
Sinha, R. M. K. (2007). Using rich morphology in resolving certain Hindi-English machine translation divergence. MT Summit XI, 10-14.
Sreelekha S., (2020), Machine Translation between Malayalam and English. Linguistics Journal, 14(2), 7-31.
Garje G. V., & Kharate G. K. (2013), Survey of Machine Translation Systems in India. International Journal on Natural Language Computing, 2.4, 47-65, https://doi.org/10.5121/ijnlc.2013.2504
Chopra Deepti, Joshi Nisheeth, & Mathur Iti (2018), A Review on Machine Translation in Indian Languages. Engineering, Technology & Applied Science Research, 8(50), 3475-3478.
Godase Amruta, & Govilkar Sharvari (2015), Machine Translation Development for Indian Languages and its Approaches. International Journal on Natural Language Computing (IJNLC),4, 55-74. https://doi.org/ijnlc.2015.4205
Sinha, R. M. K., Sivaraman, K., Agrawal, A., Jain, R., Srivastava, R., & Jain, A. (1995, October). ANGLABHARTI: a multilingual machine aided translation project on translation from English to Indian languages. In 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century (Vol. 2, pp. 1609-1614). IEEE.
Shukla Seema & Sinha Usha (2015), Noise Issues in Sentence Structure for Morphological Analysis of English Language Sentences for Hindi Language Users. International Journal of Languages, Literature and Linguistics, 1(1), 56-59, https://doi.org/10.7763/IJLLL.2015.V1.12
Shukla Seema & Sinha Usha (2015), Categorizing Sentence Structures for Phrase Level Morphological Analyzer for English to Hindi RBMT, Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP). IEEE, https://doi.org/10.1109/CCIP.2015.7100741
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.