Framework for Improving English to Hindi Rule-Based Translation System

Authors

  • Seema Shukla JSS Academy of Technical Education

Keywords:

Natural Language Processing (NLP), Machine Translation (MT), Rule Based Machine Translation (RBMT), Morphological Analyzer (MA), Lexical Resource (LR)

Abstract

Presently the information captured through Morpheme, Lexeme or Word-based Morphological Analysis, for word or words in phrases is not enough for Natural Language Processing (NLP) systems as words have different meanings as individuals or in groups. Since some phrases are well structured, sentence level morphological analyzer provides effective knowledge base for NLP. This paper presents work of phrase level and word level morph analyzers for English-Hindi language pair(s) of (most vibrant) tourism domain. The approach proposed is of identifying unique sentence structures capable of representing complete targeted corpus. First the available corpus is used to analyze sentence structures with the help of available and developed IT tools to provide the necessary information such as occurrence of the “group of words”, classify these group of words into various grammatical categories, study their behavior in rule-based machine translation system, find out the divergence between human and machine interpretation and find suitable rules to reduce the divergence. This captured intelligence can be useful as knowledge base for NLP systems.

Author Biography

Seema Shukla, JSS Academy of Technical Education

Seema Shukla
JSS Academy of Technical Education, Noida, India
seemashukla@gmail.com

References

James, A. (2002). Natural Language Understanding. Pearson.

Jusoh, S. (2018). A Study on NLP Applications and Ambiguity Problems. Journal of Theoretical & Applied Information Technology, 96(6).

Wurzel, W. U. (1996). On the similarities and differences between inflectional and derivational morphology. STUF-Language Typology and Universals, 49(3), 267-279.

Booij, G. (2009). Morphological analysis. The Oxford handbook of grammatical analysis. Oxford University Press.

Allen, M., Badecker, W., & Osterhout, L. (2003). Morphological analysis in sentence processing: An ERP study. Language and Cognitive Processes, 18(4), 405-430.

Baker, K., Franz, A., Jordan, P., Mitamura, T., & Nyberg, E. (1994). Coping with ambiguity in a large-scale machine translation system. In COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics.

Sinha, R. M. K., & Jain A., (2003), Angla-Hindi: An English to Hindi machine-aided translation system. MT Summit IX, New Orleans, USA, 494-497.

Sinha, K., Mahesh, R., & Thakur, A. (2005). Translation Divergence in English-Hindi MT. In Proceedings of the 10th EAMT Conference: Practical applications of machine translation.

Sinha, R. M. K., & Thakur, A. (2005). Divergence patterns in machine translation between Hindi and English. 10th Machine Translation summit (MT Summit X), Phuket, Thailand, 346-353.

Sinha, R. M. K. (2007). Using rich morphology in resolving certain Hindi-English machine translation divergence. MT Summit XI, 10-14.

Sreelekha S., (2020), Machine Translation between Malayalam and English. Linguistics Journal, 14(2), 7-31.

Garje G. V., & Kharate G. K. (2013), Survey of Machine Translation Systems in India. International Journal on Natural Language Computing, 2.4, 47-65, https://doi.org/10.5121/ijnlc.2013.2504

Chopra Deepti, Joshi Nisheeth, & Mathur Iti (2018), A Review on Machine Translation in Indian Languages. Engineering, Technology & Applied Science Research, 8(50), 3475-3478.

Godase Amruta, & Govilkar Sharvari (2015), Machine Translation Development for Indian Languages and its Approaches. International Journal on Natural Language Computing (IJNLC),4, 55-74. https://doi.org/ijnlc.2015.4205

Sinha, R. M. K., Sivaraman, K., Agrawal, A., Jain, R., Srivastava, R., & Jain, A. (1995, October). ANGLABHARTI: a multilingual machine aided translation project on translation from English to Indian languages. In 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century (Vol. 2, pp. 1609-1614). IEEE.

Shukla Seema & Sinha Usha (2015), Noise Issues in Sentence Structure for Morphological Analysis of English Language Sentences for Hindi Language Users. International Journal of Languages, Literature and Linguistics, 1(1), 56-59, https://doi.org/10.7763/IJLLL.2015.V1.12

Shukla Seema & Sinha Usha (2015), Categorizing Sentence Structures for Phrase Level Morphological Analyzer for English to Hindi RBMT, Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP). IEEE, https://doi.org/10.1109/CCIP.2015.7100741

Downloads

Published

2021-12-14

How to Cite

Seema Shukla. (2021). Framework for Improving English to Hindi Rule-Based Translation System. Linguistics International Journal, 15(2), 70–95. Retrieved from https://connect.academics.education/index.php/lij/article/view/155

Issue

Section

Articles