Cyrillic Mongolian-to-traditional Mongolian conversion method based on the transformer

Authors

DOI:

https://doi.org/10.5564/jimdt.v6i1.3599

Keywords:

Neural Network, Self-Attention, Mongolian translation

Abstract

 Cyrillic Mongolian and Traditional Mongolian are primarily utilized in Mongolia and China. The task of converting Cyrillic Mongolian to Traditional Mongolian (C2T) plays a vital role in facilitating language communication between compatriots of both nations and holds significant importance in the scientific, economic, and cultural domains of both countries. Mongolian words consist of stems and suffixes, resulting in an extensive Mongolian vocabulary that includes a multitude of Out-of-vocabulary (OOV) words. The conversion of OOV words cannot be effectively addressed solely through the use of rules and dictionaries. Hence, this paper presents a Transformer based approach for Cyrillic Mongolian to Traditional Mongolian conversion. Experimental results demonstrate a 5.72% reduction in word error rate (WER) compared to the joint sequence approach.

Downloads

Download data is not yet available.
Abstract
201
PDF
99

Author Biographies

Muhan Na, College of Computer Science, Hohhot, Inner Mongolia University 010021, China

Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, Inner Mongolia University 010021, China

National Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, Hohhot, Inner Mongolia University 010021, China

Feilong Bao, College of Computer Science, Hohhot, Inner Mongolia University 010021, China

Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, Inner Mongolia University 010021, China

National Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, Hohhot, Inner Mongolia University 010021, China

Weihua Wang, College of Computer Science, Hohhot, Inner Mongolia University 010021, China,

Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, Inner Mongolia University 010021, China

National Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, Hohhot, Inner Mongolia University 010021, China

Guanglai Gao, College of Computer Science, Hohhot, Inner Mongolia University 010021, China

Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, Inner
Mongolia University 010021, China
National Local Joint Engineering Research Center of Intelligent Information Processing Technology
for Mongolian, Hohhot, Inner Mongolia University 010021, China

References

[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. Computer Science, 2014.’

[2] Feilong Bao, Guanglai Gao, Hongwei Wang, and min Lu. Combining of rules and statistics for cyrillic mongolian to traditional mongolian conversion. 31(3):156, 2017.

[3] Feilong Bao, Guanglai Gao, Xueliang Yan, and Hongxi Wei. Research on conversion approach between traditional mongolian and cyrillic mongolian. Computer Engineering and Applications, pages 206–211, 2014.

[4] Maximilian Bisani and Hermann Ney. Joint-sequence models for graphemeto-phoneme conversion. Speech Communication, 50(5):434–451, 2008, https://doi.org/10.1016/j.specom.2008.01.002

[5] Chaoluomeng. Modern Mongolian. Inner Mongolia People’s Publishing House, Hohhot, 2009.

[6] Chinggaltai. A grammar of the Mongolian language. Inner Mongolia Peoples Publishing House, Hohhot, 1991.

[7] Kyunghyun Cho, Bart Van Merri‥enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science, 2014.

[8] She Chuma. A comparative study of Mongolian and Cyrillic orthography. Inner Mongolia Education Press, Hohhot, 2010.

[9] Uuganbaatar Dulamragchaa, Sodoo Chadraabai, Byambasuren Ivanov, and Munkhbayar Baatarkhuu. Mongolian language morphology and its database structure. In 2017 International Conference on Green Informatics (ICGI), pages 282–285. IEEE, 2017, https://doi.org/10.1109/ICGI.2017.56

[10] Galasamponsige. Cyrillic Mongolian Learning Book. Inner Mongolia Education Press, Hohhot, 2006.

[11] Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang, and Yonghe Wang. Improving Mongolian phrase break prediction by using syllable and morphological embeddings with bilstm model. In Interspeech, pages 57–61, 2018.

[12] Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, and Haizhou Li. Modeling prosodic phrasing with multi-task learning in tacotron-based tts. IEEE Signal Processing Letters, 27:1470–1474, 2020, https://doi.org/10.1109/LSP.2020.3016564

[13] Rui Liu, Berrak Sisman, Feilong Bao, Jichen Yang, Guanglai Gao, and Haizhou Li. Exploiting morphological and phonological features to improve prosodic phrasing for mongolian speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:274–285, 2020, https://doi.org/10.1109/TASLP.2020.3040523

[14] Rui Liu, Berrak Sisman, Jingdong Li, Feilong Bao, Guanglai Gao, and Haizhou Li. Teacherstudent training for robust tacotron-based tts. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6274–6278. IEEE, 2020.

15] Min Lu, Feilong Bao, and Guanglai Gao. Language model for mongolian polyphone proofreading. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pages 461–471. Springer, 2017. https://doi.org/10.1007/978-3-319-69005-6_38

[16] Martin Popel and Ondˇrej Bojar. Training tips for the transformer model. arXiv preprint arXiv:1804.00247, 2018, https://doi.org/10.2478/pralin-2018-0002

[17] Kanishka Rao, Fuchun Peng, Hasim Sak, and Francoise Beaufays. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4225–4229. IEEE, 2015.

[18] Bayar Saihan. Mongolian Dictionary (Cyrillic and Traditional Mongolian Contrastive Dictionary). Suoyongbu Printing Press, Hohhot, 2011.

[19] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.

[20] Uuganbaatar.D. Research on Cyrillic and Mongolian script’s morphology and conversion system. PhD thesis, Inner Mongolia University, 2014.

[21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

[22] Yonghe Wang, Feilong Bao, Hui Zhang, and Guanglai Gao. Joint alignment learning-attention based model for grapheme-to-phoneme conversion. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7785–7792. IEEE, 2021, https://doi.org/10.1109/ICASSP39728.2021.9413679

PMid:34360079 PMCid:PMC8345426

[23] Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, et al. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135, 2017, https://doi.org/10.21437/Interspeech.2017-1452 PMid:28580117 PMCid:PMC5434753

[24] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.

[25] Zhizhong Zhang. New Mongolian Chinese Dictionary. Commercial Press, Beijing, 2011.

Downloads

Published

2024-12-27

How to Cite

Na, M., Bao, F., Wang, W., Gao, G., & Dulamragchaa, U. (2024). Cyrillic Mongolian-to-traditional Mongolian conversion method based on the transformer. Journal of Institute of Mathematics and Digital Technology, 6(1), 120–129. https://doi.org/10.5564/jimdt.v6i1.3599

Issue

Section

Articles