Dzongkha Next Words Prediction Using Bidirectional LSTM

Authors

  • Karma Wangchuk College of Science and Technology (CST), Royal University of Bhutan.
  • Tandin Wangchuk College of Science and Technology (CST), Royal University of Bhutan.
  • Tenzin Namgyel Dzongkha Development Commission of Bhutan

DOI:

https://doi.org/10.17102/bjrd.rub.se2.038

Keywords:

Dzongkha word prediction, Machine Learning, Recurrent Neural Network, Long Short-Term Memory, Bidirectional LSTM

Abstract

Dzongkha Development Commission of Bhutan (DDC) is trying to computerize Dzongkha. However, the computerization of Dzongkha poses numerous challenges. Currently, the support for Dzongkha in modern technology is limited to printing, typing, and storage. Typewriting a single Dzongkha word requires several keypresses. As a result, typing Dzongkha is tedious. In this paper, the Dzongkha word label prediction was studied. The purpose of the study was to further reduce keystrokes and make Dzongkha typing much faster. The dataset encompasses different genres curated by DDC. The dataset consisted of 10000 sentences and 4820 unique words. Next, 52150 sequences were generated using N-gram methods followed by vectorizing text using embedding techniques. Different RNN-based models were evaluated for the next Dzongkha words prediction. Two Bi-LSTM layers with 512 hidden layer neurons gave the best accuracy of 73.89% with a loss of 1.0722.

Author Biographies

Karma Wangchuk, College of Science and Technology (CST), Royal University of Bhutan.

Karma Wangchuk is an Associate Lecturer at the Information Technology Department, College of Science and Technology (CST), Royal University of Bhutan. He completed his Bachelor of Engineering in Information Technology from the College of Science and Technology, Rinchending, Phuentsholing, Royal University of Bhutan, and Master of Engineering in Computer Engineering from Naresuan University, Phitsanulok, Thailand. His area of interest is in Machine Learning, Computer Vision, IoT, Natural Language Processing, and Big Data. His past publications include next syllables prediction for Dzongkha and Bhutanese Sign Language recognition. He is also one of the recipients of the AURG2021-2022 research grant. His team is currently working on a CST-DDC project titled “English to Dzongkha Translation Using Neural Machine Translation”.

Tandin Wangchuk, College of Science and Technology (CST), Royal University of Bhutan.

Tandin Wangchuk is a lecturer at the College of Science and Technology (CST). He is currently leading the Information Technology Department and looking after the smooth operation of the Bachelor of Engineering in Information Technology as the Head of the Department (HoD) and Programme Leader. He has a Master of Information Technology (MIT) in Network Computing from the University of Canberra (2012), Australia, and a Bachelor of Computer Science from the University of New Brunswick, Canada (2007). He has been working at CST since 2008 holding various positions and responsibilities. He is an ardent proponent of promoting programming and motivating students to take a keen interest in technologies.

Tenzin Namgyel, Dzongkha Development Commission of Bhutan

Mr. Tenzin Namgyel graduated with B.Tech in Computer Science and Engineering from the National Institute of Technology, Warangal, India. He is currently working as a Deputy Chief ICT Officer at Dzongkha Development Commission, Thimphu, Bhutan. He is involved in the digitization of Dzongkha and the promotion of Dzongkha through information technology. He also explores and researches automation in language processing to give easy and equal access to knowledge, information, and services to different sections of the society including those marginalized ones in the era of digital technology.

Downloads

Published

03-02-2023

How to Cite

Wangchuk, K. ., Wangchuk, T. ., & Namgyel, T. . (2023). Dzongkha Next Words Prediction Using Bidirectional LSTM. Bhutan Journal of Research and Development, (2). https://doi.org/10.17102/bjrd.rub.se2.038

Issue

Section

Articles