MANILA – The Korean government is funding a language project that will build a corpus data to train artificial intelligence (AI) to accurately translate Korean language into Philippine Tagalog.
The project, which will also build a corpus for seven other foreign languages, is being funded by the National Institute of the Korean Language.
Aside from helping develop a Korean-Filipino automatic translation tool, the corpus project aims to bridge gaps in Korean-Filipino language research.
“English cannot substitute Filipino because Filipinos speak Philippine languages. We chose Tagalog because there’s already ample data about English language so we are looking at different languages, which are also more relevant to Korean society,” University of the Philippines Korea Research Center (UP KRC) Director and head of the project’s review team Kyungmin Bae said in an interview on Tuesday night.
Corpus is a collection of authentic language data that can be used to train AI or translation engines, and allow users to translate sentences from one language to another.
This means the larger amount of data collected, the higher the quality of translation.
While Philippine Tagalog is spoken by millions of people across the world, the language has relatively insufficient translation corpus data, making it difficult to expect high-quality translation results during automatic translation.
The first phase of the Korean-Foreign Language Parallel Corpus Project began in 2021 and resulted in a parallel corpus of eight million words translated from Korean into Philippine Tagalog, Vietnamese, Indonesian, Thai, Indian Hindi, Cambodian, Russian and Uzbek.
The second phase expanded the scale of construction data to build a parallel corpus of 10 million words and is building 11.04 million words for phase three from May to December 2023.
The corpus data from phase one is now accessible to public for free (https://corpus.korean.go.kr/request/reausetMain.do?lang=en) and has already been used to produce two research papers at the University of the Philippines published in KCI (Korea Citation Index) journals.
The program is in line with the Korean government’s aim that could further strengthen trade, cultural and people-to-people relations with the Philippines as the two states mark 75 years of diplomatic relations in 2024.
In the long-term, Kyung Hee University Professor and project convenor Jung Hee Lee sees the initiative as helping narrow the cultural distance while increasing economic engagements between the Philippines and Korea.
Aside from the academe, the program sees industries benefiting from the accurate translations allowing them to create manuals for their businesses --a timely development as the number of South Korean firms doing business in the Philippines is also expected to increase as the two states finalize a bilateral trade deal.
Lee, together with Dr. Ji Yeon Jeon, Tagalog Researcher, NIKL Korean-Foreign Language Parallel Corpus Project of the Kangwon National University, visited the Philippines from Sept. 3 to 6 to assess the program’s progress and meet with Filipino counterparts.
They also gave a lecture at the UP Department of Linguistics as well as paid a courtesy call at the Komisyon sa Wikang Filipino, UP Office of Chancellor, UP Sentro ng Wikang Filipino, and met with other linguists from various fields to introduce the program and explore ways to cooperate to further develop the Philippine Tagalog corpus.
Supervising the project from the Philippine side are UP Department of Linguistics faculty Dr. Aldrin Lee and Kyungmin Bae. (PNA)