Research

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

20 Jan 2019

Abstract

Manually annotated corpora for low-resource languages are usually small in quantity (gold), or large but distantly supervised (silver). Inspired by recent progress of injecting pre-trained language model (LM) on many Natural Language Processing (NLP) task, we proposed to fine-tune pre-trained language model from high-resources languages to low-resources languages to improve the performance of both scenarios. Our empirical experiment demonstrates significant improvement when fine-tuning pre-trained language model in cross-lingual transfer scenarios for small gold corpus and competitive results in large silver compare to supervised cross-lingual transfer, which will be useful when there is no parallel annotation in the same task to begin.

We compare our proposed method of cross-lingual transfer using pre-trained LM to different sources of transfer such as mono-lingual LM and Part-of-Speech tagging (POS) in the downstream task of both large silver and small gold NER dataset by exploiting character-level input of bi-directional language model task.

Download Full Paper here

South Quarter Building, Tower C, level 10
Jl. R.A Kartini Kav 8, South Jakarta, 12430
(+62)21 50982692 | business@kata.ai

Industri

Produk

Perusahaan

Solusi

South Quarter Building, Tower C, level 10
Jl. R.A Kartini Kav 8, South Jakarta, 12430
(+62)21 50982692 | business@kata.ai