Kata.ai
Publication year: 2018

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Written by:
Kemal Kurniawan, Alham Fikri Aji

Abstract

Previous work in Indonesian part-of-speech (POS) tagging are hard to compare as they are not evaluated on a common dataset. Furthermore, in spite of the success of neural network models for English POS tagging, they are rarely explored for Indonesian. In this paper, we explored various techniques for Indonesian POS tagging, including rule-based, CRF, and neural network-based models. We evaluated our models on the IDN Tagged Corpus. A new state-of-the-art of 97.47 F1 score is achieved with a recurrent neural network. To provide a standard for future work, we release the dataset split that we used publicly.

  • Share
Download Full Paper

Other case Paper

BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter
Publication year: 2021

BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter

IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism
Publication year: 2021

IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism

Benchmarking Multidomain English-Indonesian Machine Translation
Publication year: 2020

Benchmarking Multidomain English-Indonesian Machine Translation

Ready to build your conversational AI?

Get started
CTA