Introducing the Asian language treebank (ALT)

YK Thu, WP Pa, M Utiyama, A Finch… - Proceedings of the …, 2016 - aclanthology.org
Proceedings of the Tenth International Conference on Language …, 2016aclanthology.org
This paper introduces the ALT project initiated by the Advanced Speech Translation
Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of
this project is to accelerate NLP research for Asian languages such as Indonesian,
Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original
resource for this project was English articles that were randomly selected from Wikinews.
The project has so far created a corpus for Myanmar and will extend in scope to include …
Abstract
This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.
aclanthology.org
Showing the best result for this search. See all results