Malay named entity recognition: a review
Farid Morsidi1, Sulaiman Sarkawi2, Suliana Sulaiman3, Rohaizah Abdul Wahid4.
The Named Entity Recognition (NER) field had been thriving for more than 15
years. NER could be defined as a process that recognizes named entities, such
as the names of persons, organizations, locations, times, and quantities. The
research field of NER generally emphasizes on the extraction and classification
of mentions for rigid designators. This ranged from text, such as proper
names, biological species, temporal expressions, and so on. NER has been
utilized in many sectors, for example ranging from inquiries to morphological
syntax, besides information extraction. However, most of the work had been
delegated on limited domains and textual genres such as news articles and
web pages. Techniques used during the processing of English text cannot
be used to process Malay-related terminology. This is due to the different
morphological usage of a particular language. Finding co-references and
aliases in a text can be reduced to the same problem of finding all occurrences
of an entity in a document. This paper proposes approaches that have been
applied in the fields of NER that is in Malay, or partially related to it, in order
to detect proper nouns within Malay documents. This paper also discusses
the various researches done in an effort to produce high-quality training data
for Malay corpus via appropriate NER algorithms and methods aside from
highlighting the key points needed in improving the current NER studies.
Affiliation:
- Universiti Pendidikan Sultan Idris, Malaysia
- Universiti Pendidikan Sultan Idris, Malaysia
- Universiti Pendidikan Sultan Idris, Malaysia
- Universiti Pendidikan Sultan Idris, Malaysia
Download this article (This article has been downloaded 260 time(s))