mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

Projecting named entity tags from a resource rich language to a resource poor language

Zamin, Norshuhani and Oxley, Alan and Abu Bakar, Zainab (2012) Projecting named entity tags from a resource rich language to a resource poor language. Journal of Information and Communication Technology, 11. pp. 121-146. ISSN 2180-3862

[thumbnail of J ICT 12 2013 121–146.pdf] PDF
Restricted to Registered users only

Download (781kB) | Request a copy

Abstract

Named Entities (NE) are the prominent entities appearing in textual documents.Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to a pre-defined taxonomy such as person, organization, location, date, time, etc.This article focuses on the person (PER), organization (ORG) and location (LOC) entities for a Malay journalistic corpus of terrorism.A projection algorithm, using the Dice Coefficient function and bigram scoring method with domain-specific rules, is suggested to map the NE information from the English corpus to the Malay corpus of terrorism.The English corpus is the translated version of the Malay corpus.Hence, these two corpora are treated as parallel corpora. The method computes the string similarity between the English words and the list of available lexemes in a pre-built lexicon that approximates the best NE mapping.The algorithm has been effectively evaluated using our own terrorism tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure.An evaluation of the selected open source NER tool for English is also presented.

Item Type: Article
Uncontrolled Keywords: Named entity recognition, information projection, bitext alignment, resource poor language, unsupervised learning, Malay terrorism corpus
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: UNSPECIFIED
Depositing User: Mrs. Norazmilah Yaakub
Date Deposited: 06 May 2018 23:42
Last Modified: 06 May 2018 23:42
URI: https://repo.uum.edu.my/id/eprint/24088

Actions (login required)

View Item View Item