mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

A Syntactic-based Sentence Validation Technique for Malay Text Summarizer

Alias, Suraya and Sainin, Mohd Shamrie and Mohammad, Siti Khaotijah (2021) A Syntactic-based Sentence Validation Technique for Malay Text Summarizer. Journal of Information and Communication Technology, 20 (03). pp. 329-352. ISSN 2180-3862

[thumbnail of JICT 20 03 2021 329-352.pdf]
Preview
PDF - Published Version
Available under License Attribution 4.0 International (CC BY 4.0).

Download (1MB) | Preview

Abstract

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Item Type: Article
Additional Information: Printed by UUM Press
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: College of Arts and Sciences
Depositing User: Mrs Nurin Jazlina Hamid
Date Deposited: 31 Jul 2022 07:57
Last Modified: 17 May 2023 15:08
URI: https://repo.uum.edu.my/id/eprint/28778

Actions (login required)

View Item View Item