mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

Kamaruddin, Siti Sakira and Hamdan, Abdul Razak and Abu Bakar, Azuraliza and Mat Nor, Fauzias (2012) Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function. Intelligent Data Analysis, 16 (3). pp. 487-511. ISSN 1088-467X

[thumbnail of ida%2F2012%2F16-3%2FIDA00535.pdf]
Preview
PDF
Available under License Creative Commons Attribution.

Download (571kB) | Preview

Abstract

The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

Item Type: Article
Uncontrolled Keywords: Conceptual graph interchange format; deviation detection text; outliers text mining; deviation based outlier mining method; error tolerance; dissimilarity function.
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: College of Arts and Sciences
Depositing User: Dr. Siti Sakira Kamaruddin
Date Deposited: 02 Sep 2015 01:17
Last Modified: 02 Sep 2015 01:17
URI: https://repo.uum.edu.my/id/eprint/15356

Actions (login required)

View Item View Item