UUM Repository | Universiti Utara Malaysian Institutional Repository
FAQs | Feedback | Search Tips | Sitemap

Dissimilarity algorithm on conceptual graphs to mine text outliers


Kamaruddin, Siti Sakira and Hamdan, Abdul Razak and Abu Bakar, Azuraliza and Mat Nor, Fauzias (2009) Dissimilarity algorithm on conceptual graphs to mine text outliers. In: 2nd Conference on Data Mining and Optimization, 2009 (DMO '09), 27-28 October 2009, Selangor, Malaysia.

[img] PDF
Restricted to Repository staff only

Download (157kB) | Request a copy

Abstract

The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents.As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining.In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs.In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF).In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation.We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators.For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work.Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function.Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Print ISBN: 978-1-4244-4944-6
Uncontrolled Keywords: Conceptual graphs, outlier detection, text outliers, dissimilarity algorithm, text mining.
Subjects: Q Science > QA Mathematics
Divisions: College of Arts and Sciences
Depositing User: Dr. Siti Sakira Kamaruddin
Date Deposited: 01 Sep 2015 08:56
Last Modified: 01 Sep 2015 08:56
URI: http://repo.uum.edu.my/id/eprint/15346

Actions (login required)

View Item View Item