Exploring Canonical Data Model for Text Clustering (S/O 12828)

Kamaruddin, Siti Sakira and Yusof, Yuhanis Exploring Canonical Data Model for Text Clustering (S/O 12828). Project Report. UUM. (Submitted)

PDF - Submitted Version
Restricted to Registered users only
Download (904kB) | Request a copy

Abstract

The abundance of text data have been witnessed with the growth of web and other text repositories. There is an important need to provide improved mechanism to effectively represent and retrieve text data. This paper advocates the construction of canonical data models for mapping contents of multi documents into a few general models that can represent the corpus. However to construct canonical data model for text, it involves non-trivial text mining techniques prior to the actual construction process. Furthermore constructing canonical data models for all terms in a set of documents will be costly and will not reduce the sparsity problem that are associated with text document processing. In order to solve this problem we propose a two tier dimensionality reduction step adopting commonly used feature extraction and feature selection methods. The reduced features are then used to construct a canonical data model. A canonical data model for text documents can be used as a general model that has potential to act as a reference model for text comparison in a wide variety of text mining tasks such as text clustering, text classification, text summarization and text deviation detection. Experimental result reveals that the proposed approach produces better results compared to methods without canonical data model

Item Type:	Monograph (Project Report)
Additional Information:	GERAN ERGS
Subjects:	T Technology > T Technology (General)
Divisions:	Research and Innovation Management Centre (RIMC)
Depositing User:	Mdm. Sarkina Mat Saad @ Shaari
Date Deposited:	18 Nov 2024 08:46
Last Modified:	18 Nov 2024 08:46
URI:	https://repo.uum.edu.my/id/eprint/31505

Actions (login required)

View Item