A comparative study between rough and decision tree classifiers

Mohamad Mohsin, Mohamad Farhan (2008) A comparative study between rough and decision tree classifiers. Project Report. Universiti Utara Malaysia. (Unpublished)

	PDF Restricted to Registered users only Download (1MB)
Preview	PDF Download (792kB) \| Preview

Official URL: http://lintas.uum.edu.my:8080/elmu/index.jsp?modul...

Abstract

Rule-based classification system (RBC) has been widely used in many real world applications because of the easy interpretability of rules.RBC mines a collection of rule via knowledge which is hidden in dataset in order to accurately map new cases to the decision class.In the real world, the number of attribute of dataset could be very large due the capability of database technology to store much information.Following that, the large dataset may contain thousands of relationship and it will likely provide more knowledge since the interrelationship between data will give more description.Furthermore, it is also have the possibility to have most number of rules that contain unnecessary rule or redundancies in the model. Theoretically, a good set of knowledge should provide good accuracy when dealing with new cases.Besides accuracy, a good rule set must also has a minimum number of rules and each rule should be short as possible.It is often that a rule set contains smaller quantity of rules but they usually have more conditions.An ideal model should be able to produces fewer, shorter rule and classify new data with good accuracy.Consequently, the quality and compact knowledge will contribute manager with a good decision model.Because of that, the search for appropriate data mining approach which can provide quality knowledge is important.Rough classifier (RC) and decision tree classifier (DTC) are categorized as RBC.The purpose of this study is to investigate the capability of RC and DTC in generating quality knowledge which leads to the good accuracy.To achieve that, both classifiers are compared based on four measurements that are accuracy of the classification, the number of rule, the length of rule, and the coverage of rule.Five dataset from UCI Machine Learning namely United States Congressional Voting Records, Credit Approval, Wisconsin Diagnostic Breast Cancer, Pima Indians Diabetes Database, and Vehicle Silhouettes are chosen as data experiment.All datasets were mined using RC toolkit namely ROSETTA while C4.5 algorithm in WEKA application was chosen as DTC rule generator.The experimental results indicated that both classifiers produced good classification result and had generated quality rule in different types of model – higher accuracy, fewer rule, shorter rule, and higher coverage.In term of accuracy, RC obtained higher accuracy in average while DTC significantly generated lower number of rule than RC.In term of rule length, RC produced compact and shorter rule than DTC and the length is not significantly different.Meanwhile, RC has better coverage than DTC.Final conclusion can be decided as follows “If the user interested at a variety of rule pattern with a good accuracy and the number of rule is not important, RC is the best solution whereas if the user looks for fewer nr, DTC might be the best choice”

Item Type:	Monograph (Project Report)
Additional Information:	Kod S/O: 11723
Subjects:	H Social Sciences >
Divisions:	College of Arts and Sciences
Depositing User:	Mrs. Norazmilah Yaakub
Date Deposited:	19 Jun 2013 04:39
Last Modified:	08 Jul 2014 01:16
URI:	https://repo.uum.edu.my/id/eprint/7807

Actions (login required)

View Item