mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables

Hamid, Hashibah and P.A.H., Ngu and Mohd Alipiah, Fathilah (2018) New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables. Pertanika Journal of Social Sciences & Humanities, 26 (1). pp. 247-260. ISSN 0128-7702

[thumbnail of PJST  26 1 2018  247 260.pdf] PDF
Restricted to Registered users only

Download (481kB) | Request a copy

Abstract

The issue of classifying objects into groups when measured variables in an experiment are mixed has attracted the attention of statisticians.The Smoothed Location Model (SLM) appears to be a popular classification method to handle data containing both continuous and binary variables simultaneously.However, SLM is infeasible for a large number of binary variables due to the occurrence of numerous empty cells.Therefore, this study aims to construct new SLMs by integrating SLM with two variable extraction techniques, Principal Component Analysis (PCA) and two types of Multiple Correspondence Analysis (MCA) in order to reduce the large number of mixed variables, primarily the binary ones.The performance of the newly constructed models, namely the SLM+PCA+Indicator MCA and SLM+PCA+Burt MCA are examined based on misclassification rate. Results from simulation studies for a sample size of n=60 show that the SLM+PCA+Indicator MCA model provides perfect classification when the sizes of binary variables (b) are 5 and 10. For b=20, the SLM+PCA+Indicator MCA model produces misclassification rates of 0.3833, 0.6667 and 0.3221 for n=60, n=120 and n=180, respectively. Meanwhile, the SLM+PCA+Burt MCA model provides a perfect classification when the sizes of the binary variables are 5, 10, 15 and 20 and yields a small misclassification rate as 0.0167 when b=25. Investigations into real dataset demonstrate that both of the newly constructed models yield low misclassification rates with 0.3066 and 0.2336 respectively, in which the SLM+PCA+Burt MCA model performed the best among all the classification methods compared.The findings reveal that the two new models of SLM integrated with two variable extraction techniques can be good alternative methods for classification purposes in handling mixed variable problems, mainly when dealing with large binary variables.

Item Type: Article
Uncontrolled Keywords: Classification, large mixed variables,multiple correspondence analysis, Principal Component Analysis (PCA), Smoothed Location Model (SLM)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Quantitative Sciences
Depositing User: Mrs. Norazmilah Yaakub
Date Deposited: 18 Jul 2018 05:58
Last Modified: 18 Jul 2018 05:58
URI: https://repo.uum.edu.my/id/eprint/24407

Actions (login required)

View Item View Item