mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques

Khumsong, Ployphailin and Chumwatana, Todsanai and Augsirikul, Supanit (2016) Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques. In: Knowledge Management International Conference (KMICe) 2016, 29 – 30 August 2016, Chiang Mai, Thailand.

[thumbnail of KMICe2016 373 378.pdf]
Preview
PDF
Download (912kB) | Preview

Abstract

Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails normally contain advertise products or services contents and also conveys viruses, malwares, spywares and so forth.Many people thought spam mails do not cause any damage. In fact, the spam mails made a management cost increased and resources will be used ineffectively.Therefore, verifying and filtering spam mails need to be taken into consideration. The objective of this paper is to introduce the hybrid approach, which combines three techniques including stop-word removal, n-gram extraction and data classification, for filtering spam emails and simplifies system development.The proposed hybrid approach can be widely applied for all different languages due to being language independent technique. To examine the approach, CSDMC2010 spam mail corpus comprising of 198 common emails, 202 spam mails, and 10 selective emails were used in experimental study.The results showed that the proposed technique enabled to monitor whether the email is spam with 93.2% accuracy.Hence, this hybrid approach could provide benefits for all users and organization to decrease the computer risk.

Item Type: Conference or Workshop Item (Paper)
Additional Information: ISBN: 978-967-0910-19-2 Organized by: College of Arts and Sciences, Universiti Utara Malaysia
Uncontrolled Keywords: Spam mail, N-gram extraction, Classification, Stop-word removal, Non-segmented languages
Subjects: Q Science > QA Mathematics
Divisions: School of Computing
Depositing User: Mrs. Norazmilah Yaakub
Date Deposited: 30 Nov 2016 08:17
Last Modified: 30 Nov 2016 08:17
URI: https://repo.uum.edu.my/id/eprint/20125

Actions (login required)

View Item View Item