mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

Framework for a semantic data transformation in solving data quality issues in big data

Onyeabor, Grace and Ta'a, Azman (2017) Framework for a semantic data transformation in solving data quality issues in big data. In: Sintok International Conference on Social Science and Management (SICONSEM 2017), 5 December 2017, Adya Hotel, Langkawi Island, Kedah, Malaysia.

[thumbnail of SICONSEM 2017 19 21.pdf] PDF
Restricted to Registered users only

Download (231kB) | Request a copy

Abstract

Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In addition of external data, particularly RDF data, increase some challenges for data transformation when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data.

Item Type: Conference or Workshop Item (Paper)
Additional Information: eISBN 978-967-2064-65-7 Organized by: UUM PRESS Universiti Utara Malaysia
Uncontrolled Keywords: Semantic Transformation, Data Quality, Big Data, ETL
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Computing
Depositing User: Mrs. Norazmilah Yaakub
Date Deposited: 30 Jul 2018 01:06
Last Modified: 03 Nov 2020 07:36
URI: https://repo.uum.edu.my/id/eprint/24488

Actions (login required)

View Item View Item