Wednesday, July 15, 2009

Data Preprocessing – Normalization

Further to introduction, in this article I am going to discuss “Data Preprocessing” an important step in the knowledge discovery process, can be even considered as a fundamental building block of data mining. People who come from data warehousing background may already be familiar with the term ETL ( Stands for Extraction,Transformation and Loading). Any data mining or data warehousing effort's success is dependent on how good the ETL is performed. DP ( I am going to refer Data preprocessing as DP henceforth) is a part of ETL, its nothing but transforming the data. To be more precise modifying the source data in to a different format which
(i) enables data mining algorithms to be applied easily
(ii) improves the effectiveness and the performance of the mining algorithms
(iii) represents the data in easily understandable format for both humans and machines
(iv) supports faster data retrieval from databases
(v) makes the data suitable for a specific analysis to be performed.