Data Preprocessing – Normalization - Real time example
In this article, we are going to see normalization in action in a popular web application. People who are not familiar with normalization please refer to my previous post.
We all know very well the capability of Google to exploit the available technology and give innovative products to us. Google insights for search is one such great product from Google. This application's concept is almost completely based on the normalization concepts. Let us see what this application allows us to do, suppose if I want to find who is a more popular tennis player in the year 2009. Serena Williams or Venus Williams? Insight allows to me to find the answer for this question based on the web traffic ( News articles, searches ) for these two keywords.
Further to introduction, in this article I am going to discuss “Data Preprocessing” an important step in the knowledge discovery process, can be even considered as a fundamental building block of data mining. People who come from data warehousing background may already be familiar with the term ETL ( Stands for Extraction,Transformation and Loading). Any data mining or data warehousing effort's success is dependent on how good the ETL is performed. DP ( I am going to refer Data preprocessing as DP henceforth) is a part of ETL, its nothing but transforming the data. To be more precise modifying the source data in to a different format which
(i) enables data mining algorithms to be applied easily
(ii) improves the effectiveness and the performance of the mining algorithms
(iii) represents the data in easily understandable format for both humans and machines
(iv) supports faster data retrieval from databases
(v) makes the data suitable for a specific analysis to be performed.