Frustrated with multiple records of the same customer which just differ due to a typo or abbreviation or different possible representations of same address?
Customer duplicate records could be very tricky. They suffer the problems such as abbreviating the address, typos and various possible representation of same address and name.
Say for eg., both these addresses refer to the same place
- John Street 23
- John st. 23
similarly, in the below example both refer to the same person, but there is a typo and also an abbreviation which stops computers from easily identify that they are infact the same person.
- Alphan Majar
- Alp. Major
Even with powerful computers, it is difficult to identify these duplicates. we have developed a simple tool to address this problem.
Try Deduper !!
Deduper is a simple command line tool to merge duplicates in customer records. It works based on advanced string matching techniques and clustering. This technique is called blocked nearest neighbor clustering and this general technique is further optimized in this tool for the problem of customer merging.
Deduper is a wrapper on the simile-vinco library . An open source tool called Google Refine uses this library and how this clustering works can be read in more detail from this page.
Give it a try, we will be happy to hear from you to know how it helped you.
Deduper can be downloaded from the link: http://sourceforge.net/projects/deduper/