Posts

Find Duplicate Text

Duplicate records cause numerous problems for business and on top of that, it wastes a lot of efforts. For example, if a client wants to find similar records depending on the few columns to eliminate duplicates which can be benefits to reduce processing, quick decision. Generally, fixing duplicate records is a manual process that is both tedious and costly. Unless all the details are identical, it is hard to say whether records are duplicate or not. Typically, most potential duplicates are false positives. Database queries for duplicates will not help to find spelling mistakes, typos, changes few values or rephrasing. This is the case when we need artificial intelligence (AI) to steps in. We can create & train machine learning algorithm using matching score to find duplicate records. Once trained, AI will predict whether or not records are duplicate or not. AI Model can be build/trained based on customer requirement, here I will focus on Python, Amazon Elastic Sear...