Fuzzy Matching and Data Cleansing

You certainly do not look forward to manually match two lists of physicians or two lists of facilities that don't have a common ID. While the eye can easily tell "Dr. Jean-Patrick Tsang" and "J. P. Tsang, PhD" are the same person, "4709 Golf Road" and "4709 East Golf, Suite 803", are the same street address, or that "Skockie, Ilinnois 60076" has a few typos and should be spelled "Skokie, Illinois 60076", the eye gets strained well before a thousand matches. The other problem with the eye is it may not catch the fact that 60066 is a bogus zip code or that the only valid zip codes for Skokie are 60076 or 60077.

At Bayser, we developed a Fuzzy Matcher tool based on Artificial Intelligence principles. The Fuzzy Matcher uses fuzzy logic and abstraction techniques on one hand and a host of dictionaries pertaining to people's names, facility names, states, cities, zip codes, area codes on the other, to automate the matching process. The Fuzzy Matcher matches databases that don't have common ids, spots errors, suggests fixes, and eliminates fuzzy duplicates.

Over the years, we have become experts at identifying and assessing data sources in addition to performing analyses. Our inspiration: great chefs are experts not only at cooking but also in the ingredients that go in the dishes they prepare. The newest toy in the chest is a cool Fuzzy Matcher that we developed leveraging AI. We wrote an article on the subject that appeared in October 2002 issue of the Journal of Data Warehousing. If you are interested, we’ll be glad to send you an e-copy.  In a nutshell, the Fuzzy Matcher allows us to match with great accuracy a large numbers of physician records in a short amount of time.

Some people say God is in the details. Others say it is the devil. What's sure, it's hell to work your way up from poor or bad data.