Jean-Patrick Tsang, PhD & MBA (INSEAD)
Tel: (847)920-1000
Email: bayser@bayser.com

Igor Rudychev, PhD
Tel: (847) 679-8278
Email: igor@bayser.com

Fuzzy Matching and Data Cleansing

You certainly do not look forward to manually match two lists of physicians or two lists of facilities that don't have a common ID. While the eye can easily tell "Dr. Jean-Patrick Tsang" and "J. P. Tsang, PhD" are the same person, "4709 Golf Road" and "4709 East Golf, Suite 803", are the same street address, or that "Skockie, Ilinnois 60076" has a few typos and should be spelled "Skokie, Illinois 60076", the eye gets strained well before a thousand matches. The other problem with the eye is it may not catch the fact that 60066 is a bogus zip code or that the only valid zip codes for Skokie are 60076 or 60077.

At Bayser, we developed a Fuzzy Matcher tool based on Artificial Intelligence principles. The Fuzzy Matcher uses fuzzy logic and abstraction techniques on one hand and a host of dictionaries pertaining to people's names, facility names, states, cities, zip codes, area codes on the other, to automate the matching process. The Fuzzy Matcher matches databases that don't have common ids, spots errors, suggests fixes, and eliminates fuzzy duplicates.

If you are consolidating multiple data sources to set up a target list, to assess the physician/facility overlap, or for whatever reason, you now have a time- and cost-efficient alternative: the Fuzzy Matcher. Not only will you save time and money, the results will be far more accurate. You don't even have to learn how to use our Fuzzy Matcher - just email your files to us and you'll get the results back to you by e-mail along with a cleansing/matching report.

Whether you need to compute the ROI of an opt-in DTC campaign, deliver accurate physician data to your reps through the SFA system, or update your physician segmentation/targeting analysis to increase market share penetration or curb erosion, it all boils down to the same thing. You need good data. Not stale. Not spotty. Not flaky. Simply, good data! As a matter of fact, how can you expect to elicit relevant and actionable insights otherwise?

Bayser can help you in many ways. Here are some examples.

  1. Provide up-to-date physician profile databases: address, specialty, phone number, hospital affiliation, data agent id’s, etc. 

  2. Build bridge files across several id’s (e.g. ME#, DEA, License Id, UPIN, etc.). This may also involve identifying prescribers (MD’s, DO’s, NP’s, PA’s, NM’s, Pod’s, etc.) that for some reason have missing id’s.

  3. Third, scrub and match physician files.  This can be done on an ongoing basis.

Over the years, we have become experts at identifying and assessing data sources in addition to performing analyses. Our inspiration: great chefs are experts not only at cooking but also in the ingredients that go in the dishes they prepare. We have in our tool chest a slew of databases and pointers to great data sources that help us address a multitude of issues for our clients. The newest toy in the chest is a cool Fuzzy Matcher that we developed leveraging AI. We just wrote an article on the subject that appeared in October 2002 issue of the Journal of Data Warehousing. If you are interested, we’ll be glad to send you an e-copy.  In a nutshell, the Fuzzy Matcher allows us to match with great accuracy a large numbers of physician records in a short amount of time.

Some people say God is in the details. Others say it is the devil. What’s sure, it’s hell to work your way up from poor or bad  data. Give us a call at (847) 679-8400 or email us at bayser@bayser.com for a free consultation. We’d love to hear from you. 

ABOUT BAYSER | CONTACT | SITE MAP