Dr. Bob Lindner is the Chief Science and Technology Officer at Veda, a company addressing provider directory data challenges.
It’s no surprise to anyone who works with data—it’s messy. In every industry and every business, there are data anomalies and issues that can impact the story data tells. If we have any hope of improving data practices and making collected data truly actionable, we first have to acknowledge its limitations and then explore modern solutions for improving it.
Bad Data Is The Norm
With the new federal administration exploring cost-cutting measures and releasing data nearly daily, a specific example caught my eye—it was a Social Security disbursements by age graph, with the data suggesting 210 year olds are receiving Social Security entitlements. As a data scientist who has been working with healthcare data for over 10 years, this graph wasn’t shocking to me.
I recently saw one dermatologist who was practicing at 20 different variations of one address; imagine the extra legwork required by a patient to find out where you are booking an appointment. Or how about two providers with the exact same name but one is a veterinarian on the West Coast and the other is a physician in New York? There is state licensing info for both of them, but the only one with a federal National Provider Identifier (NPI) is the veterinarian. These are complex data problems occurring every day.
Data engineers know that a lot of data in every industry is collected manually, and this often introduces errors that are quickly propagated and magnified throughout downstream processes. In fact, most data systems in the modern economy, all around the globe, have shockingly out-of-date practices. With a spotlight on data issues right now, it’s important to dig deeper and examine data processes to have any hope of modernizing databases and making data functional.