Supervised vs. Unsupervised Machine Learning: Using The Right Tool For The Job
It’s Veda’s philosophy that any new technology tools we utilize are not meant to wholly replace human engagement. We believe that technology should help elevate humanity. With a focus on performing meaningful work, people can achieve their highest value. Technology should help people help people.
As a Data Science as a Service (DSaaS) company, Veda leverages scientific principles within our unique AI and machine learning systems to perpetually clean, correct, and monitor evolving provider and facility data. A lot of companies claim to offer accurate provider data, but none are as committed to using science to solve deeply entrenched provider data problems.
At Veda, we use AI systems including natural language processing (NLP), supervised, and unsupervised learning components that can be leveraged to solve a wide array of payer data challenges depending on what tool is right for the job. No matter what, we start by understanding the problem—not by applying a method.
The Roles of Supervised and Unsupervised Learning
Veda utilizes supervised learning because it doesn’t require “perfect” sources of data— it can make use of the good parts of any data source and knows how to ignore errors. With supervised learning, a data scientist is watching and helping train a model with all the healthcare nuances and industry-specific language, etc. to make the model a good one.
We’ve measured individual sources of data for years—including attestation— and haven’t found any at 90% or above yet. Supervised learning is incredibly accurate with the data we have access to today. The benefit is the highest accuracy which is flexible and not dependent on a single data source.
We also use unsupervised learning for offline data exploration and research to learn more about a dataset, to help design better machine learning features for supervised learning systems. That’s because unsupervised learning separates big collections of data into groups on its own. The benefit of unsupervised learning is its ability to pick out patterns in the data.
Unsupervised learning separates big collections of data into groups on its own. Some will claim that unsupervised learning, alone, is superior to supervised learning because of the lack of human intervention. However, most algorithms require the user to specify upfront how many groups they want the data separated into. So no matter what data is being grouped, one would have to delineate exactly how many groups are wanted ahead of time and the exact number of groups the data is sorted into regardless of whether the groups match up well with the data. This requires the user to have upfront knowledge of exactly what labels and groups they need.
But the biggest pitfall of unsupervised learning is that there’s no labeled training data, which means there’s no actual measurement of how well it’s working and placing the items into the correct groups. And with no way of knowing how well it’s working, it’s impossible to depend on unsupervised learning as a primary method for accuracy.
The Right Tool For the Job
Supervised and unsupervised learning are tools, and just as you wouldn’t remodel your kitchen and only use a saw, you shouldn’t only use one kind of machine learning model.
Veda’s technology and approach to data challenges are fundamentally different from other provider data technology companies in that we focus on fully automating both the static information and the more challenging temporal information about a provider—data that changes at varied rates over time, like practice address, phone, and group affiliation. Our patented systems do not require manual outreach to providers, rather they rely on data created by providers throughout their established workflows. This increases data accuracy by reducing human error while also decreasing provider abrasion. Validating millions of temporal data elements in real-time requires Veda’s full automation system and could not be solved manually.
Above all, we believe that AI and machine learning are the best ways to solve the provider data quality problem because:
- These techniques do not require us to know how accurate our sources of data are ahead of time—the machine figures out how to tell good data from bad
- AI makes the most of imperfect and changing data
- They do not require provider participation—we use data they already create in their day-to-day workflows, so no need to persuade providers to take additional action
- It works—we have scientifically tested attestation along with “source of truth” modeling, and Veda’s approach has the highest measured accuracy of any approach in the industry.
Read more about Veda’s approach to AI and data science with Dr. Bob Lindner’s blog post, Artificial Intelligence, ChatGPT, and the Relationship Between Humans and Machines.