Perfecting Provider Directory AI Modeling - Veda Skip to content
Veda

Perfecting Provider Directory AI Modeling

Q&A with Bob Lindner on why sustainably-fed AI models are the path forward

As an AI company powered by our proprietary data training AI models, the article, “When A.I.’s Output Is a Threat to A.I. Itself,” in the New York Times caught our eye. Illustrating exactly what happens when you make a copy of a copy, the article lays out the problems that arise when AI-created inputs generate AI-created outputs and repeat…and repeat.

Veda focuses on having the right sources and the right training data to solve provider data challenges. A data processing system is only as good as the data it’s trained on; if the training data becomes stale—or, is a copy of a copy—inaccurate outputs will likely result.

We asked Veda’s Chief Science & Technology Officer, Bob Lindner, PhD, for his thoughts on AI-model training, AI inputs, and what happens if you rely too heavily on one source.


Veda doesn’t use payers’ directories as inputs in its AI and data training models. Why not?


At Veda, we use what we call “sustainably-fed models.” This means we use hundreds of thousands of input sources to feed our provider directory models. However, there is one kind of source we don’t use: payer-provided directories.

Provider directories are made by health plans that are spending millions of dollars of effort to make them. By lifting that data directly into Veda’s AI learning model, we would permanently depend on ongoing spending from the payers. 

We aim to build accurate provider directories that allow the payers to stop expensive administrative efforts. A system that depends on payer-collected data isn’t useful in the long term as that data will go away.

ai regulation questions

What happens if you ignore this problem and use only one input or AI-created inputs?


The models will begin ingesting data that was generated by models and you will experience quality decay just like the New York Times article describes.


We use sustainably sourced inputs that won’t be contaminated or affected by the model outputs.

Beyond the data integrity problems, if you are using payers’ directories to power directory cleaning for other payers, you are effectively lifting the hard work from payer 1 and using it to help payer 2, potentially running into data sharing agreement problems. This is another risk of cavalier machine learning applications—unauthorized use of the data powering them.

Can you give us an analogy to describe how problematic this really is?


Imagine we make chocolate and we are telling Hershey that they should just sell our chocolate because it’s way better than their own. We tell them, “You could save a lot of money by not making it yourselves anymore.”

However, we make our chocolate by buying a ton of Hershey’s chocolate, remelting it with some new ingredients, and casting it into a different shape.

In the beginning, everything is fine. Hershey loves the new bar and they’re saving money because we’re doing the manufacturing. Eventually, they turn off their own production. Now, with the production turned off, we can’t make our chocolate either. The model falls apart and in the end, no one has any chocolate. A real recipe for disaster.

RELATED BLOGS

THE TOP CHOICE FOR HEALTH CARE

We're the authority on health plan data. See how Veda's Smart Automation platform can work for your health plan. Improve data accuracy, reduce workloads, and enhance the member experience. But first, start with a free demo.