
A YOUTUBE SERIES: TRANSFORMING HEALTHCARE DATA
Episode 4: SQL vs NoSQL on FHIR with Don Rucker and Ricky Sahu
Transcription
Dr. Don Rucker:
Welcome. We have a video today, we’re talking about one of the interesting things with using FHIR, which is how do you represent the FHIR data in a database? Because that’s really how FHIR analytics are going to be powered.
This is very interesting for a lot of reasons. One of them is it exposes the historic tension that we’ve had over the last 20 or 30 years, maybe a little bit longer, on data representation.
So do we represent data hierarchically, which was the norm up until relational databases. Do we represent it relationally? Or now in this modern big data world, people are actually doing both. So they’re doing no SQL, which is a hierarchical representation. And SQL, which is a relational representation.
Here today to talk about this is Ricky Sahu, the founder of 1upHealth, who’s thought a lot about how to represent FHIR data. And is going to talk about some of the choices that he’s made in building out the 1up architecture. And just generally thinking about this question.
Ricky, what has been going through your thinking over the years here on this type of data representation?
Ricky Sahu:
Yeah, so obviously five years in now, I’ve been thinking a lot about FHIR and FHIR analytics. And coming off our last conversation on FHIR at scale, this is diving in deeper into the analytics space, into how to represent data in FHIR.
And what we see in the industry is that there’s not just one type of analysis that you want to do, or one type of application that you want to build. Many different organizations want to build many different use cases on top of FHIR.
And those span use cases for one patient at a time, they span population level use cases as well. And FHIR has traditionally been designed for that one patient or one resource type use case.
But what we see is that the broader industry, especially when you’re thinking about payers and providers or pharma, at scale they really want to interact with populations worth of data.
And this becomes relevant when you’re looking at quality measures, it becomes relevant when you’re looking at anything around value based care, or population health. As well as more emerging trends around AI and machine learning.
So our approach, which we discussed during our FHIR at scale talk, is to use many different representations of the data that’s coming in as FHIR. So that it’s optimized for different query access patterns.
And this whole world of SQL versus no SQL, benefited maybe from having data both in SQL based access, as well as no SQL based access.
And where we’ve really landed with 1up is providing the best of both worlds, where you can have data stored at scale using no SQL, but have interfaces into that data using ANSI SQL.
So all the tools and services that most organizations like health plans, providers, pharma, app developers, are used to, when they want to run analytics, group buys, joins, aggregations, which would typically take days or sometimes weeks to run using traditional FHIR APIs, can happen in seconds using SQL interfaces on top of no SQL data.
Dr. Don Rucker:
Okay. So that’s a very interesting modern combination of architectures to really get the best of both worlds. Pretty interesting.
And would you say that the underlying principle is that you get the resources in as atomic raw FHIR? Is that part of it? Or how do you see that?
Ricky Sahu:
Yes. So the benefit of FHIR is that it’s creating this canonical data format, across really all of healthcare, is this one data model to rule them all. And what we are doing is we’re using that structure and feeding it into multiple data representations.
So for analytics specifically, one of the optimal data structures is columnar data. And we use Parquet files for that. So Parquet is this open source columnar data format, which you can feed things like bulk FHIR into, which thanks to you exists and is widely being adopted in the industry.
Dr. Don Rucker:
Can you, just for folks who may not be up to speed on columnar versus row, the classic table architectures, can you just give a little quick intro to the difference between a columnar and a more classic relational table store?
Ricky Sahu:
Yeah. So it really comes down to, and this will be fun for the geeks joining this call.
Dr. Don Rucker:
Yeah.
Ricky Sahu:
This talk. Is it really comes down to the bits on the disk. With columnar data, you’re basically storing bits on the disk together when they’re columns. So all of the first names are going to be stored together. All the last names are going to be stored together. All of the IDs of the patient, those are going to be stored together, et cetera.
With row based architectures, you’re storing one patient resource together, then another patient resource together, then the third patient resource together, et cetera.
And when you’re doing analytics, SQL queries on FHIR data, SQL on FHIR, you are really interacting with these columns across different data, different resource types, or within a resource.
And when the computer or database seeks for that information, because all the bits on the disk are stored together, it can quickly access those and run those calculations. So it’s orders of magnitudes faster than a traditional SQL based database, which would be row based architecture. And that’s really the power that you get from these columnar structures.
Dr. Don Rucker:
So if you are doing a comparison, let’s say you’re a payer, planner, or provider, and you’re doing comparisons, want to rank order. People by their highest blood pressure, or hemoglobin A1C, or creatinine. The ability to compare everyone is vastly better with a columnar store, right? Much faster. That’s the basic gist of it?
Ricky Sahu:
Yeah. If you want to do, “Show me the average A1C across a million patients,” you don’t have to look at a million patients and get all their information together and then average it. You just look at all the A1Cs, and all of that information is together. Which is why, yeah, again, orders of magnitude faster.
Dr. Don Rucker:
One of the things that’s been a big national area of interest for many years and many reasons is quality measurement. How do you compare providers? How do you pay for providers?
How do you see quality comparisons being enhanced or differentiated by choices in database architecture?
Ricky Sahu:
Yeah. So with quality measures, you really are doing these classical analytics, where you want to take all the attributes of some column and interact with it, and do joins and aggregations amongst other columns. So it’s directly benefited from this sort of structure.
And with things like HEDIS, NCQA, Stars, all of that, there are very complicated measures which are touching dozens of attributes in the FHIR resources.
So with traditional, if you were to just interact with this data via FHIR, which is again originally billed for maybe one patient access at a time or one resource interaction at a time, it would literally take days to run these queries. Or hours at minimum.
With the columnar, SQL on FHIR based mechanisms, you can get these results in seconds doing all the calculations of these joins and aggregations.
And one of the ways of exploring quality measures that’s being talked a lot about today is CQL, which allows for a little bit of a higher representation on top of FHIR to run these measures. But many of the current implementations of CQL are built on top of the existing FHIR APIs, which end up making them take, again, hours to compute.
So we’re hoping that the columnar representations as no SQL with a layer of SQL on FHIR, will really, in our testing, it comes to run these measures in seconds. And taking that approach and making it more available to the wider industry, will really benefit from more and faster iteration on these quality measures.
Dr. Don Rucker:
So that’s obviously a fairly cutting edge type of concept to, as report, quality measurement to these modern tools and platforms.
Another area where I think data representation has proven to be absolutely essential to performance is in artificial intelligence applications. Essentially all of the modern commonly used deep learning neural net applications really rely on, and I’m probably butchering the use of the term, regularization of data, to minimize bias. Bias of all types, not just the bias we think of as social determinants of health. But every computational bias.
How do you look at the data representations in making the data more uniform? So if you’re feeding that into layers of a neural network, that you get a more consistent result out at the back end when you run your optimizations.
Ricky Sahu:
Yeah. Well, I’ll give you a quick anecdote here. We’ve started to get more into machine learning and AI algorithms. And our first iteration was to run feature engineering using FHIR data via these FHIR APIs, which obviously we love.
But when we did this feature extraction and feature engineering on a large population, it took about a week to do all this work. And then one of our team members, he built the same pipeline, but on SQL on FHIR using these columnar representations. And that literally reduced it by orders of magnitude into a couple hours. Which is phenomenal, especially when you’re thinking about iterating rapidly on discovering new relationships between this data.
So this sort of technology is not only applicable for the healthcare analytics today. It’s really going to enable the healthcare analytics of the future, which is largely going to be based on AI and machine learning models. And how you quickly and rapidly integrate the data that’s being represented almost universally as FHIR, for these forward looking analytics.
Dr. Don Rucker:
Can you make a comparison of, and this is maybe a false dichotomy, between SQL on FHIR and FHIR on SQL?
Ricky Sahu:
Yeah. So the way that we think about it, and these words do matter, right?
Dr. Don Rucker:
Yeah.
Ricky Sahu:
The reason why what we do is SQL on FHIR, is because we’re natively storing the data as FHIR. That is the canonical representation of all the information that we have at 1up. We are starting to do things with custom resource types, but inherently it’s the same idea.
So now that you have this data stored as FHIR, the primary way, again, that we’re seeing organizations interact with data at scale, is through SQL. They have dozens of BI tools associated with it. They have hundreds of reports and analytics that are driven off of SQL.
So we want to make FHIR data available through SQL. And what we’ve done is we’ve enabled these ANSI SQL interfaces on top of that canonically stored FHIR data.
Some other organizations, what their approach is, FHIR on SQL. Where they store the data as SQL or in Postgres or MySQL databases, which require a bespoke normalized table structure. And then enable FHIR resources to be presented on top of, emerge from that table structure.
And the issue there, or at least from our perspective, is that those traditional SQL databases don’t scale as well as the no SQL databases. So you end up hitting performance issues after a hundred thousand patients or something like that.
And the other issue is that because there is variation in how you structure these tables, like do all the addresses go into a separate table or all the addresses along with patients and providers and organizations, et cetera. Those sort of bespoke decisions have to be made in the FHIR on SQL world.
Whereas in the SQL on FHIR world, it’s represented as FHIR. And everyone knows that, “Hey, this is the way that you would access this attribute in this resource.” There’s no question about how that data is represented in that [inaudible 00:15:29].
Dr. Don Rucker:
So you could presumably anticipate some real performance issues, depending on the nature of your query, if it is dependent on specific choices and table architecture underlying-
Ricky Sahu:
Exactly.
Dr. Don Rucker:
That presumably is axiomatic, right?
Ricky Sahu:
Yeah.
Dr. Don Rucker:
[inaudible 00:15:47] sort of math.
Ricky Sahu:
That and the lack of the columnar format, is definitely the case.
Dr. Don Rucker:
Are there other considerations in the SQL on FHIR versus FHIR and SQL thing that you’ve run across, or other examples in terms of correlating patients, patient matching, analyzing over populations? Are there other things that you’ve run across there?
Ricky Sahu:
Yeah. So this is something that we discussed in our FHIR at scale talk too, which is, we’re not just storing our data as one representation. There’s the columnar, SQL on FHIR representation. There’s also an inverted index. There’s also the key value store.
And typically what we see in other solutions is there’s just one format. And that limits the level of access that you have to the data and the performance for those various query access patterns.
So with things like patient matching, what’s interesting is that an inverted index is actually more optimized for identifying similar patients at different organizations or within your own data.
So for that sort of use case, we wouldn’t typically use SQL on FHIR. But for other use cases like quality measures or calculating total cost of care, things like that, the SQL on FHIR columnar representations are more beneficial.
Dr. Don Rucker:
So let’s take a totally different use case. You’re an app developer, so you’re not a payer or a provider trying to optimize over an entire population of patients. But you’re an app developer using the FHIR, the JSON resources to power some part of the patient experience in your app. What are some of the database considerations there that you’d want to think about on the same data, potentially?
Ricky Sahu:
Yeah. It really comes down to what you’re building your application for, but I think the biggest consideration to make is that today, building apps in healthcare is no longer just about getting the data from the providers or the payers.
Traditionally, the two people in a garage applications started with, “Hey, I need to get access to this patient’s data from this provider.”
But as you’re thinking about how you interact with that data, what’s really important is how the data is stored and the fundamental structure of that information.
So today you might be saying, “Okay, I have an app that’s meant for one patient at a time. It’s showing up on their phone. And the FHIR APIs are sufficient.”
But as you get into more enterprise focused use cases, these payers or providers or pharma organizations that you’re ultimately selling to, even the patients, to benefit them in aggregate, you’ll want to analyze the data between these patients or among the patient population.
And for those use cases, you will want this SQL on FHIR, this columnar data store. And you’re not going to want to have to build that all yourself, because it is a lot of work.
Dr. Don Rucker:
So obviously the individual APIs are under the Cures Act, well, under HIPAA’s right of access. So the OAuth 2.0, patient, if you will, mediated. So patient controlled access to data and their electronic medical records.
We’ve both been involved in various ways in the rollout of the bulk FHIR standard, which is really ultimately either within providers or between payers and providers under HIPAA’s treatment payment operations, privacy provisions. So those are signed contracts, they’re batch mode, as opposed to real time API. Totally different considerations, both from the legal point of view, who the two parties are using that data, the HIPAA rights to it. How does that play into the choice of architecture, if you want to do bulk on FHIR? Which for folks, depending on when you watch this video, that both the individual right of access and the bulk FHIR APIs to electronic medical records that are certified by ONC, will be available December 31st, 2022. So under the Cures Act rule making. But how do you look at bulk FHIR and native data representations?
Ricky Sahu:
Yeah. Well, first of all, thank you, doc, for making that possible. For those of you that don’t know, Don was the driving force behind this bulk data initiative.
And really it’s brought FHIR into the enterprise world in a much more meaningful way than what was possible before through just the APIs.
And what we’re seeing with bulk FHIR, which is effectively all the resources stored for any single resource type being sent to some other party, whether it’s a payer sending it to a provider or these new regs, providers being able to send to other organizations, whoever they may be. Other providers, payers, apps.
Dr. Don Rucker:
Or maybe even payers to payer, which is another allowed use case under… Actually treatment payment and operations simultaneously.
Ricky Sahu:
Yep, exactly. So what that is doing is it’s enabling interoperability at a much higher scale than just patient access.
And what we are seeing is that, for example with payer to payer, yes, patients can authorize access to their data. But with populations at scale that might be transitioning between one plan and another plan, there are direct incentives that health plans have may have for things like Medicare Advantage, where you are paid based off of knowing the current condition set of a patient.
And if you are transitioning from care, from one health plan to another, without knowing that, you are going to have fewer ICDs documented and have a lower HCC score and have a lower reimbursement.
So with things like bulk FHIR, what that’s enabling is providers to send their data to payers, payers to get that data from other payers. And fill out the picture at scale without having to ask every single patient for their medical record from a former plan or provider.
And it’s really speeding up, increasing the liquidity of data in healthcare, which is going to lead to better patient care, better reimbursements for plans, and better reimbursements for providers that are in value based care contracts.
Dr. Don Rucker:
Right. So you’re going to essentially be able to close gaps in care at scale.
Ricky Sahu:
Yeah. And one of these organizations that really have spearheaded bulk FHIRs, CMS with Medicare itself, the BCDA Medicare program has-
Dr. Don Rucker:
So that’s the claims data for, let’s say ACOs, for example.
Ricky Sahu:
Yeah. So all the claims data that an ACO has available to them from Medicare can be sent from Medicare to that ACO organization using bulk FHIR. That’s live today, we have customers working with that today.
And what that does is, it does a few things. One is downstream, all this population level data cannot of course be fed into the SQL on FHIR interfaces.
And two is that, because this is built on your technology from Medicare, it is coming in much more real time. Think a couple weeks, rather than 12 weeks delayed, from when the data service was.
And with ACOs, they’re also benefiting from more timely access to data because some of these measures are based on how quickly can you react or interact with the patient.
Dr. Don Rucker:
Yeah, so if somebody’s been discharged and if you find that out sooner, you can get them to their primary care doc or do an intervention, as opposed to find out a bit on the next admission to the hospital or the next ER visit. [inaudible 00:24:55] was an ER doc, people should go to the ER on a regular basis, no just kidding.
So we’ve had, I think, exposed a lot of things. Any concluding thoughts on just thinking about data representations and FHIR and how our audience might think about that as they go about their work?
Ricky Sahu:
Yeah, fundamentally I think FHIR and the data storage of FHIR is extremely fundamental. And it will increasingly drive both the technical performance and the performance from a medical and healthcare perspective, of organizations. And many organizations today, health plans, providers, even app developers-
Dr. Don Rucker:
As claims and clinical come together, right? Because that’s what we’re underlying talking about here as well, though we haven’t actually mentioned it today.
Ricky Sahu:
Yeah, yeah, exactly. Both of those datasets coming together in this structure is going to drive a lot of downstream applications.
Many organizations today think about it as, “Hey, I just need to check a box and get my data in FHIR and make these APIs available. And after that I’m done.”
But what we’re increasingly seeing is that organizations are thinking of a FHIR first strategy, where all downstream applications and analytics are being built on top of FHIR data.
And when that’s the case, really the future of your data strategy rests on this fundamental choice that you’re making of where your data sits, what that platform is, how you’re connected to the outside world.
And it is critical to make the decision where it’s as performant as it can be. Because these downstream effects will compound on one another. And a gap in what you choose today is going to lead to a lack of ability to express analytics or use cases in the future.
Dr. Don Rucker:
Great. Well, hopefully we’ve given folks a way to think about really a very modern question that everybody who is dealing with big data has to face, which is how do you represent that data? How do you have as much flexibility on use cases, business models, as you can maintain, yet leverage this new sea of information? So thanks for watching.
Ricky Sahu:
Thank you.
In this episode, Dr. Rucker sits in conversation again with Ricky Sahu, Founder of 1upHealth, to discuss the importance of SQL on FHIR®. Together they help viewers understand FHIR data resources, the difference between data architectures, and more!
At 1upHealth, we’re the healthcare industry’s most complete FHIR® data platform. Along with our managed platform, best-in-class tools, and FHIR® APIs, we provide scalable, serverless, and secure solutions to support your business programs, workflows, and analytics.