Data Ingestion Guide
Learn how 1upHealth ingests source system data and transforms it into FHIR.

This page describes 1UpHealth's data expectations when ingesting data from external systems and transforming to FHIR.

Every file must be associated with a Data Dictionary. The Data Dictionary should describe all of the columns included in every file. It must exactly match the input file layout and include information about each input field, including data type. We cannot begin FHIR mapping until we have a Data Dictionary.
Note: If a file has been generated and validated according to a DIMA file extract profile, a data dictionary is not required.
Example:
Filename
Column Name
Column Number
Data Type (string, date, integer, float)
Description
Original Data Source (optional)
file1.csv
first_name
1
string
the first name of the individual
internal sql database xyz
file1.csv
last_name
2
string
the last name of the individual
internal sql database xyz
file1.csv
member_id
3
integer
the id of the member / patient
internal sql database xyz
file2.csv
eob_id
1
string
the internal ID of the EOB
vendor X's flat file extracts
file2.csv
eob_date
2
date
the paid date of the EOB
vendor Y's flat file extracts
The Data Dictionary should also document decisions on special situations, nested arrays, etc. See the Appendix for another example.
Any changes to the input files must be reflected in an updated Data Dictionary.

All data extracts MUST include a unique identifier per row. Ideally, this identifier should be a globally unique UUID/GUID (universally unique identifier) or equivalent.
Patient demographic data intended to be used to generate CARIN-BB Patient resources MUST include a globally unique identifier which is unique to the individual (heartbeat).
Example: Consider a scenario where a patient has enrolled in two health plans. To represent this scenario using FHIR, you would need to create three FHIR resources - one Patient resource, and two Coverage resources. The Patient resource would hold demographic related to the individual (i.e. first name, last name, address, DOB etc) information, and the Coverage resources would hold data related to the two separate plans the person had enrolled in over the course of their lifetime (enrollment dates, coverage type etc). When delivered in data extracts, a patient file would need to be delivered containing an identifier unique to the individual. A Coverage extract would need to be delivered with a link back to the patient using the same unique identifier.

Our preferred file format are flat, comma separated or pipe-delimited CSV files or flat NDJSON (newline delimited JSON).
All files must contain column headers.
Please do NOT provide EDI or Excel formats.
Please ensure all special characters are escaped with backslashes i.e. Robert \"Bob\" Jones.
Please do not quote enclose values.
Please compress files. We can accept zip and tar.gz file formats

Our preferred naming convention is: <FHIR resource name>_<FHIR extension name if applicable><date-time stamp>_Full/Update.csv
Example:

Production files must include PROD in the name while test/staging files must include STAGE DEV or TEST in the filename.
Files which contain real patient PHI data combined with synthetic data are not ingest-able into any environment.

If a file is delivered in CSV or pipe-delimited format, every file MUST include a header row with column names. Column names cannot have special characters or spaces.

Nested arrays are common in FHIR, e.g., one member with multiple addresses. You have options for how to handle these situations in the files you send to 1upHealth.

If there are a known number of data elements for a specific field such as an address, these may be added as a static number of additional columns.
Option 2: Create an NDJSON file extract where the 1-many relationships are represented using an array
For example, consider a claim: a claim from a single encounter can have a none, a few, or in rare cases, hundreds of associated procedures, diagnoses, and other line items.
When represented in an entirely denormalized way, these data may be structured as a JSON object, with each 1-Many relationship listed in JSON arrays on the parent object.
While an NDJSON file would be a series of JSON objects delimited by a newline, an example expanded object is shown below of what a claim in this structure might look like.
{
"claim_identifier": 000991242,
"patient_id": "M12345",
"billable_period_start_date": "1900-01-01",
"billable_period_end_date": "1900-02-01",
"diagnoses": [
{
"icd_code": "Q61",
"type": "admitting"
},
{
"icd_code": "S37.0",
"type": "principal"
}
...etc
],
"line_items": [
{
"cpt_code": "99232",
"revenue_code_nubc: "0301",
"service_period_start": "1900-01-01",
"detail_line_deductible_amount": 24.67,
"detail_line_noncovered_amount": 104.20,
...etc
},
{
"cpt_code": "99231",
"revenue_code_nubc: "0305",
"service_period_start": "1900-01-01",
"detail_line_deductible_amount": 65.43,
"detail_line_noncovered_amount": 199.43,
...etc
}
]
}

In this case, two files may be used, a parent file and a child file. The parent file should contain the encapsulating record and all non-nested-array fields. The child table should contain all related fields with an ID linking back to the parent table. Example:
If a member may be associated many different addresses, you may send a member file, and an address file. The member file should have one row per member. The address table may have multiple rows each with an id pointing back to a corresponding member in the member table.
Your approach to nested arrays should be documented in the Data Dictionary. If your data contains more than two layers of nesting, please contact your Solution Architect.

By default, data which is to be deleted can be sent alongside the data Patient extract which should contain a key/column with a boolean value (true or false) indicating whether a record needs to be removed.
By default, when a patient record is flagged for deletion, all records linked to that patient will be removed as well. This includes all Coverage, ExplanationOfBenefit, and Clinical records in addition to others.
If individual resources associated with a particular patient need to be deleted, please reach out to your solution architect.

All data extracts must contain either wholly synthetic data, or production data. 1Up is unable to ingest any data containing both synthetic and production data into any environment.
1Up is also unable to ingest full data extracts into non-production environments. If you have questions or concerns regarding performance, please contact your solution architect.
Copy link
On this page
Overview
Data Dictionary
Data Identification
File Format Guidelines
File Names
Headers
Nested Data
Deleting Existing Records
Synthetic vs Production (PHI) Data