Data and Research Core (DRC)
Address research priorities and needs to form an inclusive basis for conducting equity-focused AI/ML research targeting the use of electronic health records
The mission of the AIM-AHEAD Data and Research Core (DRC) is to broaden the diversity and representation of healthcare data, including electronic health record (EHR) data, in AI/ML and expand its availability to diverse teams of researchers to address health disparities.
The DRC is designed to support the development of a workforce capable of conducting high-quality AI/ML for health equity-focused research, with an emphasis on supporting persons from groups underrepresented in biomedical research (UBR) and/or building capacity at Minority-Serving Institutions (MSIs) to work with healthcare data, starting with electronic health record (EHR) and relevant linked social and environmental determinants of health (SEDOH) data and expanding to data from imaging, genomics, wearable devices and text such as clinical notes.
The DRC is not a single database. Instead, AIM-AHEAD seeks to catalyze an ecosystem of datasets to help address the lack of population diversity in data used in most AI/ML models and applications.
AIM-AHEAD Data Partners
AIM-AHEAD-funded projects may apply to receive facilitated access and data concierge services from AIM-AHEAD data partners that emphasize historically under-resourced and under-represented populations:
From OCHIN OCHIN, a nonprofit health care innovation center with a core mission to advance health equity, operates the most comprehensive database on primary healthcare and outcomes of safety net patients in the U.S., connected by OCHIN’s Epic EHR system, representing 6 million patients from 170 health systems and 1,600 clinic sites across 33 states.
From MedStar Health The MedStar Health System and the MedStar Health Research Institute (MHRI) include an extensive network of clinical facilities in the mid-Atlantic region, including 10 hospitals and over 200 affiliated facilities connected by MedStar’s Cerner EHR system, representing 5 million unique patients, approximately 31% of whom are African American.
The DRC and Infrastructure Core also collaborate to assist AIM-AHEAD awardees in locating other data sources to support their projects. As part of its mission to diversify datasets used in AI/ML, AIM-AHEAD has conducted a landscape survey to raise awareness about datasets that may be of interest to the research community. Each dataset has its own governance process and rules for access.
How AIM-AHEAD Data Partners Expand Representation
![]() |
![]() |
||
Race |
People who select a single race other than White, or who select more than one race |
2,620,875 |
2,467,865 |
Ethnicity |
People who select an ethnicity other than those listed under the race of White |
2,224,7561 |
2,233,438 |
Age |
<18 years old and 65 years and above |
1,337,789 |
1,424,082 |
Sexual and Gender Minority |
|
249,9782 |
Not well-captured |
Income |
Annual household income < $25,000 |
4,129,925 |
Not well-captured |
Education |
People without a high school diploma or GED |
Not well-captured but FQHCs generally higher than general population |
2,560 |
Access to Care |
Needed a medical visit in the past 12 months but cannot readily use the health care system or pay for needed care |
Not well-captured but FQHCs generally higher than general population |
Not well-captured |
Geography |
Residents of established rural and non-metropolitan zip codes, based on the HRSA Federal Office of Rural Health Policy data files |
1,072,666 |
36,3453 |
Disability |
People with a physical, functional, cognitive, or other condition that substantially limits one or more life activities |
632,5744 |
266,2264 |
Source: All of Us reference UBR categories
1People with Hispanic ethnicity and any race
2People with ‘other’ sex or non-straight sexual orientation or non-male, non-female gender identity
3Based on rural and suburban hospital discharges
4Based on ICD codes for disability in study by Clark et al , including physical, visual, hearing, intellectual/developmental disabilities
Other Dataset Options for AIM-AHEAD-funded Projects
Data Set |
Brief Description |
Data Allowed |
Size |
Analysis Platform Tools |
Selected large-scale cohorts related to heart, lung, blood and sleep disorders. Includes both prospective clinical studies and associated genomic TOPMED data. |
De-identified dataset. Including individual level genomic (TOPMED full genomes) and clinical datasets. |
List of studies : 60+ studies are available to choose from |
NHLBI BioData Catalyst PIC-SURE and Seven Bridges Platforms |
|
A variety of datasets available including clinical and genomic data |
Public data, and controlled access data (depends on dataset) |
|||
The All of Us Research Program is building one of the largest biomedical data resources of its kind. |
The All of Us Research Hub stores health data from a diverse group of participants from across the United States. |
Additional descriptions electronic health records, 444,000+ biosamples |
DRC Leadership









