Data and Research Core (DRC)

Address research priorities and needs to form an inclusive basis for conducting equity-focused AI/ML research targeting the use of electronic health records

The mission of the AIM-AHEAD Data and Research Core (DRC) is to broaden the diversity and representation of healthcare data, including electronic health record (EHR) data, in AI/ML and expand its availability to diverse teams of researchers to address health disparities.


The DRC is designed to support the development of a workforce capable of conducting high-quality AI/ML for health equity-focused research, with an emphasis on supporting persons from groups underrepresented in biomedical research (UBR) and/or building capacity at Minority-Serving Institutions (MSIs) to work with healthcare data, starting with electronic health record (EHR) and relevant linked social and environmental determinants of health (SEDOH) data and expanding to data from imaging, genomics, wearable devices and text such as clinical notes.


The DRC is not a single database. Instead, AIM-AHEAD seeks to catalyze an ecosystem of datasets to help address the lack of population diversity in data used in most AI/ML models and applications.


AIM-AHEAD Data Partners 

AIM-AHEAD-funded projects may apply to receive facilitated access and data concierge services from AIM-AHEAD data partners that emphasize historically under-resourced and under-represented populations: 



AIM-AHEAD Community Health Equity Database
OCHIN, a nonprofit health care innovation center with a core mission to advance health equity, operates the most comprehensive database on primary healthcare and outcomes of safety net patients in the U.S., connected by OCHIN’s Epic EHR system, representing 6 million patients from 170 health systems and 1,600 clinic sites across 33 states. LEARN MORE
AIM-AHEAD Data Bridge
From MedStar Health
The MedStar Health System and the MedStar Health Research Institute (MHRI) include an extensive network of clinical facilities in the mid-Atlantic region, including 10 hospitals and over 200 affiliated facilities connected by MedStar’s Cerner EHR system, representing 5 million unique patients, approximately 31% of whom are African American. LEARN MORE

The DRC and Infrastructure Core also collaborate to assist AIM-AHEAD awardees in locating other data sources to support their projects. As part of its mission to diversify datasets used in AI/ML, AIM-AHEAD has conducted a landscape survey to raise awareness about datasets that may be of interest to the research community. Each dataset has its own governance process and rules for access.



How AIM-AHEAD Data Partners Expand Representation



People who select a single race other than White, or who select more than one race




People who select an ethnicity other than those listed under the race of White 




<18 years old and 65 years and above



Sexual and Gender Minority

  • People who self-report intersex as their sex at birth
  • People who select any sexual orientation choice other than straight 
  • People who select any gender identity choice other than man or woman 


Not well-captured


Annual household income < $25,000


Not well-captured


People without a high school diploma or GED

Not well-captured but FQHCs generally higher than general population


Access to Care

Needed a medical visit in the past 12 months but cannot readily use the health care system or pay for needed care

Not well-captured but FQHCs generally higher than general population

Not well-captured


Residents of established rural and non-metropolitan zip codes, based on the HRSA Federal Office of Rural Health Policy data files




People with a physical, functional, cognitive, or other condition that substantially limits one or more life activities



Source: All of Us reference UBR categories  

1People with Hispanic ethnicity and any race  

2People  with ‘other’ sex or non-straight sexual orientation or non-male, non-female gender identity  

3Based on rural and suburban hospital discharges

4Based on ICD codes for disability in study by Clark et al , including physical, visual, hearing, intellectual/developmental disabilities

Other Dataset Options for AIM-AHEAD-funded Projects 


Data Set

Brief Description

Data Allowed


Analysis Platform Tools

60+ studies from NHLBI BioData Catalyst 

Selected large-scale cohorts related to heart, lung, blood and sleep disorders. Includes both prospective clinical studies and associated genomic TOPMED data. 

De-identified dataset. Including individual level genomic (TOPMED full genomes) and clinical datasets.

Additional description

List of studies : 60+ studies are available to choose from 

NHLBI BioData Catalyst PIC-SURE and Seven Bridges Platforms

Selected 15 Open datasets on AWS 

A variety of datasets available including clinical and genomic data

Public data, and controlled access data (depends on dataset)

Selected 15 Open datasets on AWS 

AIM-AHEAD Service Workbench

NIH All of Us

The All of Us Research Program is building one of the largest biomedical data resources of its kind. 

The All of Us Research Hub stores health data from a diverse group of participants from across the United States.

Additional descriptions

616,000+ participants, 360,000+

electronic health records, 444,000+


All of Us Researcher Workbench

DRC Leadership

Blair Darney
Blair Darney


Nawar Shara
Nawar Shara

MedStar Health Research Institute

Keith Norris
Keith Norris

University of California Los Angeles

Josh Lemieux
Josh Lemieux


Erin Hernandez
Erin Hernandez
Project Director


Megan Hoopes
Megan Hoopes
Data Analyst Lead


Robert Schuff
Robert Schuff
Data Science Lead


Sara Stienecker
Sara Stienecker
Project Manager

MedStar Health Research Institute

Wyatt Bensken
Wyatt Bensken


Scroll to top