AIM-AHEAD: All of Us Training Program
Traineeship in Advanced Data Analysis using the All of Us Clinical Database
The application of artificial intelligence and machine learning (AI/ML) to large datasets is dramatically expanding the capacity for hypothesis testing impacting the biomedical and socioeconomic domains. However, underrepresented communities, particularly those at heightened risk of socioeconomic and health disparities, are not receiving AI/ML’s benefits. Training a diverse workforce of researchers proficient in the application of AI/ML represents an opportunity to address a critical unmet need by extending the benefits of AI/ML to underrepresented, at-risk communities.
The central goal of this training program is to increase researcher diversity in AI/ML by training individuals from diverse backgrounds who are committed to gaining proficiency in AI/ML data analysis and applying their expertise to benefit communities underrepresented in biomedical research.
To accomplish this objective, diverse professionals committed to applying AI/ML to benefit underrepresented communities will complete an intensive 8-month program in advanced data analysis developed by Research Triangle Institute (RTI) and utilizing the resources of the All of Us database and AIM-AHEAD’s data science training core. Completing the training will equip the motivated professional to conduct the in-depth analysis of large datasets essential for cutting-edge biomedical and socioeconomic research.
The AIM-AHEAD consortium (Data Science Training Core and Communications Hub), All of Us and RTI are partnering to offer AIM-AHEAD stakeholders, trainees, mentees, and consortium partners a training opportunity designed to increase researcher diversity in AI/ML by leveraging the All of Us (AOU) data and infrastructure (Researcher Workbench).
The Researcher Workbench is a cloud-based platform where registered researchers can access Registered and Controlled Tier data. Its powerful tools support data analysis and collaboration. Researchers use Workbench to access, store, and analyze data for specific research projects. Researchers can perform high-powered queries and analysis within the All of Us datasets using R or Python via the integrated, cloud-based Jupyter Notebook environment.
Using the AIM-AHEAD Connect Platform, this 8-month training program will engage a diverse group of 25 graduate students, postdocs, early-career faculty and non-academic professionals from under-represented populations. Trainees will use the Dataset Builder to search, extract and organize health information from the All of Us database, and use the Cohort Builder to create, review, and annotate data from All of Us human subject cohorts. Trainees will also receive training and technical assistance related to R, Python, Jupyter Notebook, and model development for All of Us data subsets in the Researcher Workbench. Training will include:
- Merging/validating data across All of Us sources;
- Building a supervised model;
- Splitting data into subsets for model training and testing;
- Considering biases that may be present and detected or missed by the model; and
- Validating the model.
The training, which utilizes All of Us data collected from communities historically underrepresented in biomedical research, is directed particularly toward investigators conducting research at the intersection of AI/ML and health disparities.
Potential research topics that could be examined include, but are not limited to:
- Examining statistical variation in the social determinants of health and intersectionality
- Examining statistical interactions between family health history and lifestyle factors
- Using statistical analysis to identify socioeconomic, environmental, and heritable determinants of clinically significant diseases and syndromes
Having received advanced practical training in coding, model development, hypothesis testing, and data cleaning and analysis, trainees completing this program will be well prepared to harness AI/ML approaches to conduct hypothesis-driven analysis of complex datasets. The trainee will join the community of AI/ML professionals passionately committed to extend the benefits of AI/ML to communities underrepresented in biomedical research.
-- Applications are due by December 6 at 11:59pm EST, 2023.
-- Training through AIM-AHEAD Connect will begin on January 8th, 2024.
1. Applicants must be:
A. U.S. Citizens, Permanent Residents, or Non-Citizen U.S. Nationals
- US Citizen: An individual who is a citizen of the United States by law, birth or naturalization (https://www.law.cornell.edu/definitions/uscode.php?width=840&height=800&iframe=true&def_id=42-USC-630966247-802284531&term_occur=1&term_src=title:42:chapter:99:section:9102)
- Permanent Resident: An immigrant/non-citizen who can legally reside in the United States in perpetuity (https://www.law.cornell.edu/wex/lawful_permanent_resident_(lpr))
- Non-Citizen Natural: A person born in an outlying possession of the United States on or after the date of formal acquisition by the United States (https://www.law.cornell.edu/uscode/text/8/1408)
B. Able to submit Form W-9 (Request for Taxpayer Identification)
C. Affiliated with one of the following entities:
- Higher Education Institutions
- Public/State Controlled Institutions of Higher Education
- Private Institutions of Higher Education
Concordant with the goals of the AIM-AHEAD Coordinating Center, individuals affiliated with the following types of Higher Education Institutions are highly encouraged to apply:
- Hispanic-serving Institution
- Historically Black Colleges and Universities (HBCUs)
- Tribally Controlled Colleges and Universities (TCCUs)
- Alaska Native and Native Hawaiian Serving Institutions
- Asian American Native American Pacific Islander Serving Institutions (AANAPISIs)
- Other Minority Serving Institutions
- Nonprofits Other Than Institutions of Higher Education
- Nonprofits with 501(c)(3) IRS Status (Other than Institutions of Higher Education)
- Nonprofits without 501(c)(3) IRS Status (Other than Institutions of Higher Education)
- For-Profit Organizations
- Small Businesses
- For-Profit Organizations (Other than Small Businesses)
D. From an institution that holds an active Data Use and Registration Agreement (DURA) with All of Us. Confirm DURA.
E. Willing to sign the Data User Code of Conduct (DUCC). This agreement outlines the program’s expectations for researchers who use the Researcher Workbench and describes how program data may be used. View the DUCC.
Applicants must have received at least an undergraduate degree, but can be post-baccalaureate or graduate students, postdoctoral fellows, medical students or residents, allied health trainees, early-career investigators or early-career employees of non-academic institutions as defined in item 1C above. Applicants must hold at a minimum a Bachelor’s degree from an accredited U.S. institution in one of the following or related fields:
- Physical sciences (e.g. chemistry, physics)
- Biological or life sciences (e.g. biology, zoology, biochemistry, microbiology)
- Mathematics or statistics
- Data science
- Health sciences (e.g. pharmacy, psychology, health information technology, nurses, therapists, social workers)
- Public health (epidemiology, biostatistics, health administration, clinical implementation specialists)
The goal of the AIM-AHEAD Coordinating Center is to diversify the research workforce in AI/ML and Health Equity. Consistent with the NIH Interest in Diversity (NOT-OD-20-031: Notice of NIH's Interest in Diversity), the following individuals are highly encouraged to apply for the traineeship:
- Individuals from health disparity populations that have been shown by the National Science Foundation to be underrepresented in health-related sciences on a national basis (see http://www.nsf.gov/statistics/showpub.cfm?TopID=2&SubID=27 and the report Women, Minorities, and Persons with Disabilities in Science and Engineering). The following racial and ethnic groups have been shown to be underrepresented in biomedical research:
- Blacks or African Americans,
- Hispanics or Latinos,
- American Indians or Alaska Natives,
- Native Hawaiians and other Pacific Islanders.
- In addition, it is recognized that underrepresentation can vary from setting to setting; individuals from racial or ethnic groups that can be demonstrated convincingly to be underrepresented by the grantee institution should be encouraged to participate in NIH programs to enhance diversity. For more information on racial and ethnic categories and definitions, see the OMB Revisions to the Standards for Classification of Federal Data on Race and Ethnicity (https://www.govinfo.gov/content/pkg/FR-1997-10-30/html/97-28653.htm).
- Individuals with disabilities, who are defined as those with a physical or mental impairment that substantially limits one or more major life activities, as described in the Americans with Disabilities Act of 1990, as amended. See National Science Foundation data at: https://www.nsf.gov/statistics/2017/nsf17310/static/data/tab7-5.pdf
- Individuals from disadvantaged backgrounds, defined as those who meet two or more of the following criteria:
- Were or currently are homeless, as defined by the McKinney-Vento Homeless Assistance Act (Definition: https://nche.ed.gov/mckinney-vento/);
- Were or currently are in the foster care system, as defined by the Administration for Children and Families (Definition: https://www.acf.hhs.gov/cb/focus-areas/foster-care);
- Were eligible for the Federal Free and Reduced Lunch Program for two or more years (Definition: https://www.fns.usda.gov/school-meals/income-eligibility- guidelines);
- Have/had no parents or legal guardians who completed a bachelor’s degree (see https://nces.ed.gov/pubs2018/2018009.pdf);
- Were or currently are eligible for Federal Pell grants (Definition: https://www2.ed.gov/programs/fpg/eligibility.html);
- Received support from the Special Supplemental Nutrition Program for Women, Infants and Children (WIC) as a parent or child (Definition: https://www.fns.usda.gov/wic/wic-eligibility-requirements).
- Grew up in one of the following areas:
- A U.S. rural area, as designated by the Health Resources and Services Administration (HRSA) Rural Health Grants Eligibility Analyzer (https://data.hrsa.gov/tools/rural-health), or
- A Centers for Medicare and Medicaid Services-designated Low-Income and Health Professional Shortage Area (qualifying zip codes are included in the file).
Note: Only one of the two areas under #vii can be used as a criterion for the disadvantaged background definition.
We are particularly interested in applicants from historically underrepresented groups in AI/ML, such as women, racial/ethnic minorities, people with disabilities, and individuals from rural or socially disadvantaged backgrounds.
- Students from low socioeconomic (SES) status backgrounds have been shown to obtain bachelor’s and advanced degrees at significantly lower rates than students from middle and high SES groups (see, https://nces.ed.gov/programs/coe/#indicators), and are consequently less likely to be represented in biomedical research. For background see Department of Education data at: https://nces.ed.gov/; https://nces.ed.gov/programs/coe/#indicators; https://www2.ed.gov/rschstat/research/pubs/advancing-diversity-inclusion.pdf
- Literature shows that women from the above backgrounds (categories A and B) face particular challenges at the graduate level and beyond in scientific fields. (See, e.g., From the NIH: A Systems Approach to Increasing the Diversity of Biomedical Research Workforce https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008902/).
- Women are known to be underrepresented in doctorate-granting research institutions at senior faculty levels in most biomedical-relevant disciplines, and may also be underrepresented at other faculty levels in some scientific disciplines (See data from the National Science Foundation National Center for Science and Engineering Statistics: Women, Minorities, and Persons with Disabilities in Science and Engineering, special report available at https://www.nsf.gov/statistics/2017/nsf17310/, especially Table 9-23, describing science, engineering, and health doctorate holders employed in universities and 4-year colleges, by broad occupation, sex, years since doctorate, and faculty rank).
Trainees are expected to:
- Attend all training sessions, both synchronous and asynchronous, including webinars and seminars
- Work on the program an average of at least 8 hours per week
- Engage with an AIM-AHEAD Mentor
- Engage in learning communities and peer networking
- Access the All of Us Researcher Workbench
- Complete the provided training related to R, Python, and Jupyter Notebook, available via AIM-AHEAD Connect
- Complete the supervised training on model building for analysis
- Complete the provided training on data splitting for algorithm training and testing, addressing biases in model development, and model validation
- Utilize concierge services on the Workbench and R/Python coding
- Utilize AIM-AHEAD Help Desk support
- Present a work-in-progress research poster at the AIM-AHEAD meeting in summer 2024
- Generate an abstract suitable for submission to a conference, and/or a manuscript suitable for peer-reviewed publication
- Play an active part in the AIM-AHEAD community
Each trainee will receive:
- An $8,000 stipend
- Travel expenses to attend the AIM-AHEAD 2024 conference
- Support and guidance from an experienced mentor
- Support from the AIM-AHEAD data science training core
- Direct 1:1 guidance, virtual office hours, helpdesk support and concierge services supporting users of All of Us Researcher Workbench, R and Python coding
- Training on:
- Data analysis using All of Us Researcher Workbench, R, Python, Jupyter Notebook
- Use and applications of R, Python, and Jupyter Notebook
- Data merging and validation across All of Us sources
- Building and validating models for analysis
- Data splitting methods for model training vs. testing
- Detecting and addressing biases in model development
- Hypothesis development for testing by analysis of All of Us data
- Preparation of a valid data use agreement
- Using the All of Us database of de-identified medical data from >1 million Americans
The following are the objectives for trainees upon completing the program:
The trainee will apply R, Python and/or Jupyter Notebook to analyze datasets from diverse and underrepresented communities.
The trainee will present his/her project at the 2024 AIM-AHEAD annual meeting.
The trainee will present his/her project at a professional research conference.
The trainee will formulate hypotheses testable by applying AI/ML and advanced data analyses.
This training will be most beneficial for individuals who have accomplished one or more of the following. Although these experiences are not mandatory for applicants, evidence of one or more of these experiences will be considered by the trainee selection committee:
- Successfully completed an undergraduate or graduate course in probability and statistics
- Has practical experience in coding/programming with R or Python
- Has experience in data manipulation and management gained through coursework and/or research projects
- Has practical experience with Bayesian analysis and maximum likelihood estimation (if the trainee plans to conduct a project using the All of Us genomics data)
Introductory or refresher courses on these topics will be available to successful applicants at the start of the traineeship, via the AIM-AHEAD Connect platform.
Trainees will receive a cumulative professional fee (stipend) of $8,000, and up to $2,000 in travel support to attend the AIM-AHEAD Annual Meeting. All of Us Researcher Workbench cloud costs (e.g. credits) will be covered by UNTHSC. Each awarded trainee will receive mentorship from experienced, skilled investigators selected from AIM-AHEAD core members, who will guide the trainee in developing testable hypothesis using All of Us data. The online mentoring platform AIM-AHEAD Connect (https://connect.aim-ahead.net) will be used to match mentors with awarded trainees and for mentor/fellow engagement and progress tracking.
November 6 - December 6, 2023
Application Review and Ranking
NIH Approval of Trainee Roster
December 18, 2023
Issue Notifications of Award
January 8, 2024
APPLICATION PROCESS & REQUIREMENTS
Applications must be submitted during the open application period (11/03/23-12/06/23). Applications should address the requirements below and any additional questions via the Traineeship Application Form. The application should be understandable to readers from outside the applicant’s field of study and must clearly present the project aims, applicable studies already completed, methods, materials, and AIM-AHEAD engagement plan.
- Provide your name, organization, department, position title, research area, email address and profile web page.
- Please address on InfoReady the profile and prior experience questions.
Letters of Support
- One signed letter of support from the applicant’s supervisor is required. Letters of support should include the referee’s contact information (full name, position title, organization, email/phone number, and signature).
- Letters of recommendation from faculty who taught the applicant and can attest to the applicant’s aptitude for advanced data analysis training and the rational for this training.
- Academic transcripts from applicant’s undergraduate and, if applicable, graduate programs.
Biographical Sketch of the applicant
The applicant’s NIH biosketch (not to exceed 5 pages) is required.
- NIH biosketch form: https://grants.nih.gov/grants/forms/biosketch.htm
- Template: Traineeship Biosketch
- Predoctoral Fellowship biosketch sample
- Postdoctoral Fellowship biosketch sample
Statement of Rationale for Pursuing Training
Provide a personal statement of not more than 750 words addressing the following:
- Describe what you hope to accomplish through the Trainee Program. Provide your rationale and need for training in All of Us data and acquiring these skills.
- Describe your familiarity with, and/or interest in AI/ML analysis, programming, EHR, clinical or genomic data analysis, biomedical science, public health background and cloud-based computation.
- Explain how you plan to apply the training to achieve your long-term research interests and objectives.
SELECTION & NOTIFICATION OF AWARD
A Study Review Committee comprised of AIM-AHEAD Consortium, All of Us and RTI members will apply the following criteria to evaluate and prioritize applications:
Rationale for AI/ML Training:
- The applicant clearly articulates his/her expectations and reasons for participating in the program. The applicant also demonstrates the need and importance of acquiring the training in All of Us data to address health disparities using AI/ML.
- The applicant has the background and motivation to participate in and benefit from the training.
- The applicant demonstrates a willingness to engage and collaborate with the AIM-AHEAD community, contribute to documentation and training resources, welcome and empowering new users, and help foster a diverse and inclusive community.
- The applicant describes specific plans for long-term application of the training to his/her research program and//or professional development.
Notification of Award
Applicants should expect notification of their acceptance status on Friday, December 18, 2023. Accepted applicants should be prepared to expedite submission of their banking information to the University of North Texas Health Science Center to receive payments.
Submission using AIM-AHEAD Connect and InfoReady platform
Step 1: Click here to register as a “mentee/learner” on AIM-AHEAD Connect (our Community Building Platform)
Step 2: Click here to submit an application for review using InfoReady platform*.
* To submit your application in InfoReady, please use Chrome, Firefox, or Edge. If you're using Safari, make sure to clear your cache before logging in.
Please note both steps must be completed for consideration.
All applications must be received by December 6, 2023 at 11:59pm EST.