AIM-AHEAD Research Spotlight Series: Showcasing Innovative Research Across the Consortium

The AIM-AHEAD Research Spotlight Series highlights the work of program participants across the consortium, including awardees, fellows, and trainees. Each showcase features AIM-AHEAD–supported research that uses artificial intelligence and machine learning (AI/ML) to address pressing healthcare challenges and drive meaningful impact in the healthcare research community.

Featured Program: AIM-AHEAD Federated Network Program

The AIM-AHEAD Federated Network Program seeks to advance biomedical research by developing a decentralized data network that prioritizes data privacy, security, and institutional sovereignty. Using federated technology, participating institutions maintain local control of patient-level data while securely contributing aggregate results for collaborative analyses. Supported by AIM-AHEAD, this infrastructure enables institutions to engage in AI/ML-driven health research while addressing sensitive health topics and advancing community-centered research initiatives.

The program supports institutions and communities that may face barriers to participation in AI/ML research due to limited resources or infrastructure. By equipping participating sites with AI/ML capabilities and collaborative research support, the Federated Network Program helps institutions generate insights and develop solutions tailored to the unique health challenges affecting their populations. These efforts strengthen local research capacity while fostering broader collaboration across the AIM-AHEAD consortium.

Through tailored protocols for data curation, secure data sharing, and federated analysis, institutions participating in AIM-AHEAD Federated Network (Cohort 1), including HealthPartners Institute, the University of the Virgin Islands, Massachusetts Eye and Ear, and Fairview Health Services, work closely with AIM-AHEAD core teams to assess and expand their ability to run AI/ML models locally and contribute to coordinated research efforts. As part of the program’s second-year objectives, participating institutions compile and share aggregated data to support federated analyses addressing pressing health research questions. This coordinated approach aims to strengthen sustainable AI/ML research capabilities, enhance collaboration across sites, and support long-term advancements in health research outcomes.

Cohort 1 included 4 awardees.

Program Directors:

Paul Avillach, MD
Griffin Weber, MD, PhD
Usha Sambamoorthi, PhD
Gabriel Brat, MD, MPH
Tianxi Cai, PhD

Use of Clinical Data Platforms in Research Question Development

Leyla Warsame, MD
Fairview Health Services
South Central Hub

As healthcare organizations increasingly seek to leverage electronic health record (EHR) data to answer complex clinical questions, researchers face a common challenge: how to collaborate across institutions while maintaining patient privacy and accommodating differences in local data systems. Through participation in the AIM-AHEAD Federated Network Program, Dr. Leyla Warsame has helped advance a structured approach to developing multisite research questions that enable institutions to identify and refine clinically meaningful research opportunities without sharing patient-level data.

The project brought together investigators from HealthPartners Institute, the University of the Virgin Islands, Massachusetts Eye and Ear Infirmary, and Fairview Health Services to collaboratively explore research questions in cardiometabolic health, mental health, and cancer. By leveraging federated network principles and clinical data platforms, the team sought to establish a repeatable process for generating, evaluating, and refining research questions that could be pursued across various healthcare environments while preserving data privacy and institutional data governance.

Study Design and Methods

To guide the collaborative effort, participating investigators used the Population, Intervention, Comparison, and Outcome (PICO) framework, a widely adopted method for developing clinical research questions. This approach provided a consistent structure for defining study populations, identifying interventions or exposures of interest, establishing comparison groups, and determining measurable outcomes while also considering data availability and feasibility across participating sites.

The project began with the development and discussion of eight potential research questions. Through an iterative process of review, refinement, and feasibility assessment, researchers evaluated each question's scientific relevance, data requirements, and suitability for a federated research environment. This process ultimately resulted in four priority research questions focused on predicting the development of diabetic nephropathy among adults with diabetes, predicting recurrence of depressive episodes among patients with major depressive disorder, identifying predictors of head and neck cancer, and examining the impact of clinical and non-medical health factors on diabetes diagnosis.

To assess whether these questions could be studied effectively across institutions, Fairview Health Services used a clinical data platform to evaluate data availability for the proposed analyses. Researchers examined 104 variables required to support the selected research questions and assessed how consistently those variables were represented across participating healthcare systems.

One of the selected projects focused on diabetic nephropathy, a serious complication of diabetes that can lead to chronic kidney disease and kidney failure. The proposed study sought to identify factors associated with nephropathy development among adults with type 1 or type 2 diabetes before the onset of advanced kidney disease. Potential predictors include medication use, duration of diabetes, smoking status, glycemic control, obesity, family history of kidney disease, and demographic characteristics. The team also reviewed existing predictive modeling approaches and identified opportunities to explore additional risk factors and improve the clinical applicability of machine learning models.

Key Findings

The feasibility assessment demonstrated strong data readiness across participating institutions. Of the 104 variables evaluated, 91 (87.5%) were available without modification, while only a small number required refinement or were unavailable. These findings suggest that clinical data platforms can help streamline study planning and support future participation in federated research networks by reducing the time and effort required for data extraction and harmonization.

The project also demonstrated that a structured, collaborative approach can effectively support the development of research questions across institutions with differing EHR systems and data infrastructures. By evaluating feasibility early in the planning process, investigators identified research questions that were both scientifically meaningful and operationally achievable.

As part of the diabetic nephropathy use case, researchers developed an initial retrospective cohort of 72,023 patients with diabetes. Using diagnosis-based and laboratory-based identification methods, investigators identified 14,672 patients with evidence of diabetic nephropathy, representing approximately 20% of the overall cohort.

Notably, laboratory-based methods identified substantially more patients with nephropathy than diagnosis codes alone. Among patients identified through both approaches, laboratory indicators suggested evidence of nephropathy an average of 2.5 years before a formal diagnosis was documented. These findings highlight the potential value of incorporating laboratory data into future predictive models to identify patients at risk for kidney complications earlier in the disease process.

Implications for Research and Clinical Practice

This work illustrates how federated research approaches can facilitate collaboration across institutions without requiring the transfer of sensitive patient-level data. As healthcare systems increasingly seek to conduct large-scale, multisite studies, structured methods for research question development and feasibility assessment may help accelerate study planning while ensuring projects remain both scientifically rigorous and operationally feasible.

The findings from the diabetic nephropathy use case further demonstrate the potential value of combining clinical and laboratory data to improve disease identification and risk prediction. Earlier recognition of patients at increased risk of nephropathy could support more timely interventions and ultimately improve patient outcomes.

More broadly, the framework developed through this project provides a foundation for future collaborative investigations in cardiometabolic health, mental health, and cancer research. By leveraging federated infrastructure and shared methodologies, participating institutions can pursue clinically meaningful research questions while maintaining privacy protections and local stewardship of patient data.

Next Steps

Building on this work, participating institutions plan to continue refining study protocols, evaluating data availability, and developing analytic strategies for the selected research questions. Future efforts will focus on applying predictive modeling approaches, expanding multisite collaboration, and leveraging federated research infrastructure to support secure, privacy-conscious analyses.

By developing a structured framework for collaborative research question generation and feasibility assessment, Dr. Warsame and colleagues have laid the foundation for future studies that can harness the strengths of multiple healthcare systems to address important clinical challenges.

Predicting Recurrence of Depressive Episodes among Patients with Major Depressive Disorder

Patricia Mabry, PhD
HealthPartners Institute
North/Midwest Hub

Major Depressive Disorder (MDD) is one of the most common mental health conditions worldwide and is often characterized by periods of remission followed by recurrence. Although many patients experience improvement in symptoms with treatment, depressive episodes frequently return, creating ongoing challenges for patients, healthcare providers, and health systems. Through participation in the AIM-AHEAD Federated Network Program, Dr. Patricia Mabry has explored how electronic health record (EHR) data, machine learning, and federated research approaches can be leveraged to predict the risk of depression recurrence and support earlier clinical intervention.

The project focuses on developing a predictive model capable of estimating an individual's risk of experiencing a recurrent depressive episode within six months. By identifying patients at elevated risk while they are still in remission, the research aims to support clinical decision-making and enable preventive strategies before symptoms worsen.

Study Design and Methods

Dr. Mabry’s work is designed to leverage EHR data from multiple healthcare institutions participating in AIM-AHEAD Federated Network Cohort 1, including HealthPartners Institute, the University of the Virgin Islands, Massachusetts Eye and Ear, and Fairview Health Services. Working within a federated research model, each institution retains control of its own clinical data while contributing to collaborative research efforts. This approach enables researchers to develop and evaluate predictive models across various healthcare settings without requiring the exchange of patient-level information.

The target population includes adults diagnosed with Major Depressive Disorder who are considered to be in remission at the time of a clinical encounter, defined by a Patient Health Questionnaire-9 (PHQ-9) score of less than 10. Researchers are evaluating a broad range of factors that may influence the risk of recurrence, including demographic characteristics, insurance status, social deprivation measures, healthcare utilization, antidepressant medication use and adherence, comorbid conditions, tobacco and alcohol use, and prior depression history.

The study incorporates a twelve-month look-back period to assess clinical characteristics prior to remission and a six-month risk window to identify recurrent depressive episodes. By examining repeated remission encounters over time, the project seeks to better understand how patterns captured within EHR data may predict future symptom recurrence.

To support this work, the team utilized RapidMLReady, a study design and cohort development platform developed to streamline the creation of machine learning-ready research studies. The platform helps researchers separate study design decisions from programming implementation, supports collaboration across various data environments, and improves reproducibility and analytic consistency across participating sites. By automating key aspects of cohort specification and study configuration, RapidMLReady reduces the need to repeatedly translate study requirements into site-specific analytic workflows.

Key Findings

A major outcome of the project has been the successful development of a scalable framework for conducting predictive modeling studies across multiple healthcare systems. Using RapidMLReady, researchers standardized study definitions, cohort construction, observation periods, and feature selection processes while maintaining flexibility for local data environments.

The project also demonstrated how machine learning study designs can be translated into reproducible workflows that can be implemented consistently across participating institutions. This capability is particularly important for federated research networks, where differences in data systems and analytic processes can create barriers to large-scale collaboration.

The proposed predictive framework incorporates an extensive set of clinical, behavioral, and social factors that may contribute to the recurrence of depression. Future model development efforts will evaluate a range of machine learning approaches, including Random Forest, XGBoost, Gradient Boosting, and deep learning methods, to identify strategies that provide strong predictive performance while maintaining clinical relevance.

Beyond the depression recurrence use case, the project demonstrated the value of reusable research infrastructure that can support future studies across multiple disease areas. By creating tools and workflows that can be adapted to new research questions, investigators can reduce development time, improve consistency across studies, and facilitate broader participation in collaborative research efforts.

Implications for Research and Clinical Practice

Predicting depression recurrence represents an important opportunity to move from reactive care toward more proactive mental health management. If patients at elevated risk can be identified before symptoms return, healthcare providers may be able to increase monitoring, adjust treatment plans, or connect patients with additional support resources before a depressive episode develops.

The project also highlights the growing role of federated research networks in advancing artificial intelligence and machine learning applications in healthcare. By enabling collaboration without requiring the transfer of patient-level data, federated approaches can support large-scale research while maintaining privacy protections and institutional governance requirements.

In addition, RapidMLReady illustrates how reusable tools and infrastructure can help make multisite research more efficient, reproducible, and scalable. These capabilities may lower barriers to participation in collaborative research and support future efforts to develop clinically relevant AI applications across a variety of healthcare settings.

Recognition and Next Steps

Beyond advancing research on depression recurrence, the project has contributed to the development of sustainable infrastructure for future federated studies. Key accomplishments include the establishment of durable governance processes, scalable research design approaches, reusable analytic tools and infrastructure, and expanded workforce capacity to support collaborative AI and machine learning research.

Future efforts will focus on feature extraction, model development, and performance evaluation using a variety of machine learning approaches. Researchers plan to assess predictive performance using measures such as sensitivity, precision, and the area under the receiver operating characteristic curve (AUROC), while exploring strategies for implementing predictive models in federated research environments.

By combining clinical informatics expertise, machine learning methodologies, and principles of federated research, Dr. Mabry and her collaborators are advancing approaches that may ultimately support earlier identification of patients at risk of depression recurrence and inform preventive care strategies.

AIM-AHEAD Coordinating Center

AIM-AHEAD

Leadership Core

Functional Cores

News & Events

AIM-AHEAD News

AIM-AHEAD Events

Webinar & Office Hours Series

Resources

Explore & Learn

Popular Courses

AIM-AHEAD Research Spotlight Series: Showcasing Innovative Research Across the Consortium

Use of Clinical Data Platforms in Research Question Development

Predicting Recurrence of Depressive Episodes among Patients with Major Depressive Disorder