AIM-AHEAD AI Optimization Subcore

About

The AIM-AHEAD AI Optimization Subcore page is a central resource for discussing and exploring AI optimization issues within the AIM-AHEAD Program. Here, you can access expert insights, engage in meaningful conversations, and find valuable resources to guide your AI decision-making.

  • Join Our Weekly Office Hours
    Participate in live presentations on current AI topics and bring your questions or concerns for group discussion and expert feedback.
  • Engage in the AI Optimization Discussion Group
    Connect with peers and AI optimization experts to share ideas and engage in ongoing conversations on AI optimization-related matters.
  • Get Personalized AI Optimization Consultation
    Receive tailored advice and support on questions specific to your project or activity.
  • Explore the Knowledgebase
    Delve into a wide range of AI optimization-related topics to deepen your understanding and inform your work.
  • Access AIM-AHEAD-Specific AI Optimization Resources
    Discover materials designed specifically for the AIM-AHEAD community, addressing key challenges in our field.

See the Related Links for more information on all of these initiatives.

AI Optimization Subcore Discussion Forums

Large Language Models (LLMs) are increasingly used in healthcare for tasks like clinical documentation, decision support, and patient education. Their intuitive, prompt-based interfaces make them attractive tools across a range of clinical settings. However, existing evaluations are limited: most rely on clean, exam-style datasets (e.g., MedQA, PubMedQA) or narrowly focus on isolated real-world tasks. These approaches fail to capture the complexity, variety, and multilingual nature of actual electronic health records (EHRs) text generated in real-world clinical practice.

In this talk, we will present BRIDGE, a large-scale, multilingual benchmark built from 87 real-world clinical tasks in nine languages, based on over one million EHR-derived samples. We evaluated 95 leading LLMs (including GPT-4o, Gemini, DeepSeek-R1, and gpt-oss) under multiple prompting strategies across 24,000+ experiments and 3.4 billion LLM inferences. Our findings highlight major performance differences across models, tasks, and languages, and show that some open-source models match proprietary ones. We anticipate that BRIDGE will serve as a critical foundation for the community to evaluate and safely deploy future LLMs in real-world clinical environments.


About the Speaker

Dr. Jie Yang is an Assistant Professor at Harvard Medical School and a lead investigator at Brigham and Women’s Hospital. His research focuses on developing and applying advanced AI and NLP/LLM methods to analyze large-scale healthcare data, particularly electronic health records (EHRs). He is a Fellow of the American College of Medical Informatics (FACMI), Fellow of the American Medical Informatics Association (FAMIA), an affiliate faculty member at the Broad Institute of MIT and Harvard, and serves on the Technical Advisory Group for the World Health Organization (WHO) Global Clinical Platform. Dr. Yang received the Best Paper Award at COLING and a Best Demonstration Paper nomination at ACL. He serves as an Associate Editor for npj Digital Medicine, IEEE Transactions on Neural Networks and Learning Systems, and npj Health Systems, and as an Area Chair for major NLP conferences, including ACL, EMNLP, and COLING.

Join on Zoom: AI Optimization Subcore Discussion Forum Link

Large language models (LLMs) have demonstrated strong potential in medicine and are increasingly adopted for clinical and biomedical tasks. Many studies continue to pretrain or fine-tune LLMs on medical data to further improve domain-specific performance. However, a fundamental question remains: to what extent do LLMs memorize medical training data, and what are the implications of such memorization for medical applications?

In this study, we present the first comprehensive evaluation of memorization in medical LLMs, characterizing its prevalence (how often it occurs), properties (what content is memorized), and downstream impacts (how memorization influences medical applications). We systematically analyze common adaptation settings, including (1) continued pretraining on medical corpora, (2) fine-tuning on standard medical benchmarks, and (3) fine-tuning on real-world clinical data consisting of over 13,000 unique inpatient records from Yale New Haven Health System. Our evaluation covers both medical foundation models (PMC-LLaMA, Meditron, Me-LLaMA, and Med-LLaMA-3 variants) and general-purpose LLMs (LLaMA-2 and LLaMA-3 families) widely adopted in medical research and practice. Based on empirical findings, we provide actionable guidance to promote beneficial memorization that enhances medical reasoning and factual accuracy, limit uninformative memorization to encourage true knowledge generalization, and mitigate harmful memorization to prevent leakage of sensitive or identifiable patient information.


About the Speaker

Dr. Qingyu Chen is a tenure-track Assistant Professor in the Department of Biomedical Informatics & Data Science at Yale University. Prior to joining Yale, he completed postdoctoral training at the National Library of Medicine, National Institutes of Health. His research focuses on data science and artificial intelligence in biomedicine and healthcare, with emphasis on three main areas: biomedical natural language processing and large language models; medical imaging and multimodal analysis; and accountability and trustworthy AI in medical applications.

Dr. Chen is the Principal Investigator of an R01 grant focused on improving the factuality of LLMs in medicine, as well as a K99/R00 grant on multimodal AI-assisted disease diagnosis. He has authored over 45 first/last-author publications among 100+ peer-reviewed papers.

Watch the Discussion Forum Video: Accountability of AI in Medicine - Dr. Qingyu Chen

Artificial intelligence (AI) is rapidly transforming medical imaging and diagnostic workflows. However, growing evidence suggests that these technologies may not perform consistently across all patient groups. In this talk, Dr. Lin will share recent work examining and addressing the robustness of image-based AI systems for clinical diagnosis.

He will first present findings from a study on primary open-angle glaucoma (POAG) diagnosis, where he observed that diagnostic performance varied across different patient populations, leading to increased rates of under- or over-diagnosis. Dr. Lin will then present one approach he developed to improve model robustness in image-based diagnosis. This will also highlight a second method he applied to chest X-ray diagnosis, which reduces variation in performance. These findings underscore the need for thorough validation across clinical scenarios to ensure consistent diagnostic performance in real-world practice.


About the Speaker

Mingquan Lin (PhD),is an Assistant Professor in the Division of Computational Health Sciences at the University of Minnesota. He has extensive experience in medical image analysis, including segmentation, diagnosis, prognosis, and biomarker identification. His research also explores the application of multimodal large language models (LLMs) in healthcare. Before joining the University of Minnesota, Dr. Lin was a Postdoctoral Associate at Weill Cornell Medicine and previously a Research Fellow at Emory University. He received his PhD in Electrical Engineering from the City University of Hong Kong.

Join on Zoom: AI Optimization Subcore Discussion Forum Link

Large language models (LLMs) have been integrated into numerous biomedical application frameworks. Despite their significant potential, they possess vulnerabilities that can lead to serious consequences. In this seminar, we will discuss vulnerabilities in LLMs and discuss potential solutions to address them.

Adversarial manipulations can cause LLMs to generate harmful medical suggestions or promote specific stakeholder interests. We will demonstrate two methods by which a malicious actor can achieve this: prompt injection and data poisoning. Thirteen models were tested, and all exhibited significant behavioral changes after manipulation in three tasks. Although newer models performed slightly better, they were still greatly affected. A few methods can be applied, however there is no guarantee.


About the Speaker

Yifan Yang is a PhD candidate at the University of Maryland, College Park, and a visiting fellow at NLM/NIH. His work mainly focuses on LLM applications in biomedical tasks, and medical AI safety.

Join on Zoom: AI Optimization Subcore Discussion Forum Link

This forum will examine the unexpected consequences of multi-source data scaling in healthcare. The discussion will showcase innovative applications of LLMs in maternal health and contraceptive care, demonstrate how LLMs can generate rationales for contraceptive medication switches using clinical notes, and emphasize vigilance and other considerations as we advance towards more data-driven and AI-assisted healthcare.


About the Speaker

Dr. Irene Chen is an Assistant Professor in UC Berkely and UCSF's Computational Precision Health, Electrical Engineering and Computer Science, Berkeley AI Research. Her work develops machine learning methods for healthcare that are robust and impactful, Irene received her PhD in Electrical Engineering and Computer Science from MIT, her AB/SM in Applied Math from Harvard.

Join on Zoom: AI Optimization Subcore Discussion Forum Link

This talk will cover research published in NEJM and JAMA on the clinical, occupational, and financial implications of including or excluding selected demographic features in equations used across various medical fields. It will explore the limitations of binary decision thresholds, socio-demographic model inputs, and population-derived reference ranges, and then discuss alternative approaches that prioritize precise causal measures and patient-centered outcomes.


About the Speaker

Dr. James Diao is a physician-scientist based at Brigham and Women’s Hospital and the Harvard Medical School Department of Biomedical Informatics. His research uses computational and statistical tools to develop and evaluate clinical algorithms with the goal of improving health for various populations. Previously, he developed AI models for pathology image analysis at PathAI and investigated wearable-derived measures of cardiovascular fitness at Apple. Dr. Diao earned his MD from the Harvard-MIT Program in Health Sciences and Technology (HST) as a PD Soros Fellow, MPhil from the University of Cambridge as a Churchill Scholar, and degrees in biochemistry and statistics from Yale College.

Join on Zoom: AI Optimization Subcore Discussion Forum Link

As AI applications continue to transform clinical research and care practices, ensuring their trustworthiness is critical. A key component of trust is reproducibility. Dr. Fu will discuss the various conceptual dimensions of reproducibility in the context of EHR-based AI applications. He will define the role of contextual metadata in the process of AI model development, evaluation, and dissemination, and illustrate his work on standardized frameworks, RITE-FAIR principles, and tools that promote AI reproducibility.

Dr. Sunyang Fu
About the Speaker

Dr. Sunyang Fu, PhD MHI, is an Assistant Professor and Associate Director of Team Science at the Center for Translational AI Excellence and Applications in Medicine (TEAM-AI), University of Texas Health Science Center at Houston. The overarching goal of my research is to accelerate, improve and govern the secondary use of EHR data for clinical and translational research toward high throughput, reproducible, and trustworthy discoveries.

Zoom link: https://us06web.zoom.us/j/84694432646?pwd=LfXyYerpK2ErARq5ha6b6DMfBb3COb.1

Community Health Workers (CHWs), or Promotores de Salud in Latino communities, help address healthcare challenges by providing education and services to targeted populations. To strengthen their research skills, the "Building Research Integrity and Capacity" (BRIC) initiative offers eight training modules for self-paced learning or professional development. A complementary course was also created to help faculty better engage local communities in research. Dr. Nebeker will discuss the co-design and evaluation of these educational resources.

About the Speaker

Dr. Camille Nebeker, EdD, MS, is Co-Founder and Director at UC San Diego Research Center for Optimal Digital Ethics - Health, Professor at Herbert Wertheim School of Public Health and Human Longevity Science, and Director of UC San Diego Research Ethics Program, University of California, San Diego.

Zoom link: https://us06web.zoom.us/j/84694432646?pwd=LfXyYerpK2ErARq5ha6b6DMfBb3COb.1

Scroll to top