NIH Other Transaction Award · HITI Lab, Emory University
Multimodal Fusion Initiative for Novel Disease Phenotype Discovery and Population-Specific Risk Prediction
A shared, reusable toolbox helping clinicians make cancer treatment decisions using radiology and pathology images and clinical records that the hospital already has.
MEFINDER tests a unifying hypothesis: if the tooling is sound, every architecture should classify the same patient in the same direction — whether the model uses ROIs or whole images, two modalities or five, deep learning or graph neural networks, and whether the target is chemotherapy benefit, treatment response, or disease recurrence. We demonstrate this framework scales across clinical use cases and institutional boundaries, delivering consistent patient-level predictions regardless of site, cohort, or prediction task.
260,815+
Patients in EMBED v2
Breast cancer cohort
~1 Million
Imaging Exams
Cross-modality
5
Partner Institutions
Multi-site consortium
13+
Open-Source Tools
Code & models released
What is MEFINDER?
“Patients with identical cancer diagnoses experience vastly different outcomes.”
MEFINDER — Multimodal Fusion Initiative for Novel Disease Phenotype Discovery and Population-Specific Risk Prediction — is an NIH-funded research initiative led by Dr. Judy Gichoya at Emory University’s HITI Lab. The project develops open-source computational frameworks that fuse medical imaging, digital pathology, clinical records, and social determinants of health to discover patterns current molecular assays cannot detect.
Current diagnostic tools like Decipher (≈$3,400 per test) remain expensive and may not capture the full complexity of disease biology. MEFINDER builds affordable, equity-aware alternatives by integrating information across modalities — mammography, MRI, whole-slide pathology, and structured clinical text — to deliver more accurate prognostication and treatment guidance.
The initiative addresses two clinical use cases: ER-positive breast cancer recurrence prediction and biochemical recurrence following definitive prostate cancer therapy. Both represent significant unmet clinical needs where multimodal AI can meaningfully improve patient outcomes.
Open Science
Open Science
All tools publicly released through TCIA, Zenodo, and GitHub. Reproducible methods with comprehensive documentation enabling adoption across institutions.
Health Equity
Health Equity
Population-specific models built on diverse cohorts, including the EMBED v2 dataset. Equity-aware evaluation metrics and bias auditing embedded throughout the pipeline.
Clinical Translation
Clinical Translation
Designed for integration with existing clinical workflows. Models validated against standard-of-care tools with clinical usability studies guiding interface design.
The Framework
From raw data to clinical insight.
A modular pipeline spanning data harmonization, multimodal feature extraction, graph-based fusion, and phenotype discovery.
Data Inputs
Radiology
FFDM (2D) Mammography · DBT (3D) Tomosynthesis · MRI DCE Multi-Phase
Digital Pathology
H&E whole-slide images · APIC features
Clinical / EHR
Structured data · Clinical notes · ICD-10 · HL7 FHIR · PI-RADS v2.1
Social Determinants
Demographics · socioeconomic · geographic data
Pipeline
01
Data Harmonization
Radiology · Pathology · EHR · Federated
02
Feature Extraction
Radiomics · Pathomics · Deep · Causal
03
Multimodal Embedding
Engineered Fusion · Deep Joint Embedding · Graph Networks
04
Phenotype Discovery
Clustering · In-Context Learning
05
Use Cases
Breast · Prostate (Parallel)
06
Evaluation
Shared Benchmarking
Use Case 01
ER-Positive Breast Cancer Recurrence
ER-positive breast cancer represents the most common breast cancer subtype, yet patients with identical diagnoses experience markedly different recurrence rates. MEFINDER fuses mammography (FFDM and digital breast tomosynthesis), breast MRI, digital pathology from core needle biopsies, and structured clinical text to build a multimodal recurrence predictor — drawing on the EMBED v2 cohort of 260,815 patients and approximately one million exams from Emory University.
260,815
patients in the EMBED v2 cohort
EMBED v2
260,815 breast cancer patients · ~1M exams · multimodal imaging
EPIP
~5,000 prostate MRI patients from Emory University
Clinical Text
NLP-based recurrence labeling from radiology & pathology reports
Key Tools
Prostate Cancer Lesion Detection (nnU-Net)
nnU-Net framework trained on PI-CAI and Prostate-158 datasets; developed at Indiana University (Shiradkar Lab)
PI-QUAL vs MRQy Comparison
Code and analysis comparing PI-QUAL and MRQy prostate MRI quality metrics, using UMAP clustering, V-Net segmentation, and VoxelMorph registration
Niffler
DICOM framework for real-time and on-demand DICOM retrieval, metadata extraction, anonymization, and processing workflows
Use Case 02
Biochemical Recurrence After Prostate Therapy
Biochemical recurrence following radical prostatectomy or radiation affects up to 40% of prostate cancer patients. MEFINDER integrates prostate MRI data, clinical trial datasets (CHAARTED, STAMPEDE), and biparametric MRI from the VA to build multimodal frameworks for recurrence prediction and treatment benefit assessment.
387
biparametric MRI patients (VA dataset, expanding)
The Consortium
Five institutions. One framework.
A multi-site collaborative spanning academic medical centers, research universities, and federal health systems — each contributing unique expertise and datasets to the MEFINDER initiative.
Institution
Key Contribution
Data Contributed
Emory University (Lead)
Project coordination, HITI Lab infrastructure, NLP recurrence labeling, multimodal fusion
EMBED v2 (260,815 patients, ~1M exams), EPIP (~5,000 prostate MRI patients)
Indiana University
Pathomic feature extraction (APIC), treatment benefit prediction, prostate aging research
CHAARTED, STAMPEDE clinical trial datasets
Stanford University
NLP toolkit validation, breast cancer data harmonization, cancer registry linkage
Stanford breast and prostate cohorts, CA registry
Mayo Clinic
Long-term follow-up, biobank infrastructure, clinical report parsing
Mayo Clinic Biobank (75,000+ patients, 10–15 years follow-up)
VA
Prostate MRI data, pathology slide digitization
387 biparametric MRI patients (expanding)
Publications & Outputs
Recent work from the consortium.
01
Predicting 30-day all-cause hospital readmission using multimodal spatiotemporal graph neural networks.
Tang, Siyi, Amara Tariq, Jared A. Dunnmon, Umesh Sharma, Praneetha Elugunti, Daniel L. Rubin, Bhavik N. Patel, and Imon Banerjee. IEEE Journal of Biomedical and Health Informatics 27, no. 4 (2023): 2071–2082.
02
Knowledge-grounded Adaptation Strategy for Vision-language Models: Building a Unique Case-set for Screening Mammograms for Residents Training.
Khan, Aisha Urooj, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jeong, Amara Tariq, and Imon Banerjee. MICCAI 2024.
03
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology Images.
Kathiravelu, Pradeeban, Puneet Sharma, Ashish Sharma, Imon Banerjee, Hari Trivedi, Saptarshi Purkayastha, Priyanshu Sinha, Alexandre Cadrin-Chenevert, Nabile Safdar, and Judy Wawira Gichoya. Journal of Digital Imaging (JDI), August 2021.
Funding & Institutional Support
Supported by the National Institutes of Health
NIH Other Transactions (OT) Award OT2OD038065 · Principal Investigator: Dr. Judy Gichoya, Emory University

HITI Lab

Emory University

Indiana University

Stanford University

Mayo Clinic

Veterans Affairs
This research is supported by the National Institutes of Health under Other Transactions Award Number OT2OD038065. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Get Involved
Collaborate with MEFINDER.
We welcome proposals for tool integration, data sharing arrangements, joint analyses, and clinical collaborations. Prospective graduate students and postdoctoral fellows interested in multimodal cancer AI are encouraged to reach out to the HITI Lab directly.
HITI Lab, Department of Radiology and Imaging Sciences
Emory University School of Medicine
Atlanta, Georgia
hitilab.comFor Researchers
We invite collaborations on tool development, dataset integration, model validation, and joint publications. Access to open-source tools and derived features is available through TCIA, Zenodo, and GitHub. Proposals for data use agreements are coordinated through Emory’s Office of Research.
For Clinicians & Advocates
We are actively recruiting clinical collaborators for prospective validation studies and patient advocates to participate in usability studies and focus groups. Your perspective shapes how MEFINDER tools are designed for real-world clinical integration and community benefit.