Mission, Team
& Milestones.
A multi-institutional NCI-funded initiative building open-source AI infrastructure for equitable cancer phenotype discovery.
Plain-Language Summary
MEFINDER develops free, open-source computer tools that combine different types of medical data — scans, tissue slides, and health records — to better understand why patients with the same cancer diagnosis have different outcomes. This helps doctors choose the right treatment for each patient, especially in communities that have had less access to expensive diagnostic tests.
Scientific Mission Statement
MEFINDER aims to develop an open-source framework that fuses medical imaging, digital pathology, clinical records, and social determinants of health data to discover novel disease phenotypes and deliver population-specific risk predictions for cancer patients.
The initiative focuses on two clinical use cases: ER-positive breast cancer recurrence and biochemical recurrence following definitive prostate cancer therapy.
NCI Funding
Funded by the National Cancer Institute under U01 award mechanism. PI: Dr. Judy Gichoya, Emory University HITI Lab.
01
Task 1 — Open-Source Fusion Framework
Develop and release harmonization tools, feature extractors, and multimodal embedding approaches as community-accessible open-source software.
- Data harmonization tools
- Feature extraction modules
- Multimodal embedding API
- Comprehensive documentation
02
Task 2 — Clinical Use Case Application
Apply the framework to breast and prostate cancer recurrence prediction, demonstrating real-world clinical value and population-specific risk models.
- ER+ breast cancer recurrence
- Prostate BCR prediction
- Multi-site validation
- Fairness & equity analysis
The MEFINDER Team
A cross-disciplinary team of clinicians, data scientists, and biomedical engineers from five leading US institutions.
Judy Wawira Gichoya, MD, MS
Associate Professor
Emory University
Dr. Gichoya directs the Health Informatics in Radiology (HITI) Lab at Emory University. Her research focuses on AI fairness, multimodal medical AI, and clinical translation of deep learning systems. She leads MEFINDER's overall scientific strategy and the breast cancer use case.
Lab websiteAbhijeet Shrivastava, PhD
Research Scientist
Emory University
Dr. Shrivastava leads the open-source framework development within MEFINDER, including MamoCLIP, F-SYN, and the multimodal fusion components. His expertise spans self-supervised learning, federated learning, and medical image analysis.
Rohit Bhargava, PhD
Professor
Indiana University
Professor Bhargava leads the pathomic feature extraction and APIC development efforts at Indiana University. His laboratory has pioneered infrared and Raman spectroscopic imaging approaches for digital pathology.
Dhruv Bhatt, PhD
Research Scientist
Indiana University
Dr. Bhatt developed APIC and leads its clinical validation on prostate cancer datasets. His work bridges computational pathology with clinical decision support for treatment benefit prediction.
Imon Banerjee, PhD
Associate Professor
Stanford University
Professor Banerjee leads the NLP toolkit development and breast cancer data harmonization at Stanford. His research focuses on clinical NLP, radiology report analysis, and cancer registry linkage methods.
Paul Flint, MD
Professor
Mayo Clinic
Professor Flint provides access to Mayo Clinic's biobank infrastructure and its unparalleled long-term follow-up data (10–15 years) for both breast and prostate cancer cohorts. He leads clinical validation efforts at Mayo.
Neel Mehta, MD
Radiologist
VA Medical Center
Dr. Mehta oversees the VA prostate MRI dataset curation and pathology slide digitization program. The VA dataset provides critical representation of diverse veteran populations in MEFINDER's prostate cancer analyses.
Five institutions. One mission.
Each partner brings complementary expertise and unique patient cohorts spanning the US.
Emory University
LeadLead Institution
Atlanta, GA
Contributions
- ·Project coordination
- ·HITI Lab infrastructure
- ·Data harmonization
- ·NLP recurrence labeling
- ·Multimodal fusion development
Datasets
- ·EMBED v2 (260,815 patients, ~1M exams)
- ·EPIP (~5,000 prostate MRI patients)
Indiana University
Pathomics Partner
Indianapolis, IN
Contributions
- ·Pathomic feature extraction (APIC)
- ·Treatment benefit prediction
- ·Prostate aging research
Datasets
- ·CHAARTED clinical trial
- ·STAMPEDE clinical trial
- ·Validated pathomic classifiers
Stanford University
NLP & Breast Data Partner
Stanford, CA
Contributions
- ·NLP toolkit validation
- ·Breast cancer data harmonization
- ·Cancer registry linkage (CA)
Datasets
- ·Stanford breast and prostate cohorts
- ·Registry-linked outcomes data
Mayo Clinic
Long-term Follow-up Partner
Rochester, MN
Contributions
- ·Long-term follow-up data (10–15 years)
- ·Biobank infrastructure
- ·Clinical report parsing
Datasets
- ·Mayo Clinic Biobank (75,000+ patients)
Veterans Affairs
Prostate MRI Partner
Nationwide
Contributions
- ·Prostate MRI data
- ·Pathology slide digitization
- ·Diverse veteran population
Datasets
- ·VA prostate dataset (387 biparametric MRI, expanding)
Milestones & Timeline
Year 1 milestones for the MEFINDER U01 award.
Open-Source Fusion Framework Development
Months 1–6
- DICOM preprocessing & quality assessment pipeline
- Pathology QC with HistoQC and F-SYN stain normalization
- NLP recurrence labeling toolkit for EMBED v2
- Prostate MRI segmentation (ProstateNet) validation
- MRI batch effect harmonization (PyComBatch)
- APIC prostate pathomics classifier release
Multimodal Data Integration & Phenotype Discovery
Months 6–12
- Graph-based multimodal feature fusion (imaging + EHR)
- Spatio-temporal fusion with longitudinal data
- Vision-language contrastive training (MamoCLIP)
- Co-attention causal fusion (MOSCARD)
- Phenotype cluster analysis & clinical validation
- Model interpretability (SHAP, attention maps)