AboutMEFINDER

Mission, Team
& Milestones.

A multi-institutional NCI-funded initiative building open-source AI infrastructure for equitable cancer phenotype discovery.

Plain-Language Summary

MEFINDER develops free, open-source computer tools that combine different types of medical data — scans, tissue slides, and health records — to better understand why patients with the same cancer diagnosis have different outcomes. This helps doctors choose the right treatment for each patient, especially in communities that have had less access to expensive diagnostic tests.

01Our Mission

Scientific Mission Statement

MEFINDER aims to develop an open-source framework that fuses medical imaging, digital pathology, clinical records, and social determinants of health data to discover novel disease phenotypes and deliver population-specific risk predictions for cancer patients.

The initiative focuses on two clinical use cases: ER-positive breast cancer recurrence and biochemical recurrence following definitive prostate cancer therapy.

NCI Funding

Funded by the National Cancer Institute under U01 award mechanism. PI: Dr. Judy Gichoya, Emory University HITI Lab.

01

Task 1 — Open-Source Fusion Framework

Develop and release harmonization tools, feature extractors, and multimodal embedding approaches as community-accessible open-source software.

  • Data harmonization tools
  • Feature extraction modules
  • Multimodal embedding API
  • Comprehensive documentation

02

Task 2 — Clinical Use Case Application

Apply the framework to breast and prostate cancer recurrence prediction, demonstrating real-world clinical value and population-specific risk models.

  • ER+ breast cancer recurrence
  • Prostate BCR prediction
  • Multi-site validation
  • Fairness & equity analysis
02Investigators

The MEFINDER Team

A cross-disciplinary team of clinicians, data scientists, and biomedical engineers from five leading US institutions.

Principal Investigator
JG

Judy Wawira Gichoya, MD, MS

Associate Professor

Emory University

Principal Investigator

Dr. Gichoya directs the Health Informatics in Radiology (HITI) Lab at Emory University. Her research focuses on AI fairness, multimodal medical AI, and clinical translation of deep learning systems. She leads MEFINDER's overall scientific strategy and the breast cancer use case.

Lab website
AS

Abhijeet Shrivastava, PhD

Research Scientist

Emory University

Lead Software Developer & ML Engineer

Dr. Shrivastava leads the open-source framework development within MEFINDER, including MamoCLIP, F-SYN, and the multimodal fusion components. His expertise spans self-supervised learning, federated learning, and medical image analysis.

RB

Rohit Bhargava, PhD

Professor

Indiana University

Prostate Pathomics Lead

Professor Bhargava leads the pathomic feature extraction and APIC development efforts at Indiana University. His laboratory has pioneered infrared and Raman spectroscopic imaging approaches for digital pathology.

DB

Dhruv Bhatt, PhD

Research Scientist

Indiana University

APIC Development Lead

Dr. Bhatt developed APIC and leads its clinical validation on prostate cancer datasets. His work bridges computational pathology with clinical decision support for treatment benefit prediction.

IB

Imon Banerjee, PhD

Associate Professor

Stanford University

NLP & Breast Cancer Data Lead

Professor Banerjee leads the NLP toolkit development and breast cancer data harmonization at Stanford. His research focuses on clinical NLP, radiology report analysis, and cancer registry linkage methods.

PF

Paul Flint, MD

Professor

Mayo Clinic

Long-term Follow-up & Biobank Lead

Professor Flint provides access to Mayo Clinic's biobank infrastructure and its unparalleled long-term follow-up data (10–15 years) for both breast and prostate cancer cohorts. He leads clinical validation efforts at Mayo.

NM

Neel Mehta, MD

Radiologist

VA Medical Center

VA Prostate MRI Lead

Dr. Mehta oversees the VA prostate MRI dataset curation and pathology slide digitization program. The VA dataset provides critical representation of diverse veteran populations in MEFINDER's prostate cancer analyses.

03Consortium Partners

Five institutions. One mission.

Each partner brings complementary expertise and unique patient cohorts spanning the US.

EU

Emory University

Lead

Lead Institution

Atlanta, GA

Contributions

  • ·Project coordination
  • ·HITI Lab infrastructure
  • ·Data harmonization
  • ·NLP recurrence labeling
  • ·Multimodal fusion development

Datasets

  • ·EMBED v2 (260,815 patients, ~1M exams)
  • ·EPIP (~5,000 prostate MRI patients)
IU

Indiana University

Pathomics Partner

Indianapolis, IN

Contributions

  • ·Pathomic feature extraction (APIC)
  • ·Treatment benefit prediction
  • ·Prostate aging research

Datasets

  • ·CHAARTED clinical trial
  • ·STAMPEDE clinical trial
  • ·Validated pathomic classifiers
SU

Stanford University

NLP & Breast Data Partner

Stanford, CA

Contributions

  • ·NLP toolkit validation
  • ·Breast cancer data harmonization
  • ·Cancer registry linkage (CA)

Datasets

  • ·Stanford breast and prostate cohorts
  • ·Registry-linked outcomes data
MC

Mayo Clinic

Long-term Follow-up Partner

Rochester, MN

Contributions

  • ·Long-term follow-up data (10–15 years)
  • ·Biobank infrastructure
  • ·Clinical report parsing

Datasets

  • ·Mayo Clinic Biobank (75,000+ patients)
VA

Veterans Affairs

Prostate MRI Partner

Nationwide

Contributions

  • ·Prostate MRI data
  • ·Pathology slide digitization
  • ·Diverse veteran population

Datasets

  • ·VA prostate dataset (387 biparametric MRI, expanding)
04Project Progress

Milestones & Timeline

Year 1 milestones for the MEFINDER U01 award.

M1

Open-Source Fusion Framework Development

Months 1–6

Complete
Task completion6/6 · 100%
  • DICOM preprocessing & quality assessment pipeline
  • Pathology QC with HistoQC and F-SYN stain normalization
  • NLP recurrence labeling toolkit for EMBED v2
  • Prostate MRI segmentation (ProstateNet) validation
  • MRI batch effect harmonization (PyComBatch)
  • APIC prostate pathomics classifier release
M2

Multimodal Data Integration & Phenotype Discovery

Months 6–12

In Progress
Task completion1/6 · 17%
  • Graph-based multimodal feature fusion (imaging + EHR)
  • Spatio-temporal fusion with longitudinal data
  • Vision-language contrastive training (MamoCLIP)
  • Co-attention causal fusion (MOSCARD)
  • Phenotype cluster analysis & clinical validation
  • Model interpretability (SHAP, attention maps)