Open Science

Tools & Data

Every component of MEFINDER is freely available under open-source licenses. Use, adapt, and extend our tools for your own research.

Open-Source Toolkit

10 Tools, Freely Available

Organized across harmonization, feature extraction, embedding, and classification.

Central Repository

Data Harmonization

5 tools

HistoQC

v2.1
stable
Digital Pathology

HistoQC provides automated quality assessment for whole-slide pathology images, detecting artifacts including blur, tissue fold, air bubbles, pen marks, and staining anomalies. The tool generates per-tile quality masks and summary metrics, enabling downstream analysis to filter unreliable regions automatically.

#quality-control#pathology#preprocessing

Janowczyk A, et al. JCO CCI 2019.

F-SYN

stable
Digital Pathology

F-SYN performs stain normalization in the Fourier domain, transferring stain statistics between source and target slides while preserving tissue morphology. Unlike GAN-based approaches, F-SYN introduces no hallucinated structures, making it suitable for clinical and regulatory workflows.

#stain-normalization#pathology#harmonization

MQUAL

stable
MRIRadiology

MQUAL provides automated quality scoring for prostate and breast MRI studies, evaluating signal-to-noise ratio, motion artifact severity, field uniformity, and sequence protocol compliance. Validated on EPIP and VA prostate MRI datasets, MQUAL enables scalable QC for large retrospective imaging cohorts.

#quality-control#MRI#radiology

Beaks

stable
RadiologyDigital Pathology

Beaks provides a unified quality assurance interface for heterogeneous imaging collections, supporting both radiology (DICOM) and pathology (SVS/NDPI) formats. It generates standardized quality reports and flags outlier studies for radiologist or pathologist review.

#quality-control#multi-modal#DICOM

PyComBatch

stable
MRIRadiologyRadiomics

PyComBatch implements ComBat harmonization with extensions for radiomic feature matrices, removing scanner-related batch effects while preserving biological variance. Supports both parametric and non-parametric harmonization modes, with integration hooks for IBSI-compliant feature sets.

#harmonization#radiomics#batch-effects#multi-site

Classification & Prediction

2 tools

APIC

v1.0
stable
Digital Pathology

APIC uses deep learning on H&E pathology slides to quantify tumor-immune microenvironment features and predict treatment benefit from systemic therapies. Validated on CHAARTED and STAMPEDE clinical trial datasets, APIC achieves performance comparable to expensive molecular assays at a fraction of the cost.

#prostate-cancer#classification#clinical-trial

Bhatt D, et al. JCO CCI 2024.

MOSCARD

beta
ImagingEHRMulti-modal

MOSCARD addresses spurious correlations in multimodal fusion models by embedding causal structure into the learning process. Using structural causal models (SCMs) and do-calculus, MOSCARD separates true predictive features from confounders introduced by demographic or acquisition biases. Demonstrated on cardiovascular opportunistic screening with strong fairness properties.

#causal-AI#de-confounding#fairness#multi-modal#MICCAI-2025

Gichoya JW, et al. MICCAI 2025.

Multimodal Embedding

2 tools

MamoCLIP

stable
MammographyRadiology

MamoCLIP adapts the CLIP contrastive learning paradigm to mammography analysis, aligning imaging features with radiology report text embeddings. Trained in a federated setting across EMBED v2 institutions, MamoCLIP produces transferable representations for downstream breast cancer classification and risk prediction tasks.

#breast-cancer#contrastive-learning#federated#CLIP

Shrivastava A, et al. MICCAI 2024.

HemaToMe

beta
PathologyEHRMulti-modal

HemaToMe provides a general-purpose multimodal fusion architecture combining image-derived pathology features with structured EHR data. The framework implements graph-based and attention-based fusion strategies with support for missing modality imputation, making it suitable for real-world clinical cohorts with incomplete data.

#fusion#EHR#multi-modal#graph-neural-network

Feature Extraction

1 tools

ProstateNet

stable
MRIRadiology

ProstateNet provides pixel-level segmentation of the transition zone (TZ) and peripheral zone (PZ) in biparametric prostate MRI, enabling zone-specific radiomic feature extraction aligned with PI-RADS v2.1. Trained on EPIP and VA datasets, ProstateNet generalizes robustly across scanner vendors and field strengths.

#prostate-cancer#segmentation#MRI#radiomics

Research Data

Primary Datasets

Large-scale, multi-site datasets providing the foundation for MEFINDER's analyses. Data access follows institutional DUA and IRB requirements.

EMBED v2

Emory Breast Imaging Dataset v2

Emory University

260,815 patients~1M examsMultimodal imaging

The largest publicly available institutional mammography dataset, including FFDM, DBT, and breast MRI with paired pathology and outcomes data.

Mammography (FFDM/DBT)Breast MRIDigital PathologyClinical EHR

Data Access

TCIA (public)

Standards

DICOMHL7 FHIRICD-10

EPIP

Emory Prostate Imaging & Pathology

Emory University

~5,000 patientsBiparametric MRIH&E Pathology

Institutional prostate MRI and digital pathology cohort with linked PSA outcomes and definitive therapy records for biochemical recurrence analysis.

Biparametric MRIDigital PathologyClinical LabsEHR

Data Access

DUA Required

Standards

DICOMPI-RADS v2.1IBSI

Mayo Biobank

Mayo Clinic Biobank

Mayo Clinic

75,000+ patients10–15 yr follow-upBiospecimens

Unparalleled long-term follow-up data enabling validation of MEFINDER prognosis models at clinically meaningful time horizons.

RadiologyPathologyGenomicsEHR

Data Access

Collaboration Required

Standards

DICOMHL7 FHIRICD-10

VA Prostate

VA Prostate MRI Dataset

Veterans Affairs

387 patients (expanding)Biparametric MRIDiverse population

Biparametric prostate MRI dataset from the VA system, providing representation of a diverse veteran population and enabling cross-site generalization analysis.

Biparametric MRIDigital Pathology

Data Access

VA Data Access

Standards

DICOMPI-RADS v2.1

Clinical Trials

CHAARTED · STAMPEDE · RTOG

Multi-institutional

Multi-siteRandomized controlledTreatment outcomes

Phase III clinical trial datasets providing gold-standard treatment outcome labels for validating APIC treatment benefit predictions in prostate cancer.

PathologyClinicalSurvival data

Data Access

Via investigators

Standards

ICD-10RECIST

Data Standards Used

DICOM

Digital Imaging and Communications in Medicine

All imaging data

HL7 FHIR

Fast Healthcare Interoperability Resources

EHR data exchange

ICD-10

International Classification of Diseases, 10th Revision

Diagnosis codes

PI-RADS v2.1

Prostate Imaging Reporting and Data System

Prostate MRI scoring

IBSI

Image Biomarker Standardization Initiative

Radiomic features

Open Source

Central GitHub Repository

All MEFINDER software, documentation, and reproducibility notebooks are hosted on GitHub. Contributions from the community are welcome — read our contribution guidelines to get started.

# Clone the repository

git clone https://github.com/Emory-Empathathetic-AI-for-Health-Inst/

Multimodal-Multi-scale-Framework-for-Ethical-AI-Model-Development