Open Science
Tools & Data
Every component of MEFINDER is freely available under open-source licenses. Use, adapt, and extend our tools for your own research.
Open-Source Toolkit
10 Tools, Freely Available
Organized across harmonization, feature extraction, embedding, and classification.
Data Harmonization
5 toolsHistoQC
v2.1HistoQC provides automated quality assessment for whole-slide pathology images, detecting artifacts including blur, tissue fold, air bubbles, pen marks, and staining anomalies. The tool generates per-tile quality masks and summary metrics, enabling downstream analysis to filter unreliable regions automatically.
Janowczyk A, et al. JCO CCI 2019.
F-SYN
F-SYN performs stain normalization in the Fourier domain, transferring stain statistics between source and target slides while preserving tissue morphology. Unlike GAN-based approaches, F-SYN introduces no hallucinated structures, making it suitable for clinical and regulatory workflows.
MQUAL
MQUAL provides automated quality scoring for prostate and breast MRI studies, evaluating signal-to-noise ratio, motion artifact severity, field uniformity, and sequence protocol compliance. Validated on EPIP and VA prostate MRI datasets, MQUAL enables scalable QC for large retrospective imaging cohorts.
Beaks
Beaks provides a unified quality assurance interface for heterogeneous imaging collections, supporting both radiology (DICOM) and pathology (SVS/NDPI) formats. It generates standardized quality reports and flags outlier studies for radiologist or pathologist review.
PyComBatch
PyComBatch implements ComBat harmonization with extensions for radiomic feature matrices, removing scanner-related batch effects while preserving biological variance. Supports both parametric and non-parametric harmonization modes, with integration hooks for IBSI-compliant feature sets.
Classification & Prediction
2 toolsAPIC
v1.0APIC uses deep learning on H&E pathology slides to quantify tumor-immune microenvironment features and predict treatment benefit from systemic therapies. Validated on CHAARTED and STAMPEDE clinical trial datasets, APIC achieves performance comparable to expensive molecular assays at a fraction of the cost.
Bhatt D, et al. JCO CCI 2024.
MOSCARD
MOSCARD addresses spurious correlations in multimodal fusion models by embedding causal structure into the learning process. Using structural causal models (SCMs) and do-calculus, MOSCARD separates true predictive features from confounders introduced by demographic or acquisition biases. Demonstrated on cardiovascular opportunistic screening with strong fairness properties.
Gichoya JW, et al. MICCAI 2025.
Multimodal Embedding
2 toolsMamoCLIP
MamoCLIP adapts the CLIP contrastive learning paradigm to mammography analysis, aligning imaging features with radiology report text embeddings. Trained in a federated setting across EMBED v2 institutions, MamoCLIP produces transferable representations for downstream breast cancer classification and risk prediction tasks.
Shrivastava A, et al. MICCAI 2024.
HemaToMe
HemaToMe provides a general-purpose multimodal fusion architecture combining image-derived pathology features with structured EHR data. The framework implements graph-based and attention-based fusion strategies with support for missing modality imputation, making it suitable for real-world clinical cohorts with incomplete data.
Feature Extraction
1 toolsProstateNet
ProstateNet provides pixel-level segmentation of the transition zone (TZ) and peripheral zone (PZ) in biparametric prostate MRI, enabling zone-specific radiomic feature extraction aligned with PI-RADS v2.1. Trained on EPIP and VA datasets, ProstateNet generalizes robustly across scanner vendors and field strengths.
Research Data
Primary Datasets
Large-scale, multi-site datasets providing the foundation for MEFINDER's analyses. Data access follows institutional DUA and IRB requirements.
EMBED v2
Emory Breast Imaging Dataset v2
Emory University
The largest publicly available institutional mammography dataset, including FFDM, DBT, and breast MRI with paired pathology and outcomes data.
Data Access
TCIA (public)Standards
EPIP
Emory Prostate Imaging & Pathology
Emory University
Institutional prostate MRI and digital pathology cohort with linked PSA outcomes and definitive therapy records for biochemical recurrence analysis.
Data Access
DUA RequiredStandards
Mayo Biobank
Mayo Clinic Biobank
Mayo Clinic
Unparalleled long-term follow-up data enabling validation of MEFINDER prognosis models at clinically meaningful time horizons.
Data Access
Collaboration RequiredStandards
VA Prostate
VA Prostate MRI Dataset
Veterans Affairs
Biparametric prostate MRI dataset from the VA system, providing representation of a diverse veteran population and enabling cross-site generalization analysis.
Data Access
VA Data AccessStandards
Clinical Trials
CHAARTED · STAMPEDE · RTOG
Multi-institutional
Phase III clinical trial datasets providing gold-standard treatment outcome labels for validating APIC treatment benefit predictions in prostate cancer.
Data Access
Via investigatorsStandards
Data Standards Used
DICOM
Digital Imaging and Communications in Medicine
All imaging data
HL7 FHIR
Fast Healthcare Interoperability Resources
EHR data exchange
ICD-10
International Classification of Diseases, 10th Revision
Diagnosis codes
PI-RADS v2.1
Prostate Imaging Reporting and Data System
Prostate MRI scoring
IBSI
Image Biomarker Standardization Initiative
Radiomic features
Open Source
Central GitHub Repository
All MEFINDER software, documentation, and reproducibility notebooks are hosted on GitHub. Contributions from the community are welcome — read our contribution guidelines to get started.
# Clone the repository
git clone https://github.com/Emory-Empathathetic-AI-for-Health-Inst/
Multimodal-Multi-scale-Framework-for-Ethical-AI-Model-Development