Tools & Data

Open-source infrastructure for multimodal cancer AI.

View on GitHub

All tools developed under the MEFINDER initiative are released with academic open-source licenses and designed for reproducible research.

01

Multimodal Fusion Models

MOSCARD

MICCAI 2025

FrameworkCausal AI

Addresses bias in multimodal medical imaging by integrating causal reasoning.

Uses chest X-ray (CXR) as primary modality and ECG as complementary guiding modality. Employs a co-attention mechanism and Vision Transformer (ViT)/MedCLIP encoder. Includes a structural causal model (SCM) for de-confounding. Supports four training modes: Baseline, Causal, Conf, and CaConf. Code available in repository.

Availability:

GitHub

VLM for Mammography

MICCAI 2024

Breast CancerVision-Language

Knowledge-grounded adaptation strategy for vision-language models for screening mammography.

Builds unique case-sets for screening mammography using mini-batch selective sampling for VLM adaptation. Evaluated with two VLMs: MedCLIP (in-domain) and ALBEF (out-of-domain). Validated zero-shot, few-shot, and supervised on UW Madison datasets and externally on Mayo Clinic. Authors include Aisha Urooj Khan et al. Model checkpoints available via download link.

Availability:

GitHub; model checkpoints via download link

MM-STGNN

IEEE Journal of Biomedical and Health Informatics, 2023

FrameworkMultimodal Fusion

Multimodal spatiotemporal graph neural network for 30-day all-cause hospital readmission prediction.

Fuses longitudinal chest radiographs and EHR data using a GraphSAGE + GRU architecture. Achieved AUROC 0.79 on both evaluation datasets (MIMIC-IV). Code available in repository.

Performance  — AUROC 0.79 (MIMIC-IV)

Availability:

GitHub

02

NLP & Clinical Text

BreastRecurrence_Transformer

Breast CancerNLP

Transformer-based NLP for identification of breast cancer recurrence occurrence and timing from EMRs.

Adaptable to other cancer sites. Validated on Mayo, Stanford, Emory, and UC Davis. Released with an academic open-source license and packaged in Docker. Model weights available via Google Drive.

Availability:

Google Drive (model weights); Docker

Breast Cancer Treatment Extraction

Breast CancerNLP

Hybrid UMLS parser + fine-tuned LLM for extracting longitudinal treatment timelines from free-text clinical notes.

Combines a UMLS-based parser with fine-tuned language models (GPT-2, BioGPT, LLaMA) to extract structured treatment timelines from unstructured clinical notes. Validated on Mayo, Stanford, Emory, and UC Davis. Released with an academic open-source license and packaged in Docker.

Availability:

GitHub; Docker

PCO Extraction

Breast CancerNLP

Fine-tuning framework for LLMs to extract patient-centered outcomes from breast cancer clinical notes.

Extracts treatment-related side effects including fatigue, depression, anxiety, nausea, and lymphedema from breast cancer clinical notes. Validated on Mayo, Stanford, Emory, and UC Davis. Released with an academic open-source license and packaged in Docker.

Availability:

GitHub; Docker

Recurrence Site Extraction (BioLinkBERT)

Breast CancerNLP

Fine-tuned BioLinkBERT model for extracting sites of distant recurrence from clinical, radiology, and pathology notes.

Fine-tuned on annotated clinical, radiology, and pathology notes to identify distant recurrence sites. Validated on Mayo, Stanford, Emory, and UC Davis. Released with an academic open-source license and packaged in Docker.

Availability:

GitHub; Docker

03

Data Quality & Infrastructure

Mammogram Implant Identifier

Breast CancerData Infrastructure

ResNet18 CNN that identifies breast implants in mammograms without relying on DICOM tags.

Trained on 6,250 mammograms (5,000 train/validate, 1,250 test). Does not rely on DICOM metadata tags. Model weights available in repository.

Performance  — AUROC 0.998 · Sensitivity 0.966 · Specificity 1.000

Availability:

GitHub (model weights in repository)

Niffler

Journal of Digital Imaging (JDI), 2021

FrameworkData Infrastructure

DICOM framework for machine learning pipelines enabling real-time and on-demand DICOM retrieval from PACS.

Enables real-time and on-demand DICOM retrieval from PACS, metadata extraction, anonymization, and processing workflows. Published in the Journal of Digital Imaging (JDI), 2021.

HITI-Preproc

Data Infrastructure

Python package for DICOM preprocessing workflows.

Provides standardized DICOM preprocessing utilities installable directly via PyPI.

Availability:

PyPI
pip install hiti-preproc

RadPrompter

FrameworkData Infrastructure

Tool for simplified and reproducible LLM prompting for structured radiology reporting and dataset relabeling.

Provides a reproducible interface for prompting large language models to generate structured radiology reports and relabel datasets. Installable via PyPI.

Availability:

PyPI
pip install radprompter

04

Prostate Cancer Tools

Prostate Cancer Lesion Detection (nnU-Net)

Prostate Cancer

nnU-Net-based framework for prostate cancer lesion detection.

Developed at Indiana University (Shiradkar Lab). Trained on the PI-CAI and Prostate-158 datasets. Model weights available on request from the authors.

Availability:

Request from authors (Indiana University, Shiradkar Lab)

PI-QUAL vs MRQy Comparison

Prostate CancerData Infrastructure

Code and analysis for prostate MRI quality assessment comparing PI-QUAL and MRQy quality metrics.

Compares PI-QUAL and MRQy quality metrics for prostate MRI. Uses UMAP clustering, V-Net segmentation, and VoxelMorph deformable registration for analysis.

Availability:

GitHub