Our Research

Multimodal AI frameworks for cancer phenotype discovery.

MEFINDER develops and validates open-source computational frameworks that fuse imaging, pathology, and clinical text to discover patterns no single modality can reveal.

Multimodal Fusion · MICCAI 2025

MOSCARD

Causal AI

MOSCARD addresses a core challenge in multimodal medical imaging: bias introduced when one modality systematically confounds another. By integrating causal reasoning directly into the fusion architecture, MOSCARD learns de-confounded representations that are more robust across patient populations and imaging conditions.

The model uses chest X-ray (CXR) as the primary modality and ECG as a complementary guiding modality. A co-attention mechanism learns which regions of the CXR are most relevant given the ECG signal. Encoders based on Vision Transformer (ViT) and MedCLIP provide modality-specific representations.

A structural causal model (SCM) is integrated into the training pipeline to explicitly model and remove confounding relationships between modalities. The framework supports four training modes: Baseline, Causal, Conf, and CaConf, enabling controlled ablation studies of each causal component.

Architecture

Primary modality

Chest X-ray (CXR)

Guiding modality

ECG

Encoder

ViT / MedCLIP

Fusion

Co-attention mechanism

Causal model

Structural causal model (SCM)

Training modes

Baseline, Causal, Conf, CaConf

Code

Available in repository

Multimodal Fusion · MICCAI 2024

VLM for Mammography

Vision-Language

A knowledge-grounded adaptation strategy for vision-language models (VLMs) applied to screening mammography. The framework constructs unique case-sets designed for resident training and few-shot adaptation, addressing the challenge of limited annotated mammography data.

Mini-batch selective sampling is used to build case-sets that maximize representational diversity within each adaptation batch. Two VLMs are evaluated: MedCLIP (in-domain, trained on medical imaging) and ALBEF (out-of-domain, trained on general vision-language pairs).

The approach is validated across zero-shot, few-shot, and supervised settings on UW Madison datasets, with external validation on Mayo Clinic. Authors include Aisha Urooj Khan, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jeong, Amara Tariq, and Imon Banerjee. Code and model checkpoints are publicly available.

Details

Method

Mini-batch selective sampling

VLMs evaluated

MedCLIP (in-domain), ALBEF (out-of-domain)

Evaluation

Zero-shot, few-shot, supervised

Primary validation

UW Madison

External validation

Mayo Clinic

Code

Available in repository

Checkpoints

Available via download link

Multimodal Fusion · IEEE J. Biomed. Health Informatics 2023

MM-STGNN

Readmission Prediction

A multimodal spatiotemporal graph neural network for predicting 30-day all-cause hospital readmission. MM-STGNN fuses two distinct data streams: longitudinal chest radiographs (capturing imaging trajectory over time) and electronic health records (capturing structured clinical measurements and events).

The architecture combines GraphSAGE for learning patient similarity graphs from the EHR modality with a Gated Recurrent Unit (GRU) for modeling temporal dynamics in both imaging and clinical sequences. A cross-modal attention mechanism aligns representations from the two modalities before final prediction.

Evaluated on the MIMIC-IV dataset, MM-STGNN achieved AUROC 0.79 on both evaluation splits, demonstrating that fusing longitudinal imaging with EHR data provides a meaningful improvement over single-modality baselines. Authors include Siyi Tang, Amara Tariq, Jared A. Dunnmon, and Imon Banerjee, among others.

Performance

Task

30-day all-cause readmission

Modalities

Chest X-ray + EHR

Architecture

GraphSAGE + GRU

Dataset

MIMIC-IV

AUROC

0.79

Code

Available in repository

NLP Framework

Breast cancer clinical text mining.

Four NLP tools extract structured outcomes from free-text clinical notes. All tools are validated on data from Mayo Clinic, Stanford University, Emory University, and UC Davis. All are released with academic open-source licenses and packaged in Docker for reproducible deployment.

BreastRecurrence_Transformer

Transformer-based NLP for identification of breast cancer recurrence occurrence and timing from EMRs. Adaptable to other cancer sites.

Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged

Breast Cancer Treatment Extraction

Hybrid UMLS parser + fine-tuned LLM (GPT-2/BioGPT/LLaMA) for extracting longitudinal treatment timelines from free-text clinical notes.

Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged

PCO Extraction

Fine-tuning framework for LLMs to extract patient-centered outcomes (treatment-related side effects: fatigue, depression, anxiety, nausea, lymphedema) from breast cancer clinical notes.

Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged

Recurrence Site Extraction (BioLinkBERT)

Fine-tuned BioLinkBERT model for extracting sites of distant recurrence from clinical, radiology, and pathology notes.

Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged