Our Research
Multimodal AI frameworks for cancer phenotype discovery.
MEFINDER develops and validates open-source computational frameworks that fuse imaging, pathology, and clinical text to discover patterns no single modality can reveal.
Multimodal Fusion · MICCAI 2025
MOSCARD
MOSCARD addresses a core challenge in multimodal medical imaging: bias introduced when one modality systematically confounds another. By integrating causal reasoning directly into the fusion architecture, MOSCARD learns de-confounded representations that are more robust across patient populations and imaging conditions.
The model uses chest X-ray (CXR) as the primary modality and ECG as a complementary guiding modality. A co-attention mechanism learns which regions of the CXR are most relevant given the ECG signal. Encoders based on Vision Transformer (ViT) and MedCLIP provide modality-specific representations.
A structural causal model (SCM) is integrated into the training pipeline to explicitly model and remove confounding relationships between modalities. The framework supports four training modes: Baseline, Causal, Conf, and CaConf, enabling controlled ablation studies of each causal component.
Architecture
Primary modality
Chest X-ray (CXR)
Guiding modality
ECG
Encoder
ViT / MedCLIP
Fusion
Co-attention mechanism
Causal model
Structural causal model (SCM)
Training modes
Baseline, Causal, Conf, CaConf
Code
Available in repository
Multimodal Fusion · MICCAI 2024
VLM for Mammography
A knowledge-grounded adaptation strategy for vision-language models (VLMs) applied to screening mammography. The framework constructs unique case-sets designed for resident training and few-shot adaptation, addressing the challenge of limited annotated mammography data.
Mini-batch selective sampling is used to build case-sets that maximize representational diversity within each adaptation batch. Two VLMs are evaluated: MedCLIP (in-domain, trained on medical imaging) and ALBEF (out-of-domain, trained on general vision-language pairs).
The approach is validated across zero-shot, few-shot, and supervised settings on UW Madison datasets, with external validation on Mayo Clinic. Authors include Aisha Urooj Khan, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jeong, Amara Tariq, and Imon Banerjee. Code and model checkpoints are publicly available.
Details
Method
Mini-batch selective sampling
VLMs evaluated
MedCLIP (in-domain), ALBEF (out-of-domain)
Evaluation
Zero-shot, few-shot, supervised
Primary validation
UW Madison
External validation
Mayo Clinic
Code
Available in repository
Checkpoints
Available via download link
Multimodal Fusion · IEEE J. Biomed. Health Informatics 2023
MM-STGNN
A multimodal spatiotemporal graph neural network for predicting 30-day all-cause hospital readmission. MM-STGNN fuses two distinct data streams: longitudinal chest radiographs (capturing imaging trajectory over time) and electronic health records (capturing structured clinical measurements and events).
The architecture combines GraphSAGE for learning patient similarity graphs from the EHR modality with a Gated Recurrent Unit (GRU) for modeling temporal dynamics in both imaging and clinical sequences. A cross-modal attention mechanism aligns representations from the two modalities before final prediction.
Evaluated on the MIMIC-IV dataset, MM-STGNN achieved AUROC 0.79 on both evaluation splits, demonstrating that fusing longitudinal imaging with EHR data provides a meaningful improvement over single-modality baselines. Authors include Siyi Tang, Amara Tariq, Jared A. Dunnmon, and Imon Banerjee, among others.
Performance
Task
30-day all-cause readmission
Modalities
Chest X-ray + EHR
Architecture
GraphSAGE + GRU
Dataset
MIMIC-IV
AUROC
0.79
Code
Available in repository
NLP Framework
Breast cancer clinical text mining.
Four NLP tools extract structured outcomes from free-text clinical notes. All tools are validated on data from Mayo Clinic, Stanford University, Emory University, and UC Davis. All are released with academic open-source licenses and packaged in Docker for reproducible deployment.
01
BreastRecurrence_Transformer
Transformer-based NLP for identification of breast cancer recurrence occurrence and timing from EMRs. Adaptable to other cancer sites.
Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged
02
Breast Cancer Treatment Extraction
Hybrid UMLS parser + fine-tuned LLM (GPT-2/BioGPT/LLaMA) for extracting longitudinal treatment timelines from free-text clinical notes.
Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged
03
PCO Extraction
Fine-tuning framework for LLMs to extract patient-centered outcomes (treatment-related side effects: fatigue, depression, anxiety, nausea, lymphedema) from breast cancer clinical notes.
Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged
04
Recurrence Site Extraction (BioLinkBERT)
Fine-tuned BioLinkBERT model for extracting sites of distant recurrence from clinical, radiology, and pathology notes.
Validated: Mayo Clinic · Stanford · Emory · UC Davis · Docker packaged