Lazy loaded image
Logs for Cross-modal Distillation under the LUPI Paradigm
Words 436Read Time 2 min
Dec 23, 2024
Oct 9, 2025
type
status
date
slug
summary
tags
category
icon
password
This serves as a log for using learning using privileged information (LUPI) paradigm to enhancing WSI diagnosis.
 
Cover page credits to Fengtao Zhou (https://cvzzz.com/).
 
Note: this page will provide my thoughts for processing problems. This page would be served as my discovery logs and should not be treated seriously.

Sect 1. Introduction

All of us (or at least people doing multimodal learning) know multimodal is good for improving DL accuracy, but unfortunately, not every tasks have complete paired datasets. Plenty of literature suggests to use multi-modal information and fuse the complementary information for better predictions. However, obtaining complete modalities is challenging in real-world scenario.
The motivation of this work is pretty simple: routine checks of WSIs are cheap, and molecular testing are expensive. Given that we have large datasets like TCGA that provided multi-modal paired dataset, why don’t we fully leverage that? Therefore a natural idea is, if we “fixed” the genomics knowledge, or at least “remember” the transferrable mapping (although this is incredibly hard), this part of the genomics knowledge can be reused to improve the WSI (as the pseudo-multimodal feature).

Sect 2. Results and Failed Attempts

 
The fundamental architecture of this work is basic RRT [1] + L1 constraints for Genomics Distillation in training + ABMIL aggregator.
 
Failed Attempts:
#1: Change the aggregator to more complicated ones. e.g., DSMIL [2], DTFD [3], etc. none of them brings superior improvement or drastically deteriorate in the external test.
 
#2: Add the Cox survival loss. In the survival tasks, I used micro batch technique for computing the survival loss. But unfortunately, the loss did not converged.

Sect 3. Discussions

To date, this work just contains basic classification tasks covering major cancer type around the TCGA cohorts and private external cohorts. I hope to extended to regression tasks like several Nat. Commun. [4]/ Nat. Mach. Intell. [5] papers do. This part is worth investigation.
 
 
 
 

References

[1] Tang, Wenhao, et al. "Feature re-embedding: Towards foundation model-level performance in computational pathology." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024.
[2] Li, Bin, Yin Li, and Kevin W. Eliceiri. "Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
[3] Zhang, Hongrun, et al. "Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
[4] El Nahhas, Omar SM, et al. "Regression-based Deep-Learning predicts molecular biomarkers from pathology slides." Nature communications 15.1 (2024): 1253.
[5] Ing, Alex, et al. "Integrating multimodal cancer data using deep latent variable path modelling." Nature Machine Intelligence (2025): 1-23.
 
 
 
 
 
 
 
Prev Post
博三某夜之静夜思
Next Post
Logs for FM-powered Large-scale Cervical Cancer Screening

Comments
Loading...