Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
W8Yi 
posted an update 2 days ago
Post
2631
I built a TCGA WSI feature dataset using UNI2-h.

The official release currently has incomplete coverage (see discussion):
MahmoodLab/UNI2-h-features#2

To make the features easier to use for research, I generated a new dataset:

W8Yi/tcga-wsi-uni2h-features

Key differences from the official release:

• All detected tissue tiles are encoded (not a sampled subset)
• Features can be downloaded per slide instead of large ZIP archives
• QC overlay images are provided for visual inspection
• UNI2-h 1536-D tile embeddings stored in H5 format
• Organized by TCGA project for easier use in MIL / retrieval pipelines

Example layout:

TCGA-HNSC/
  features/*.h5
  vis/*__overlay.png


Hope this helps others working on computational pathology and TCGA WSI research.
In this post