Post
2433
I built a **TCGA WSI feature dataset using UNI2-h**.
The official release currently has incomplete coverage (see discussion):
MahmoodLab/UNI2-h-features#2
To make the features easier to use for research, I generated a new dataset:
W8Yi/tcga-wsi-uni2h-features
Key differences from the official release:
• **All detected tissue tiles are encoded** (not a sampled subset)
• **Features can be downloaded per slide** instead of large ZIP archives
• **QC overlay images** are provided for visual inspection
• **UNI2-h 1536-D tile embeddings** stored in H5 format
• Organized by TCGA project for easier use in MIL / retrieval pipelines
Example layout:
Hope this helps others working on computational pathology and TCGA WSI research.
The official release currently has incomplete coverage (see discussion):
MahmoodLab/UNI2-h-features#2
To make the features easier to use for research, I generated a new dataset:
W8Yi/tcga-wsi-uni2h-features
Key differences from the official release:
• **All detected tissue tiles are encoded** (not a sampled subset)
• **Features can be downloaded per slide** instead of large ZIP archives
• **QC overlay images** are provided for visual inspection
• **UNI2-h 1536-D tile embeddings** stored in H5 format
• Organized by TCGA project for easier use in MIL / retrieval pipelines
Example layout:
TCGA-HNSC/
features/*.h5
vis/*__overlay.pngHope this helps others working on computational pathology and TCGA WSI research.