haotongl commited on
Commit
7cd62ae
·
verified ·
1 Parent(s): 6f9b584

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +143 -3
  2. config.json +89 -0
  3. model.safetensors +3 -0
README.md CHANGED
@@ -1,3 +1,143 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - depth-estimation
5
+ - computer-vision
6
+ - monocular-depth
7
+ - multi-view-geometry
8
+ - pose-estimation
9
+ library_name: depth-anything-3
10
+ pipeline_tag: depth-estimation
11
+ ---
12
+
13
+ # Depth Anything 3: DA3-GIANT
14
+
15
+ <div align="center">
16
+
17
+ [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
18
+ [![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
19
+ [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
20
+ <!-- Benchmark badge removed as per request -->
21
+
22
+ </div>
23
+
24
+ ## Model Description
25
+
26
+ DA3 Giant model for multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation. This is the flagship foundation model with unified depth-ray representation.
27
+
28
+ | Property | Value |
29
+ |----------|-------|
30
+ | **Model Series** | Any-view Model |
31
+ | **Parameters** | 1.15B |
32
+ | **License** | CC BY-NC 4.0 |
33
+
34
+ ⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
35
+
36
+ ## Capabilities
37
+
38
+ - ✅ Relative Depth
39
+ - ✅ Pose Estimation
40
+ - ✅ Pose Conditioning
41
+ - ✅ 3D Gaussians
42
+
43
+ ## Quick Start
44
+
45
+ ### Installation
46
+
47
+ ```bash
48
+ git clone https://github.com/ByteDance-Seed/depth-anything-3
49
+ cd depth-anything-3
50
+ pip install -e .
51
+ ```
52
+
53
+ ### Basic Example
54
+
55
+ ```python
56
+ import torch
57
+ from depth_anything_3.api import DepthAnything3
58
+
59
+ # Load model from Hugging Face Hub
60
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
61
+ model = DepthAnything3.from_pretrained("depth-anything/da3-giant")
62
+ model = model.to(device=device)
63
+
64
+ # Run inference on images
65
+ images = ["image1.jpg", "image2.jpg"] # List of image paths, PIL Images, or numpy arrays
66
+ prediction = model.inference(
67
+ images,
68
+ export_dir="output",
69
+ export_format="glb" # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
70
+ )
71
+
72
+ # Access results
73
+ print(prediction.depth.shape) # Depth maps: [N, H, W] float32
74
+ print(prediction.conf.shape) # Confidence maps: [N, H, W] float32
75
+ print(prediction.extrinsics.shape) # Camera poses (w2c): [N, 3, 4] float32
76
+ print(prediction.intrinsics.shape) # Camera intrinsics: [N, 3, 3] float32
77
+ ```
78
+
79
+ ### Command Line Interface
80
+
81
+ ```bash
82
+ # Process images with auto mode
83
+ da3 auto path/to/images \
84
+ --export-format glb \
85
+ --export-dir output \
86
+ --model-dir depth-anything/da3-giant
87
+
88
+ # Use backend for faster repeated inference
89
+ da3 backend --model-dir depth-anything/da3-giant
90
+ da3 auto path/to/images --export-format glb --use-backend
91
+ ```
92
+
93
+ ## Model Details
94
+
95
+ - **Developed by:** ByteDance Seed Team
96
+ - **Model Type:** Vision Transformer for Visual Geometry
97
+ - **Architecture:** Plain transformer with unified depth-ray representation
98
+ - **Training Data:** Public academic datasets only
99
+
100
+ ### Key Insights
101
+
102
+ 💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization. # noqa: E501
103
+
104
+ ✨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
105
+
106
+ ## Performance
107
+
108
+ 🏆 Depth Anything 3 significantly outperforms:
109
+ - **Depth Anything 2** for monocular depth estimation
110
+ - **VGGT** for multi-view depth estimation and pose estimation
111
+
112
+ For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io). # noqa: E501
113
+
114
+ ## Limitations
115
+
116
+ - The model is trained on academic datasets and may have limitations on certain domain-specific images # noqa: E501
117
+ - Performance may vary depending on image quality, lighting conditions, and scene complexity
118
+ - ⚠️ **Non-commercial use only** due to CC BY-NC 4.0 license.
119
+
120
+ ## Citation
121
+
122
+ If you find Depth Anything 3 useful in your research or projects, please cite:
123
+
124
+ ```bibtex
125
+ @article{depthanything3,
126
+ title={Depth Anything 3: Recovering the visual space from any views},
127
+ author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
128
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
129
+ year={2025}
130
+ }
131
+ ```
132
+
133
+ ## Links
134
+
135
+ - 🏠 [Project Page](https://depth-anything-3.github.io)
136
+ - 📄 [Paper](https://arxiv.org/abs/)
137
+ - 💻 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
138
+ - 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
139
+ - 📚 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
140
+
141
+ ## Authors
142
+
143
+ [Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/) # noqa: E501
config.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "da3-giant",
3
+ "config": {
4
+ "__object__": {
5
+ "path": "depth_anything_3.model.da3",
6
+ "name": "DepthAnything3Net",
7
+ "args": "as_params"
8
+ },
9
+ "net": {
10
+ "__object__": {
11
+ "path": "depth_anything_3.model.dinov2.dinov2",
12
+ "name": "DinoV2",
13
+ "args": "as_params"
14
+ },
15
+ "name": "vitg",
16
+ "out_layers": [
17
+ 19,
18
+ 27,
19
+ 33,
20
+ 39
21
+ ],
22
+ "alt_start": 13,
23
+ "qknorm_start": 13,
24
+ "rope_start": 13,
25
+ "cat_token": true
26
+ },
27
+ "head": {
28
+ "__object__": {
29
+ "path": "depth_anything_3.model.dualdpt",
30
+ "name": "DualDPT",
31
+ "args": "as_params"
32
+ },
33
+ "dim_in": 3072,
34
+ "output_dim": 2,
35
+ "features": 256,
36
+ "out_channels": [
37
+ 256,
38
+ 512,
39
+ 1024,
40
+ 1024
41
+ ]
42
+ },
43
+ "cam_enc": {
44
+ "__object__": {
45
+ "path": "depth_anything_3.model.cam_enc",
46
+ "name": "CameraEnc",
47
+ "args": "as_params"
48
+ },
49
+ "dim_out": 1536
50
+ },
51
+ "cam_dec": {
52
+ "__object__": {
53
+ "path": "depth_anything_3.model.cam_dec",
54
+ "name": "CameraDec",
55
+ "args": "as_params"
56
+ },
57
+ "dim_in": 3072
58
+ },
59
+ "gs_head": {
60
+ "__object__": {
61
+ "path": "depth_anything_3.model.gsdpt",
62
+ "name": "GSDPT",
63
+ "args": "as_params"
64
+ },
65
+ "dim_in": 3072,
66
+ "output_dim": 38,
67
+ "features": 256,
68
+ "out_channels": [
69
+ 256,
70
+ 512,
71
+ 1024,
72
+ 1024
73
+ ]
74
+ },
75
+ "gs_adapter": {
76
+ "__object__": {
77
+ "path": "depth_anything_3.model.gs_adapter",
78
+ "name": "GaussianAdapter",
79
+ "args": "as_params"
80
+ },
81
+ "sh_degree": 2,
82
+ "pred_color": false,
83
+ "pred_offset_depth": true,
84
+ "pred_offset_xy": true,
85
+ "gaussian_scale_min": 1e-05,
86
+ "gaussian_scale_max": 30.0
87
+ }
88
+ }
89
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e47a08338ca73a6d6a21d37fd060b26b993b672bc6ddf6295fe474df2592001
3
+ size 5422814644