Papers
arxiv:2601.06378

RigMo: Unifying Rig and Motion Learning for Generative Animation

Published on Jan 10
· Submitted by
Hao Zhang
on Jan 16
Authors:
,
,
,
,
,
,
,
,
,

Abstract

RigMo is a unified generative framework that simultaneously learns rig and motion from mesh sequences, encoding deformations into compact latent spaces for interpretable and physically plausible 3D animation.

AI-generated summary

Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.

Community

Paper submitter

🚀 New work: RigMo — Unifying Rig & Motion Learning for Generative Animation

Rigging and motion are two hard problems—usually solved separately.
RigMo unifies them.

A feed-forward framework that jointly learns rig structure + motion
directly from raw mesh sequences
→ no rig annotations, no per-sequence optimization.
✨ Highlights
• Explicit Gaussian bones + skinning weights
• SE(3) bone motions from compact latents
• Motion-DiT for controllable motion synthesis
• Interpretable, reusable, truly animatable 4D assets

📊 Strong results on DeformingThings4D / Objaverse-XL / TrueBones
🔗 Project page: https://rigmo-page.github.io
📄 Paper (arXiv): https://arxiv.org/abs/2601.06378
📺 Video: https://youtube.com/watch?v=0H0lsM3USVM
💻 Code: coming soon
#ComputerGraphics #Animation #Rigging #4D #3D #GenerativeAI

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.06378 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.06378 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.06378 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.