Visual Grounding of Learned Physical Models


Yunzhu Li      Toru Lin*      Kexin Yi*      Daniel M. Bear      Daniel L. K. Yamins
Jiajun Wu      Joshua B. Tenenbaum      Antonio Torralba

(* indicate equal contribution, listed in alphabetical order)


Abstract

Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models. In this work, we present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. The visual prior predicts a particle-based representation of the system from visual observations. An inference module operates on those particles, predicting and refining estimates of particle locations, object states, and physical parameters, subject to the constraints imposed by the dynamics prior, which we refer to as visual grounding. We demonstrate the effectiveness of our method in environments involving rigid objects, deformable materials, and fluids. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.


Paper

Yunzhu Li, Toru Lin*, Kexin Yi*, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, and Antonio Torralba
Visual Grounding of Learned Physical Models
ICML 2020, [Paper] [Code] [BibTeX]
(* indicate equal contribution, listed in alphabetical order)


Video




Results

The following are the demo videos for showing the effect of different physical properties on future prediction.

Rigidness Estimation
Comparison between our full model and the baseline without rigidness estimation.



Physical Parameters Estimation
Comparison between our full model and the baseline with randomly sampled physical parameters.



Position Refinement
Comparison between our full model and the baseline without position refinement.



Related Work


Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba
Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids
ICLR 2019, [website]

Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, and Shuran Song
DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions
RSS 2019, [website]

Jiajun Wu*, Ilker Yildirim*, Joseph J. Lim, William T. Freeman, and Joshua B. Tenenbaum
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
NeurIPS 2015, [website]