AnimateAnything

Abstract

Image animation is a key task in computer vision which aims to generate dynamic visual content from static image. Recent image animation methods employ neural based rendering technique to generate realistic animations. Despite these advancements, achieving fine-grained and controllable image animation guided by text remains challenging, particularly for open-domain images captured in diverse real environments. In this paper, we introduce an open domain image animation method that leverages the motion prior of video diffusion model. Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control the movable area and its motion speed. This results in enhanced alignment between the animated visual elements and the prompting text, thereby facilitating a fine-grained and interactive animation generation process for intricate motion sequences. We validate the effectiveness of our method through rigorous experiments on an open-domain dataset, with the results showcasing its superior performance.

Method

We adopt the widely used 3D U-Net based video diffusion model for image animation. Given a noisy video latent with shape (frames, height, width, channel), we concatenate the clean latent of the reference image and the noisy frames in the temporal dimension. Additionally, we concatenate the motion area mask with the video latent in the channel dimension. This results in the input latent with shape (frames+1, height, width, channel+1) for the 3D U-Net. To control the motion strength of the generated video, we project the motion strength as positional embedding and concatenate it with the time step embedding.

Animate Image with Text

A gril is talking. A boy sprays water.

A gril moves hands. A boy is smiling.

A lion moves head. Cartoon pigs are talking.

A cartoon turtle is talking. The sea ridges under the wind.

The windmill is spinning. A yellow car is running on the grass.

Animate Image with Text And Motion Area Mask

The clouds are moving to the left.

The girl is walking in the forest.

The couple is having an intimate conversation.

The woman is waving her hands.

A small boat sails in the sea.

The coconut tree is blowing in the wind.

A small boat sails in the sea.

The snow is falling slowly.

The sunflower is blowing in the wind.

The fish and tadpoles are playing.

Animate Image with Text And Multiple Motion Area Mask

Two wild geese flying in the air.

Scarlet Witch is casting a spell.

People are walking. The ship sailed on the water.

The ship sailed on the water.

BibTeX

@misc{dai2023finegrained, title={Fine-Grained Open Domain Image Animation with Motion Guidance}, author={Zuozhuo Dai and Zhenghao Zhang and Yao Yao and Bingxue Qiu and Siyu Zhu and Long Qin and Weizhi Wang}, year={2023}, eprint={2311.12886}, archivePrefix={arXiv}, primaryClass={cs.CV} }

AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance

Image Animation Examples

Abstract

Method

Animate Image with Text

A gril is talking. A boy sprays water.

A gril moves hands. A boy is smiling.

A lion moves head. Cartoon pigs are talking.

A cartoon turtle is talking. The sea ridges under the wind.

The windmill is spinning. A yellow car is running on the grass.

Animate Image with Text And Motion Area Mask

The clouds are moving to the left.

The girl is walking in the forest.

The couple is having an intimate conversation.

The woman is waving her hands.

A small boat sails in the sea.

The coconut tree is blowing in the wind.

A small boat sails in the sea.

The snow is falling slowly.

The sunflower is blowing in the wind.

The fish and tadpoles are playing.

Animate Image with Text And Multiple Motion Area Mask

Two wild geese flying in the air.

Scarlet Witch is casting a spell.

People are walking. The ship sailed on the water.

The ship sailed on the water.

Different values of motion strength

A woman is smiling.

A gold rabbit moves hand.

BibTeX