The World is Your Canvas

Painting Promptable Events with Reference Images, Trajectories, and Text

Hanlin Wang^1,2 Hao Ouyang² Qiuyu Wang² Yue Yu^1,2 Yihao Meng^1,2 Wen Wang^3,2 Ka Leong Cheng² Shuailei Ma^4,2 Qingyan Bai^1,2 Yixuan Li^5,2 Cheng Chen^6,2 Yanhong Zeng² Xing Zhu² Yujun Shen² Qifeng Chen¹

¹HKUST, ²Ant Group, ³ZJU, ⁴NEU, ⁵CUHK, ⁶NTU

📄 Paper 💻 GitHub 🤗 Model

Abstract

We present WorldCanvas, a framework for promptable world events that enables rich, user-directed simulation by combining text, trajectories, and reference images. Unlike text-only approaches and existing trajectory-controlled image-to-video methods, our multimodal approach combines trajectories—encoding motion, timing, and visibility—with natural language for semantic intent and reference images for visual grounding of object identity, enabling the generation of coherent, controllable events that include multi-agent interactions, object entry/exit, reference-guided appearance and counterintuitive events. The resulting videos demonstrate not only temporal coherence but also emergent consistency, preserving object identity and scene despite temporary disappearance. By supporting expressive world events generation, WorldCanvas advances world models from passive predictors to interactive, user-shaped simulators.

Event Branches

What event should happen next?

Results Gallery

Subject & Camera motion

The car performs a drift at the intersection, aligning its front end in the correct direction, then drives off in that direction and exits the frame.

The girl is spinning while taking a selfie, continuously moving her head and smiling. The background behind her keeps rotating.

A man walks on the road, first slowly and then run faster (Tip: Use trajectory to control motion speed).

The hot air balloon in the left flies through the other two balloons (Tip: Use trajectory to control occlusion).

Interactive Actions

The woman moves closer to the man, who then lifts her up and spins her halfway around before putting her down. The man embraces the woman as she walks toward him, lifts her up, and spins her halfway around in place before putting her down.

The little girl spins around once in place, then walks toward the puppy approaching her, picks it up, and lifts it above her head. A cute little dog enters the frame, runs toward the little girl, and is then picked up by her and lifted above her head.

A hand moves toward the cup, grasps it, lifts it into the air, releases it, and then moves the hand away. The cup is lifted into the air by a hand; after the hand lets go, the cup falls, topples over upon hitting the table, and water spills out from inside.

A man rides a motorcycle forward at high speed, with the background continuously moving backward. Suddenly, he leans sharply to the left to dodge an incoming missile, then straightens his body again and continues riding forward. The motorcycle speeds forward, and suddenly the rider leans it sharply to the left to dodge an incoming missile, then straightens it back to an upright position. A missile flies toward the man, whizzes past his head, lands behind him, and explodes.

Subject appearance & disappearance

A UFO slowly emerges in the sky. A mysterious man wearing a cloak and a conical hat appears out of nowhere and walks toward the UFO. An alien flies out of the UFO and slowly descends to the ground.

A car drives into the frame and then slams on the brakes to avoid hitting a person. An elderly man walks into the frame and begins moving forward. Suddenly, he notices a car approaching in the distance and starts stepping backward to avoid being hit. He eventually walked out of the frame and disappeared.

The thief fled hastily, running out of the frame. A police officer enters the frame, chasing the fleeing thief.

The ghost floats toward the window and then vanishes.

Reference-Image based generation

The man walks forward to the polar bear, then rides on it and continues moving forward. The polar bear stands still in place, waiting for the man to come to it, mount it, and then carries the man forward. A Chinese dragon flies across the sky, its scales golden yellow.

The man wearing sunglasses turns around on the spot, gazes at the eagle flying across the sky, and removes his hat in salute. An eagle soars through the sky.

The dog leaped vigorously, lunging at the butterfly in midair, but missed and landed on the ground. The butterfly fluttered in the air, dodged the dog's lunge, and then flew upward.

The man wearing sunglasses walks forward, takes off his hat, and places it on the sheep's head. A white sheep walks into the frame.

The man wearing sunglasses turned toward the tower and punched it down with a single blow. The tower remained still at first, then, after being punched by the man, it tilted and collapsed, raising a cloud of dust.

Text-Trajectory correspondence

A cat walks into the frame and moves toward the goal. A dog walks into the frame and moves toward the goal.

A dog walks into the frame and moves toward the goal. A cat walks into the frame and approaches the goal.

The woman on the left moves to the left while raising both hands above her head. The woman on the right moves to the right while placing both hands on her back behind her. The woman in the middle slowly squats down in place.

The woman on the left moves to the left while placing both hands on her back behind her.The woman on the right moves to the right while raising both hands above her head.The woman in the middle slowly squats down in place.

The little girl in front walks forward, raising her hands above her head. The little girl behind walks forward, squats down, then starts crying and wipes her tears with both hands.

The little girl in front walks forward, starts crying, and wipes her tears with both hands. The little girl behind walks forward and squats down, raising her hands above her head.

A fighter jet flies by, leaving a white contrail behind. A UFO flies by, flashing with colorful lights.

Visual Memory

A patterned ball falls, disappears from the frame, then bounces off the ground and reappears in the shot. The cameraperson's hand pulls back and disappears from the frame.

The girl is spinning while taking a selfie, continuously moving her head and smiling. The background behind her keeps rotating.

The little boy walks out of the frame, then walks back into the shot, approaches his backpack on the ground, and picks it up. The backpack lies still on the ground, then is picked up by the little boy as he walks back.

Physical plausibility & Causal reasoning & Prediction

This domino falls and lands on the table.

The burning torch approaches the paper and makes contact with it.

The bottle of drink falls over onto the tabletop.

A book on the table is suddenly pulled away by a hand.

Counterfactual generation

A shark enters the frame, leaps into the air, then dives into the sand and disappears—only to burst back out of the sand and leap into the air again.

As the puppy was walking, it suddenly grew a pair of wings and flew up into the sky.

Failure cases

The girl spins around while taking a selfie, continuously shaking her head and smiling. The background behind her also rotates continuously. After completing a full turn, the background returns to its original position as at the beginning.

The shot initially focuses on a glass of water, then tilts upward to look out the window, and subsequently moves downward again to refocus on the glass of water.

Comparisons

The camera moves forward and then stops. Then the door is opened from the inside. A man in pajamas walks out through the door, raises his hand to greet the camera, then turns around and switches off the lit light.

ATI

Wan2.2 I2V

Ours

ATI

Wan2.2 I2V

Ours

The little girl in front walks forward, starts crying, and wipes her tears with both hands. The little girl behind walks forward and squats down, raising her hands above her head.

ATI

The little girl in front walks forward, starts crying, and wipes her tears with both hands. The little girl behind walks forward and squats down, raising her hands above her head.

Wan2.2 I2V

The little girl in front walks forward, starts crying, and wipes her tears with both hands. The little girl behind walks forward and squats down, raising her hands above her head.

Ours

A cat walks into the frame and approaches the goal. A dog walks into the frame and moves toward the goal.

ATI

A cat walks into the frame and approaches the goal. A dog walks into the frame and moves toward the goal.

Wan2.2 I2V

A cat walks into the frame and approaches the goal. A dog walks into the frame and moves toward the goal.

Frame In-N-Out

A cat walks into the frame and approaches the goal. A dog walks into the frame and moves toward the goal.

Ours

Frame IN-N-OUT

Ours

Standard full cross-attention

Hard cross-attention

Ours