Introducing Paris 2.0
A Decentralized Diffusion Model for Video Generation
Today we’re releasing Paris 2.0, the first video generation model pre-trained across heterogeneous GPU types distributed across regions.
Three things make Paris 2.0 unique.
The model was trained on a deeply heterogeneous pool of GPUs across generations and vendors.
Paris 2.0 training ran across geographically distributed regions and clouds, instead of shared datacenters.
In low-resolution text-to-video training, against a monolithic baseline model at a matched total compute budget, Paris 2.0 cuts Fréchet Video Distance (FVD) from 561.04 → 279.01 - a ~2.0× improvement.

Relative improvement over the monolithic baseline, where a taller bar marks a larger gain. Paris 2.0 roughly halves FVD and lifts CLIP text-video and aesthetic scores. Samples

A woman with long, blond, wavy hair is speaking directly to the camera. She is wearing a red sweater. While talking, her facial expressions changing as she speaks. The background is a cluttered room. 
A person’s hands performing a paper-folding craft on a green cutting mat with a grid. The person uses a black marker to make a small mark on a piece of purple paper that has already been folded into a specific shape. 
A pair of hands interacting with a translucent, gelatinous slime. The slime is a vibrant blue color. The hands are seen stretching, squeezing, and folding the slime, demonstrating its gooey and pliable texture. Get Started





