A large language model for zero-shot video generation
VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator.
Google announced VideoPoet:
VideoPoet is an impressive modeling approach that effortlessly converts autoregressive language models into top-notch video generators. It harmoniously blends various modalities like images, video, and audio, using a unified vocabulary.
The inclusion of multimodal generative learning objectives enhances its capabilities, covering tasks like text-to-video, image-to-video, and more. The model's zero-shot prowess, demonstrated in tasks like text-to-audio, adds an exciting dimension. In essence, VideoPoet breaks traditional boundaries, offering a powerful tool for innovative multimodal content generation.