PikaStream1.0 lets you have a real-time face-to-face video call with any AI agent. 24 FPS, 1.5s latency, single GPU. Built on a 9B DiT with persistent memory, lip-sync, and natural expressions — not a filter or avatar. Works in Google Meet today.
Pika just made it possible to have a face-to-face video call with an AI agent — in real time.
The problem: today's video generation models are too slow for live interaction. A single clip takes seconds to minutes — fundamentally incompatible with a live video call that needs continuous, identity-consistent, instantly responsive output.
The solution: PikaStream1.0 is a real-time visual engine that generates personalized video at 24 FPS with ~1.5 seconds end-to-end latency on a single GPU.
What stands out:
🎥 Real-time video calls with AI agents — works in Google Meet today, Zoom and FaceTime coming soon ⚡ ~1.5s speech-to-video latency — down from 4.5s on 8 GPUs with their previous model
🧠 Persistent memory and context maintained throughout the call
🤖 Agentic abilities enabled during video calls — get things done face to face
👄 Frame-level lip-sync accuracy via audio cross-attention
😊 Natural gestures and emotionally appropriate reactions
🔄 Mid-stream reference swap — change identity without interrupting generation
🔌 Available on GitHub for any agent, and via API for developers
Under the hood:
FlashVAE — 441 FPS decoding at 480p, 1.1GB memory on a single H100
9B Diffusion Transformer trained with multi-reward RLHF for identity consistency, lip-sync, and motion naturalness
Different because this isn't a filter or an avatar overlay — it's a living AI Self responding in real time with persistent memory, natural expressions, and full agentic capability.
Perfect for creators, developers, and anyone building or using AI agents that need a human-feeling, face-to-face presence.
P.S. I hunt the latest and greatest launches in AI, SaaS and tech — follow me to stay ahead.
Pika just made it possible to have a face-to-face video call with an AI agent — in real time.
The problem: today's video generation models are too slow for live interaction. A single clip takes seconds to minutes — fundamentally incompatible with a live video call that needs continuous, identity-consistent, instantly responsive output.
The solution: PikaStream1.0 is a real-time visual engine that generates personalized video at 24 FPS with ~1.5 seconds end-to-end latency on a single GPU.
What stands out:
🎥 Real-time video calls with AI agents — works in Google Meet today, Zoom and FaceTime coming soon ⚡ ~1.5s speech-to-video latency — down from 4.5s on 8 GPUs with their previous model
🧠 Persistent memory and context maintained throughout the call
🤖 Agentic abilities enabled during video calls — get things done face to face
👄 Frame-level lip-sync accuracy via audio cross-attention
😊 Natural gestures and emotionally appropriate reactions
🔄 Mid-stream reference swap — change identity without interrupting generation
🔌 Available on GitHub for any agent, and via API for developers
Under the hood:
FlashVAE — 441 FPS decoding at 480p, 1.1GB memory on a single H100
9B Diffusion Transformer trained with multi-reward RLHF for identity consistency, lip-sync, and motion naturalness
Different because this isn't a filter or an avatar overlay — it's a living AI Self responding in real time with persistent memory, natural expressions, and full agentic capability.
Perfect for creators, developers, and anyone building or using AI agents that need a human-feeling, face-to-face presence.
P.S. I hunt the latest and greatest launches in AI, SaaS and tech — follow me to stay ahead.