Powerful robotics VLA that runs on consumer hardware
SmolVLA is a compact (450M) open-source Vision-Language-Action model for robotics. Trained on community data, it runs on consumer hardware & outperforms larger models. Released with code & recipes.
Hi everyone!
I think there are a few really important ingredients for bringing AI agents into the physical world. First, they need to be able to interact with real environments. Second, due to the limits of on-robot hardware, the models need to be lightweight and efficient. And third, for the good of the community and wider adoption, these foundational models should ideally be open-source.
SmolVLA is an exciting new release because it squarely addresses these points. It's a compact (450M) Vision-Language-Action (VLA) model that runs on consumer-grade hardware, is fully open-source, and was trained entirely on open, community-contributed robotics datasets from the LeRobot project.
Despite its small size, SmolVLA outperforms much larger VLAs on both simulation and real-world tasks. The team has also implemented things like asynchronous inference to make it even more responsive. This is a fantastic contribution for making capable, real-world robotics research more accessible to everyone.