SmolVLA is a compact (450M) open-source Vision-Language-Action model for robotics. Trained on community data, it runs on consumer hardware & outperforms larger models. Released with code & recipes.
Hi everyone!
I think there are a few really important ingredients for bringing AI agents into the physical world. First, they need to be able to interact with real environments. Second, due to the limits of on-robot hardware, the models need to be lightweight and efficient. And third, for the good of the community and wider adoption, these foundational models should ideally be open-source.
SmolVLA is an exciting new release because it squarely addresses these points. It's a compact (450M) Vision-Language-Action (VLA) model that runs on consumer-grade hardware, is fully open-source, and was trained entirely on open, community-contributed robotics datasets from the LeRobot project.
Despite its small size, SmolVLA outperforms much larger VLAs on both simulation and real-world tasks. The team has also implemented things like asynchronous inference to make it even more responsive. This is a fantastic contribution for making capable, real-world robotics research more accessible to everyone.
SmolVLA is a great example of efficient design meeting real-world usability — compact, open-source, and high-performing. Love that it’s accessible to the broader robotics community right out of the box.
Hugging Face is doing incredible work! Their open-source model hub—packed with thousands of pre-trained models—makes it a breeze to dive into NLP, vision, and generative AI. I love how the community and APIs make complex AI feel so accessible and fun. Huge kudos to the team for building such a welcoming ecosystem!
Love that SmolVLA was trained on open, community datasets and is itself fully open-source. Transparency and collaboration like this will really push robotics research forward.