Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

SmolVLM2

Smallest Video LM Ever from HuggingFace

SmolVLM2, from HuggingFace, is a series of tiny, open-source multimodal model for video understanding. Processes video, images, and text. Ideal for on-device applications.

Top comment

Hi everyone!

Sharing SmolVLM2, a new open-source multimodal model series from Hugging Face that's surprisingly small, with the smallest version at only 256M parameters! It's designed specifically for video understanding, opening up interesting possibilities for on-device AI.

What's cool about it:

📹 Video Understanding: Designed specifically for analyzing video content, not just images.
🤏 Tiny Size: The smallest version is only 256M parameters, meaning it can potentially run on devices with limited resources.
🖼️ Multimodal: Handles video, images, and text, and you can even interleave them in your prompts.
👐 Open Source: Apache 2.0 license.
🤗 Hugging Face Transformers: Easy to use with the transformers library.

It's based on Idefics3 and supports tasks like video captioning, visual question answering, and even story telling from visual content.

You can try a video highlight generation demo here.

VLMs this small could run on our personal phones, and many other devices like glasses. That's the future.