Product Thumbnail

InternVL3

Open MLLMs excelling in vision, reasoning & long context

Open Source
Artificial Intelligence
GitHub
Development

Open MLLM family (1B-78B) from OpenGVLab. Excels at vision, reasoning, long context & agents via native multimodal pre-training. Outperforms base LLMs on text tasks.

Top comment

Hi everyone!

Check out InternVL3 from OpenGVLab – a new family of open vision-language models.

They used a training approach mixing vision and text data from the start, which reportedly leads to strong performance in both understanding images/video and handling text tasks well.

These models show good reasoning abilities and can handle long inputs. The weights and code are openly available.

You can experience these model capabilities directly on their Chat Web and HF Space.

Comment highlights

The Open MLLM family is truly impressive! What stands out is how well these models handle vision and reasoning tasks while outperforming base LLMs even on text benchmarks. The native multimodal pre-training approach seems to be a game-changer.

Can't wait to see what the community will build with these models. Wishing the OpenGVLab team continued success with this project!

Hey Zac Zuo & the OpenGVLab team (congrats on the hunt/launch!), this looks like a significant step forward for open vision-language models. Exciting to see strong performance reported from native multimodal pre-training, especially in reasoning and handling long context alongside vision tasks.


As we're building AI experiences (@UNI AI), having powerful, open models like InternVL3 available is fantastic for the ecosystem. The ability to handle both image/video and text tasks well from the start is key.


Question: Regarding the long context handling – what architectural innovations or training techniques allow InternVL3 to maintain strong performance on extended inputs compared to other MLLMs?


Great contribution to the open-source community. Wishing you success with the launch! 👁️‍🗨️🧠