Open MLLMs excelling in vision, reasoning & long context
Open MLLM family (1B-78B) from OpenGVLab. Excels at vision, reasoning, long context & agents via native multimodal pre-training. Outperforms base LLMs on text tasks.
Check out InternVL3 from OpenGVLab – a new family of open vision-language models.
They used a training approach mixing vision and text data from the start, which reportedly leads to strong performance in both understanding images/video and handling text tasks well.
These models show good reasoning abilities and can handle long inputs. The weights and code are openly available.
You can experience these model capabilities directly on their Chat Web and HF Space.