Product Thumbnail

Ferret

Refer and ground anything anywhere at any granularity

Open Source
Artificial Intelligence
GitHub
Apple

A new type of multimodal large language model (MLLM) from Apple that excels in both image understanding and language processing, particularly demonstrating significant advantages in understanding spatial references.

Top comment

Apple has released a new multimodal large language model that seems promising. I'm interested to learn more about how well it can comprehend spatial references. I'm eager to witness it in action!

Comment highlights

Whoa, Apple's new multimodal big language model sounds amazing! It's wonderful to see advances in language processing and visual interpretation. I would like more information about how it manages references to space. I appreciate you sharing this wonderful news!

Wow, Ferret looks like an amazing new MLLM! Can't wait to see what it can do for image and language processing. Have you found any noticeable improvements in handling multi-step tasks compared to other models? What applications do you

Spatial references haven't met their match until now! Ferret, you're rewriting the rules of grounded AI!

impressive work on the launch! Your tool seems like a game-changer for comprehending spatial references. Kudos on this fantastic project!