Product Thumbnail

Meta Perception Encoder

Vision encoder setting new standards in image & video tasks

Artificial Intelligence
GitHub

A vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models.

Top comment

Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)

After republishing, the bug should be removed.

P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuoTwelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh

Comment highlights

👋 Hey Hunters!

Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!

From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.

What’s exciting?

✅ Intermediate embeddings for richer representations

✅ Advanced alignment techniques

✅ Strong zero-shot and retrieval performance

✅ Open-source and research-friendly!

Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.

Would love your feedback! 💬👇