Meta Perception Encoder
Vision encoder setting new standards in image & video tasks
Artificial Intelligence
GitHub

Featured onMay 8th, 2025

A vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models.

Top comment

Upvotes133

▲ 133View on ProductHunt ⧉

Comments8

8 commentsSee comments on PH ⧉

Product of the Day15th

Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)
After republishing, the bug should be removed.
P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuo – Twelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh

Comment highlights

👋 Hey Hunters!
Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!
From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.
What’s exciting?
✅ Intermediate embeddings for richer representations
✅ Advanced alignment techniques
✅ Strong zero-shot and retrieval performance
✅ Open-source and research-friendly!
Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.
Would love your feedback! 💬👇