
OmniParser V2
Turn any LLM into a Computer Use Agent
User ExperienceArtificial IntelligenceGitHubComputers

OmniParser V2
Turn any LLM into a Computer Use Agent
User Experience
Artificial Intelligence
GitHub
Computers
Featured onFebruary 15th, 2025
Product upvotes vs the next 3
Waiting for data. Loading
Product comments vs the next 3
Waiting for data. Loading
Product upvote speed vs the next 3
Waiting for data. Loading
Product upvotes and comments
Waiting for data. Loading
Product vs the next 3
Loading
OmniParser V2
Turn any LLM into a Computer Use Agent
OmniParser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval based next action prediction given a set of parsed interactable elements.
Top comment

Product of the Day3rd
Microsoft Research has unveiled their own Computer Use model trained on a ton of labeled screenshots.
The v2 achieves a 60% improvement in latency compared to V1 (avg latency: 0.6s/frame on A100, 0.8s on single 4090).