Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

OmniParser V2

Turn any LLM into a Computer Use Agent

OmniParser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval based next action prediction given a set of parsed interactable elements.

Top comment

Microsoft Research has unveiled their own Computer Use model trained on a ton of labeled screenshots.


The v2 achieves a 60% improvement in latency compared to V1 (avg latency: 0.6s/frame on A100, 0.8s on single 4090).