Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Llama

3.1-405B: an open source model to rival GPT-4o / Claude-3.5

Meta is releasing three models: The new 3.1-405B and upgrades to their smaller models: 3.1-70B and 3.1-8B. If 405B is as good as the benchmarks indicate, this would be the first time an open source model rivaled the best closed models—a profound shift.

Top comment

This could be The One - the open source model that closes the gap with the top closed models like GPT-4o / Claude-3.5. It's a "curves-crossing" moment reminiscent of how the Intel vs. ARM approaches played out, and perhaps with similar profound effects on the landscape. If you're in SF, join us tonight for a meetup and 405B panel including founders of Vercel and JuliusAI: https://lu.ma/4es9bfgs Also, one-time only: Launch your product TODAY using Llama-3 405b and we'll feature it (this won't prevent you from launching in the near future). Plus the top launches will be eligible to demo tonight after the panel. In private conversations with launch partners, Meta has emphasized 405B's reasoning capability and multilingual abilities. This would seem to have big implications for interfaces, especially voice interfaces. Are people finding the model lives up to this, practically? Som more thoughts from a friend, @kwindla (Daily.co), who is a launch partner for 405B: "1. 405B beats GPT-4o on 11 of 13 widely used benchmarks. And Meta/Fair has a history of being careful about these benchmarks, so they almost certainly went to a lot of effort to not let training data leak into test, etc. No open source model has previously come close to GPT-4o/Claude-3.5. It’s a huge, huge deal if this is accurate and reflects the quality of “reasoning” the model can do. 2. The two smaller 3.1 models (70B and 8B) also made big leaps in benchmark performance. That indicates that Meta’s strategy for training/distilling is working. Having models that are small enough to run on single devices (or, on LPUs, very very very fast and inexpensively) that are this good may be equivalent to leap-frogging GPT-4o-mini. This also gives people the opportunity to experiment with fine-tuning really good models and with doing architecture/merge experiments. 3. Big models have a different “tone/vibe” than small models. 3-70B was a pretty good model in a lot of ways, but as a conversational agent it just didn’t feel as good qualitatively as GPT-4o and Claude-3.5. That feel really matters in things like consumer-facing voice chat use cases. If 405B is approximately as good as the proprietary models on benchmarks, *and* matches their “vibe” for the first time, that’s truly exciting for a whole range of next-generation conversational/interactive use cases."