Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

The Alexandria Index

Massive internet datasets, embedded, open-sourced and free

Vector embeddings are incredibly powerful, compressing text into a low-dimensional 'meaning space', but they're also shockingly cheap. Alexandria is an open-source project to embed and release (for free) large datasets in research, law, medicine and more.

Top comment

Today, I'm announcing Alexandria, an open-source initiative to embed the internet. To start, we're releasing the embeddings for every research paper on the Arxiv. That's over 4m items, 600m tokens, and 3.07 billion vector dimensions. We're not stopping here. A significant number of the world's problems are just search, clustering, recommendation, or classification; all things embeddings are great at. For example, finding research papers via keywords is hard when there's 10 words that mean the same thing. Embeddings makes this easy. Embeddings are also a one-time cost and are incredibly cheap. In most cases, you'll never need to compute the same document twice. At the moment, we're embedding tokens at high performance for $1 per 100,000,000 tokens. That's the length of the Bible, 10 times, per dollar. I was surprised when I couldn't find any open embedding datasets (research, law, finance, etc.), considering the immense value and low cost. There's too much to be built here... so we're building an org. and doing it ourselves. You can download the Arxiv embeddings (titles and abstracts, 6gb and 8gb respectively) at the link above. There's a lot of datasets to choose from, so we need your help to figure out what to work on next. Let us know by voting! Note: Embeddings are most often used for search / question answering, so we're building those ourselves. Our Arxiv embedding search launches next week, with more to come. We're also experimenting on a AI agent personal research assistant that helps you learn, teach, and publish.

About The Alexandria Index on Product Hunt

Massive internet datasets, embedded, open-sourced and free

The Alexandria Index launched on Product Hunt on May 27th, 2023 and earned 133 upvotes and 8 comments, placing #6 on the daily leaderboard. Vector embeddings are incredibly powerful, compressing text into a low-dimensional 'meaning space', but they're also shockingly cheap. Alexandria is an open-source project to embed and release (for free) large datasets in research, law, medicine and more.

On the analytics side, The Alexandria Index competes within Open Source, Artificial Intelligence and Data — topics that collectively have 537.4k followers on Product Hunt. The dashboard above tracks how The Alexandria Index performed against the three products that launched closest to it on the same day.

Who hunted The Alexandria Index?

The Alexandria Index was hunted by Chris Messina. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

For a complete overview of The Alexandria Index including community comment highlights and product details, visit the product overview.