Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Bagel

Unified model for multimodal understanding and generation

BAGEL by ByteDance-Seed is an Apache 2.0 open-source unified multimodal model for advanced image/text understanding, generation, editing, and navigation, with capabilities comparable to proprietary systems.

Top comment

Hi everyone!

ByteDance-Seed has released BAGEL, an open-source model that handles both images and text. It's built to understand and create content using both, offering an open option compared to some of the well-known proprietary systems.

With BAGEL, you can chat using images and text, generate realistic images, edit pictures while keeping important details, transfer styles, and navigate environments based on what it learned from video. It also has a "thinking" mode which aims to improve outputs by first processing the prompt in more detail.

BAGEL uses a Mixture-of-Transformer-Experts (MoT) architecture and was trained on a lot of mixed image, text, and video data. It's open-source under the Apache 2.0 license, so you can fine-tune it and use it in your own projects.

BAGEL performs well on standard tests for understanding and generating multimodal content. They also mention its image generation quality is comparable to some dedicated image models, and it can handle more advanced tasks like free-form image changes and world navigation.

You can try out the model with this demo.