# :chestnut: SEED Multimodal [![Project Homepage](https://img.shields.io/badge/Project-Homepage-green)](https://ailab-cvc.github.io/seed/) [![arXiv](https://img.shields.io/badge/arXiv-2307.08041-b31b1b.svg)](https://arxiv.org/abs/2307.08041) [![arXiv](https://img.shields.io/badge/arXiv-2310.01218-b31b1b.svg)](https://arxiv.org/abs/2310.01218) [![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/AILab-CVC/SEED/tree/main) [![Demo](https://img.shields.io/badge/Gradio-Demo-orange)](https://10a4e7976e6fc2032c.gradio.live/) **Powered by [CV Center, Tencent AI Lab](https://ailab-cvc.github.io), and [ARC Lab, Tencent PCG](https://github.com/TencentARC).** ![image](https://github.com/AILab-CVC/SEED/blob/main/paper_images/milestone.jpg) The repository provides the official implementation of [SEED](https://ailab-cvc.github.io/seed/seed.html), [SEED-LLaMA](https://ailab-cvc.github.io/seed/seed_llama.html). For any inquiries, please email [seed-x@googlegroups.com](mailto:seed-x@googlegroups.com). ## News **:beers: We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. :beers:** - [x] **2023-10-23** :hugs: We have optimized the memory overhead. Through 8bit quantization and dynamic loading, SEED-LLaMA 8b/14B can run on single **16GB/24GB** GPU. - [x] **2023-10-23** :hugs: All model weights will be **downloaded automatically** when starting the demo. - [x] **2023-10-20** :hugs: We release the [checkpoints](https://huggingface.co/AILab-CVC/SEED/tree/main) and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B. - [x] **2023-10-20** :space_invader: We release an online [gradio demo](https://10a4e7976e6fc2032c.gradio.live/), feel free to use it by yourself. - [x] **2023-10-02** :paperclip: We release the technical report of SEED-LLaMA on [arXiv](https://arxiv.org/abs/2310.01218), which is empowered by the improved SEED-2 tokenizer. - [x] **2023-07-29** :octocat: We release the checkpoint of the SEED tokenizer and its inference code. Check it out via [SEED-1](./SEED-1.md). - [x] **2023-07-16** :paperclip: We release the technical report of SEED on [arXiv](https://arxiv.org/abs/2307.08041). Stay tuned for the updates! ## Brief Introduction It is recommended to check out our [papers](#citation) for technical details. ### :speech_balloon: What can SEED-LLaMA do? ![image](https://github.com/AILab-CVC/SEED/blob/main/paper_images/v2/teaser.jpg) **SEED-LLaMA** is capable of both multimodal comprehension and generation, exhibiting compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant. [[Compare to SOTA]](https://ailab-cvc.github.io/seed/seed_llama_compare.html) [[More examples on X]](https://twitter.com/ge_yixiao/status/1710509538238157069?s=20) ### :bulb: How does SEED-LLaMA achieve it? ![image](https://github.com/AILab-CVC/SEED/blob/main/paper_images/seed_overview.jpg) The core of SEED-LLaMA is the tailored **SEED** tokenizer, which properly quantized visual signals into discrete visual tokens, capturing necessary semantics while being produced under 1D causal dependence. [[SEED-2 vs. SEED-1]](https://ailab-cvc.github.io/seed/seed_llama.html) ## Usage ### Dependencies - Python >= 3.8 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux)) - [PyTorch >= 1.11.0](https://pytorch.org/) - NVIDIA GPU + [CUDA](https://developer.nvidia.com/cuda-downloads) ### Installation Clone the repo and install dependent packages ```bash git clone https://github.com/AILab-CVC/SEED.git cd SEED pip install -r requirements.txt ``` ### Model Weights We release the pretrained SEED Tokenizer and De-Tokenizer, pretrained and instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B in [SEED Hugging Face](https://huggingface.co/AILab-CVC/SEED). - Check the SEED tokenizer weights in [AILab-CVC/seed-tokenizer-2](https://huggingface.co/AILab-CVC/seed-tokenizer-2) - Check the SEED LLaMA(8B) weights in [AILab-CVC/seed-llama-8b-sft](https://huggingface.co/AILab-CVC/seed-llama-8b-sft) - Check the SEED LLaMA(14B) weights in [AILab-CVC/seed-llama-14b-sft](https://huggingface.co/AILab-CVC/seed-llama-14b-sft) The model weights of unCLIP SD-UNet which are used to reconstruct the image will be downloaded automatically. ### Inference for visual tokenization and de-tokenization To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet: ```bash cd .. # SEED/ python scripts/seed_tokenizer_inference.py ``` ### Inference for SEED-LLaMA Given that SEED-LLaMA-8B is based on Vicuna-7B and SEED-LLaMA-14B based on LLaMA2-Chat-13B, we use Vicuna-7B's ("USER:", "ASSISTANT:") and LLaMA2-Chat-13B's ([INST] [/INST]) prompts for respective instruction tuning. ```bash # Inference for SEED-LLaMA-8B python scripts/seed_llama_inference_8B.py ``` ```bash # Inference for SEED-LLaMA-14B python scripts/seed_llama_inference_14B.py ``` ### Launching Gradio Demo of SEED-LLaMA-14B Locally 1. Building the local demo of SEED-LLaMA-14B currently requires **single 24GB** GPU. ```bash # SEED/ # in first terminal bash scripts/start_backend_14b.sh # in second terminal bash scripts/start_frontend_14b.sh ``` 2. Building the local demo of SEED-LLaMA-8B currently requires **single 16GB** GPU. ```bash # SEED/ # in first terminal bash scripts/start_backend_8b.sh # in second terminal bash scripts/start_frontend_8b.sh ``` Then the demo can be accessed through http://127.0.0.1:80 ## Citation If you find the work helpful, please consider citing: ```bash @article{ge2023making, title={Making LLaMA SEE and Draw with SEED Tokenizer}, author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying}, journal={arXiv preprint arXiv:2310.01218}, year={2023} } @article{ge2023planting, title={Planting a seed of vision in large language model}, author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying}, journal={arXiv preprint arXiv:2307.08041}, year={2023} } ``` The project is still in progress. ## License `SEED` is released under [Apache License Version 2.0](License.txt). `SEED-LLaMA` is released under the original [License](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) of [LLaMA2](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf). ## Acknowledgement We thank the great work from [unCLIP SD](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip) and [BLIP2](https://github.com/salesforce/LAVIS).