--- license: apache-2.0 library_name: peft tags: - generated_from_trainer base_model: BioMistral/BioMistral-7B model-index: - name: spanish_medica_llm results: [] datasets: - somosnlp/SMC language: - es pipeline_tag: text-generation widget: - text: "En el contexto médico que es el Antígeno linfocitario cd73" example_title: "Pregunta sobre medicamento" # output: # text: "Factor inmunológico" - text: "En el contexto médico que es la Abdominopatía aguda" example_title: "Pregunta sobre síntomas" #output: # text: "Signo o síntoma" - text: "Diga el tramiento de un caso que Anamnesis Mujer de 68 años, conocida por el servicio desde septiembre de 2009, alérgica a betalactámicos y contrastes yodados, con antecedentes de: HTA, dislipidemia, depresión, incidentaloma hipofisiario, con déficit de GH e hipotiroidismo (2006), cefalea mixta (migrañosa y tensional), reflujo gastroesofágico y fracturas traumáticas D9, D11 por accidente de tráfico (1995). sometida en diciembre de 2010 a tumorectomía dirigida y mamoplastia oncoterapéutica bilateral...." example_title: "Tramiento médico" #output: # text: "Neoplasia metastásica" --- # Model Card for SpanishMedicaLLM More than 600 million Spanish-speaking people need resources, such as LLMs, to obtain medical information freely and safely, complying with the millennium objectives: Health and Wellbeing, Education and Quality, End of Poverty proposed by the UN. There are few LLMs for the medical domain in the Spanish language. The objective of this project is to create a large language model (LLM) for the medical context in Spanish, allowing the creation of solutions and health information services in LATAM. The model will have information on conventional, natural and traditional medicines. An output of the project is a public dataset from the medical domain that pools resources from other sources that allows LLM to be created or fine-tuned. The performance results of the LLM are compared with other state-of-the-art models such as BioMistral, Meditron, MedPalm. [**Dataset Card in Spanish**](README_es.md) ## Model Details ### Model Description - **Developed by:** [Dionis López Ramos](https://www.linkedin.com/in/dionis-lopez-ramos/), [Alvaro Garcia Barragan](https://huggingface.co/Alvaro8gb), [Dylan Montoya](https://huggingface.co/dylanmontoya22), [Daniel Bermúdez](https://huggingface.co/Danielbrdz) - **Funded by:** SomosNLP, HuggingFace - **Model type:** Language model, instruction tuned - **Language(s):** Spanish (`es-ES`, `es-CL`) - **License:** apache-2.0 - **Fine-tuned from model:** [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B) - **Dataset used:** [somosnlp/SMC/](https://huggingface.co/datasets/somosnlp/SMC/) ### Model Sources - **Repository:** [spaces/somosnlp/SpanishMedicaLLM/](https://huggingface.co/spaces/somosnlp/SpanishMedicaLLM/tree/main) - **Paper:** "Comming soon!" - **Demo:** [spaces/somosnlp/SpanishMedicaLLM](https://huggingface.co/spaces/somosnlp/SpanishMedicaLLM) - **Video presentation:** [SpanishMedicaLLM | Proyecto Hackathon #SomosNLP ](https://www.youtube.com/watch?v=tVe_MC7Da6k) ## Uses ### Direct Use [More Information Needed] ### Out-of-Scope Use The creators of LOL are not responsible for any harmful results they may generate. A rigorous evaluation process with specialists is suggested of the results generated. ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations ## How to Get Started with the Model Use the code below to get started with the model. ``` from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM config = PeftConfig.from_pretrained("somosnlp/spanish_medica_llm") model = AutoModelForCausalLM.from_pretrained("BioMistral/BioMistral-7B") model = PeftModel.from_pretrained(model, "somosnlp/spanish_medica_llm") ``` ## Training Details ### Training Data Dataset used was [somosnlp/SMC/](https://huggingface.co/datasets/somosnlp/SMC/) ### Training Procedure #### Training Hyperparameters **Training regime:** - learning_rate: 2.5e-05 - train_batch_size: 16 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 5 - training_steps: 2 - mixed_precision_training: Native AMP - ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The corpus used was 20% [somosnlp/SMC/](https://huggingface.co/datasets/somosnlp/SMC/) #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** GPU - **Hours used:** 4 Horas - **Cloud Provider:** [Hugginface](https://huggingface.co) - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ### Model Architecture and Objective The architecture of [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)because it is a foundational model trained with a medical domain dataset. ### Compute Infrastructure [More Information Needed] #### Hardware Nvidia T4 Small 4 vCPU 15 GB RAM 16 GB VRAM #### Software - transformers==4.38.0 - torch>=2.1.1+cu113 - trl @ git+https://github.com/huggingface/trl - peft - wandb - accelerate - datasets ## License Apache License 2.0 ## Citation **BibTeX:** ``` @software{lopez2024spanishmedicallm, author = {Lopez Dionis, Garcia Alvaro, Montoya Dylan, Bermúdez Daniel}, title = {SpanishMedicaLLM}, month = February, year = 2024, url = {https://huggingface.co/datasets/HuggingFaceTB/cosmopedia} } ``` ## More Information This project was developed during the [Hackathon #Somos600M](https://somosnlp.org/hackathon) organized by SomosNLP. The model was trained using GPUs sponsored by HuggingFace. **Team:** - [Dionis López Ramos](https://huggingface.co/inoid) - [Alvaro Garcia Barragan](https://huggingface.co/Alvaro8gb) - [Dylan Montoya](https://huggingface.co/dylanmontoya22) - [Daniel Bermúdez](https://huggingface.co/Danielbrdz) ## Contact For any doubt or suggestion contact to: PhD Dionis López (inoid2007@gmail.com)