475 160 514

Daniel van Strien PRO

davanstrien

https://danielvanstrien.xyz/

vanstriendaniel

davanstrien

AI & ML interests

Machine Learning Librarian

Articles

Synthetic dataset generation techniques: Self-Instruct

3 days ago

• 3

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

11 days ago

• 6

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20

• 17

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 9

Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

Aug 2, 2023

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

Jun 12, 2023

• 1

Introducing BERTopic Integration with Hugging Face Hub

May 31, 2023

Jupyter X Hugging Face

Mar 23, 2023

• 2

Image search with 🤗 datasets

Mar 16, 2022

Organizations

davanstrien's activity

commented 4 papers 3 days ago

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 28 •

Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

Paper • 2307.03692 • Published Jul 5, 2023 • 24 •

Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 38 •

Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

Paper • 2403.04190 • Published Mar 7 •

commented a paper 9 days ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published 15 days ago • 92 •

New activity in argilla/Capybara-Preferences 9 days ago

Update README.md

#1 opened 9 days ago by

davanstrien

commented 12 papers 10 days ago

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Paper • 2403.16187 • Published Mar 24 •

CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

Paper • 2403.08350 • Published Mar 13 •

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Paper • 2404.10306 • Published Apr 16 •

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Paper • 2403.11808 • Published Mar 18 •

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Paper • 2403.09192 • Published Mar 14 •

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper • 2403.14608 • Published Mar 21 •

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts

Paper • 2404.15159 • Published 26 days ago •

BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models

Paper • 2404.02827 • Published Apr 3 •

Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension

Paper • 2404.17991 • Published 20 days ago •

GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning

Paper • 2404.09163 • Published Apr 14 •

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

Paper • 2404.16816 • Published 22 days ago • 1 •

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Paper • 2405.04086 • Published 11 days ago • 1 •

New activity in davanstrien/cosmochat 11 days ago

Improve third turn

#1 opened 11 days ago by

davanstrien

New activity in teknium/openhermes 12 days ago

update tag

#5 opened 12 days ago by

davanstrien

New activity in DIBT/aya_dutch_dpo 15 days ago

Librarian Bot: Add language metadata for dataset

#1 opened 26 days ago by

librarian-bot

New activity in ibm/KVP10k 15 days ago

add minimal card template with citation info

#2 opened 15 days ago by

davanstrien

New activity in Harvard-Edge/Wake-Vision 15 days ago

add minimal dataset card with link to paper

#5 opened 15 days ago by

davanstrien

New activity in tomasonjo/synthetic-text2cypher-gpt4turbo 16 days ago

Update README.md

#1 opened 16 days ago by

davanstrien

New activity in avramandrei/histnero 16 days ago

Add link to paper

#2 opened 16 days ago by

davanstrien

New activity in PleIAs/Post-OCR-Correction 22 days ago

tags and typo

#2 opened 22 days ago by

davanstrien

New activity in Eladio/emrqa-msquad 26 days ago

add outline for dataset card

#2 opened 26 days ago by

davanstrien

New activity in argilla/argilla-template-space-with-oauth 29 days ago

Bump Argilla version

#3 opened 29 days ago by

davanstrien

New activity in argilla/demo 29 days ago

Bump Argilla

#2 opened 29 days ago by

davanstrien

New activity in DIBT-Dutch/prompt-translation-for-Dutch 29 days ago

Bump argilla version to 1.27.0

#2 opened 29 days ago by

davanstrien

New activity in 2A2I/prompt-translation-for-Arabic 29 days ago

Upgrade argilla version to 1.27.0

#2 opened 29 days ago by

davanstrien

New activity in mistralai/Mixtral-8x22B-Instruct-v0.1 about 1 month ago

Add language metadata to model card

#5 opened about 1 month ago by

davanstrien

New activity in BramVanroy/orca_dpo_pairs_dutch about 1 month ago

Update metadata to add DPO tag and remove deprecated tag

#3 opened about 1 month ago by

davanstrien

add dpo tag

#2 opened about 1 month ago by

davanstrien

New activity in 5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated about 1 month ago

add dpo tag

#2 opened about 1 month ago by

davanstrien

New activity in kyujinpy/orca_math_dpo about 1 month ago

add dpo tag

#2 opened about 1 month ago by

davanstrien

New activity in HuggingFaceH4/orca_dpo_pairs about 1 month ago

Add dpo tag

#4 opened about 1 month ago by

davanstrien

Add dpo tag

#3 opened about 1 month ago by

davanstrien

New activity in mlabonne/chatml_dpo_pairs about 1 month ago

Add DPO tag

#2 opened about 1 month ago by

davanstrien

New activity in Intel/orca_dpo_pairs about 1 month ago

Add DPO tag

#4 opened about 1 month ago by

davanstrien

New activity in SAGI-1/ultrafeedback_binarized_dpo about 1 month ago

add dpo tag

#2 opened about 1 month ago by

davanstrien

New activity in xcodemind/vision2ui about 1 month ago

Add abstract from the paper to give a bit more context for what the dataset is about.

#2 opened about 1 month ago by

davanstrien

New activity in DIBT/TemplateDashboardPromptTranslation about 1 month ago

Add background scheduler

#1 opened about 2 months ago by

ZennyKenny

New activity in DIBT-Czech/Dashboard about 2 months ago

Update README.md

#1 opened about 2 months ago by

davanstrien

New activity in DIBT-Persian/prompt-translation-for-Persian about 2 months ago

Update argilla to v1.26.1

#2 opened about 2 months ago by

davanstrien

Update argilla to v1.26.1

#1 opened about 2 months ago by

davanstrien

New activity in 2A2I/prompt-translation-for-Arabic about 2 months ago

How can I modify this space for Prompt Translation for other Languages?

#1 opened about 2 months ago by

musfiqdehan

New activity in Bhawna/ChroniclingAmericaQA about 2 months ago

add basic dataset card

#2 opened about 2 months ago by

davanstrien

New activity in davanstrien/dataset-tldr about 2 months ago

Is this a good idea?

#1 opened about 2 months ago by

davanstrien

New activity in confit/wmms about 2 months ago

Add task categories

#1 opened about 2 months ago by

davanstrien

New activity in louisbrulenaudet/code-rural-ancien about 2 months ago

add synthetic tag

#1 opened about 2 months ago by

davanstrien

New activity in argilla/argilla-template-space-with-oauth 2 months ago

Suggest making allowed workspace default to "public"

#1 opened 2 months ago by

davanstrien

New activity in NickyNicky/gemma-2b-it_oasst2_chatML_Cluster2_aya_multilingual_10k_prompts_ranked_all_json_V1 2 months ago

very cool!

#1 opened 2 months ago by

dvilasuero

New activity in huggingface/cookbook-images 2 months ago

Upload two new images

#11 opened 2 months ago by

sdiazlor

New activity in DIBT/prompt-collective-dashboard 2 months ago

Top contributors and Overall contributors stats Not updating

#3 opened 2 months ago by

flozi00

New activity in DIBT/10k_prompts_ranked 3 months ago

Dataset error on generation

#8 opened 3 months ago by

ArkaAbacus

Upload dataset

#9 opened 3 months ago by

davanstrien

add link to space

#7 opened 3 months ago by

davanstrien

remove placeholder texts

#6 opened 3 months ago by

davanstrien

add illustration image

#5 opened 3 months ago by

davanstrien

Daniel van Strien PRO

AI & ML interests

Articles

Synthetic dataset generation techniques: Self-Instruct

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Data is better together

Extracting Insights from Model Cards Using Open Large Language Models

Creating open machine learning datasets? Share them on the Hugging Face Hub!

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

Introducing BERTopic Integration with Hugging Face Hub

Jupyter X Hugging Face

Image search with 🤗 datasets

Organizations

davanstrien's activity

Update README.md

Improve third turn

update tag

Librarian Bot: Add language metadata for dataset

add minimal card template with citation info

add minimal dataset card with link to paper

Update README.md

Add link to paper

tags and typo

add outline for dataset card

Bump Argilla version

Bump Argilla

Bump argilla version to 1.27.0

Upgrade argilla version to 1.27.0

Add language metadata to model card

Update metadata to add DPO tag and remove deprecated tag

add dpo tag

add dpo tag

add dpo tag

Add dpo tag

Add dpo tag

Add DPO tag

Add DPO tag

add dpo tag

Add abstract from the paper to give a bit more context for what the dataset is about.

Add background scheduler

Update README.md

Update argilla to v1.26.1

Update argilla to v1.26.1

How can I modify this space for Prompt Translation for other Languages?

add basic dataset card

Is this a good idea?

Add task categories

add synthetic tag

Suggest making allowed workspace default to "public"

very cool!

Upload two new images

Top contributors and Overall contributors stats Not updating

Dataset error on generation

Upload dataset

add link to space

remove placeholder texts

add illustration image