Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrien 
posted an update 17 days ago
Post
1500
Can you create domain-specific synthetic datasets in under 20 minutes?

@burtenshaw recently launched the Domain Specific Dataset Project as part of Data is Better Together. As part of this, Ben created a Space that you can use to define some key perspectives and concepts from a domain. This seed dataset can then be used to generate a synthetic dataset for a particular domain.

In less than 30 minutes this afternoon, I created a domain-specific dataset focused on data-centric machine learning using these tools: davanstrien/data-centric-ml-sft.

You can create your own domain specific datasets using this approach. Find the steps to follow here: https://github.com/huggingface/data-is-better-together/blob/main/domain-specific-datasets/README.md

quite cool, love quick solutions like this!