Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 103