Normalize size

by christopher - opened
BigScience Data org

Size currently uses the heuristic sqrt(num_bytes to make sure English and bigger subsets don't dominate. What's a better rank-preserving transformation that would be faithful to the data but also make smaller subsets visible?

BigScience Data org

By size I mean the surface of the rectangles

Sign up or log in to comment