Competitions documentation

Competition Repo

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.1.6).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Competition Repo

NOTE: Competition repo must always be kept private. Do NOT make it public!

The competition repo consists of the following files:

β”œβ”€β”€ COMPETITION_DESC.md
β”œβ”€β”€ conf.json
β”œβ”€β”€ DATASET_DESC.md
β”œβ”€β”€ solution.csv
β”œβ”€β”€ SUBMISSION_DESC.md
β”œβ”€β”€ submission_info
β”‚Β Β  └── *.json
β”œβ”€β”€ submissions
β”‚Β Β  └── *.csv
β”œβ”€β”€ teams.json
└── user_team.json

COMPETITION_DESC.md

This file contains the description of the competition. It is a markdown file. You can use the markdown syntax to format the text and modify the file according to your needs. Competition description is shown on the front page of the competition.

DATASET_DESC.md

This file contains the description of the dataset. It is again a markdown file. This file is used to describe the dataset and is shown on the dataset page. In this file you can mention which columns are present in the dataset, what is the meaning of each column, what is the format of the dataset, etc.

conf.json

conf.json is the configuration file for the competition. An example conf.json is shown below:

{
   "COMPETITION_TYPE":"generic",
   "SUBMISSION_LIMIT":5,
   "TIME_LIMIT": 10,
   "HARDWARE":"cpu-basic",
   "SELECTION_LIMIT":10,
   "END_DATE":"2024-05-25",
   "EVAL_HIGHER_IS_BETTER":1,
   "SUBMISSION_ID_COLUMN":"id",
   "SUBMISSION_COLUMNS":"id,pred",
   "SUBMISSION_ROWS":10000,
   "EVAL_METRIC":"roc_auc_score",
   "LOGO":"https://github.com/abhishekkrthakur/public_images/blob/main/song.png?raw=true",
   "DATASET": "",
   "SUBMISSION_FILENAMES": ["submission.csv"]
   "SCORING_METRIC": "roc_auc_score"
}

This file is created when you create a new competition. You can modify this file according to your needs. However, we do not recommend changing the evaluation metric field once the competition has started as it would require you to re-evaluate all the submissions.

  • COMPETITION_TYPE: This field is used to specify the type of competition. Currently, we support two types of competitions: generic and script.
    • generic competition is a competition where the users can submit a csv file (or a different format) and the submissions are evaluated using a metric.
    • script competition is a competition where the users can submit a huggingface model repo containing a script.py. The script.py is run to generate submission.csv which is then evaluated using a metric.
  • SUBMISSION_LIMIT: This field is used to specify the number of submissions a user can make in a day.
  • TIME_LIMIT: This field is used to specify the time limit for each submission in seconds. (used only for script competitions)
  • HARDWARE: This field is used to specify the hardware on which the submissions will be evaluated.
  • SELECTION_LIMIT: This field is used to specify the number of submissions that will be selected for the leaderboard. (used only for script competitions)
  • END_DATE: This field is used to specify the end date of the competition. The competition will be automatically closed on the end date. Private leaderboard will be made available on the end date.
  • EVAL_HIGHER_IS_BETTER: This field is used to specify if the evaluation metric is higher the better or lower the better. If the value is 1, then higher the better. If the value is 0, then lower the better.
  • SUBMISSION_ID_COLUMN: This field is used to specify the name of the id column in the submission file.
  • SUBMISSION_COLUMNS: This field is used to specify the names of the columns in the submission file. The names must be comma separated without any spaces.
  • SUBMISSION_ROWS: This field is used to specify the number of rows in the submission file without the header.
  • EVAL_METRIC: This field is used to specify the evaluation metric. We support all the scikit-learn metrics and even custom metrics.
  • LOGO: This field is used to specify the logo of the competition. The logo must be a png file. The logo is shown on the all pages of the competition.
  • DATASET: This field is used to specify the PRIVATE dataset used in the competition. The dataset is available to the users only during the script run. This is only used for script competitions.
  • SUBMISSION_FILENAMES: This field is used to specify the name of the submission file. This is only used for script competitions with custom metrics and must not be changed for generic competitions.
  • SCORING_METRIC: When using a custom metric / multiple metrics, this field is used to specify the metric name that will be used for scoring the submissions.

solution.csv

This file contains the solution for the competition. It is a csv file. A sample is shown below:

id,pred,split
0,1,public
1,0,private
2,0,private
3,1,private
4,0,public
5,1,private
6,1,public
7,1,private
8,0,public
9,0,private
10,0,private
11,0,private
12,1,private
13,0,private
14,1,public

The solution file is used to evaluate the submissions. The solution file must always have an id column and a split column. The split column is used to split the solution into public and private parts. The split column can have two values: public and private. You can have multiple columns in the solution file. However, the evaluation metric must support multiple columns.

For example, if the evaluation metric is roc_auc_score then the solution file must have two columns: id and pred. The names of id and pred columns can be anything. The names will be grabbed from the conf.json file. Please make sure you have appropriate column names in the conf.json file and that you have both public and private splits in the solution file.

SUBMISSION_DESC.md

This file contains the description of the submission. It is a markdown file. You can use the markdown syntax to format the text and modify the file according to your needs. Submission description is shown on the submission page.

Here you can mention the format of the submission file, what columns are required in the submission file, etc.

For the example solution file shown above, the submission file must have two columns: id and pred. An example of sample_submission.csv is shown below:

id,pred
0,0.6
1,0.1
2,0.5
3,1.6
4,0.8
5,1
6,1
7,1
8,0
9,0
10,0.1
11,0.4
12,1.9
13,0.01
14,1.1

When a user submits a submission file, the system will check if the submission file has the required columns. If the submission file does not have the required columns, the submission will be rejected.

It is the responsibility of the organzier to make sure they provide a sample submission file in correct format and a submission description file.

submission_info

This folder contains the submission info files. Each submission info file contains the information about a submission. This folder is created when a first submission is made. The submission info files are json files.

submissions

This folder contains the submissions made by the users. Each submission is a csv file (or a different format). This folder is created when a first submission is made.

other files

The other files teams.json and user_team.json are used to store the information about the teams.