SetFit documentation

Utility Functions

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Utility Functions

setfit.get_templated_dataset

< >

( dataset: Optional = None candidate_labels: Optional = None reference_dataset: Optional = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text' ) Dataset

Parameters

  • dataset (Dataset, optional) — A Dataset to add templated examples to.
  • candidate_labels (List[str], optional) — The list of candidate labels to be fed into the template to construct examples.
  • reference_dataset (str, optional) — A dataset to take labels from, if candidate_labels is not supplied.
  • template (str, optional, defaults to "This sentence is {}") — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”.
  • sample_size (int, optional, defaults to 2) — The number of examples to make for each candidate label.
  • text_column (str, optional, defaults to "text") — The name of the column containing the text of the examples.
  • label_column (str, optional, defaults to "label") — The name of the column in dataset containing the labels of the examples.
  • multi_label (bool, optional, defaults to False) — Whether or not multiple candidate labels can be true.
  • label_names_column (str, optional, defaults to “label_text”) — The name of the label column in the reference_dataset, to be used in case there is no ClassLabel feature for the label column.

Returns

Dataset

A copy of the input Dataset with templated examples added.

Raises

ValueError

  • ValueError — If the input Dataset is not empty and one or both of the provided column names are missing.

Create templated examples for a reference dataset or reference labels.

If candidate_labels is supplied, use it for generating the templates. Otherwise, use the labels loaded from reference_dataset.

If input Dataset is supplied, add the examples to it, otherwise create a new Dataset. The input Dataset is assumed to have a text column with the name text_column and a label column with the name label_column, which contains one-hot or multi-hot encoded label sequences.

setfit.sample_dataset

< >

( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )

Samples a Dataset to create an equal number of samples per class (when possible).