You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

You agree to not use the model to conduct experiments that cause harm to human subjects.

Log in or Sign Up to review the conditions and access this model content.

NER-Luxury

A fine-tuned XLM-Roberta model for NER in the fashion and luxury industry

1. Goal

  • NER-Luxury is a fine-tuned XLM-Roberta model for the subtask N.E.R (Named Entity Recognition) in English. NER-Luxury is domain-specific for the fashion and luxury industry with bespoke labels. NER-Luxury is trying to be a bridge between the aesthetic side and the quantitative side of the fashion and luxury industry.
  • As a downstream task, NER-Luxury is able to identify major fashion houses, artistic directors, fragrances, models, or influential artists on the website of a fashion magazine. And NER-Luxury is also able to identify companies, listed groups, executives, financial analysts, and investment companies inside a 200-page quarterly financial report.
  • The goal of NER-Luxury is to create a clear hierarchical classification of luxury houses, fine watchmakers, beauty brands, sportswear labels, and fast fashion brands with respect of temporality, context, and sustainability. NER-Luxury is trying to solve the "entity disambiguation" between the founder, his eponymous label, the company designation, the names of products, and the intellectual property rights for corporate lawyers, M&A bankers, and financial analysts.

For example, the disambiguation of Louis Vuitton:

  • The visionary founder, Louis Vuitton (1821-1892)
  • The luxury house, Louis Vuitton
  • The giant luxury group LVMH Moët Hennessy Louis Vuitton SE
  • The collection with Japanese artist, Louis Vuitton x Yayoi Kusama

2. Prototyping, dataset limitation, and bias

I started a NER model based on various architectures with my dataset in English, French, and Japanese. I achieved weak results in English due to the dominance of linguistic specificities of French in the fashion and luxury industry (for example Hermès, prêt-à-porter, Château-Grillet, or Héliotrope). The Japanese results were also poor due to the span across word boundaries in the SentencePiece encoding, as kindly advised by Kaito Sugimoto of the University of Tokyo.

For NER-Luxury, I decided to focus on a carefully crafted bespoke dataset, stable deep learning architecture, and the speed of the new version 2.0 PyTorch. I annotated more than 38.063 sentences in English (758.309 words) for 32 labels with classic ones such as Date, Location, and Money (Monetary value), and bespoke labels for the fashion and luxury industries.

3. Model technical description and hyperparameters

  • Architecture: BERT (Bidirectional Encoder Representations from Transformers)
  • XLM-Roberta from Meta A.I (Facebook)
  • AI Framework: PyTorch version 2.0.1+cu118
  • Transformers version 4.35.0 from Hugging Face : )
  • Nvidia Tesla T4 with Cuda Version: 12.0

4. Hierarchical methodology

The "Savigny Luxury Index" of Pierre Mallevays Co-Head of Merchant Banking at Stanhope Capital gave me a clear perspective on the luxury industry with the major groups such as LVMH, Kering, Richemont, Swatch, etc. Then I built on sportswear brands focused on performance such as Nike, Adidas, and Lululemon. I avoided The Transparency Index since you can be transparent about your own factories but lie about the supplier of the supplier. My choice was to build the index of fast fashion brands such as Zara, Shein, Boohoo, Primark, and Victoria Secret's based on their popularity ranking among teenagers. For sustainability, I adopted keywords of the Principles for Responsible Investment (UNPRI or PRI) and ESG Corporate reporting tool developed by the Ethical Fashion Initiative led by Simone Cipriani and his team.

5. Classification bias

The lexicon in the fashion and luxury industry is strongly biased toward French technical words that are clearly out-of-vocabulary (OOV) in English. NER-Luxury is heavily biased with the label (Company) with more than 30,000 companies that have been consolidated or disappeared over the years.

The data is unbalanced in the portfolio of houses for listed groups, LVMH Group is composed of more than 70 entities (Louis Vuitton, Dior, Tiffany, Fendi, etc.), while Burberry is one major fashion house. Entities such as Artémis Group of Kering or Mousse Partners of Chanel with the label "Holding/Trust" are quite rare, so I had to create synthetic sentences.  During classification, I made the deliberate choice of labeling luxury watchmakers such as Cartier, Patek Philippe, A. Lange & Soehne, etc. with the label "House" for their duality in creativity and technicity. I also classified perfumers with the perfume houses, such as Diptyque, Guerlain, Lancôme, etc. with the label "House" for the creative side. While, I attributed the label "Brand" to cosmetic brands such as la Roche-Posay, Clinique, and La Mer due to scientific research. 

6. NER bespoke labels

Entities are evolving according to temporality, and context.

Abbreviation Description (Examples)
O Outside of a named entity
House Fashion & luxury houses (Louis Vuitton, Cartier, Gucci, Chanel, Burberry, Rolex, Diptyque, Lancôme, etc.)
Brand Sportswear, beauty or diffusion labels (Nike, Lululemon, Clinique, Urban Decay, Shu Uemura, Marc by Marc Jacobs, etc.)
FastFashion Mass-market retailers (Zara, H&M, Uniqlo, Shein, Primark, GAP, Victoria's Secret, etc)
ArtisticDirector Lead Creative of houses (Karl Lagerfeld, Daniel Lee, Virginie Viard, Alessandro Michele, Kim Jones, etc.)
Founder Lead Creative & Owner (Ralph Lauren, Rei Kawakubo, Tom Ford, Michael Kors, Calvin Klein, etc.)
Company Corporate entity & Out of domain entity (Stella McCartney Ltd, Valentino S.p.A, Facebook, Christie's, IKEA, etc.)
Group Listed luxury groups in exchange (LVMH, Hermès International SCA, Kering, Burberry Group PLC, Prada S.p.A, etc.)
HoldingTrust Holding or family office (Agache, H51, Mousse Partners, Artèmis, Compagnie Financière Rupert, Gado Srl, etc.)
Executive C-level, board members, owners (Bernard Arnault, Patrizio Bertelli, François-Henri Pinault, Stefan Larsson, Francoise Bettencourt Meyers, etc.)
AnalystBanker Equity analysts, M&A bankers (Luca Solca, Pierre Mallevays, Louise Singlehurst, Simeon Siegel, etc.)
InvestmentFirm Investment banks, PE funds, M&A Boutique (KKR, L Catterton, Sequoia, Mayhoola, Bernstein, Stanhope Capital, Goldman Sachs, etc.)
Event Critical moment in the luxury industry (World War II, Olympics, IPO, Quartz crisis, Covid pandemic, Paris Fashion Week, etc.)
ArtistKOL Artists, celebrities, historical figures (Sophia Loren, BTS, Emma Watson, Audrey Hepburn, Kim Kardashian, Churchill, etc.)
AthleteTeam Professional athletes or teams (David Beckham, Maria Sharapova, Cristiano Ronaldo, Luna Rossa, Scuderia Ferrari, Chelsa FC, etc)
Model Fashion models (Iman, Kate Moss, Adriana Lima, Naomi Campbell, Mariacarla Boscono, Lea T, etc.)
CreativeInsider Photographers, stylists, make-up artists, watchmakers (Steven Meisel, Jacques Cavallier-Belletrud, Gérald Genta, Alexandre de Betac, etc.)
EditorJournalist Editors-in-chief, fashion editors, journalists (Suzy Menkes, Anna Wintour, Carine Roitfeld, Tim Blanks, Derek Blasberg, etc.)
MediaPublisher Fashion and finance media outlets (Bloomberg, Vogue, Business of Fashion, WWD, The New York Times, etc.)
DptStoreBoutique Point-of-Sales, Department Store, Boutique, Select shop (Bergdorf Goodman, Le Bon Marché, Takashimaya, Dover Street Marker, Colette, etc.)
Sustainability Relevant ESG factors or entities (Ethical Fashion Initiative, ESG, decoupling, biodiversity loss, chemical pollution, etc.)
MuseumGallery Exhibition spaces (Louvre, Metropolitan Museum of Art, Victoria & Albert, Smithsonian’s, Pinault Collection, Grand Palais, etc.)
GarmCollection Iconic garment or collections (Haute Couture, Bar suit, No.13 of McQueen, Green Jungle Dress, etc.)
Cosmetic Cosmetic products (Tilbury Glow palette, Crème de La Mer, YSL Nu, Viva Glam, etc.)
Fragrance Perfumes and Eau de Toilette (Chanel No.5, Dior Sauvage, Terre d'Hermès, Tom Ford Black Orchid, Angel Mugler, etc.)
BagTrvlGoods Iconic bags, handbags, and leather goods (Hermès Birkin bag, Louis Vuitton Speedy bag, Chanel 2.55, Lady Dior, etc.)
Jewellery Fine jewellery or gems (Alhambra of Van Cleef & Arpels, Juste un Clou Cartier, The Winston Blue, Bvlgari Serpenti, etc.)
Timepiece Fine watches (Nautilus Patek Philippe, Reverso Jaeger-Lecoultre, Tank Cartier, Rolex Oyster, Audemars Piguet Royal Oak, Rolex Oyster, etc.)
Footwear High heels to sneakers (Rainbow of Ferragamo, Armadillo of McQueen, Birkenstock Arizona, Air Force 1, Gucci loafer, etc.)
WineSpirit Wine & Spirit (Château d'Yquem, Clos de Tart, Château Matras, Hennessy, Moet, Belvedere, etc.)

How to use NER-Luxury with HuggingFace?

Load NER-Luxury and its sub-word tokenizer :

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("AkimfromParis/NER-Luxury")
model = AutoModelForTokenClassification.from_pretrained("AkimfromParis/NER-Luxury")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

example = "CEO Leena Nair dismisses IPO rumours for Chanel."
ner_results = nlp(example)
print(ner_results)

NER-Luxury results with Seqeval:

This model is a fine-tuned version of xlm-roberta-base.

It achieves the following results on the evaluation set:

  • Loss: 0.3990
  • Precision: 0.7686
  • Recall: 0.8082
  • F1: 0.7879
  • Accuracy: 0.9427

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
1.1191 1.0 1072 0.5983 0.6200 0.6863 0.6515 0.9066
0.5722 2.0 2144 0.4722 0.6918 0.7444 0.7172 0.9242
0.4389 3.0 3216 0.4148 0.7187 0.7744 0.7455 0.9319
0.3545 4.0 4288 0.3942 0.7368 0.7862 0.7607 0.9361
0.3015 5.0 5360 0.3898 0.7425 0.7954 0.7680 0.9374
0.2542 6.0 6432 0.3888 0.7492 0.7964 0.7721 0.9380
0.2279 7.0 7504 0.3878 0.7602 0.8020 0.7805 0.9421
0.2015 8.0 8576 0.3913 0.7612 0.8028 0.7814 0.9412
0.1809 9.0 9648 0.3970 0.7648 0.8070 0.7853 0.9417
0.1663 10.0 10720 0.3990 0.7686 0.8082 0.7879 0.9427

Framework versions

  • Transformers 4.35.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.14.0

Please feel free to connect me: Akim Mousterou - moakim@protonmail.com

Copyright NER-Luxury [2023] [Akim Mousterou]

Downloads last month
5

Finetuned from

Evaluation results