stefan-it (Stefan)

upvoted a paper 5 days ago

Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published 6 days ago • 2

upvoted a paper 6 days ago

Linearizing Large Language Models

Paper • 2405.06640 • Published 9 days ago • 1

upvoted a paper 9 days ago

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published 12 days ago • 7

upvoted a paper 17 days ago

HistNERo: Historical Named Entity Recognition for the Romanian Language

Paper • 2405.00155 • Published 19 days ago • 2

upvoted a paper 26 days ago

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published 27 days ago • 6

upvoted a paper 30 days ago

Investigating Gender Bias in Turkish Language Models

Paper • 2404.11726 • Published Apr 17 • 1

upvoted 8 papers about 1 month ago

Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16 • 2

Token Dropping for Efficient BERT Pretraining

Paper • 2203.13240 • Published Mar 24, 2022 • 2

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Paper • 2403.19559 • Published Mar 28 • 1

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Paper • 2404.05694 • Published Apr 8 • 2

BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models

Paper • 2404.04113 • Published Apr 5 • 3

Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds

Paper • 2404.04031 • Published Apr 5 • 1

Tokenizer Choice For LLM Training: Negligible or Crucial?

Paper • 2310.08754 • Published Oct 12, 2023 • 2

Understanding Back-Translation at Scale

Paper • 1808.09381 • Published Aug 28, 2018 • 1

upvoted 4 papers about 2 months ago

Revisiting subword tokenization: A case study on affixal negation in large language models

Paper • 2404.02421 • Published Apr 3 • 1

Cross-lingual Named Entity Corpus for Slavic Languages

Paper • 2404.00482 • Published Mar 30 • 2

Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions

Paper • 2403.15279 • Published Mar 22 • 1

CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction

Paper • 2403.15322 • Published Mar 22 • 1

upvoted 3 papers 2 months ago

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

Paper • 2403.10293 • Published Mar 15 • 1

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Paper • 2403.08693 • Published Mar 13 • 1

MaiBaam Annotation Guidelines

Paper • 2403.05902 • Published Mar 9 • 1

upvoted a paper 3 months ago

Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models

Paper • 2402.18397 • Published Feb 28 • 1

upvoted a collection 3 months ago

LiT5

Collection

Linguistically-Informed T5 models from the LREC-COLING paper "Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)". • 6 items • Updated Feb 28 • 2

upvoted 3 papers 3 months ago

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

Paper • 2402.08638 • Published Feb 13 • 1

Pixel Sentence Representation Learning

Paper • 2402.08183 • Published Feb 13 • 2

Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

Paper • 2402.01825 • Published Feb 2 • 2

upvoted 15 papers 4 months ago

Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks

Paper • 2401.17396 • Published Jan 30 • 1

SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30 • 22

ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

Paper • 2401.16589 • Published Jan 29 • 1

DrBERT: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining

Paper • 2401.15861 • Published Jan 29 • 1

Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

Paper • 2305.18893 • Published May 30, 2023 • 2

TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation

Paper • 2401.14373 • Published Jan 25 • 10

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Paper • 2401.13160 • Published Jan 24 • 9

LangBridge: Multilingual Reasoning Without Multilingual Supervision

Paper • 2401.10695 • Published Jan 19 • 4

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

Paper • 2309.08351 • Published Sep 15, 2023 • 3

Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Paper • 2207.14251 • Published Jul 28, 2022 • 1

upvoted 13 papers 5 months ago

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Paper • 2312.17482 • Published Dec 29, 2023 • 1

Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers

Paper • 2312.16291 • Published Dec 26, 2023 • 1

Language Resources for Dutch Large Language Modelling

Paper • 2312.12852 • Published Dec 20, 2023 • 9

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Paper • 2112.06598 • Published Dec 13, 2021 • 1

PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 14

On Meta-Prompting

Paper • 2312.06562 • Published Dec 11, 2023 • 1

Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models

Paper • 2312.05503 • Published Dec 9, 2023 • 1

Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 3

RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

Paper • 2312.04032 • Published Dec 7, 2023 • 1

Advancing State of the Art in Language Modeling

Paper • 2312.03735 • Published Nov 28, 2023 • 1

SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM

Paper • 2312.03788 • Published Dec 6, 2023 • 1

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Paper • 2204.00595 • Published Apr 1, 2022 • 1

NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval

Paper • 2310.14282 • Published Oct 22, 2023 • 5

upvoted 6 papers 6 months ago

Larger-Scale Transformers for Multilingual Masked Language Modeling

Paper • 2105.00572 • Published May 2, 2021 • 1

SeaLLMs -- Large Language Models for Southeast Asia

Paper • 2312.00738 • Published Dec 1, 2023 • 23

Instruction-tuning Aligns LLMs to the Human Brain

Paper • 2312.00575 • Published Dec 1, 2023 • 10

CoLLiE: Collaborative Training of Large Language Models in an Efficient Way

Paper • 2312.00407 • Published Dec 1, 2023 • 2

Nonparametric Variational Regularisation of Pretrained Transformers

Paper • 2312.00662 • Published Dec 1, 2023 • 1

Mark My Words: Analyzing and Evaluating Language Model Watermarks

Paper • 2312.00273 • Published Dec 1, 2023 • 3

Stefan PRO

AI & ML interests

Articles

Fine-tune Flair Models on NER Dataset with 🤗 AutoTrain SpaceRunner

Organizations

stefan-it's activity