Stefan PRO
stefan-it
AI & ML interests
Flair Library, NER & PoS Tagging, LM Pretraining (mostly encoder-only), Historical Language Models
Articles
Organizations
stefan-it's activity
upvoted
a
paper
5 days ago
upvoted
a
paper
6 days ago
upvoted
a
paper
9 days ago
upvoted
a
paper
17 days ago
upvoted
a
paper
26 days ago
upvoted
a
paper
30 days ago
Fewer Truncations Improve Language Modeling
Paper
•
2404.10830
•
Published
•
2
Token Dropping for Efficient BERT Pretraining
Paper
•
2203.13240
•
Published
•
2
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Paper
•
2403.19559
•
Published
•
1
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Paper
•
2404.05694
•
Published
•
2
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models
Paper
•
2404.04113
•
Published
•
3
Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds
Paper
•
2404.04031
•
Published
•
1
Tokenizer Choice For LLM Training: Negligible or Crucial?
Paper
•
2310.08754
•
Published
•
2
Understanding Back-Translation at Scale
Paper
•
1808.09381
•
Published
•
1
Revisiting subword tokenization: A case study on affixal negation in large language models
Paper
•
2404.02421
•
Published
•
1
Cross-lingual Named Entity Corpus for Slavic Languages
Paper
•
2404.00482
•
Published
•
2
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions
Paper
•
2403.15279
•
Published
•
1
CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction
Paper
•
2403.15322
•
Published
•
1
upvoted
a
paper
3 months ago
upvoted
a
collection
3 months ago
Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks
Paper
•
2401.17396
•
Published
•
1
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper
•
2401.17072
•
Published
•
22
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Paper
•
2401.16589
•
Published
•
1
DrBERT: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
Paper
•
2401.15861
•
Published
•
1
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Paper
•
2305.18893
•
Published
•
2
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Paper
•
2401.14373
•
Published
•
10
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Paper
•
2401.13160
•
Published
•
9
LangBridge: Multilingual Reasoning Without Multilingual Supervision
Paper
•
2401.10695
•
Published
•
4
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Paper
•
2309.08351
•
Published
•
3
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Paper
•
2207.14251
•
Published
•
1
Cross-lingual Editing in Multilingual Language Models
Paper
•
2401.10521
•
Published
•
1
Mission: Impossible Language Models
Paper
•
2401.06416
•
Published
•
3
RoBERTurk: Adjusting RoBERTa for Turkish
Paper
•
2401.03515
•
Published
•
1
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Paper
•
2401.03321
•
Published
•
1
German Text Embedding Clustering Benchmark
Paper
•
2401.02709
•
Published
•
5
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Paper
•
2312.17482
•
Published
•
1
Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers
Paper
•
2312.16291
•
Published
•
1
Language Resources for Dutch Large Language Modelling
Paper
•
2312.12852
•
Published
•
9
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Paper
•
2112.06598
•
Published
•
1
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper
•
2312.07910
•
Published
•
14
On Meta-Prompting
Paper
•
2312.06562
•
Published
•
1
Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models
Paper
•
2312.05503
•
Published
•
1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
•
2312.06635
•
Published
•
3
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
Paper
•
2312.04032
•
Published
•
1
Advancing State of the Art in Language Modeling
Paper
•
2312.03735
•
Published
•
1
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Paper
•
2312.03788
•
Published
•
1
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Paper
•
2204.00595
•
Published
•
1
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
Paper
•
2310.14282
•
Published
•
5
Larger-Scale Transformers for Multilingual Masked Language Modeling
Paper
•
2105.00572
•
Published
•
1
SeaLLMs -- Large Language Models for Southeast Asia
Paper
•
2312.00738
•
Published
•
23
Instruction-tuning Aligns LLMs to the Human Brain
Paper
•
2312.00575
•
Published
•
10
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Paper
•
2312.00407
•
Published
•
2
Nonparametric Variational Regularisation of Pretrained Transformers
Paper
•
2312.00662
•
Published
•
1
Mark My Words: Analyzing and Evaluating Language Model Watermarks
Paper
•
2312.00273
•
Published
•
3