Santiago Viquez
santiviquez
AI & ML interests
ML @ NannyML. A bit of everything. NLP, RL, and, of course, tabular. In the GenAI era, how can you not love tabular data? Educational content and OSS.
Articles
Organizations
Posts
18
Post
1535
Looking for someone with +10 years of experience training Deep Kolmogorov-Arnold Networks.
Any suggestions?
Any suggestions?
Post
2033
More open research updates π§΅
Performance estimation is currently the best way to quantify the impact of data drift on model performance. π‘
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. π¨βπ¬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share π
Performance estimation is currently the best way to quantify the impact of data drift on model performance. π‘
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. π¨βπ¬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share π
Collections
1
Collection of LLM hallucination and evaluation papers that I've been exploring and implementing. Some of them have my comments and annotated doodles.
-
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Paper β’ 2208.05309 β’ Published β’ 1 -
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Paper β’ 2305.13711 β’ Published β’ 2 -
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Paper β’ 2302.09664 β’ Published β’ 2 -
BARTScore: Evaluating Generated Text as Text Generation
Paper β’ 2106.11520 β’ Published β’ 1
models
16
santiviquez/t5-small-finetuned-samsum-en
Summarization
β’
Updated
β’
7
santiviquez/bart-base-finetuned-samsum-en
Summarization
β’
Updated
β’
7
santiviquez/amazon-reviews-sentiment-bert-base-uncased-6000-samples
Updated
santiviquez/amazon-reviews-sentiment-distilbert-base-uncased-6000-samples
Text Classification
β’
Updated
β’
1
santiviquez/amazon-reviews-finetuning-distilbert-base-uncased
Text Classification
β’
Updated
santiviquez/amazon-reviews-finetuning-distilbert-base-uncased_books
Text Classification
β’
Updated
β’
5
santiviquez/amazon-reviews-finetuning-bert-base-sentiment
Text Classification
β’
Updated
β’
19
santiviquez/amazon_reviews_finetuning-sentiment-model-3000-samples
Text Classification
β’
Updated
β’
1
santiviquez/noisy_human_cnn
Updated
santiviquez/ssr-base-finetuned-samsum-en
Summarization
β’
Updated
β’
7