Knut Jägersberg's picture

Knut Jägersberg

KnutJaegersberg

·

AI & ML interests

NLP, opinion mining, narrative intelligence

Articles

Towards actively reasoning LLM systems

Organizations

Posts 3

Post

Shocking: 2/3 of LLMs fail at 2K context length

code_your_own_ai makes a great vlog about mostly LLM related AI content.
As I watched the video below, I wondered about current best practices on LLM evaluation. We have benchmarks, we have sota LLMs evaluating LLMs, we have tools evaluating based on human comparison.
Often, I hear, just play with the LLM for 15 mins to form an opinion.
While I think for a specific use case and clear expectations, this could yield signal carrying experiences, I also see that one prompt is used to judge models.
While benchmarks have their weaknesses, and are by themselves not enough to judge model quality, I still think systematic methods that try to reduce various scientifically known errs should be the way forward, even for qualitative estimates.
What do you think? How can we make a public tool for judging models like lmsys/chatbot-arena-leaderboard help to leverage standards known in social science?

https://www.youtube.com/watch?v=mWrivekFZMM

Post

QuIP# ecosystem is growing :)

I've seen a quip# 2 bit Qwen-72b-Chat model today on the hub that shows there is support for vLLM inference.
This will speed up inference and make high performing 2 bit models more practical. I'm considering quipping MoMo now, as I can only use brief context window of Qwen-72b on my system otherwise, even with bnb double quantization.

keyfan/Qwen-72B-Chat-2bit

Also notice the easier to use Quip# for all library :)

https://github.com/chu-tianxiang/QuIP-for-all

Collections 2

models 112

KnutJaegersberg/Deita-500m

Text Generation • Updated 5 days ago • 396

KnutJaegersberg/gpt2-chatbot

Text Generation • Updated 12 days ago • 1.63k • 8

KnutJaegersberg/Deita-Mixtral-8x7b

Text Generation • Updated 15 days ago • 328

KnutJaegersberg/Nous-Hermes-2-70b-exl2-4.0bpw

Text Generation • Updated 19 days ago • 1

KnutJaegersberg/Llama-3-Deita-8b

Text Generation • Updated 22 days ago • 549 • 1

KnutJaegersberg/Deita-32b-exl2-8.0bpw

Text Generation • Updated 23 days ago • 6 • 1

KnutJaegersberg/Luminex-34B-v0.1-exl2-8.0bpw

Text Generation • Updated 28 days ago • 8 • 1

KnutJaegersberg/WizardLM-2-8x22B

Text Generation • Updated 30 days ago • 276 • 1

KnutJaegersberg/jamba-bagel-4bit

Text Generation • Updated Apr 12 • 36 • 1

KnutJaegersberg/Deita-20b

Text Generation • Updated Apr 9 • 2.71k • 1

datasets 24

KnutJaegersberg/c4-website-classifier-dataset

Viewer • Updated Mar 21 • 2

KnutJaegersberg/Deita-6k

Viewer • Updated Feb 20 • 1

KnutJaegersberg/Auton

Preview • Updated Dec 12, 2023 • 5

KnutJaegersberg/trilobite

Viewer • Updated Dec 3, 2023

KnutJaegersberg/facehugger

Viewer • Updated Dec 3, 2023

KnutJaegersberg/webglm_dataset

Viewer • Updated Nov 16, 2023

KnutJaegersberg/longinstruct

Viewer • Updated Nov 14, 2023 • 3

KnutJaegersberg/dolphin_orca_clustered

Updated Sep 14, 2023 • 1

KnutJaegersberg/orca-wizardlm-v1-clustered

Viewer • Updated Sep 14, 2023

KnutJaegersberg/WizardLM_evol_instruct_V2_196k_instruct_format

Preview • Updated Sep 4, 2023 • 3