KnutJaegersberg (Knut Jägersberg)

replied to s3nh's post 13 days ago

Don't burn out! Lighten up again will you.

posted an update 13 days ago

Post

1372

What We Learned from a Year of Building with LLMs

It's a nice perspective outlined in here.

“When a measure becomes a target, it ceases to be a good measure.”

— Goodhart’s Law

https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/

replied to BramVanroy's post 3 months ago

it mixed up stuff in the output, gave weird answers. didn't have that problem with other models. maybe the update they released sovled that issue, I just never cared, given the alternatives.

replied to BramVanroy's post 3 months ago

I got some weird results, since there are a lot of other models in that performance-parameter range, I just didn't try anymore.

replied to macadeliccc's post 4 months ago

Want this to run on CPU

replied to bwang0911's post 4 months ago

Exciting!

replied to JustinLin610's post 4 months ago

Thanks for sharing!

replied to osanseviero's post 4 months ago

I hear there is an incredible amount of competition among LLM makers within China, I guess one would publish and thus promote only the best. Hundreds of models. Competition is good for performance.

replied to s3nh's post 5 months ago

I didn't dive deeply into all the creative role play models, although I sense there is a great deal of innovation happening there, unrecognized. Beautiful art!

replied to their post 5 months ago

that's a nice space you made there, but it is also unrelated to my post

replied to their post 5 months ago

I didn't see a link to the prompt in the video, but prompt format can be optimized.

replied to their post 5 months ago

Amazing, thank you for sharing :)

posted an update 5 months ago

Post

Shocking: 2/3 of LLMs fail at 2K context length

code_your_own_ai makes a great vlog about mostly LLM related AI content.
As I watched the video below, I wondered about current best practices on LLM evaluation. We have benchmarks, we have sota LLMs evaluating LLMs, we have tools evaluating based on human comparison.
Often, I hear, just play with the LLM for 15 mins to form an opinion.
While I think for a specific use case and clear expectations, this could yield signal carrying experiences, I also see that one prompt is used to judge models.
While benchmarks have their weaknesses, and are by themselves not enough to judge model quality, I still think systematic methods that try to reduce various scientifically known errs should be the way forward, even for qualitative estimates.
What do you think? How can we make a public tool for judging models like lmsys/chatbot-arena-leaderboard help to leverage standards known in social science?

https://www.youtube.com/watch?v=mWrivekFZMM

6 replies

·

posted an update 5 months ago

Post

QuIP# ecosystem is growing :)

I've seen a quip# 2 bit Qwen-72b-Chat model today on the hub that shows there is support for vLLM inference.
This will speed up inference and make high performing 2 bit models more practical. I'm considering quipping MoMo now, as I can only use brief context window of Qwen-72b on my system otherwise, even with bnb double quantization.

keyfan/Qwen-72B-Chat-2bit

Also notice the easier to use Quip# for all library :)

https://github.com/chu-tianxiang/QuIP-for-all

2 replies

·

posted an update 5 months ago

Post

Microsoft: Improving Text Embeddings with Large Language Models

- uses an LLM instead of complex pipelines to create the training data
- directly generates data for numerous text embedding tasks
- fine tunes standard models with contrastative loss achieving great performance
- critical thought: isn't this kinda benchmark hacking? If the benchmarks are so encompassing that they capture the complete idea of embedding, it's maybe a good idea, but often it is oversimplifying, I find.

Feel free to share your thoughts, even if they like mine don't beat the benchmarks ;P

https://arxiv.org/abs/2401.00368

2 replies

·

replied to fffiloni's post 6 months ago

how did you do that?

Knut Jägersberg

AI & ML interests

Articles

Towards actively reasoning LLM systems

Organizations

KnutJaegersberg's activity