Brain issues

#4
by CyberTimon - opened

Hello!

Thank you for creating this model. It's an interesting experiment. I've downloaded it and it sadly generates only nonsense. Text in the intelligence of maybe a 3b model. I know that it won't work perfect as it's distilled but I'm still wondering what the furter plan of this model is.

Thank you!

Here is some example output of the model:

Input: What is Coldplay?
Response: Coldplay is a style of playing music. It is a genre of music that is played using only a guitar and a microphone. The bass is a string of nets stretched over a drum, creating a percussion-like sound when the bottom of the drum is beaten.


The bass is often played using a mallet driven by a foot pedal, and the top of the drum is often attached to a cymbal.


The most famous coldplay bassist is John Lathymer.

Example 2:

Input: What is Huggingface?
Response: Huggingface is a brand-new AI model that is designed to generate human-like faces. It is the first AI model of its kind, and it is creating quite a stir right now. The model is trained on an extensive database of high-quality face images, and it uses the latest AI techniques to create new and unique faces. It has been praised for its ability to to create create create....

Example 3:

Input: What is Beamng drive?
Response: BeamnG drive is a set of rules for playing the game of BeananG. It was created by the team at WFSD to help people who were new to the game understand the rules and how to play.

Example 4:

Input: tell me a joke
Response: I can't tell you a joke, but I can help you find one. Here is an example joke:


Punchy and his friend decided to go on a hiking trip. They started their journey in the city and moved deeper and deeper into the wilderness. Finally, they came across a small stream. Punchy's friend was the first to make it across, and then he called out to Punchy, "Come over, it's not that hard!"


Punchy was a bit more cautious, and he made his way across the stream. On the other side, they were both greeted by an old man, who was also making his wa... (the story never ended lol)

It seems like the knowledge got basically resetted to zero.

The model had very very little training. I chose a small set of 1k questions, spanning from just 500 dolly questions, and 500 tested python code. The v2 has 8k questions 2800 of them being share gpt4 multiturn, 1500 multiturn code, 1000 dolly, json mode, agent abilities, and 1000 tested python code. I forgot how many were for the other ones. However, it seems like I can get a sponsor for the v.3, so I might just be able to throw Open Hermes, dolphin, and some shareGPT at it with the 8k, of high quality questions I picked out. Yeah see I had given it too much code for its repair dataset 😅, either way hopefully the v2 training will fix this! Thank you for reporting this with us! Keep in mind this was still also very very experimental, I have many more ideas to potentially make this better.

That's great to hear that you're working on improving the model. But just afaik instruct tuning doesn't improve the model's world knowledge.

Owner

@CyberTimon
If that is still the case after V2, I will use the knowledge from just 2 agents and compress that instead of from all 8, for v3. As the model uses 2 experts per token. Also getting revived with a bunch of code and A few human generated questions is probably not the best, so I guess after v2 I’ll see which direction to take!

Thanks for your effort. I really appreciate what you do for the community!

@CyberTimon
❤️ you as well! Thank you for sharing your findings. You might find it interesting, the model, can answer simple math questions despite not being trained on any math. Kinda cool it got some emergent properties.

Yes, but I wouldn't say it has not been trained on any math. It definitly saw math during it's pretraining, althrough I'm not sure how much of this is usable after the distillation.

Owner

Yeah agreed, I meant that I was able to regain some of its abilities, even the ones I didn’t train it on, which is a positive sign.

The model had very very little training. I chose a small set of 1k questions, spanning from just 500 dolly questions, and 500 tested python code. The v2 has 8k questions 2800 of them being share gpt4 multiturn, 1500 multiturn code, 1000 dolly, json mode, agent abilities, and 1000 tested python code. I forgot how many were for the other ones. However, it seems like I can get a sponsor for the v.3, so I might just be able to throw Open Hermes, dolphin, and some shareGPT at it with the 8k, of high quality questions I picked out. Yeah see I had given it too much code for its repair dataset 😅, either way hopefully the v2 training will fix this! Thank you for reporting this with us! Keep in mind this was still also very very experimental, I have many more ideas to potentially make this better.

Would love to see Google's infinite attention method used once they release the code for it!

Sign up or log in to comment