proxmox

Reflection 70b Ai Model Update. Is it Broken? What is going on

Reflection 70b Ai Model Update. Is it Broken? What is going on here?

#Reflection #70b #Model #Update #Broken

“Digital Spaceport”

The Reflection 70b Llama 3.1 was going to be the best Llama 3.1 fine-tune we had ever seen, even lauded as rivaling OpenAI and Claude. The current state is not that, however, and I wanted to give a quick update on what I found out and the direction things are going. HOPEFULLY we can see a killer…

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

6 Comments

  1. I think the benchmarks are very limited. I get that they want "smart", but the biggest failing of LLMs so far is their utter inability "to ask the important question". Instead of asking, we get one of two things: a false assumption (and, because that's usually triggered by the 'safety' training, a good scolding with it); or a massive outpouring of verbiage trying to cover every possible facet of a topic without ever getting to any useful point. This is awful for two reasons: they don't know who we on the other end are at all; what we already know, believe, or understand, and so an insightful question or two would dramatically improve their answers; and secondly, because we're apparently all supposed to live with these things every day, and so will spend our lives being barked at or being drowned with data. In fact, that's exactly why the description of this "reflection" model seemed so interesting to me; because, it seemed to me, that maybe it had the model ask itself "what if I'm wrong" before giving the answer. If it goes to the next step by asking itself how it can find out, maybe we'll finally get an endurable artificial companion!

  2. Already deleted the 2nd upload due to its erratic performance. Some things it answered very well, but others were much worse than the 3.1 70b (which I will never delete).

  3. I understand what you mean in this video. The way we evaluate LLMs now isn’t based on graphs or scores, but more on the experience and feel behind it. Unquantized and quantized models can give different results even if the answer is correct—hard to explain, isn’t it?

  4. Didnt "matt" alreadoy post that they had a problem during upload?

    Idk the actual reality of the benchmark scores, but if its not the real model then probably just prematurely judgement.

  5. simple prompt fails:
    "Write a script that implements the “tree -L 2” functionality, but in bash without using tree. make a one-line script, without arguments, for the current directory."
    ANY other LLM can do this more or less correct, except reflection (70b-q8_0 tested). Reflection code just do something useless.

Leave a Reply