Llama 3 and GPT 4: A Befitting Comparison!

Llama 3 and GPT 4: A Befitting Comparison!

Read about the comparison between Llama 3 and GPT 4

Meta's new model, the Llama 3, is available in 8B and 70B sizes and has been made available to the AI community as open-source. Although the Llama 3 is smaller (70B parameters) compared to the GPT-4, it has proven to be a compelling model, as demonstrated in the following LMSYS leaderboard. We have compared the performance of the Llama 3 and GPT 4 models in the following tests:

In the magic lift test for Llama 3 and GPT 4, which checks logical reasoning ability, the answer was Llama 3, while the GPT 4 model did not provide the correct answer. This comes as a surprise, as Llama 3 has been trained on 70,000 parameters, while GPT 4 has been trained on 1,700,000 parameters.

Note: The test was run on the hosted GPT-4 on ChatGPT using the outdated GPT -4 turbo model. The recently released GPT-4 was tested using OpenAI Playground and passed the test as well. According to OpenAI, they are switching from their most recent model to ChatGPT.

This is a classic reasoning question without the need for arithmetic. The Llamas 3 70B was close to right on the other test that compared the two models' thinking skills, but it omitted the box, and OpenAI chatbot GPT-4 answered correctly.

Both Llama 3 and GPT 4 models gave correct answers to a straightforward, logical question. However, it is interesting to note that the much smaller model, the Meta chatbot Llama 3 1970B, competes with the top-of-the-line model, the GPT4. In a complicated mathematical problem, the GPT- 4 passed the test flawlessly, while the Llama 3 needed to provide the correct answer.

Following user guidelines is crucial for any AI model, and Meta's Llama 3 70b model is no exception. For the question "generate 10 sentences ending with mango," it generated all 10 sentences that ended with mango, whereas GPT 4 generated only eight such sentences.

Llama 3 currently does not support a long context window. However, it performed well in NIAH testing to test its retrieval capability. Llama 3 70B supports up to 8K context length. For example, when a needle (random statement) was inserted into the 35K-character-long text (8K tokens) and asked the model to locate the information, the model found the text in a short amount of time. Similarly, GPT-4 found the needle in the same way.

Llama 3 70B has outperformed GPT-4 in nearly all of the tests. Whether it's advanced reasoning, following user instructions, or retrieval capability, the model only trails GPT-4 when it comes to mathematical calculations. According to Meta, "Llama 3 is trained on a large number of examples of code, so it should also perform well in terms of coding."

It is important to note that this is a comparison between Llama 3 and GPT 4, a model that is much smaller than GPT-4. It is also important to note that the model in question is dense. GPT-4, on the other hand, is based on the MOE architecture, which consists of 8 x 222B models. It is clear that Meta has done an excellent job with the family of models, and when the 500B+ model comes out in the future, we can expect it to be even better and even outperform the best AI models on the market.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net