New Qwen2.5-Max exceeds DeepSeek capabilities
11:54, 31.01.2025
After the releases of Qwen2.5, Qwen2.5-VL, a new version of Qwen2.5-Max has become available. The new version of Qwen shows top performance over the DeepSeek V3 in the following benchmarks - GPQA-Diamond, Arena-Hard, LiveCodeBench, and LiveBench.
Architecture and Model Features
The Max version is a fairly large-scale project of the Mixture of Experts model. The uniqueness of this particular model was in training on real user feedback (RLHF), using Supervised-Fine-Tuning, and of course training on 20 trillion tokens.
At the moment, the data for the new version has not yet been posted on GitHub, only access to the API and Qwen Chat is available for now. There's a good chance that the lack of data on HuggingFace and GitHub indicates a rush to unveil the new project or a planned promotion by the company to incentivize the adoption of their cloud platform.
Qwen has published results regarding the new model. According to the open data table of the new Qwen version compared to LLaMA3.1 and DeepSeek-V3, the Max version outperforms its competitors in most characteristics. When compared to Claude Sonnet and GPT, the Max version loses to GPT.
The company has invested a significant budget in training data, and the superiority over competitors exists, but it is relatively insignificant. Because of this, some experts have the theory that it is possible to extend the capabilities of language models by using computing power during testing.