DeepSeek V3: record results in benchmarks
13:47, 26.03.2025
The new DeepSeek model appeared without announcement on HuggingFace. Only after a day did a detailed announcement with a description become available.
Programming and Math Benchmarks
DeepSeek-V3-0324 shows record-breaking performance and scores significantly higher than DeepSeek-V3 in all of the following categories:
- AIME: 59.4
- MMLU-Pro: 81.2
- LiveCodeBench: 49.2
- GPQA: 68.4
Also, in most results, V3-0324 scores better than Claude 3.5.
DeepSeek noted that their new product also outperforms Claude 3.7. After this announcement, there were rumors about a possible training of the new model on the Claude 3.7. There is no confirmation or denial of this information at this time.
Model Updates
Regarding the main updates, they relate to code improvements, and certain changes to the game interfaces and web pages. In addition, the quality of Function Calling has been changed.
Also, the new project has a good base on web search results processing and file reading. In addition to this, the new model has been tested and runs fine on the Mac Studio.