GPT-4.5: A new stage in the development of language models

watch 1m, 42s
views 2

13:51, 28.02.2025

A new language model, GPT-4.5, has been released, which will be more natural than previous versions, but the pricing will be higher.

GPT-4.5 is now available as a “Research Preview” for developers and users of the Pro version. Team and Plus users are scheduled to get access next week.

A significant difference between GPT-4.5 and the o3-mini and o1 models is that the new version responds much faster due to a change in the “unsupervised learning” approach. Since the new model does not think before responding, performance is greatly improved.

GPT-4.5 is also known as Orion and is the largest trained model so far. OpenAI states that the new model will not be “borderline” - such statements from the company may be related to the training of another o3 model.

The price of the model is significantly higher than the GPT-4o and o1 versions and is $75 (for a million input tokens) and $150 (for a million output tokens). Like previous versions, this variant will have a context length of 128,000 tokens.

OpenAI stated that the 2 main approaches (reasoning and learning) will be used as mutually complementary variants. Version 4.5 is already much smarter because of pre-training. There is also a big possibility that the new version of GPT-5 will be able to combine these two features.

Benchmarking results

As for performance tests, the 4.5 model shows good results and achieves 62.5% on SimpleQA. In the same test, Grok 3 shows a score of 43.6%, and GPT-4o shows a score of 43.6%. The hallucination rate is also significantly lowered and is 37.1%. Also, the new version 4.5 dominates the tests of human judgment in everyday matters, creative intelligence, and professional questions.

In STEM tests, results vary from model to model. For example, in the AIME '24 test, the 4.5 model scores at 36.7%, the o3-mini at 87.3%, and the GPT-4o at 9.3%. In the SWE-Bench Verified test, the result is 38.8%, while the o3-mini is 61.0%, and the GPT-4o is 30.7%.

If we compare the results of all benchmarks, the figures are quite stable, and there is no significant jump in performance as SimpleQA showed.

Share

Was this article helpful to you?

VPS popular offers

Other articles on this topic

cookie

Accept cookies & privacy policy?

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we'll assume that you are happy to receive all cookies on the HostZealot website.