OpenAI's GPT-5.2 Achieves 70.9% on GDPval: Beating Human Experts in Professional Work

OpenAI released GPT-5.2 on December 11, 2025, marking a significant breakthrough in artificial intelligence capabilities for professional work. The model achieves a 70.9% score on the GDPval benchmark, meaning it beats or ties top industry professionals on knowledge work tasks according to expert human judges OpenAI. This represents a massive jump from earlier versions and signals a new era where AI can genuinely compete with human expertise across dozens of occupations.

The timing of this release is no accident. CEO Sam Altman declared a code red internally to marshal resources toward improving ChatGPT amid intensifying competition from Google’s well-received Gemini 3 model Towardsai. OpenAI needed to respond quickly to maintain its leadership position in the AI race, and GPT-5.2 appears to be that response.

Three Variants for Different Needs

OpenAI structured GPT-5.2 around three specialized variants, each designed for specific use cases. GPT-5.2 Instant is a fast, capable workhorse for everyday work and learning, with clear improvements in info-seeking questions, how-tos and walk-throughs, technical writing, and translation OpenAI. This version prioritizes speed and handles routine tasks efficiently.

GPT-5.2 Thinking is designed for deeper work, helping users tackle more complex tasks with greater polish, especially for coding, summarizing long documents, answering questions about uploaded files, working through math and logic step by step, and supporting planning and decisions with clearer structure OpenAI. This variant applies more computational resources to solve challenging problems that require multi-step reasoning.

For the most demanding work, GPT-5.2 Pro offers maximum intelligence and reliability, though it comes with longer processing times and higher costs. This version is designed for high-stakes decisions where accuracy cannot be compromised.

Dramatic Performance Improvements

The numbers tell a compelling story about GPT-5.2’s capabilities. The model scored 100% on the AIME 2025 math competition, solving every single problem correctly R&D World. In coding, the improvements are equally impressive, with the model achieving top scores on benchmarks that measure real-world software engineering tasks.

The hallucination rate dropped to 6.2%, down from 10-15% in earlier generations Turingcollege. While one incorrect response out of every sixteen might still seem high, this represents a major step forward in reliability. OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy Towardsai.

Vision capabilities received substantial upgrades as well. The model now handles charts, diagrams, and technical images with significantly fewer errors compared to previous versions. This makes GPT-5.2 particularly valuable for professionals who work with visual data, financial charts, or technical documentation.

Real-World Impact on Productivity

OpenAI claims that Enterprise users will save 40-60 minutes a day, and intensive users will be able to save more than 10 hours a week OpenAI. These time savings come from the model’s improved ability to handle complex tasks like spreadsheet automation, presentation creation, and data analysis.

The GDPval benchmark that measures professional knowledge work provides concrete evidence of these capabilities. GPT-5.2 Thinking beats or ties top industry professionals on 70.9% of comparisons on GDPval knowledge work tasks, completing them 11x faster at under 1% of human cost OpenAI. This covers 44 different occupations, from data analysis to content creation to project management.

All three ChatGPT models have a new knowledge cutoff of August 2025, so answers are more accurate and useful, with more relevant examples and context, even before turning to web search TechCrunch. This updated training data ensures the model has more current information about recent events and trends.

Technical Infrastructure and Training

GPT-5.2 was built in collaboration with long-standing partners NVIDIA and Microsoft, with Azure data centers and NVIDIA GPUs, including H100, H200, and GB200-NVL72, underpinning OpenAI’s at-scale training infrastructure OpenAI. The use of NVIDIA’s latest Blackwell architecture GPUs enabled faster training times and more efficient model development.

The model supports extensive context windows and can process hundreds of documents simultaneously. This long-context understanding capability makes it ideal for enterprise applications where users need to analyze large volumes of information quickly.

For developers, GPT-5.2 introduces new features like preambles, which are brief, user-visible explanations that GPT-5.2 generates before invoking any tool or function, outlining its intent or plan TechCrunch. This transparency helps developers understand the model’s reasoning and improves debugging.

Availability and Pricing

GPT-5.2 began rolling out December 11, starting with paid plans including Plus, Pro, Go, Business, and Enterprise OpenAI. The gradual rollout ensures system stability as more users gain access.

Pricing wise, GPT-5.2 is a rare increase, costing 1.4x more than GPT-5.1, at $1.75 per million input tokens and $14 per million output tokens 9to5Mac. While this represents a 40% cost increase, OpenAI argues the improved capabilities justify the higher price for applications that benefit from enhanced reasoning and reliability.

In the API, GPT-5.2 Thinking is available as gpt-5.2, and GPT-5.2 Instant as gpt-5.2-chat-latest OpenAI. This allows developers to integrate the new models into their applications immediately.

Competitive Landscape

The release comes at a crucial moment in the AI industry. Google’s Gemini 3 has gained significant attention for its multimodal capabilities and performance on various benchmarks. In OpenAI’s own benchmark chart, GPT-5.2 Thinking edges out Gemini 3 on most listed reasoning tests including GDPval, SWE-Bench Pro, and GPQA Diamond R&D World.

However, third-party reviewers note that Gemini 3 remains very strong for certain creative workflows and multimodal tasks. The competition between major AI labs continues to drive rapid innovation, with each company pushing the boundaries of what these systems can accomplish.

Safety and Reliability Improvements

This release includes meaningful improvements in how models respond to prompts indicating signs of suicide or self harm, mental health distress, or emotional reliance on the model OpenAI. OpenAI has implemented targeted interventions to ensure safer responses in sensitive conversations.

The company is also rolling out an age prediction model to automatically apply content protections for users under 18, limiting access to sensitive content. This builds on existing parental controls and shows OpenAI’s commitment to responsible AI deployment.

Despite these improvements, OpenAI acknowledges ongoing challenges. The company continues to work on reducing over-refusals, where the model unnecessarily declines to help with legitimate requests. Balancing safety with helpfulness remains a complex challenge that requires continuous refinement.

What This Means for Users

For everyday users, GPT-5.2 represents a more capable assistant that handles complex tasks with greater reliability. The improved instruction following means users spend less time crafting perfect prompts and more time getting useful results.

Professional users in fields like coding, data analysis, and content creation will likely see the most dramatic benefits. The ability to handle long documents, complex reasoning tasks, and multi-step workflows makes GPT-5.2 a genuine productivity tool rather than just an interesting experiment.

Early testers particularly noted clearer explanations that surface key information upfront OpenAI. This improved communication style makes interactions feel more natural and reduces the need for follow-up questions.

Enterprise users can leverage GPT-5.2 for tasks that previously required human experts, from financial modeling to technical documentation. The combination of speed and accuracy enables new workflows that weren’t economically viable before.

OpenAI stated that GPT-5.2 is one step in an ongoing series of improvements, and they’re far from done OpenAI. The company continues to work on known issues while raising the bar on safety and reliability.

The rapid pace of development suggests we’ll see further iterations soon. As training techniques improve and computational resources grow, models will likely become even more capable. The question isn’t whether AI will continue improving, but how quickly these improvements will arrive and what new applications they’ll enable.

For now, GPT-5.2 sets a new standard for AI performance on professional knowledge work. Whether analyzing financial data, writing code, or creating presentations, the model demonstrates that AI can genuinely compete with human expertise in many domains. The technology has moved beyond novelty into practical utility, and that shift will likely accelerate in the coming months.

The competition between major AI labs ensures continued innovation. Each breakthrough from one company pushes others to respond, driving a cycle of rapid improvement that benefits users across industries. GPT-5.2 represents OpenAI’s latest entry in this competitive race, and it’s a strong one.