OpenAI gets caught vibe graphing
Share this @internewscast.com

During its major GPT-5 presentation on Thursday, OpenAI unveiled several charts intended to highlight the model’s capabilities. However, upon closer inspection, some of these graphs contained inaccuracies.

One chart that aimed to illustrate GPT-5’s performance in “deception evals across models” had inconsistent scaling. For instance, it reported GPT-5 achieving a 50.0 percent deception rate in “coding deception,” yet this was compared to OpenAI’s smaller o3 score of 47.4 percent, which displayed a disproportionately larger bar.

Another chart displayed a peculiar anomaly, where GPT-5’s score was actually lower than o3’s, yet depicted with a bigger bar. Furthermore, it showed o3 and GPT-4o with differing scores but identically-sized bars. This particular chart was so problematic that CEO Sam Altman commented on it, calling it a “mega chart screwup,” and an OpenAI marketing team member apologized for what they termed an “unintentional chart crime.”

OpenAI did not immediately reply to requests for comments. While it remains unclear whether GPT-5 was used to generate the charts, these issues cast a shadow over the company’s major launch event—particularly when it was promoting the “significant advances in reducing hallucinations” achieved with the new model.

Share this @internewscast.com
You May Also Like

Discover How Netflix is Revolutionizing Entertainment with Interactive Experiences

Netflix is channeling significant resources into gaming, yet this venture is part…

Discover How Google is Selecting 15 Lucky Fans to Test the New Pixel

Terms and Conditions Official Rules for the Pixel Superfans ‘Trusted Tester Program’…

Revolutionize Pet Care: Whisker’s Latest Litter-Robots Feature Cutting-Edge Camera Technology

Introducing the latest innovation in feline care, Whisker has unveiled the Litter-Robot…

Bryan Cranston and SAG-AFTRA Acknowledge OpenAI’s Commitment to Addressing Deepfake Concerns

Since the unveiling of Sora 2’s AI-generated video technology last month, a…