AI is just one year away from beating 'Humanity's Last Exam'
Share this @internewscast.com

Developers are asserting that artificial intelligence will soon be capable of achieving perfect scores on one of the globe’s most demanding assessments, dubbed Humanity’s Last Exam (HLE), within the coming months.

The HLE was devised by technology leaders to evaluate the intelligence of their systems. It features 2,500 carefully curated questions that cover a broad spectrum of topics, including rocket science, mythology, and physiology.

To tackle these questions, a PhD-level understanding is required, and achieving near-perfect marks would bestow the title of ‘universal expert’ upon the examinee.

Merely two years ago, OpenAI’s highly praised ChatGPT managed to score only 3 percent on this exam, with competitors at Google and Anthropic showing similar results.

This test served as a reassurance against fears concerning AI’s burgeoning influence, with researchers highlighting that it demonstrated a significant gap still existed between large language models (LLMs) and the world’s top scholars.

However, what was once thought to be an insurmountable challenge posed by the HLE might soon become just another landmark in the relentless advancement of AI.

Google Gemini scored an impressive 45.9 per cent on the exam last month having soared to a score of 18.8 per cent within months of its first attempt.

And full marks are on the horizon, according to Calvin Zhang, the research lead at Scale, the AI company behind HLE.

AI will be ready to score full marks on one of the world's most challenging knowledge tests branded Humanity's Last Exam (HLE) in a matter of months, developers claim (Stock Photo)

AI will be ready to score full marks on one of the world’s most challenging knowledge tests branded Humanity’s Last Exam (HLE) in a matter of months, developers claim (Stock Photo)

‘We wanted to create this close-ended academic benchmark, set to the frontier of expert humans, that only a handful of people on earth can really solve,’ he said.

‘We’ve seen over the past few years insane progress on these language models. It’s impressive, model builders have really done a great job at improving these reasoning models.’

Kate Olszewska, a product manager at Google DeepMind added: ‘If we truly cared about this as the only thing in life, I think we could get to it pretty quickly.’ 

Anthropic – the company behind the Claude AI system – has achieved a score of 34.2 per cent in HLE and is improving its marks at a rapid pace.

AI returning a score of 100 per cent in the exam would be a significant development given the test is ‘designed to be the final closed-ended academic benchmark of its kind’, according to its authors.

It means that if the technology cracks the HLE, it will need to be tested on questions no human knows the answer to in future.

The test was created by researchers at Scale and the Center for AI Safety, a non-profit organisation, to examine both the AI’s breadth of knowledge and its depth of reasoning.

Experts from roughly 50 countries submitted 70,000 questions for consideration in response to a global appeal in September 2024 which offered a $500,000 prize pot.

They had to require a short unambiguous answer and be difficult to find on the internet.

The list was whittled down to 13,000 after questions which any existing model could answer were removed from consideration.

Some of the 2,500 that were chosen have since been removed or edited following feedback from users. 

They require a wide-range of expertise – from knowledge of biology to proficiency in languages – and a large number of them have remained secret in a bid to stop systems benefiting from answers being publicly discussed online.

Success in HLE would evoke memories of IBM’s supercomputer Deep Blue defeating world chess champion Garry Kasparov in a game in 1997, confounding most experts’ predictions.

Since then, a string of major AI benchmarks have been cleared including the multi-disciplinary Massive Multitask Language Understanding, released in 2020, which was canned after systems began finding it too easy, often scoring above 90 per cent.

As AI approaches the stage where it can master human-made tests, expanding beyond the existing limits of human knowledge has increasingly become the main focus of developers, Ms Olszewska added.

But there will always be room for human specialism, according to Zhang, with physical fields such as surgery, as well as decision-based skills including judgment and creativity harder for AI to master. 

Share this @internewscast.com
You May Also Like

U.S. Fighter Jets Observed in the Middle East Amid Speculation of Iran Conflict

A noticeable movement of U.S. Air Force jets was monitored near the…

Inside the Friendship Drama: Caitlin Clark and Lexie Hull’s Near Breakup Story

Lexie Hull finds herself back in the swing of things with the…

Unveiling the Black Dahlia Mystery: Groundbreaking DNA Evidence Emerges in Cold Case Investigation

Groundbreaking developments have emerged in the investigation of whether a single individual…

Controversy Erupts: DC Mayoral Hopeful Criticized Over Grocery Store Closures

A candidate running for mayor in Washington, D.C., is drawing criticism for…

Inspiring Act of Sportsmanship: Boston Marathon Runners Unite to Help Exhausted Competitor Cross Finish Line

In the aftermath of the iconic Boston Marathon’s 130th running, two compassionate…

Unlock Your Career Potential: Join the Civil Service as a Vetting Officer – 10 Exciting Opportunities Available!

The civil service is seeking to bolster its ranks with an “exciting”…

Rising Football Star Shares Crucial Cancer Warning Signs He Overlooked: A Cautionary Tale for Early Detection

A promising young athlete has shared his experience after what he initially…

Unexpected Ocean Encounter: Lewis Hamilton and Kim Kardashian’s Surprising Seaside Moment Sparks Buzz

Kim Kardashian and Lewis Hamilton were spotted enjoying each other’s company amidst…

Family Speaks Out After Singer D4vd Charged in Shocking Murder Case

The family of Celeste Rivas Hernandez has finally spoken out following the…

You’ll Never Believe What Happened When I Asked a Stranger to Guess My Age

Imagine this: a lavish charity luncheon held at a chic Italian eatery…

Renowned Traffic Singer and British Rock Icon Dave Mason Passes Away at 79

Renowned musician Dave Mason, who gained fame as the lead vocalist and…

Bride’s Honeymoon in Japan Takes Unexpected Turn with Emergency Hospital Visit

A Texas nurse found herself in a dire situation while on her…