By Shin Seo-Hui
Publication Date: 2026-03-31 09:36:00
When major generative artificial intelligence models were given the nationwide mock exam that high school seniors took last month, their scores ranged widely — from levels sufficient to apply to Seoul’s top universities down to lower-mid tier, a new test has found. Average scores nearly doubled between the best and worst performers on the same test paper, and some models produced surprising errors on basic questions requiring contextual understanding.
Jongno Academy administered the March nationwide mock exam’s Korean, math and English sections to major generative AI models and scored them based on raw points, with results released on the 31st. The average scores were 87.8 for Gemini, 59.5 for ChatGPT and 43.7 for Perplexity.
When converted to grade levels, Gemini achieved Grade 1 in Korean and math and Grade 2 in English — performance evaluated as sufficient to apply to so-called “SKY” universities including Seoul National University, Yonsei University and Korea University. ChatGPT, by contrast, remained broadly at Grade 4, while Perplexity plummeted to Grades 6 through 8 in math, starkly exposing performance gaps between models. The paid subscription versions of each model were used for this test.
In terms of time spent by subject, most models completed Korean and English within minutes, but math required considerably longer. Gemini took approximately 40 minutes, ChatGPT 30 minutes and Perplexity about one hour.
Math produced the widest gap. Perplexity scored just 19 in…