GPT-4, the latest version of the AI chatbot ChatGPT, outperforms its predecessor in various aspects. It can score in the 90th percentile on high school and law school tests, while the earlier version typically achieved in the bottom 10%. Additionally, GPT-4 has new capabilities that the earlier version still needs to possess, including identifying vulnerabilities in ETH contracts and converting image, audio, and video inputs to text. OpenAI, the creators of ChatGPT, released GPT-4’s test scores on March 14, which showed that it scored in the top 10% of simulated bar exam takers, while the previous version scored in the bottom 10%. Furthermore, GPT-4 scored 163 in the 88th on the LSAT exam, which is needed for admission to law school in the US.
GPT-4’s performance on the LSAT exam is impressive and would make it a strong candidate for admission to a top 20 law school. It must catch up to the reported scores required for access to highly prestigious law schools like Harvard, Stanford, Princeton, or Yale. In contrast, the previous version of ChatGPT scored poorly on the LSAT, placing it in the bottom 40%.
In addition to its strong performance on the LSAT, GPT-4 also did well on the Uniform Bar Exam, scoring 298 out of 400. Newly graduated law students take this test to become licensed to practice law in any jurisdiction in the US. In contrast, the earlier version of ChatGPT struggled with this exam, scoring in the bottom ten percent with a score of 213 out of 400.
GPT-4 also demonstrated strong performance on the SAT Evidence-Based Reading & Writing and SAT Math exams, which high school students in the US take to gauge their readiness for college. GPT-4 scored in the 93rd and 89th, respectively.
Additionally, GPT-4 excelled in the “hard” sciences, achieving scores well above the average percentile in AP Biology (85-100%), Chemistry (71-88%), and Physics 2 (66-84%).
Assessment theory was a big chunk of my life for several years. I was banging on about this day coming many years ago. I literally sounded like the resident crank at the time.— drnick 🗳️² (@DrNickA) March 14, 2023
But… really this means that anything but invigilated assessment is over from this point on.
The AP Calculus score of GPT-4 was average and ranked in the 43rd to 59th percentile. In two separate English literature exams, GPT-4 posted scores in the 8th to 44th percentile, indicating a lack of proficiency. OpenAI clarified that GPT-4 and GPT-3.5 took these tests from the 2022-2023 practice exams without specific training. The founder of FactoryDAO, Nick Almond, predicted on Twitter that GPT-4 would scare and collapse the global education system.
Conor Grogan, a former director of Coinbase, inserted a live Ethereum smart contract into GPT-4 and discovered several security vulnerabilities that could be exploited. However, earlier smart contract audits on ChatGPT found that its first version could also spot code bugs to a reasonable degree. Rowan Cheung, the founder of the AI newsletter The Rundown, shared a video of GPT-4 transcribing a hand-drawn fake website on paper into code.
I just watched GPT-4 turn a hand-drawn sketch into a functional website.— Rowan Cheung (@rowancheung) March 14, 2023
This is insane. pic.twitter.com/P5nSjrk7Wn