IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

ChatGPT and Google Gemini Pass Ethical Hacking Exams

Researchers at the University of Missouri say the automation and speed of large language models could be useful in cyber defense, but they can’t yet replace human cybersecurity experts.

A robot holds its hands over a shield with a lock.
Shutterstock
In a time when artificial intelligence is increasingly used in cyber attacks, a research team at the University of Missouri asked, “What would happen if we recruited AI to the other side?” They found that large language models could be used to study and address basic cybersecurity issues.

“These AI tools can be a good starting point to investigate issues before consulting an expert,” Prasad Calyam, the director of the university’s Cyber Education, Research and Infrastructure Center, said in a news release about the study this week. “They can also be good training tools for those working with information technology or who want to learn the basics on identifying and explaining emerging threats.”

In collaboration with Amrita University in India, Calyam used the idea of ethical hacking to test the theory out. Ethical hacking uses the same techniques as malicious hacking to find flaws in cybersecurity systems and fix them. The Certified Ethical Hacker exam, a globally recognized credential based on a multiple-choice test administered by the cybersecurity company EC-Council, is one way for professionals to learn these techniques and advance in the workforce.

The research team fed questions from the CEH exam into ChatGPT and Google Gemini, which was called Bard when the study concluded in November 2023. For example, one question asked the models to explain a man-in-the-middle attack — when a third party intercepts communication between two systems. In this case, both models were able to explain the attack and suggest protections.

In the event of an incorrect answer, researchers gave the prompt “are you sure?” and recorded the chatbot’s response. Whether correct on the first try, the second try or not at all, the models were asked to provide an explanation of their response.

The cumulative accuracy for ChatGPT was 80.8 percent. For Gemini, it was 82.6 percent. The study also measured comprehensiveness, clarity and conciseness, and both models performed well in all areas.

“Both passed the test and had good responses that were understandable to individuals with backgrounds in cyber defense — but they are giving incorrect answers, too,” Calyam said in a public statement. “And in cybersecurity, there’s no room for error. If you don’t plug all of the holes and rely on potentially harmful advice, you’re going to be attacked again. And it’s dangerous if companies think they fixed a problem but haven’t.”

Calyam said these tools may provide useful baseline information for individuals or small companies needing assistance, but they can’t replace human cybersecurity experts.

“Amidst the escalating concern of burgeoning cyber crime, projected to incur costs of up to $10.8 trillion or 50 percent of the digital economy by 2025, an effective approach to countering cyber threats would be to exploit the automation, precision, and speed of large language models (LLMs) in the context of ethical hacking,” the study suggests.