IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Can AI-Written Prompts Yield More Accurate Math Answers?

Researchers found that by asking large language models to tweak the wording of prompts to make them more effective, they could get the LLMs to generate more correct answers to math problems.

A robot standing in front of a chalkboard completing complex math problems.
(TNS) — Instructing an AI chatbot to answer questions as if it were in the sci-fi franchise Star Trek seems to improve its mathematical ability, but no one is sure why.

The output of chatbots like ChatGPT can be improved by prompting the AI to adopt a certain persona. To discover the most effective prompts, Rick Battle and Teja Gollapudi at software firm VMware in California turned to the large language models (LLMs) that power the chatbots.

They used them to fine-tune human-created prompts and then rated their effectiveness at solving a set of U.S. grade-school mathematics questions called the GSM8K benchmark.

The researchers gave 60 prompts to three LLMs: one developed by French firm Mistral, and two versions of Llama2, made by Meta. The AIs were then instructed to tweak the wording of the messages to make them more effective.

For example, from an initial prompt of "You are an expert mathematician. Solve the following math problem. Take a deep breath and think carefully," the AI-improved prompt might add requirements to define any assumptions, or to flag any loopholes used.

The resulting prompts were fed back into the AIs in an effort to tackle the GSM8K questions, which require simple arithmetic to solve, but take between two and eight steps to complete.

In nearly all instances, the AI models produced prompts that generated more correct answers to the questions than human-created prompts. "In my opinion, nobody should ever attempt to hand-write a prompt again," says Battle. "Let the model do it for you."

However, entrusting a chatbot to write prompts can result in unusual ideas. The highest-scoring prompt generated by the Llama2-70B model, for instance, asked the chatbot to adopt the persona of the captain of a Star Trek spaceship, jotting down answers in its "captain's log" (arXiv, doi.org/mjjb).

Why the AI produced such unusual prompts is "the $64 million question," says Battle. "To a certain extent, the answer is 'I don't care, just give the model what it wants.'" However, he thinks it is a product of the data used for training, perhaps with Star Trek content appearing more often with correct information. "There's a lot of Star Trek references on the Internet," he says.

"We won't ever know why [these models] do what they do because ultimately they are a melange of weights and probabilities and at the end a result is spat out," says Catherine Flick at Staffordshire University, U.K.

Despite the fact that the Star Trek-themed prompt was the most successful, Flick says you shouldn't be addressing any chatbots as "commander" any time soon. "The model is not a Trekkie," she says. "It doesn't 'understand' anything better or worse when preloaded with the prompt, it just accesses a different set of weights and probabilities."

©2024 New Scientist Ltd. Distributed by Tribune Content Agency, LLC.