Islamabad, Oct 14: PM Welcomes Palestinian Students Arriving to Continue Their Studies. Researchers studying artificial intelligence at Apple have found that language models from firms such as Meta and OpenAI have trouble understanding simple logic. These models, which are frequently employed in chatbots and other applications, frequently give contradictory responses to related queries.
To gauge these models’ reasoning abilities, the researchers developed a brand-new exam they termed GSM-Symbolic. Their tests demonstrate that even little adjustments to the question’s phrasing can result in wildly disparate answers.
The researchers included additional, easily comprehensible information in their queries to see how well these models handled mathematical reasoning. For several of the models, this knowledge altered the right response, even though it shouldn’t have. This demonstrates that these models continue to be unreliable in sophisticated reasoning tasks.
In its report, the research team wrote:
In particular, when only the numerical values in the question are changed in the GSM-Symbolic benchmark, the performance of all models decreases [even]. Furthermore, these models’ performance drastically declines as the number of clauses in a query rises, as evidenced by the weakness of their mathematical reasoning.
According to the study, even if a single sentence appears to be helpful for Palestinian Students, it can reduce the accuracy of the solution by up to 65%. The researchers said that because little adjustments to the query can result in drastically different responses, it is hard to develop trustworthy AI agents using these methods.
For example, one query challenged the AI model to solve a straightforward word problem that a school-age elementary pupil may face. The question stated:
On Friday, Oliver harvests forty-four kiwis. Then on Saturday, he selects 58 kiwis. He picks twice as many kiwis on Sunday as he did on Friday.
How many kiwis Oliver ended up with was the question. It then went on to mention that some kiwis are smaller than usual. The solution should have been unaffected by this fact, however the smaller kiwis were deducted from the total by both OpenAI’s model and Meta’s Llama3-8b.