Researchers reveal bias in a widely used measure of algorithm performance

When scientists test algorithms that sort or classify data, they often turn to a trusted tool called Normalized Mutual Information (or NMI) to measure how well an algorithm's output matches reality. But according to new research, that tool may not be as reliable as many assume.

computer-sciences

Enabling small language models to solve complex reasoning tasks

As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that human-like reasoning is around the corner. In reality, they still trail us by a wide margin on complex tasks. Try playing Sudoku with one, for instance, where you fill in numbers one through nine in such a way that each appears only once across the columns, rows, and sections of a nine-by-nine grid. Your AI opponent will either fail to fill in boxes on its own or do so inefficiently, although it can verify if you've filled yours out correctly.

computer-sciences

AI agents debate their way to improved mathematical reasoning

Large language models (LLMs), artificial intelligence (AI) systems that can process and generate texts in various languages, are now widely used worldwide to create written content, source information and even to code websites or applications. While these models have improved significantly over the past couple of years, their answers can sometimes contain factual inaccuracies or logical inconsistencies.

computer-sciences