A fresh approach to constructing neural networks could shed light on AI’s decision-making process.
This simplified method offers better insight into how neural networks generate their outputs.
A slight change in the functioning of artificial neurons within neural networks might make AI systems more transparent.
For decades, artificial neurons—the core components of deep neural networks—have remained largely unchanged. While these networks power modern AI, they’re often seen as mysterious black boxes.
Current artificial neurons, found in advanced language models like GPT4, process numerous inputs, combine them, and transform the result into an output using a complex operation within the neuron. Neural networks are made up of these neurons, and their collective behavior can be challenging to interpret.
However, the new method of combining neurons operates differently. It simplifies some of the complexity of existing neurons and moves it outside the neuron itself. The new neurons simply add up their inputs and produce an output, without the need for additional internal calculations. These networks are called Kolmogorov-Arnold Networks (KANs), named after the Russian mathematicians who inspired their creation.
This simplification, extensively researched by a team led by MIT scientists, could make it easier to understand why neural networks produce certain results, verify their decisions, and even examine them for bias. Early findings also suggest that as KANs grow larger, their accuracy improves more rapidly than networks using traditional neurons.
Andrew Wilson, a machine learning expert at New York University, says, “It’s promising research. It’s great to see people fundamentally rethinking the design of these [networks].”
While the basic concepts of KANs were first proposed in the 1990s, with researchers building simple versions over time, the MIT-led team has taken the idea further. They’ve shown how to construct and train larger KANs, conducted empirical tests, and analyzed some KANs to demonstrate how humans could interpret their problem-solving abilities. “We’ve given new life to this concept,” said team member Ziming Liu, a PhD student in Max Tegmark’s lab at MIT. “And hopefully, with this improved interpretability, we may no longer have to view neural networks as black boxes.”
Although it’s still early, the team’s work on KANs is gaining attention. GitHub pages have emerged, showcasing how to use KANs for various applications, from image recognition to solving fluid dynamics problems.
Discovering the Formula
The recent breakthrough occurred when Liu and his team from MIT, Caltech, and other institutions sought to unravel the internal mechanisms of standard artificial neural networks.
Today, nearly all AI systems, including those used in large language models and image recognition, incorporate sub-networks known as multilayer perceptrons (MLPs). In an MLP, artificial neurons are organized into densely connected layers. Each neuron contains an “activation function”—a mathematical operation that processes multiple inputs and transforms them in a predetermined way to produce an output.
In an MLP, every artificial neuron receives inputs from neurons in the previous layer, multiplies each input by a corresponding “weight” (a value representing the significance of that input), and sums them up. This sum is then passed through the neuron’s activation function to generate an output, which is transmitted to neurons in the next layer. The MLP learns tasks such as differentiating between images of cats and dogs by adjusting the input weights for each neuron. Notably, the activation function itself remains fixed during the training process.
Once training is complete, the entire network of neurons and their connections effectively act as a complex function. This function takes an input, like thousands of pixels in an image, and produces the desired output, such as identifying a cat with a 0 or a dog with a 1. Understanding the mathematical form of this function is crucial for comprehending why a network produces certain outputs. For instance, why does it label someone as creditworthy based on their financial information? However, MLPs often behave like “black boxes,” making it extremely difficult to reverse-engineer their processes for complex tasks like image recognition.
Even when Liu’s team attempted to reverse-engineer MLPs using simplified “synthetic” data, they encountered significant challenges.
“If we can’t interpret neural networks using synthetic datasets, there’s little hope for real-world ones,” says Liu. “We realized it was incredibly hard to understand these networks, so we aimed to redesign the architecture.”
Mapping the math
Most AI systems today, including those powering large language models and image recognition, use multilayer perceptrons (MLPs). These networks have layers of connected artificial neurons. Each neuron contains an “activation function” that transforms multiple inputs into an output using a set mathematical operation.
In an MLP, neurons get inputs from all neurons in the previous layer. They multiply each input by a “weight” that shows its importance. These weighted inputs are added and fed into the neuron’s activation function to create an output for the next layer. MLPs learn tasks like telling cats from dogs by finding the right input weights for all neurons. The activation function stays fixed during training.
Once trained, an MLP’s neurons and connections work together as a function. This function takes an input (like image pixels) and gives an output (like “cat” or “dog”). Understanding this function’s math is key to knowing why it gives certain outputs, like why it labels someone creditworthy. But MLPs are hard to understand. It’s nearly impossible to reverse-engineer complex networks like those used in image recognition.
Liu and his team even struggled to reverse-engineer MLPs for simpler tasks with custom “synthetic” data.
“If we can’t interpret these synthetic datasets from neural networks, it’s hopeless for real-world data,” Liu says. “We found it hard to understand these neural networks. We wanted to change the architecture.”
Mapping the math
The main change was removing the fixed activation function. Instead, they added a simpler learnable function to transform each input before it enters the neuron.
Unlike an MLP neuron’s activation function that handles many inputs, each simple function outside a KAN neuron takes one number and outputs another. During training, instead of learning individual weights like an MLP, the KAN learns how to represent each simple function. In a paper on the ArXiv preprint server this year, Liu’s team showed these simple functions outside neurons are easier to interpret. This makes it possible to reconstruct the math form of the function the whole KAN is learning.
The team has only tested KANs on simple, synthetic datasets so far, not on complex real-world problems like image recognition. “We’re slowly pushing the boundary,” Liu notes. “Interpretability can be very challenging.”
Liu’s group has also shown that KANs become more accurate with size faster than MLPs, both theoretically and empirically for science-related tasks like approximating physics functions. “It’s unclear if this will extend to standard machine learning tasks, but it seems promising for science-related ones,” Liu says.
However, Liu admits KANs have a significant drawback: they need more time and computing power to train compared to MLPs.
“This limits KANs’ application efficiency on large-scale datasets and complex tasks,” says Di Zhang from Xi’an Jiaotong-Liverpool University. But he suggests that more efficient algorithms and hardware accelerators could help address this issue.
Despite these challenges, the potential for better interpretability and faster accuracy gains with size make KANs an intriguing alternative to traditional MLPs for certain applications.
Ready to Dive Deeper into AI?
Are you fascinated by new AI architectures like Kolmogorov-Arnold Networks? Imagine harnessing the power of these innovations to boost your own AI skills! If you’re looking to understand AI on a practical level without all the tech jargon, explore our Prompt Engineering for Leaders: Mastering AI Conversations Without the Tech Jargon course. This course is designed to empower professionals to leverage AI with ease, even without a technical background.
Start your AI journey today and see how AI can work for you!