DeepMind’s run of discoveries in fundamental computer science continues. Last year it used a version of its game-playing AI AlphaZero to find new ways to speed up the calculation of a crucial piece of math at the heart of many different kinds of code, beating a 50-year-old record.
Now the company (recently renamed Google DeepMind after a merge with its sister company’s AI lab in April) has pulled the same trick again—twice. Using a new version of AlphaZero called AlphaDev, the UK-based firm has discovered a way to sort items in a list up to 70% faster than the best existing method.
It has also found a way to speed up a key algorithm used in cryptography by 30%. These algorithms are among the most common building blocks in software. Small speed ups can make a huge difference, cutting costs and saving energy.
“Moore’s law is coming to an end, where chips are approaching their fundamental physical limits,” says Daniel Mankowitz, a research scientist at Google DeepMind. “We need to find new and innovative ways of optimizing computing.”
“It’s an interesting new approach,” says Peter Sanders, who studies the design and implementation of efficient algorithms at the Karlsruhe Institute of Technology in Germany and who was not involved in the work. “Sorting is still one of the most widely used subroutines in computing,” he says.
DeepMind published its results in Nature today. But the techniques that AlphaDev discovered are already being used by millions of software developers. In January 2022, DeepMind submitted its new sorting algorithms to the organization that manages C++—one of the most popular programming languages in the world—and, after two months of rigorous independent vetting, AlphaDev’s algorithms were added to the language. This was the first change to C++’s sorting algorithms in more than a decade and the first update ever to involve an algorithm discovered using AI.
DeepMind added its other new algorithms to Abseil, an open-source collection of prewritten C++ algorithms that can be used by anybody coding with C++. These cryptography algorithms compute numbers called hashes that can be used as unique IDs for any kind of data. DeepMind estimates that its new algorithms are now being used trillions of times a day.
AlphaDev is built on top of AlphaZero, the reinforcement learning model that DeepMind trained to master games such as Go and chess. DeepMind’s breakthrough was to treat the problem of finding a faster algorithm as a game and then getting its AI to beat it—the same method it used to speed up calculations in last year’s research.
In AlphaDev’s case, the game involves choosing computer instructions and placing them in order so that the resulting lines of code make up an algorithm. AlphaDev wins the game if the algorithm is both correct and faster than existing ones. It sounds simple, but to play the game well AlphaDev must search through an astronomical combination of possible moves.
DeepMind chose to work with assembly, a programming language that can be used to give specific instructions for how to move numbers around on a computer chip. Few humans write in assembly; it is the language that code written in languages like C++ gets translated into before it is run. The advantage with assembly is that it allows algorithms to be broken down into fine-grained steps—a good starting point if you’re looking for shortcuts.
Computer chips have different slots where numbers get put and processed. Assembly includes basic instructions for manipulating what’s in these slots, like mov(A,B), which tells a computer to move the number that’s in slot A to slot B, and cmp(A,B), which tells the computer to check if what’s in slot A is less than, equal to or greater than what’s in slot B. Long sequences of such instructions can carry out everything that computers do.
AlphaDev plays a move in the game by adding a new assembly instruction to the algorithm it is building. To start, AlphaDev would add instructions at random, generating algorithms that would not run. Over time, just like AlphaZero did with board games, AlphaDev learned to play winning moves. It added instructions that led to algorithms that not only ran, but were correct and fast.
DeepMind focused on algorithms for sorting short lists of three to five items. Such algorithms get called over and over again in programs that sort longer lists. Speed ups in these short algorithms will therefore have a cumulative knock-on effect.
But short algorithms have also been studied and optimized by humans for decades. Mankowitz and his colleagues started with an algorithm for sorting a list of three items just as a proof of concept. The best human-devised version of this algorithm involves 18 instructions. They didn’t believe they’d be able to improve on it.
“We honestly didn’t expect to achieve anything better,” he says. “But to our surprise, we managed to make it faster. We initially thought this was a mistake or a bug or something, but when we analyzed the program we realized that AlphaDev had actually discovered something.”
AlphaDev found a way to sort a list of three items in 17 instructions instead of 18. What it had discovered was that certain steps could be skipped. “When we looked at it afterwards, we were like, ‘Wow, that definitely makes sense,’” says Mankowitz. “But to discover something like this [without AlphaDev], it requires people that are experts in assembly language.”
AlphaDev could not beat the best human version of the algorithm for sorting a list of four items, which takes 28 instructions. But it beat the best human version for five items, cutting the number of instructions down from 46 to 42.
That equates to a significant speed up. The existing C++ algorithm for sorting a list of five items took around 6.91 nanoseconds on a typical Intel Skylake chip. AlphaDev’s took 2.01 nanoseconds, around 70% faster.
DeepMind compares AlphaDev’s discovery to one of AlphaGo’s weird but winning moves in its Go match against grandmaster Lee Sedol in 2016. “All the experts looked at this move and said ‘This isn’t the right thing to do, this is a poor move,’” says Mankowitz. “But actually it was the right move and AlphaGo ended up not just winning the game but also influencing the strategies that professional Go players started using.”
Sanders is impressed, but does not think the results should be oversold. “I agree that machine learning techniques are increasingly a game changer in programming and everybody is expecting that AIs will soon be able to invent new better algorithms,” he says. “But we are not quite there yet.”
For one thing, Sanders points out that AlphaDev only uses a subset of the instructions available in assembly. Many existing sorting algorithms use instructions that AlphaDev did not try, he says. Without using those instructions it is harder to compare AlphaDev to the best rival approaches.
It’s true that AlphaDev has its limits. The longest algorithm that AlphaDev produced was 130 instructions long, for sorting a list of up to five items. At each step, AlphaDev picked from 297 possible assembly instructions (out of many more). “Beyond 297 instructions and assembly games of more than 130 instructions long, learning became slow,” says Mankowitz.
That’s because even with 297 instructions (or games moves) the number of possible algorithms AlphaDev could construct is larger than the possible number of games in chess (10^120) and the number of atoms in the universe (around 10^80).
For longer algorithms, the team plans to adapt AlphaDev to work with C++ instructions instead of assembly. With less fine-grained control AlphaDev might miss certain shortcuts, but it would make the approach applicable to a wider range of algorithms.
Sanders would also like to see a more exhaustive comparison with the best human-devised approaches, especially for longer algorithms. DeepMind says that’s part of its plan. Mankowitz wants to combine AlphaDev with the best human-devised methods, getting AlphaDev to build on human intuition rather than starting from scratch.
After all, there may be more speed ups to be found. “For a human to do this, it requires significant expertise and a huge amount of hours—maybe days, maybe weeks—to look through these programs and identify improvements,” says Mankowitz. “As a result, it hasn’t been attempted before.”