Alongside its Gemini generative AI model, Google took the wraps off of AlphaCode 2, an enhanced version of the code-generating AlphaCode it introduced from Google's DeepMind lab roughly a year ago.
AlphaCode 2 is in fact powered by Gemini, or at least some variant of it (Gemini Pro) fine-tuned on coding contest data. And it's far more capable than its predecessor, Google says — at least on one benchmark.
In a subset of programming competitions hosted on Codeforces, a platform for programming contests, AlphaCode 2 — coding in languages spanning Python, Java, C++ and Go — performed better than an estimated 85% of competitors on average, according to Google. That’s compared to the roughly 50% of competitors its predecessor managed to best on the same subset.
We chose 12 most recent contests with over 8,000 participants, either from division 2 or the more difficult division '1+2.' That comes to 77 problems," reads a technical whitepaper on AlphaCode 2. "AlphaCode 2 solves 43% of problems within 10 attempts, just about twice as many problems as the original AlphaCode (25%).
AlphaCode 2 can solve computer programming challenges involving "complex" math and theoretical computer science. And, among other reasonably sophisticated techniques, AlphaCode 2 is also capable of dynamic programming, according to DeepMind research scientist Rémi Leblond in a prerecorded video.
Dynamic programming involves making complex problems easier by breaking them down into simpler sub-problems on an ongoing basis; Leblond says that AlphaCode 2 knows not just when to properly apply this strategy but where to apply it. That is impressive, considering programming problems that require dynamic programming were a huge area of failure for the original AlphaCode.
"AlphaCode 2 needs to be able to show some kind of understanding, some kind of reasoning and designing of code solutions before it can get to the actual implementation to solve [a] coding problem," Leblond said. "And it does all that on problems it's never seen before."
AlphaCode 2 solves problems by first tapping a family of "policy models" that generate a number of code samples for each problem. They filter out code samples that don't fit the problem description, group "semantically similar code samples" by a clustering algorithm in order to avoid any redundancies, and a scoring model within AlphaCode 2 surfaces the best candidate out of each of the 10 biggest code samples "clusters" — which constitutes AlphaCode 2's answer to the problem.
Certainly, all AI models have imperfections — and AlphaCode 2 is no different. The whitepaper notes that AlphaCode 2 needs a lot of error-correcting trial and error, is too expensive to run at scale, and has an overly heavy reliance on the ability to filter out obviously terrible code samples. Migrating to a stronger version of Gemini, such as Gemini Ultra, may lessen some of this, speculates the whitepaper.
On whether we can look forward to ever seeing a product out of AlphaCode 2 — AlphaCode was never released — in a briefing, Eli Collins, VP of product at DeepMind, alluded to the possibility.
"One of the things that was most exciting to me about the latest results is that when programmers collaborate with [AlphaCode 2 powered by] Gemini, by defining certain properties for the code to follow, the performance [of the model] gets even better," Collins said. In the future, we see programmers making use of highly capable AI models as collaborative tools that assist with the entire software development process from reasoning about problems to assisting with implementation.