Lightmatter's $400 million funding round has generated excitement among AI hyperscalers for photonic data centers.

In the case of photonic computing, startup Lightmatter has raised $400 million in order to blow one of the bottlenecks in the modern data center wide open
Lightmatter's $400 million funding round has generated excitement among AI hyperscalers for photonic data centers.

In the case of photonic computing, startup Lightmatter has raised $400 million in order to blow one of the bottlenecks in the modern data center wide open: the interconnect layer. This ensures hundreds of GPUs that can work synchronously can highly simplify the extremely costly and complex task of training and running AI models.

The growth of AI and its correspondingly massive compute requirements has been supercharging the data center industry, but it doesn't work that way by plugging in another thousand GPUs. High-performance computing experts have known for years that it doesn't matter how fast each node of your supercomputer is if those nodes sit idle half the time waiting for data to arrive.

The interconnect layer or layers are really what turn racks of CPUs and GPUs into effectively one giant machine-so it follows that the faster the interconnect, the faster the data center. And it's looking like Lightmatter builds the fastest interconnect layer by a long shot, using the photonic chips it's been developing since 2018.

"Hyperscalers know if they want a computer with a million nodes, they can't do it with Cisco traditional switches. Once you leave the rack you go from high-density interconnect to basically a cup on a string," Nick Harris, CEO and founder of the company, said to TechCrunch. You can see a short talk he gave summarizing this issue here.)

The state of the art, he said, is NVLink and particularly the NVL72 platform, which puts 72 Nvidia Blackwell units wired together in a rack, capable of a maximum of 1.4 exaFLOPs at FP4 precision. But no rack is an island, and all that compute has to be squeezed out through 7 terabits of "scale up" networking. It's a lot, and it is, but one of the main impediments to doing more with this sort of system is the inability to network these units faster to each other and to other racks.

"For a million GPUs, you need multiple layers of switches, and that adds a huge latency burden," said Harris. "You have to go from electrical to optical to electrical to optical… the amount of power you use and the amount of time you wait is huge. And it gets dramatically worse in bigger clusters."

So what does Lightmatter bring to the table? Fiber. Lots and lots of fiber, routed through a purely optical interface. With up to 1.6 terabits per fiber (using multiple colors), and up to 256 fibers per chip… well, let's just say that 72 GPUs at 7 terabits starts to sound positively quaint.

According to Harris, "Photonics is coming way faster than people thought — people have been struggling to get it working for years, but we're there," said Harris. "After seven years of absolutely murderous grind," he added.

The photonic interconnect now available from Lightmatter does 30 terabits, while the on-rack optical wiring could let 1,024 GPUs work in lockstep within their own specially designed racks. If you're wondering how come the two numbers don't scale by comparable multiples, it's because much of what could need to be networked to another rack could be done on-rack in a thousand-GPU cluster. (And anyway, 100 terabit is on its way.)

The market is huge for this, Harris noted, with companies from Microsoft and Amazon to newer entrants such as xAI and OpenAI showing limitless appetites for compute. "They are chaining together buildings! I wonder how long they can keep it going," he quipped.

Many of these hyperscalers are already customers, though Harris wouldn't name any. "Think of Lightmatter a little like a foundry, like TSMC," he said. "We don't pick favorites or attach our name to other people's brands. We provide a roadmap and a platform for them — just helping grow the pie."

But, he coyly added, "you don't quadruple your valuation without leveraging this tech," perhaps an allusion to OpenAI's recent funding round valuing the company at $157 billion, but the remark could just as easily be about his own company.

This $400 million D round values it at $4.4 billion, roughly the same multiple of its mid-2023 valuation that "makes us by far the largest photonics company. So that's cool!" Harris said. The round was led by T. Rowe Price Associates with existing investors Fidelity Management & Research Company and GV.

What's next? In addition to interconnect, the company is working on new substrates for the chips so they may better do those more intimate forms of networking, if you will, using light.

Apart from interconnect, Harris forecasted, for example, that power per chip will be the big difference going forward. "In 10 years you'll have wafer-scale chips from everybody — there's just no other way to improve the performance per chip," he said. Cerebras is working on this of course, and whether they can make headway into the true value that advance represents, at this point in the technology curve, remains to be seen.

For Harris, though, seeing the chip industry come up against a wall, he intends to be ready and waiting for the next step. "Ten years from now, interconnect is Moore's Law," he said.

Blog
|
2024-10-17 19:50:48