CUDA Proves Nvidia Is a Software Company
Briefly

CUDA Proves Nvidia Is a Software Company
"Popularized decades ago by Warren Buffett to refer to a company's competitive advantage, the word found its way into Silicon Valley pitch decks when a memo purportedly leaked from Google, titled "We Have No Moat, and Neither Does OpenAI," fretted that open-source AI would pillage Big Tech's castle. A few years on, the castle walls remain safe. Apart from a brief bout of panic when DeepSeek first appeared, open-source AI models have not vastly outperformed proprietary models. Still, none of the frontier labs-OpenAI, Anthropic, Google-has a moat to speak of."
"The company that does have a moat is Nvidia. CEO Jensen Huang has called it his most precious "treasure." It is not, as you might assume for a chip company, a piece of hardware. It's something called CUDA. What sounds like a chemical compound banned by the FDA may be the one true moat in AI. CUDA technically stands for Compute Unified Device Architecture, but much like laser or scuba, no one bothers to expand the acronym; we just say "KOO-duh.""
"If forced to give a one-word answer: parallelization. Here's a simple example. Let's say we task a machine with filling out a 9×9 multiplication table. Using a computer with a single core, all 81 operations are executed dutifully one by one. But a GPU with nine cores can assign tasks so that each core takes a different column-one from 1×1 to 1×9, another from 2×1 to 2×9, and so on-for a ninefold speed gain."
"Modern GPUs can be even cleverer. For example, if programmed to recognize commutativity-7×9 = 9×7-they can avoid duplicate work, reducing 81 operations to 45, nearly halving the workload. When a single training run costs a hundred million dollars, every optimization counts."
“Moats” describe durable competitive advantages. Open-source AI has not significantly displaced frontier proprietary models, and major labs lack clear moats. Nvidia has a stronger advantage through CUDA, described as its most precious treasure. CUDA is Compute Unified Device Architecture, used to accelerate computation by parallelizing tasks across many cores. In a 9×9 multiplication table example, a single-core system performs 81 operations sequentially, while a nine-core GPU assigns columns to different cores for a ninefold speed gain. Additional optimizations like using commutativity can reduce operations from 81 to 45. With training runs costing around $100 million, small efficiency improvements matter.
Read at WIRED
Unable to calculate read time
[
|
]