My head is currently spinning like a top. I foolishly wondered how much power AI-heavy data centers are currently consuming, and how much they are expected to consume in the coming years, and now Im sorry I asked.
www.eejournal.com, Aug. 26, 2025 –
The International Energy Agency (IEA) forecasts that global electricity demand from data centers will more than double to approximately 945 TWh by 2030, primarily driven by AI-optimized facilities. That’s about the current annual electricity use of Japan. A Deloitte estimate predicts that worldwide AI data center consumption will reach ~90 TWh by 2026, which would account for roughly one-seventh of all data center power use (681 TWh).
By 2030, US data centers are expected to consume approximately 8% of the nation’s electricity, up from 3% in 2022. Utility delays for data center grid connections can stretch up to seven years. Without upgrades, AI expansion could outpace the electric grid’s ability to keep up.
According to Data Center Dynamics, Google CEO Sundar Pichai has stated that the company is now designing data centers that require over 1 gigawatt of power, which is roughly equivalent to the entire output of a conventional large nuclear reactor.
Fortunately, the use of AI is slowing down and is expected to decline next year.
I’M JOKING! (You’re so easy.) In a 2024 forecast, Gartner said that AI inferencing compute demand is expected to grow at a ~40–50% compound annual growth rate (CAGR) through 2027, much faster than most other datacenter workloads. Also in 2024, the International Data Corporation (IDC) estimated that by 2028, ~80–90% of all AI compute in the cloud will be inference, not training, since once models are trained, they’re deployed and queried billions of times. In a crunchy nutshell, total AI inference compute demand is projected to increase by an order of magnitude over the next five years.
Suppose you are the proud owner of an AI data center (or that you are in the process of building one). Now, suppose I told you I knew a way to halve the amount of power required to perform AI inferencing on large language models (LLMs). Of course, knowing the ways of the world, what you’d probably do would be double the amount of inferencing you were performing while maintaining the same power envelope, but that’s just a different way of looking at the same thing.
The reason for my waffling about all this here is that I was just chatting with Jan Pantzar, who is VP of Sales and Marketing at VSORA. This is a French fabless semiconductor company, founded in 2015 by a team of seasoned DSP and AI engineers, based in Meudon-La-Forêt near Paris, with additional offices across Taiwan, Japan, Korea, and the US. Their mission is to deliver ultra-efficient, high-performance chips designed specifically for real-time inference—not training—with a focus on reducing latency, lowering power consumption, and cutting the cost per query
The last time Jan and I spoke was circa 2020/2021. At that time, VSORA had recently released an intellectual property (IP) block for other designers to incorporate into their System-on-Chip (SoC) devices. The primary target market at that time was the automotive industry.
This IP provided a combination of AI inferencing and DSP. It also provided 1 petaflop of inferencing power, which was around eight times the performance of anything else available on the market at that time. In fact, this little scamp received the “Best Processor IP of the Year” award from the Linley Group (now part of TechInsights).
The way in which (and the reasons why) the guys and gals at VSORA transmogrified themselves from an IP provider into a fabless semiconductor company is jolly interesting, but not particularly relevant to the tale I’m about to tell. All we really need to know is that they’ve designed an AI chiplet that provides sufficient inference performance to make your eyes water while also being 4 to 5 times more efficient than anything else on the market.
Take a look at the image below. This shows the insides of VSORA’s flagship product, the Jotunn 8. Implemented using TSMC’s 5nm technology node and intended for AI data center inferencing, this bodacious beauty boasts eight of the aforementioned VSORA chiplets along with eight high-bandwidth memory (HBM3E) die stacks. The external host processor can be an ARM, X86, RISC-V, etc.
The International Energy Agency (IEA) forecasts that global electricity demand from data centers will more than double to approximately 945 TWh by 2030, primarily driven by AI-optimized facilities. That’s about the current annual electricity use of Japan. A Deloitte estimate predicts that worldwide AI data center consumption will reach ~90 TWh by 2026, which would account for roughly one-seventh of all data center power use (681 TWh).
By 2030, US data centers are expected to consume approximately 8% of the nation’s electricity, up from 3% in 2022. Utility delays for data center grid connections can stretch up to seven years. Without upgrades, AI expansion could outpace the electric grid’s ability to keep up.
According to Data Center Dynamics, Google CEO Sundar Pichai has stated that the company is now designing data centers that require over 1 gigawatt of power, which is roughly equivalent to the entire output of a conventional large nuclear reactor.
Fortunately, the use of AI is slowing down and is expected to decline next year.
I’M JOKING! (You’re so easy.) In a 2024 forecast, Gartner said that AI inferencing compute demand is expected to grow at a ~40–50% compound annual growth rate (CAGR) through 2027, much faster than most other datacenter workloads. Also in 2024, the International Data Corporation (IDC) estimated that by 2028, ~80–90% of all AI compute in the cloud will be inference, not training, since once models are trained, they’re deployed and queried billions of times. In a crunchy nutshell, total AI inference compute demand is projected to increase by an order of magnitude over the next five years.
Suppose you are the proud owner of an AI data center (or that you are in the process of building one). Now, suppose I told you I knew a way to halve the amount of power required to perform AI inferencing on large language models (LLMs). Of course, knowing the ways of the world, what you’d probably do would be double the amount of inferencing you were performing while maintaining the same power envelope, but that’s just a different way of looking at the same thing.
The reason for my waffling about all this here is that I was just chatting with Jan Pantzar, who is VP of Sales and Marketing at VSORA. This is a French fabless semiconductor company, founded in 2015 by a team of seasoned DSP and AI engineers, based in Meudon-La-Forêt near Paris, with additional offices across Taiwan, Japan, Korea, and the US. Their mission is to deliver ultra-efficient, high-performance chips designed specifically for real-time inference—not training—with a focus on reducing latency, lowering power consumption, and cutting the cost per query
The last time Jan and I spoke was circa 2020/2021. At that time, VSORA had recently released an intellectual property (IP) block for other designers to incorporate into their System-on-Chip (SoC) devices. The primary target market at that time was the automotive industry.
This IP provided a combination of AI inferencing and DSP. It also provided 1 petaflop of inferencing power, which was around eight times the performance of anything else available on the market at that time. In fact, this little scamp received the “Best Processor IP of the Year” award from the Linley Group (now part of TechInsights).
The way in which (and the reasons why) the guys and gals at VSORA transmogrified themselves from an IP provider into a fabless semiconductor company is jolly interesting, but not particularly relevant to the tale I’m about to tell. All we really need to know is that they’ve designed an AI chiplet that provides sufficient inference performance to make your eyes water while also being 4 to 5 times more efficient than anything else on the market.
Take a look at the image below. This shows the insides of VSORA’s flagship product, the Jotunn 8. Implemented using TSMC’s 5nm technology node and intended for AI data center inferencing, this bodacious beauty boasts eight of the aforementioned VSORA chiplets along with eight high-bandwidth memory (HBM3E) die stacks. The external host processor can be an ARM, X86, RISC-V, etc.
The International Energy Agency (IEA) forecasts that global electricity demand from data centers will more than double to approximately 945 TWh by 2030, primarily driven by AI-optimized facilities. That’s about the current annual electricity use of Japan. A Deloitte estimate predicts that worldwide AI data center consumption will reach ~90 TWh by 2026, which would account for roughly one-seventh of all data center power use (681 TWh).
By 2030, US data centers are expected to consume approximately 8% of the nation’s electricity, up from 3% in 2022. Utility delays for data center grid connections can stretch up to seven years. Without upgrades, AI expansion could outpace the electric grid’s ability to keep up.
According to Data Center Dynamics, Google CEO Sundar Pichai has stated that the company is now designing data centers that require over 1 gigawatt of power, which is roughly equivalent to the entire output of a conventional large nuclear reactor.
Fortunately, the use of AI is slowing down and is expected to decline next year.
I’M JOKING! (You’re so easy.) In a 2024 forecast, Gartner said that AI inferencing compute demand is expected to grow at a ~40–50% compound annual growth rate (CAGR) through 2027, much faster than most other datacenter workloads. Also in 2024, the International Data Corporation (IDC) estimated that by 2028, ~80–90% of all AI compute in the cloud will be inference, not training, since once models are trained, they’re deployed and queried billions of times. In a crunchy nutshell, total AI inference compute demand is projected to increase by an order of magnitude over the next five years.
Suppose you are the proud owner of an AI data center (or that you are in the process of building one). Now, suppose I told you I knew a way to halve the amount of power required to perform AI inferencing on large language models (LLMs). Of course, knowing the ways of the world, what you’d probably do would be double the amount of inferencing you were performing while maintaining the same power envelope, but that’s just a different way of looking at the same thing.
The reason for my waffling about all this here is that I was just chatting with Jan Pantzar, who is VP of Sales and Marketing at VSORA. This is a French fabless semiconductor company, founded in 2015 by a team of seasoned DSP and AI engineers, based in Meudon-La-Forêt near Paris, with additional offices across Taiwan, Japan, Korea, and the US. Their mission is to deliver ultra-efficient, high-performance chips designed specifically for real-time inference—not training—with a focus on reducing latency, lowering power consumption, and cutting the cost per query
The last time Jan and I spoke was circa 2020/2021. At that time, VSORA had recently released an intellectual property (IP) block for other designers to incorporate into their System-on-Chip (SoC) devices. The primary target market at that time was the automotive industry.
This IP provided a combination of AI inferencing and DSP. It also provided 1 petaflop of inferencing power, which was around eight times the performance of anything else available on the market at that time. In fact, this little scamp received the “Best Processor IP of the Year” award from the Linley Group (now part of TechInsights).
The way in which (and the reasons why) the guys and gals at VSORA transmogrified themselves from an IP provider into a fabless semiconductor company is jolly interesting, but not particularly relevant to the tale I’m about to tell. All we really need to know is that they’ve designed an AI chiplet that provides sufficient inference performance to make your eyes water while also being 4 to 5 times more efficient than anything else on the market.
Take a look at the image below. This shows the insides of VSORA’s flagship product, the Jotunn 8. Implemented using TSMC’s 5nm technology node and intended for AI data center inferencing, this bodacious beauty boasts eight of the aforementioned VSORA chiplets along with eight high-bandwidth memory (HBM3E) die stacks. The external host processor can be an ARM, X86, RISC-V, etc.