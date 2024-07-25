Many computing errors have been historically blamed on bad code/programming, algorithms and/or users’ errors. And that makes sense, as many performance issues are easily traced to software and it has seemingly been one of the major root causes of many computer errors.

Or has it?

Over the last decade or so, a sleeping giant has been uncovered, lurking in the components that undergird all computing: hardware. More specifically, a hardware problem that’s known as Silent Data Corruption (SDC) is to blame for many performance issues. As computing scales massively at a rapid pace with the demands of AI and machine learning algorithms, the issue of Silent Data Corruption has sharpened and become more intense.

But what is Silent Data Corruption? How do we stop it? And why is it such a pervasive, difficult problem to address?

We sat down with Rama Govindaraju, principal engineer at Google, and Robert S. Chappell, partner hardware architecture at Microsoft, to get to the bottom of these questions and more.

