Stop Debugging Code That Works: Identifying False Failures in Kubernetes
(Mon, 19 Jan 2026)
Production debugging has a particular kind of frustration reserved for problems that don't actually exist.
A function deployment fails. The dashboard turns red. Alerts fire across multiple channels. Engineers abandon their current work and start combing through recent commits, reviewing dependencies,
and running local tests. Code reviews get scheduled. Rollback plans get discussed. Hours pass.
>> Read More
Copilot, Code, and CI/CD: Securing AI-Generated Code in DevOps Pipelines
(Mon, 19 Jan 2026)
Three months ago, I watched a senior engineer at a Series B startup ship an authentication bypass to production. Not because he was incompetent — he'd been writing secure code since Django was
considered cutting-edge. He shipped it because GitHub Copilot suggested it, the tests turned green, and he'd learned to trust the little ghost icon more than his own instincts.
The bug sat in prod for six days before a security researcher found it during a routine pen test. No customer data leaked. They got lucky. But that engineer quit two weeks later, not because he
was fired — he wasn't — but because he couldn't reconcile fifteen years of hard-won expertise with the fact that he'd stopped thinking the moment the AI started typing.
>> Read More
RAG at Scale: The Data Engineering Challenges
(Fri, 16 Jan 2026)
Retrieval-augmented generation (RAG) has emerged as a powerful technique for building AI systems that can access and reason over external knowledge bases. RAG enabled us to build accurate and
up-to-date systems by combining the content-generative capabilities of LLMs with user-context-specific, precise information retrieval.
However, deploying RAG systems at scale in production reveals a different reality that most blog posts
and conference talks gloss over. While the core RAG concept is straightforward, the engineering challenges required to make it work reliably, efficiently, and cost-effectively at production scale
are substantial and often underestimated.
>> Read More
IT Asset, Vulnerability, and Patch Management Best Practices
(Fri, 16 Jan 2026)
The vulnerability management lifecycle is a continuous process for discovering, addressing, and prioritizing vulnerabilities in an organization's IT assets
A normal round of the lifecycle has five phases:
>> Read More
DevOps Cafe Ep 79 - Guests: Joseph Jacks and Ben Kehoe
(Mon, 13 Aug 2018)
Triggered by Google Next 2018, John and Damon chat with Joseph Jacks (stealth startup) and Ben Kehoe (iRobot) about their public disagreements — and agreements — about Kubernetes and
Serverless.
>> Read More
DevOps Cafe Ep 78 - Guest: J. Paul Reed
(Mon, 23 Jul 2018)
John and Damon chat with J.Paul Reed (Release Engineering Approaches) about the field of Systems Safety and Human Factors that studies why accidents happen and how to minimize the occurrence and
impact.
Show notes at http://devopscafe.org
>> Read More
DevOps Cafe Ep. 77 - Damon interviews John
(Wed, 20 Jun 2018)
A new season of DevOps Cafe is here. The topic of this episode is "DevSecOps." Damon interviews John about what this term means, why it matters now, and the overall state of security.
Show notes at http://devopscafe.org
>> Read More