latest news



DZone.com Feed

A Unified Framework for SRE to Troubleshoot Database Connectivity in Kubernetes Cloud Applications (Wed, 25 Feb 2026)
The ability to have an application or business connect with the right information at the right time is key to making informed decisions in today’s digital and AI world. Having an efficient, reliable connection between an application and its database enables businesses to best serve their customers. Traditional troubleshooting methods used on many enterprise systems are no longer sufficient to troubleshoot these complex, multi-layered Kubernetes systems. The layered troubleshooting framework described in this article can be used by developers, cloud architects, and site reliability engineers (SREs) as a structured approach to quickly determine the root cause of failures and achieve stability in production environments.  A layered approach to troubleshooting is necessary to provide an understanding of how all the different components of a system relate to one another, which is critical to being able to resolve problems quickly and efficiently. Troubleshooting the communication layer between an application and its database is one of the most complex tasks for developers, cloud architects, and SREs working with Kubernetes-based cloud-native applications. 
>> Read More

From Keywords to Meaning: The New Foundations of Intelligent Search (Wed, 25 Feb 2026)
I still remember a moment that should have been simple. A product team wanted a search experience that felt obvious to users. Type “red running shoe” and get red running shoes. We had the catalog, filters, indexing, and engineers (including me) confidently saying, “This is straightforward.”
>> Read More

How We Cut AI API Costs by 70% Without Sacrificing Quality: A Technical Deep-Dive (Wed, 25 Feb 2026)
The Wake-Up Call I'll be honest — we screwed up. Like a lot of engineering teams, we built our AI features fast and worried about costs later. "Later" came faster than expected when our finance team flagged our OpenAI bill crossing five figures monthly. The real problem wasn't just the dollar amount. It was that we had zero visibility. We didn't know:
>> Read More

Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB) (Wed, 25 Feb 2026)
Modern authorization endpoints often do more than approve a request. They evaluate complex policies, compute rolling aggregates, call third‑party risk services, and enforce company/card limits, all under a hard latency budget. If you miss it, the transaction fails, and the failure is customer-visible. This post walks through a practical approach to take a Python authorization API from roughly ~3.2s P99 down to ~650ms P99, using a sequence of changes that compound: query/index correctness, deterministic query planning, connection pooling and warmup, and parallelizing third‑party I/O.
>> Read More


DevOps Cafe Podcast

DevOps Cafe Ep 79 - Guests: Joseph Jacks and Ben Kehoe (Mon, 13 Aug 2018)
Triggered by Google Next 2018, John and Damon chat with Joseph Jacks (stealth startup) and Ben Kehoe (iRobot) about their public disagreements — and agreements — about Kubernetes and Serverless. 
>> Read More

DevOps Cafe Ep 78 - Guest: J. Paul Reed (Mon, 23 Jul 2018)
John and Damon chat with J.Paul Reed (Release Engineering Approaches) about the field of Systems Safety and Human Factors that studies why accidents happen and how to minimize the occurrence and impact. Show notes at http://devopscafe.org
>> Read More

DevOps Cafe Ep. 77 - Damon interviews John (Wed, 20 Jun 2018)
A new season of DevOps Cafe is here. The topic of this episode is "DevSecOps." Damon interviews John about what this term means, why it matters now, and the overall state of security.  Show notes at http://devopscafe.org
>> Read More