InfoQ Homepage DevOps Content on InfoQ
-
QCon London 2026: Morgan Stanley Rethinks Its API Program for the MCP Era
Morgan Stanley engineers Jim Gough and Andreea Niculcea showed how they're retooling the bank's API program for AI agents using MCP and FINOS CALM. Live demos covered compliance guardrails, deployment gates, and zero-downtime rollouts across 100+ APIs. First API deployment shrank from two years to two weeks. They also demoed Google's A2A protocol running alongside MCP.
-
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
The Azure Kubernetes Service team shared a detailed guide on how to use Dynamic Resource Allocation (DRA) with NVIDIA vGPU technology on AKS. his update improves control and efficiency for shared GPU use in AI and media tasks.
-
QCon London 2026: Wrangling Telemetry at Scale, a Guide to Self-Hosted Observability
At QCon London 2026, Colin Douch discussed building and operating self-hosted monitoring stacks, surveyed the current tooling landscape, and explained how to build a coherent observability setup rather than treating logs, metrics, and traces as separate pillars.
-
QCon London 2026: SBOMs Move From Best Practice to Legal Obligation as CRA Enforcement Looms
In a talk at QCon London 2026, Viktor Petersson argued that software teams are running out of time to adopt SBOMs (Software Bills of Materials) due to pending legislative changes in both the US and Europe. He walked through the current regulatory landscape, spoke on the practical mechanics of generating high-quality SBOMs and on the emerging standards for distributing the resulting artefacts.
-
War in Iran Damages Multiple AWS Data Centers, Challenging Multi-AZ Assumptions
Earlier this month, Iranian drone strikes damaged three AWS data centers in the UAE and Bahrain, causing outages and disruptions to multiple services. The events, which affected multiple facilities within the same AWS region, sparked discussion in the community about how geopolitical conflict can directly impact global cloud infrastructure and multi-AZ deployments.
-
QCon London 2026: Uncorking Queueing Bottlenecks with OpenTelemetry
At QCon London 2026, Julian Wreford and Oli Lane from Gearset showcased how distributed tracing and SLOs solve asynchronous observability gaps. By shifting from queue-size metrics to latency-based alerts, the team improved incident response. Key technical takeaways included using OpenTelemetry trace state for async duration tracking and wide events to uncover hidden architectural waste.
-
QCon London 2026: Shipping Constantly with Humans and Beyond at Monzo
At QCon London 2026, Suhail Patel, a principal engineer at Monzo who leads the bank’s platform group, described how the bank has built a developer platform capable of shipping hundreds of changes to production every day.
-
QCon London 2026: Your Multi-Cloud Strategy Is a Product Problem — Treat It Like One
JP Morgan Chase engineers Luis Albinati and Surabhi Mahajan argued that multi-cloud complexity can't be solved with engineering alone. Speaking at QCon London, they showed how treating multi-cloud as a product with capability mapping, demand governance, and defined users tames the chaos.
-
AI Is Amplifying Software Engineering Performance, Says the 2025 DORA Report
Artificial intelligence is rapidly reshaping the way software is built, but its impact is more nuanced than many organizations expected. The 2025 DevOps Research and Assessment (DORA) report, titled State of AI-Assisted Software Development, finds that AI does not automatically improve software delivery performance.
-
QCon London 2026: How to Run on Three Clouds at Once, and When Not to
Form3 runs UK bank payments across three clouds simultaneously. At QCon London, their engineers explained how they built their custom Kubernetes operators, cross-cloud DNS tricks, and distributed databases, and what happened when they tried to sell them in America. Spoiler: US customers wanted East/West failover, not triple-active multi-cloud.
-
AWS Launches Managed Openclaw on Lightsail amid Critical Security Vulnerabilities
AWS launched managed OpenClaw on Lightsail for AI agent deployment while security concerns mount. The 250k-star GitHub project is affected by CVE-2026-25253, which enables one-click RCE, with 17,500+ vulnerable instances exposed. Bitdefender found 20% of ClawHub skills malicious. AWS blueprint provides automated hardening, but doesn't address architectural security limits.
-
Elastic Releases Version 9.3.0 with Enhanced AI Tools and OTel Support
Elastic 9.3.0 is now available, featuring enhanced vector search indexing for RAG applications and significant upgrades to the ES|QL query language. The release deepens OpenTelemetry integration for vendor-neutral observability and updates the AI Assistant with better contextual analysis. Security visibility is also expanded across Kubernetes and serverless architectures.
-
Cloudflare Introduces Support for ASPA, an Emerging Internet Routing Security Standard
Cloudflare recently announced support for ASPA (Autonomous System Provider Authorization). The new cryptographic standard helps make Internet routing safer by verifying the path data takes across networks to reach its destination and preventing traffic from traversing unreliable or untrusted networks.
-
Netflix Uncovers Kernel-Level Bottlenecks While Scaling Containers on Modern CPUs
Engineers at Netflix have uncovered deep performance bottlenecks in container scaling that trace not to Kubernetes or containerd alone, but into the CPU architecture and Linux kernel itself.
-
Claude Opus 4.6 Introduces Adaptive Reasoning and Context Compaction for Long-Running Agents
Anthropic’s Claude Opus 4.6 introduces "Adaptive Thinking" and a "Compaction API" to solve context rot in long-running agents. The model supports a 1M token context window with 76% multi-needle retrieval accuracy. While leading benchmarks in agentic coding, independent tests show a 49% detection rate for binary backdoors, highlighting the gap between SOTA claims and production security.