AI-Augmented DevOps with Platform Engineering

Table of Contents

Ask AI about this article

When the CEO or CIO comes to you and says “We need AI in our development process,” the right response is not to start implementing immediately. The right response is to ask: why? In this talk at Conf42 Platform Engineering, I walk through the complete journey from identifying value stream bottlenecks to implementing AI-augmented DevOps on a real platform, including a live demo.

Start with the Value Stream, Not with AI
#

When management asks for AI, the real goals behind that request are usually: faster time to market, more value for money, and higher quality. These are legitimate goals. But to achieve them, you first need to understand your value stream.

Value stream mapping is the essential starting point. Bring all the people working across a value stream into a room, lay out every step from idea to production, identify who is responsible for each step, and measure three things: lead time (total elapsed time), process time (actual value-adding work), and percentage complete and accurate (quality of output).

When you have this picture, bottlenecks become immediately visible. For example, a testing step might show 8 hours of actual work but 336 hours of lead time, and a 50% quality rate meaning half the work needs to be redone. These are the spots where improvement will have the biggest impact.

Not everything needs to be solved with AI. Sometimes it is far better to just improve the process or rethink it entirely.

Where AI Can Help Across the DevOps Lifecycle
#

Once you have identified genuine bottlenecks, AI can be applied strategically across the entire DevOps lifecycle:

Plan: Analyze historic project data, predict risks, resource needs, and delivery timelines
Code: Generate, refactor, debug, and explain code with copilots
Build: Auto-remediate security vulnerabilities
Test: Impact analysis of changes and intelligent test selection
Deploy: Predict deployment impact, monitor health, auto-trigger rollbacks
Release: Continuous release verification and impact analysis
Operate: Detect and fix configuration drift automatically
Monitor: Pattern recognition, anomaly detection, event correlation, root cause analysis, and self-healing (this is often called AIOps)

These are powerful capabilities, but they all require the right foundation.

Platform Engineering: The Foundation for AI at Scale
#

Without standardization, AI-augmented DevOps does not scale. When every team has its own local development environment and its own tool landscape, there is no consistent foundation to build on.

Platform engineering solves this. You create one platform where tools and capabilities are standardized. The target operating model has two types of teams: product teams that are smaller and focused, and a platform team that provides self-service capabilities through an internal developer platform.

The architectural principle I always emphasize is to build a floating platform. This means you plug in services and tools (GitHub, GitLab, Kubernetes, cloud providers) and provide a developer experience layer on top. You never duplicate features from the tools below. You integrate them through adapters so they can be swapped when needed. If you start hiding, abstracting, or duplicating features, your platform will sink.

AI as a Platform Capability
#

In the platform architecture, AI is just another capability, but a powerful one. When you zoom into that AI capability, the layers look like this:

Developer Portal and APIs at the top, exposing AI services to product teams
Application Layer with chatbots, AI coding assistants, and knowledge management
Tooling Layer with prompt engineering, RAG systems, and vector databases
Model Layer with a model hub, enterprise-specific models, and fine-tuned configurations
Infrastructure Layer integrating Gen AI APIs from cloud providers

Live Demo: AI in Action on a Real Platform
#

During the talk, I demonstrated the platform we built together with LGT, a bank in Liechtenstein. The platform, called Zühlke Plane, is used both internally at Zühlke and by LGT with their own instance. Here are some of the AI features in production:

AI Chat: A ChatGPT-like interface deployed at Zühlke in a standardized, governed way. Employees do not need to know which LLM is running behind it, and the underlying service can be replaced transparently.

Container Image Analysis: Developers can view container image layers and click a button to get an AI analysis. The LLM is optimized for container image analysis and provides actionable insights. The feedback from developers has been extremely positive.

Log File Analysis: With all logs flowing through the platform, developers can analyze entire Kubernetes namespaces using AI. The system examines log patterns and provides structured insights, saving significant time in troubleshooting.

Self-Service AI Capabilities: Product teams can add Azure OpenAI or a full LLM platform to their applications through the service catalog. Teams have built applications like a project reference finder and a bidding process optimizer using specialized AI agents with custom system prompts. These were built in minimal time because the platform provided all the infrastructure.

The Floating Platform Principle
#

One of the most important lessons from this work: the platform must float. It must sit on top of all the tools and cloud providers without trying to replace them. When tools release new features, you benefit automatically. When you need to swap a tool (for example, replacing GitLab with GitHub), the adapter pattern makes this possible without disrupting the platform.

The moment you abstract away or duplicate a feature from the underlying tools, your platform starts to sink. This is the single biggest architectural mistake I see in platform engineering.

Key Takeaways
#

Start with value stream mapping, not with AI. Understand your bottlenecks before reaching for technology solutions.
AI is not always the answer. Sometimes process improvement is simpler and more effective.
Platform engineering is the foundation for AI-augmented DevOps at scale. Without standardization, nothing scales.
Build a floating platform. Integrate tools through adapters, never duplicate their features.
AI is a platform capability. Provide it as a self-service to product teams so they can build their own use cases.
Enable the business, not just engineering. The platform’s AI capabilities can power business applications like reference finders and process optimizers, not just developer tools.

We are entering the age of industrialized software development. Platform teams build the platform that enables development teams to do AI-augmented DevOps, and that also enables the business to innovate with AI.

Start with the Value Stream, Not with AI#

Where AI Can Help Across the DevOps Lifecycle#

Platform Engineering: The Foundation for AI at Scale#

AI as a Platform Capability#

Live Demo: AI in Action on a Real Platform#

The Floating Platform Principle#

Key Takeaways#