From Sense to Resilience
Wave 2 Active
SO
Context
AI Bulletin
Industry trends, leadership expectations, and technology challenges shaping why SOAR exists
Key Industry Trends
Forces reshaping how operations teams must think, organise, and deliver
Intelligence Over Monitoring
From Monitoring to Operational Intelligence
Operational intelligence delivering insights across incidents, performance, and cost. Explainable analytics supporting decisions and risk management — not just dashboards.
Resilient Outcomes
Reliability & Measurable Service Outcomes
Shifting focus from infrastructure uptime to service reliability and business outcomes. Growing emphasis on proactive operations over reactive firefighting.
Federated Execution
Federated Operations, Centralised Governance
Federated execution with centralised reliability and governance oversight. Integrated ecosystem across cloud providers, platform teams, security, and tooling.
AI as Multiplier
AI-Embedded, Cloud-Scaled Delivery
AI and automation as operational effectiveness multipliers — reducing toil, accelerating decisions, and enabling faster knowledge discovery and collaboration.
Expectations from Leadership
What the organisation needs operations to deliver across people, process, data, and technology
People
Demonstrate measurable operational impact
Upskill teams with AI, governance, and cloud skills
Retain high-value operational talent
Reduce toil and improve engineer experience
Process
Accelerate operational response cycles
Embed FinOps and observability for a holistic view
Standardise tooling to reduce fragmentation
Treat operational data as a strategic asset
Data
Build scalable and reliable operational data platforms
Governance for accurate, explainable, and compliant AI
Cross-platform and cross-domain operational analytics
Embed risk controls and explainability into AI processes
Technology
Build scalable cloud platforms supporting AI-assisted ops
Use AI and automation as effectiveness multipliers
Drive maturity through phased SOAR adoption
Maintain compliance, security, and governance standards
Technology Challenges to Address
Landscape
Unmanageable technology sprawl
Infrastructure complexity outpacing team capacity to govern it
Cost
Uncontrolled recurring spend
Right-fit cloud service selection and tier discipline needed
Risk
Regulatory exposure
Regulatory controls must be designed into architecture, not bolted on
Delivery
Fragmented tooling & inconsistent deployments
Unified tooling and accelerated modernisation without increasing debt
SOAR directly addresses each of these challenges through structured, wave-based AI adoption across all operational pillars.
Reference
Operations Layer Diagram
How product offerings map to the operations capabilities required to sustain them
As organisations leverage AI to transform their product offerings, operations must equally evolve to enable reliability, governance, and intelligent support.
Experience Layer
Performance & Experience Monitoring
Usage Analytics A/B Testing Latency Uptime / Availability Endpoint Protection
Functional Layer
Data Integrity, Retrieval Ops & AI Ops
Rules Validation Human-in-the-Loop RAG Checks Storage Performance Data Security Back-up & Recovery
Intelligence Layer
Model Observability, Quality Assurance & MLOps
Workflow Management LLM / SLM Models Prompt & Context Ops Model Bias & Drift Hallucination Detection Model Versioning
Data Layer
Resource Optimisation, Reliability & Compliance
Semantic & Feed Layer Embeddings Data Sovereignty Legal Checks Business Continuity Incident Management
Infra Layer
FinOps, Compliance Auditing & Networking
Auto-Scaling GPU Utilisation API Health Networking FinOps Compliance Auditing
Governance Layer
Security, Risk & Organisational Data
Transactional Data Proprietary Content Public / Open Data Organisation Data Cost View Auto-Scaling
Each SOAR pillar maps to one or more layers — ensuring operational capabilities evolve in step with product and AI platform needs.
AI Adoption Program · Cloud Operations
Operational AI Value Tracker
Tracking AI adoption, productivity gains and business value across all SOAR pillars
Program start: Jan 2025
Current wave: Wave 2
Last updated: Mar 2026
Program Vision
Transform operations into an intelligence-driven reliability organization powered by AI, automation, and insights.
SOAR evolves operations from reactive execution to intelligence-driven service reliability — augmenting engineers, accelerating decisions, and strengthening resilience.
SOAR Enables — Evidenced in Numbers
Proactive reliability over reactive firefighting
134
incidents prevented
AI-assisted decision support cutting cognitive load
-47%
MTTR · 90→48 min
Engineering productivity removing operational friction
3,200h
engineering hours reclaimed
Governance strengthened while enabling innovation
68%
AI adoption · target 80%
Measurable operational and financial outcomes
$2.4M
cost saved · +18% vs target
SOAR Capability Themes
Sense
Detect & Understand
MTTR improvement -47%
Alert noise reduction -62%
Proactive detections 134
Wave 2 Active 2 pillars →
Optimize
Reduce & Improve
Cost saved $2.4M
Waste eliminated $890K
Rightsizing adoption 71%
Wave 1 Done 1 pillar →
Accelerate
Speed & Automate
Hours saved 3,200
Ticket resolution -38%
Deploy failure rate -22%
Wave 2 Active 1 pillar →
Data
Govern & Enable
Platform availability 99.7%
Data quality score 87%
AI data readiness 72%
Wave 2 Active 1 pillar →
Reinforce
Protect & Scale
Security incidents -31%
Onboarding time -40%
SME dependency -28%
Wave 3 Planned 2 pillars →
Guiding Principles
01
Value-driven & Measurable
Every initiative is tied to a measurable operational or financial outcome — no vanity metrics.
02
Augmentation Before Automation
AI assists and amplifies engineers first — autonomous execution follows once trust is established.
03
Workflow Integration
AI is embedded into existing operational workflows — not deployed as standalone tooling.
04
Phased Adoption
Structured wave delivery builds capability incrementally — from quick wins to autonomous operations.
Delivery Waves
Wave 1 — Complete Quick Wins
Q1 2025
Incident AI summarization
Runbook assistant
Cost anomaly detection
Ticket summarization
Wave 2 — Active Decision Enablement
Q2–Q3 2025
Root cause assistant
Deployment risk analysis
Log anomaly detection
Wave 3 — Planned Autonomous Operations
Q4 2025
Predictive incident detection
Auto-remediation workflows
Intelligent auto-scaling
Recent AI Activity
Cost anomaly detected
Idle EC2 cluster — $42K/mo savings identified
2h ago
Incident triage completed
P1 root cause identified in 4 min vs 90 min avg
5h ago
Runbook auto-generated
DB failover runbook created from incident history
Yesterday
Compliance drift detected
3 misconfigured S3 buckets flagged automatically
Yesterday
Alert deduplication active
412 duplicate alerts suppressed this week
2d ago
Deploy risk assessment
High-risk change blocked pre-production — saved est. 3h outage
3d ago
Team Adoption Progress
Platform Eng.
88%
Cloud Ops
76%
FinOps
72%
Dev Teams
65%
Security
48%
Knowledge
41%
Sense
Intelligent Incident Management
AI-assisted triage and root cause analysis enabling faster, smarter incident response
MTTR
48 min
Down from 90 min
↑ -47% improvement
MTTD
6 min
Down from 22 min
↑ -73% improvement
Escalations
-34%
Fewer P1 escalations
↑ AI triage impact
Recurrence
-28%
Repeat incidents
↑ KB generation effect
MTTR Before vs After AI
Mean Time To Resolve
Before AI90 min
After AI48 min
Target30 min
On track to reach 30 min target by Wave 3 with auto-remediation.
AI Maturity — Incident Management
Current Capability Level
Level 0 Manual operations — human driven
Level 1 AI insights — visibility & summarization Done
Level 2 AI recommendations — root cause & fix suggestions Current
Level 3 AI assisted execution — with human approval Wave 3
Level 4 Autonomous — auto-remediation Future
Active Initiatives
Incident Management AI Portfolio
01 AI incident triage assistant Active High
02 Root cause analysis summarization Active High
03 Log anomaly detection Active Med
04 Auto remediation suggestions Planned High
05 Knowledge base generation from incidents Done Med
AI Triage — Sample Output
Incident P1-2024-0847 · Live Analysis
AI Summary · generated in 12s
Root Cause: Memory pressure on prod-api-03 caused cascade failure. OOM triggered at 02:14 UTC. Pod evictions followed across 3 nodes.
Impact: 4.2% error rate on /checkout endpoint. ~1,400 affected users. Latency p99 elevated to 8.4s.
Suggested Fix: Increase memory limits on prod-api deployment. Apply runbook RB-0041. Scale horizontally +2 pods. Monitor for 30 min.
Analysis time
12s
vs 90 min manual
Confidence
91%
Root cause match
Runbook
RB-41
Auto-matched
Sense
Observability & Reliability Engineering
Unified operational telemetry enabling proactive, intelligence-driven reliability
Alert Noise
-62%
Alerts suppressed/deduped
↑ 412 suppressed/week
False Positives
-54%
Fewer false alerts
↑ Engineer trust up
Proactive Detections
134
Pre-incident catches YTD
↑ +34 this quarter
SLO Compliance
99.4%
Up from 98.1%
↑ +1.3% improvement
Alert Volume — Before vs After AI
Weekly Alert Categories
Total alerts (before)2,840 / week
After deduplication1,080 / week
Actionable alerts390 / week
AI noise reduction freed ~6h/week of on-call engineer time.
SLO Compliance by Service
Current Period Performance
API Gateway
99.0%
Auth Service
99.8%
Data Pipeline
98.9%
Payments
99.5%
Notifications
97.2%
Notifications SLO below 99% target — AI anomaly probe active.
Active Initiatives
Observability AI Portfolio
01 Smart alert deduplication Done High
02 Log anomaly detection Active High
03 Performance anomaly detection Active Med
04 Failure prediction Planned High
05 Capacity risk prediction Planned Med
Proactive Detections — YTD Breakdown
AI-caught issues before user impact
Memory pressure
41
Disk saturation
25
Latency spikes
34
Config drift
20
Capacity risk
14
Avg detection lead time
38 min
Est. incidents avoided
$1.1M
Optimize
Cloud Cost Optimization
FinOps-driven cost governance and AI model flexibility delivering measurable financial outcomes
Total Cost Saved
$2.4M
Cloud waste eliminated YTD
↑ +18% vs target
Waste Eliminated
$890K
Idle & orphaned resources
↑ 214 resources reclaimed
Rightsizing Adoption
71%
Recommendations accepted
↑ Up from 42% last Qtr
Forecast Accuracy
93%
AI cost forecast precision
↑ Up from 71% manual
Cost Savings Breakdown
AI-identified savings by category
Idle resources
$890K
Rightsizing
$690K
Reserved coverage
$530K
Storage tiering
$290K
Monthly run rate
$200K
Annual projection
$2.4M
Next anomaly
Today
Recent Cost Anomalies
AI-detected spend anomalies
EC2 cluster spike — prod-batch
+340% above baseline · Auto-scaling misconfiguration
+$42K/mo
S3 egress anomaly — data-lake
+180% above baseline · Uncompressed exports
+$18K/mo
RDS over-provisioned — staging
db.r5.4xlarge at 8% avg CPU · Downsize candidate
+$9K/mo
NAT gateway waste — dev accounts
34 idle NAT gateways across dev accounts
-$6K/mo
Last scan: 2h ago · Next scheduled: in 4h
Active Initiatives
Cost Optimization AI Portfolio
01 Idle resource detection Done High
02 Rightsizing recommendations Done High
03 Cost anomaly detection & alerts Active High
04 Waste pattern detection Active Med
05 AI cost forecasting Planned Med
Rightsizing Adoption by Team
Recommendation acceptance rate
FinOps
92%
Platform
80%
Data Eng.
70%
Dev Teams
55%
Security
40%
Dev teams below 60% target — AI recommendation UX review planned.
Accelerate
Developer Productivity
AI-assisted workflows evolving toward autonomous execution and reduced engineering toil
Hours Saved
3,200
Engineering hours YTD
↑ 1.6 FTE equivalent
Ticket Resolution
-38%
Avg. resolution time
↑ 4.2h → 2.6h avg
Deploy Failures
-22%
Failed deployments
↑ Risk assessment impact
Change Success
94%
Up from 81%
↑ +13 points
Toil Reduction by Activity
Hours saved per engineer task type
Incident triage
820h
Doc search
610h
Ticket writing
510h
Deploy diagnosis
410h
Runbook execution
350h
Avg saving / engineer
4.2h
Teams benefiting
14
AI Deploy Risk Assessment
Change success rate — before vs after AI
Change success (before)81%
Change success (after AI)94%
Sample Risk Assessment · CHG-2024-1183
Risk Level: Medium — 2 similar changes caused incidents in past 90 days
Recommendation: Deploy during low-traffic window. Enable feature flag. Have rollback ready.
Similar incidents: INC-0812 (DB migration), INC-0934 (cache invalidation)
23 high-risk changes blocked in staging this quarter.
Active Initiatives
Developer Productivity AI Portfolio
01 AI runbook assistant Done High
02 AI change risk assessment Active High
03 Deployment failure analysis Active High
04 AI platform documentation search Active Med
05 Infra troubleshooting assistant Planned Med
Ticket Resolution Time
Average time by category
Access requests
0.8h
Infra questions
1.3h
Deploy issues
1.7h
Config changes
2.0h
Incident follow-up
2.6h
All categories improved by 25–45% since AI assistant rollout.
Reinforce
Security & Compliance Automation
Policy-as-code enabling continuous compliance, security resilience, and automated governance
Security Incidents
-31%
YoY reduction
↑ AI detection impact
Time to Patch
-58%
Critical: 14d → 6d avg
↑ AI prioritisation
Compliance Violations
-44%
Drift detections resolved
↑ Continuous monitoring
Risk Score
42
Down from 74 (low = good)
↑ -32 points improvement
Risk Exposure — Before vs After AI
Organisational risk score trend
Risk score (before AI)74 / High
Risk score (current)42 / Medium
Target25 / Low
Open findings by severity
Critical Unpatched CVEs (CVSS ≥ 9.0) 3
High Misconfigured IAM policies 11
Medium Public S3 buckets / open ports 28
Compliance Drift Detections
AI-flagged violations — last 7 days
Encryption at rest — data-store-07
CIS AWS 2.1.1 · Auto-remediated
Fixed
MFA not enforced — 4 IAM users
SOC2 CC6.1 · Awaiting owner action
Open
CloudTrail disabled — dev account
PCI DSS 10.1 · Ticket raised
Open
VPC flow logs missing — 3 regions
NIST 800-53 AU-12 · Auto-remediated
Fixed
Auto-remediated
67%
Avg fix time
1.8h
Open items
15
Active Initiatives
Security & Compliance AI Portfolio
01 Misconfiguration detection Done High
02 Compliance drift detection Active High
03 AI security log analysis Active High
04 Vulnerability prioritisation Planned Med
05 AI policy explanation assistant Planned Med
Time to Patch — By Severity
Average days before vs after AI prioritisation
Critical (CVSS ≥ 9.0)
Before
14d
After
6d
High (CVSS 7–8.9)
Before
30d
After
15d
AI prioritisation cut patch backlog by 58% — on track for 5d critical target.
Reinforce
Knowledge Management & Operational Intelligence
AI-powered knowledge discovery reducing SME dependency and accelerating operational learning
Onboarding Time
-40%
8 weeks → 5 weeks avg
↑ AI onboarding assistant
SME Dependency
-28%
Fewer SME escalations
↑ Self-serve queries up
Knowledge Reuse
73%
Queries resolved by AI KB
↑ Up from 31% baseline
Repeat Incidents
-35%
Same-cause recurrence
↑ AI post-mortems impact
AI Knowledge Assistant — Usage
Query resolution by source
Runbooks
88%
Incident history
76%
Architecture docs
62%
Policy docs
54%
Onboarding guides
70%
Queries/day
340
Avg response
8s
Satisfaction
4.4/5
AI Post-Incident Analysis — Sample
INC-2024-0931 · Auto-generated in 45s
Post-Incident Summary · AI Generated
What happened: Payment service degraded for 22 min due to DB connection pool exhaustion triggered by a batch job overlap.
Root cause: Batch job RB-cron-042 not throttled. Consumed all 200 pool connections at peak load.
Action items: Add connection limit to batch jobs. Implement pool monitoring alert at 80%. Review cron schedule overlap policy.
Similar past incidents: INC-0714, INC-0823 — same root cause pattern. Prevention runbook created: RB-0054.
Generated in
45s
vs manual
3h
Runbooks created
142
Active Initiatives
Knowledge Management AI Portfolio
01 AI Ops knowledge assistant Active High
02 AI runbook generator Done High
03 AI post-incident analysis generator Active High
04 AI architecture explainer Planned Med
05 AI onboarding assistant Planned Med
SME Dependency Reduction
Escalation volume by domain
Networking
-35%
Kubernetes
-42%
Database
-28%
Security policy
-20%
CI/CD pipelines
-38%
AI knowledge assistant now handles 73% of tier-1 queries without SME involvement.
Data
Data Platforms & AI Governance
Strong data foundations enabling accurate, explainable, and compliant AI operations
Platform Availability
99.7%
Operational data platforms
↑ Up from 98.2%
Data Quality Score
87%
Across governed datasets
↑ Up from 61% baseline
Governed Datasets
1,240
Tagged, searchable, compliant
↑ +340 this quarter
AI Data Readiness
72%
Datasets ready for AI use
→ Target: 90% by Wave 3
Data Platform Health
Availability by Platform
Ops Data Lake
99.7%
Event Streaming
99.4%
ML Feature Store
98.8%
Metrics Pipeline
99.1%
All platforms above 98.5% SLO threshold. Metrics pipeline improved after Nov incident.
AI Data Readiness by Domain
% Datasets Ready for AI Consumption
Incident data
91%
Cost & usage
88%
Security events
76%
Change history
68%
Capacity metrics
52%
Capacity metrics dataset below 60% readiness — tagging and lineage work in progress.
Active Initiatives
01
Operational data catalogue & tagging
Data
Done High
02
Data quality monitoring & alerting
Data
Active High
03
AI data lineage & explainability tracking
Data
Active High
04
Compliance data governance framework
Data
Planned Medium
05
Searchable ops knowledge & data store
Data
Planned Medium
Treat operational data as a strategic asset — searchable, tagged, governed, and AI-ready across all SOAR pillars.
Roadmap
Initiative Roadmap
Wave-based delivery across all capability pillars
Total Initiatives
18
Across all SOAR pillars
Complete
6
Wave 1 delivered
In Progress
7
Wave 2 active
Planned
5
Wave 3 pipeline
Wave 1 — Complete Quick Wins · Q1 2025 Fastest credibility — high value, low friction
Incident AI Summarization
Done
AI condenses 5,000 log lines into a plain-English root cause summary in under 30 seconds.
Sense High value
Readiness
Data
Ready
Process
Ready
People
Ready
Runbook Assistant
Done
AI surfaces and guides engineers through the correct runbook steps during live incidents.
Accelerate High value
Readiness
Data
Ready
Platform
Ready
People
Ready
Cost Anomaly Detection
Done
AI monitors spend patterns and alerts teams to anomalies within hours of emergence.
Optimize High value
Readiness
Data
Ready
Governance
Ready
People
Ready
Wave 2 — Active Decision Enablement · Q2–Q3 2025 Better decisions, reduced risk
Root Cause Assistant
Active
AI correlates alerts, logs and topology to suggest a ranked list of probable root causes.
Sense High value
Readiness
Data
Ready
Platform
In prog
People
In prog
Deploy Risk Analysis
Active
Scores change requests against historical failure patterns before approval is granted.
Accelerate High value
Readiness
Data
Ready
Process
In prog
Governance
Ready
Compliance Drift Detection
Active
Continuous scanning against CIS, SOC2 and PCI controls with auto-remediation for safe fixes.
Reinforce High value
Readiness
Data
Ready
Governance
In prog
Platform
Ready
Wave 3 — Planned Autonomous Operations · Q4 2025 Execution acceleration and self-healing
Predictive Incident Detection
Planned
ML models detect degradation signals 20–60 min before a user-impacting incident occurs.
Sense High value
Readiness
Data
Prep needed
Platform
Prep needed
People
Training
Auto-Remediation Workflows
Planned
AI executes safe remediation steps autonomously — with human approval gates for high-risk actions.
Accelerate High value
Readiness
Process
Prep needed
Governance
Defining
Platform
Prep needed
Intelligent Auto-Scaling
Planned
Predictive scaling driven by AI demand forecasting, reducing over-provisioning and latency spikes.
Optimize Med value
Readiness
Data
In prog
Platform
Prep needed
Process
Prep needed
Value vs Feasibility Matrix
High Value · Low Feasibility
Predictive incidents
Auto-remediation
Plan carefully
High Value · High Feasibility ★
Incident summarisation
Cost anomaly detection
Deploy risk analysis
Compliance drift
Prioritise now
Low Value · Low Feasibility
AI policy explainer
Defer
Low Value · High Feasibility
Architecture explainer
Doc search
Quick add-ons
← Low Feasibility High Feasibility →
Theme Legend
Sense
Optimize
Accelerate
Reinforce
Readiness Key
Ready
In progress
Prep needed
Value
Value Metrics
Three-layer measurement framework: Adoption → Productivity → Business Value
Layer 1
Adoption
Are people using AI? Without adoption there is no value.
Layer 2
Productivity
Is AI making engineers faster and removing toil?
Layer 3
Business Value
What is the measurable organisational impact? Executives fund this layer.
Layer 1 — Adoption Are teams using AI tools?
Monthly Active AI Users
68%
of all engineers
↑ Target: 80% by Q3 2026
Teams Onboarded
9 / 12
teams actively using SOAR tools
↑ 3 teams in onboarding
AI-Assisted Incidents
82%
of P1/P2 incidents use AI triage
↑ Up from 12% 12 months ago
Queries per Engineer
14
avg AI queries / engineer / week
↑ Up from 3 at programme start
Adoption Rate by Team
Monthly active users as % of team headcount
Platform Eng.
88%
Cloud Ops
76%
FinOps
72%
Dev Teams
65%
Security
48%
Knowledge
41%
Feature Adoption Rate
% of users actively using each AI feature
Incident triage
82%
Cost insights
74%
Runbook assist
68%
Knowledge search
61%
Deploy risk
55%
Compliance scan
44%
Layer 2 — Productivity Are engineers doing more with less toil?
Eng. Hours Saved
3,200h
reclaimed from toil YTD
↑ 1.6 FTE equivalent
Ticket Resolution
-38%
avg resolution time
↑ 4.2h → 2.6h average
Automation Rate
34%
of ops tasks AI-assisted
↑ Target: 60% by Wave 3
Toil Reduction
-41%
self-reported toil per sprint
↑ Engineer NPS up +22pts
Layer 3 — Business Value What is the executive-level impact?
Financial
Cost impact
Cloud cost reduction $2.4M
Incidents avoided (est.) $1.1M
Eng. hours saved ($) $640K
Total value delivered $4.14M
Reliability
System stability impact
MTTR improvement -47%
Availability improvement +1.3%
Customer incidents -31%
SLO breach rate -52%
Risk Reduction
Security & compliance
Security incidents -31%
Compliance violations -44%
Velocity
Engineering speed
Deploy frequency +28%
Change success rate 81%→94%
Maturity
AI Maturity Model
Level 0–4 capability progression across all pillars
Program Overall Maturity
2.1
out of 4.0
AI Recommendations — Level 2
AI provides insights and recommendations. Engineers make final decisions. Augmentation before automation.
L0 ManualL1 InsightsL2 Recommend ◎L3 AssistL4 Autonomous
Sense
Incident Management
Level 2
L0L1 ✓L2 ◎L3L4
MTTR improvement -47%
Next target L3 in Wave 3 →
Sense
Observability
Level 2
L0L1 ✓L2 ◎L3L4
Alert noise reduction -62%
Next target L3 in Wave 3 →
Optimize
Cost Optimization
Level 2+
L0L1 ✓L2 ✓L3 →L4
Cost saved $2.4M
Next target L3 near-term →
Accelerate
Dev Productivity
Level 2
L0L1 ✓L2 ◎L3L4
Hours saved 3,200h
Next target L3 in Wave 3 →
Reinforce
Security & Compliance
Level 1
L0L1 ◎L2 →L3L4
Security incidents -31%
Next target L2 active →
Reinforce
Knowledge Management
Level 1
L0L1 ◎L2L3L4
Knowledge reuse 73%
Next target L2 in Wave 3 →
Maturity Level Definitions
Level 0
Manual
Fully human-driven. No AI assistance. High cognitive load on engineers.
Level 1
Insights
AI provides visibility and summarisation. Engineers still decide and act. Done
Level 2
Recommendations
AI suggests root causes and fixes. Human approves every action. Current
Level 3
Assisted Execution
AI executes safe actions with human approval gates. Builds trust. Wave 3
Level 4
Autonomous
Self-healing operations. AI acts independently within defined guardrails. Future
AI augments engineers to improve effectiveness and reduce operational burden. Augmentation before automation — build trust at every level.
SOAR Programme Guiding Principle
Insights
AI Adoption Insights
What drives successful AI adoption — and what to avoid
Common Pitfalls to Avoid
Patterns that stall adoption across organisations
Starting with tools instead of problems to solve
Many pilots launched with no path to scale
Poor data readiness — untagged, ungoverned, unsearchable
No clear ROI model or business value definition
Change management under-invested — communicate intent clearly
Standalone AI tools deployed with low workflow integration
What Is Working
Validated approaches across the SOAR programme
AI as assistant, not a replacement — trust is built gradually
Identify pain first, apply AI, then measure the outcome
Standardise → Integrate → Scale: structured rollout model
Tracking adoption, productivity, and business impact together
Rollout Model — AIOps Approach
Alert Intelligence
AIOps platform for alert correlation, incident clustering, and de-duplication
LLM Assist
Root cause analysis, ticket creation, and workflow routing via language models
AI Investigation
Diagnostic analysis with suggested mitigation actions — human approves