Quantum Cloud Inc.
From IoT Telemetry to Intelligent Service Operations
Your MQTT infrastructure generates millions of data points daily. Your service desk handles the complexity. Now Assist makes both smarter.
Baybora Gülec — Former ServiceNow Lead Architect at Robert Bosch | Stanford
Diagnostic
AI Readiness Assessment — Your Platform Today
Based on your 5-year ITSM Pro deployment. Each capability scored for AI readiness.
High data volume from MQTT broker alerts and infrastructure incidents. Good categorization. Ready for Now Assist Summarization — AI can immediately start summarizing complex multi-system incidents involving sensor networks and gateway failures.
Automated CAB workflows with CI/CD integration. High data quality. AI can assess change risk by analyzing historical patterns — e.g., firmware updates that caused MQTT broker instability in the past.
SLAs defined for key services including uptime guarantees for telecom clients. AI can predict SLA breaches based on current incident velocity — critical for public sector contracts.
Configuration items include MQTT brokers, IoT gateways, edge computing nodes, and cloud infrastructure. 80% coverage is good but ownership gaps in edge devices create blind spots for AI Agent automation.
Root cause analysis for recurring MQTT connection drops and sensor calibration failures is poorly documented. AI pattern detection can't learn from what isn't recorded. This must improve for Phase 2.
Broad catalog covering everything from standard laptop requests to IoT gateway provisioning. But templates vary wildly — some have full automation, others are manual forms. AI Agents need consistent structure to automate fulfillment.
Highly customized request workflows for client onboarding and infrastructure provisioning. Custom UI policies add complexity. Needs standardization before AI Agent Orchestrator can handle multi-step fulfillment.
THIS IS THE BLOCKER. 6,000+ articles with limited metadata, significant duplication, and outdated content. Many articles reference deprecated MQTT v3.1 configurations instead of v5.0. AI Search returns irrelevant results. Knowledge Generation can't build on a broken foundation. MUST fix first.
Piloted in IT support (EN + DE) but barely used. No sentiment tracking. Can't handle IoT-specific queries like "My gateway in Munich is showing high latency" because it lacks the knowledge base and NLU topics. GenAI Virtual Agent will transform this — but needs clean KM data first.
Knowledge Management (20%) is pulling down the entire score. Every AI capability depends on clean, structured knowledge. This is Day 1, Week 1.
Why Self-Service Fails Today — Data Quality Gap
A Quantum Cloud engineer searches for help with an MQTT broker issue...
Based on available knowledge, the recommended resolution for MQTT packet loss is to restart the broker service and verify network connectivity using standard ping tests.
If the issue persists, please escalate to your network team and reference KB0003421 for further troubleshooting steps.
Sources: KB0003421, KB0008234 | Confidence: moderate
After Knowledge Management Cleanup + Now Assist
The same engineer. The same question. But this time the knowledge base is clean.
Your Frankfurt MQTT broker (mqtt-fra-prod-03) is experiencing packet loss due to a known issue with QoS 2 message handling when shared subscriptions exceed 500 concurrent clients.
The broker’s max_inflight_messages is set to the default (20). With 12 telecom client gateways each pushing 40+ MQTT topics, the inflight queue saturates, causing packet drops.
- SSH to
mqtt-fra-prod-03(see CMDB CI003847 for credentials) - Edit
/etc/mosquitto/mosquitto.conf - Set
max_inflight_messages 100 - Set
max_queued_messages 5000 - Restart:
systemctl restart mosquitto - Verify: Check Grafana dashboard “MQTT Broker Health” (link)
The Difference
| Metric | Before (Messy KM) | After (Clean KM + Now Assist) | Impact |
|---|---|---|---|
| Self-service resolution | 0% | 85%+ | +85% |
| Time to information | 25 min (fail) | 3 min (success) | −88% |
| Incident created | Yes (P2) | No (deflected) | −1 ticket |
| Senior engineer needed | Yes (€150/hr) | No | Cost saved |
| Monthly recurring saves | 0 | 50+ hours | +50 hrs/month |
Target: −30% MTTR — Now Assist for ITSM
Incident Summarization, Resolution Notes, and Chat Summary deploy in weeks 1–4. No custom development. These three skills directly attack your Mean-Time-to-Resolution target.
mqtt-fra-01 went down, standby mqtt-fra-02 did not assume leadership. 847 IoT devices lost connection. Telekom SLA breach imminent.
mqtt-fra-01 health check failed. MQTT connections dropping across Cluster A.
mqtt-fra-02 shows 'STANDBY' status but hasn't triggered election. Checking ZooKeeper ensemble.
zk-fra-03 lost quorum at 03:41 UTC. Manual intervention required.
mqtt-fra-02. 612 of 847 devices reconnected. Remaining 235 require client-side reconnect. Telekom notified.
Key Actions Taken: ZooKeeper diagnosis, manual broker promotion, partial device recovery
Outstanding: 235 devices, Telekom SLA review, root cause for ZK quorum loss
Similar Incidents: INC0009823 (Feb 2025, same cluster), INC0011102 (Dec 2024, Munich cluster)
Resolution Steps:
- Identified ZooKeeper split-brain via ensemble health check
- Manually promoted mqtt-fra-02 to primary (04:30 UTC)
- Reconnected 612/847 IoT devices automatically
- Initiated client-side reconnect for remaining 235 devices
- Applied NIC firmware patch to zk-fra-03 (v4.2.1 → v4.2.3)
- Restored ZooKeeper quorum — all 3 nodes healthy
- Telekom reported IoT gateway connectivity loss at 03:48 UTC
- Quantum Cloud confirmed MQTT cluster issue, ETA for fix: 1 hour
- Telekom requested hourly updates for SLA documentation
- Interim workaround discussed: route critical devices through Munich backup cluster
- Telekom accepted workaround for 47 priority-1 gateways
- Follow-up call scheduled for 09:00 UTC with Telekom SLA manager
Target Architecture: Every Now Assist skill follows the same pipeline: context is extracted from the incident record, processed through the Skill Kit, filtered by Guardian (16 safety categories) and DPNA (PII masking), then sent to the LLM. The response is grounded to ServiceNow data only — no internet access, no hallucination from external sources. Each interaction is logged to sys_generative_ai_metric for full auditability.
Key deployment note: Now Assist skills are PLATFORM CAPABILITIES — not custom code. Your ServiceNow Admin enables them in the Now Assist admin console. No development sprints, no custom integrations. This is why Phase 1 deploys in weeks 1–4.
Target: +25% Self-Service Resolution — Virtual Agent GenAI
Your Virtual Agent pilot deflects 35%. The target is 60%+. Same engineer, same problem — see what changes when NLU becomes GenAI.
Target Architecture: The GenAI Virtual Agent replaces rigid NLU intent matching with natural language understanding. Instead of pre-built conversation trees, it dynamically generates responses by combining AI Search (knowledge), CMDB (device-specific context like gateway GW-MUC-042), and Service Catalog (fulfillment actions). When it can't resolve, it hands off to a live agent WITH a full conversation summary — no "can you repeat the issue?" moments.
Key difference from current pilot:
| Before: | NLU topics (manually created), keyword matching, scripted flows |
| After: | GenAI topics (zero-shot), semantic understanding, dynamic resolution |
| CMDB integration: | VA knows WHICH device the user is asking about |
| Proactive: | Can detect related issues across fleet |
AI Agent Orchestrator — Autonomous Multi-Step Resolution
Phase 3 introduces autonomous multi-step workflows. Here's what it looks like when an IoT gateway provisioning request comes in.
Target Architecture — 3 Layers:
Layer 1: AI Control Tower (Governance)
Enterprise command center. Inventories all AI agents, monitors performance, detects drift, enforces policies. The CISO's view into everything AI does.
Layer 2: AI Agent Orchestrator (Coordination)
The meta-agent. Coordinates teams of specialized agents working across tasks, systems, and departments. Evaluates every step against policies in real-time. Powered by Workflow Data Fabric for cross-silo data access.
Layer 3: Specialized Agents (Execution)
Individual agents with defined scopes. ITSM Agent handles incident triage and auto-resolution. Provisioning Agent deploys IoT gateways and manages access. Monitoring Agent correlates MQTT health alerts and triggers proactive remediation.
Human-in-the-Loop: Configurable per risk level. Low-risk actions (password reset) = fully autonomous. Medium-risk (gateway provisioning) = human approval gate. High-risk (production changes) = human-only with AI analysis support.
24-Week Transformation — With Guardrails
Each phase has explicit go/no-go criteria. We don't advance until the foundation is proven.
Deliverables
- Knowledge Base audit and deduplication
- CMDB health assessment and remediation
- AI Readiness Score baseline measurement
- Now Assist pilot: Incident Summary, Resolution Notes, Chat Summary
- Guardrail configuration (PII filters, topic blocks)
Success Metrics
- AI Readiness Score ≥ 70%
- KB duplicate articles reduced by ≥ 50%
- 70% agent satisfaction with Now Assist pilot
- Governance Council formed and operational
Key Dependencies
- ServiceNow Vancouver/Washington instance
- Now Assist SKU licensing activated
- Executive sponsor commitment
- KB subject matter experts identified
Risks to Watch
- KB cleanup takes longer than estimated
- CMDB data quality lower than assumed
- Agent resistance to AI-generated summaries
Deliverables
- Full Now Assist ITSM rollout (all agent teams)
- Knowledge Generation from resolved incidents
- AI Search enhancement for Employee Center
- Virtual Agent GenAI topic blocks (top 10 intents)
- Weekly governance reviews established
Success Metrics
- MTTR reduction ≥ 15%
- VA deflection rate ≥ 40%
- AI hallucination rate < 5%
- Phase 1 adoption ≥ 70% of agents
Key Dependencies
- Gate 1 criteria all passed
- Agent training completed
- KB quality thresholds maintained
- Feedback loop from pilot operational
Risks to Watch
- Agent adoption plateau after initial enthusiasm
- Knowledge Generation creates low-quality KB articles
- VA GenAI topic scope creep
Deliverables
- Full VA GenAI rollout (English + German)
- Employee Center AI integration
- KB Generation flywheel operational
- EU AI Act compliance documentation
- Multilingual AI Search optimization
Success Metrics
- Self-service resolution rate +20%
- CISO sign-off on agentic autonomy levels
- Zero critical PII violations
- Employee satisfaction (CSAT) ≥ 4.0/5.0
Key Dependencies
- Gate 2 criteria all passed
- German language model quality validated
- Employee Center portal deployed
- Legal review of EU AI Act requirements
Risks to Watch
- German language GenAI quality gaps
- EU AI Act requirements shifting mid-phase
- User trust erosion from any PII incident
Deliverables
- AI Agent Orchestrator deployment
- Custom agent skills (password reset, access provisioning)
- Autonomous workflow execution (tiered autonomy)
- Monthly governance reviews with drift monitoring
- Production agentic AI with full audit trail
Success Metrics
- Autonomous resolution rate ≥ 30%
- Zero unauthorized autonomous actions
- Cost per ticket reduction ≥ 25%
- Full EU AI Act compliance maintained
Key Dependencies
- Gate 3 criteria all passed
- CISO sign-off on autonomy level matrix
- Integration APIs for target systems
- Rollback procedures tested and documented
Risks to Watch
- Autonomous actions with unintended consequences
- Model drift degrading decision quality
- Scope expansion beyond approved autonomy levels
"At Bosch, we used the same gated approach across 8 divisions. Phase 1 gate review saved us from rolling out AI Search before KB cleanup was complete — which would have eroded agent trust in the entire program. The gates aren't bureaucracy — they're insurance."
Concern #2 — Adoption
Concern #2: Adoption — How We Drive Usage & Track ROI
Technology is half the battle. Quantum Cloud's 600+ service desk agents need to trust, use, and champion these tools. Here's the proven playbook.
Champions (Week 1-2)
- 5-10 power users per team, early access to Now Assist
- They become the internal advocates who show peers the value
- Selection criteria: tech-curious, respected by peers, cross-shift coverage
Early Adopters (Week 3-4)
- Expand to full Infrastructure Ops and NOC teams
- Guided onboarding: 30-min workshop + cheat sheet
- Champions support their peers — not IT training department
Majority (Week 5-8)
- All L1 and L2 agents across EU regions
- Now Assist enabled in Service Operations Workspace by default
- Feedback loop: in-app thumbs up/down on every AI suggestion
Full Organization (Week 9-12)
- All fulfillers + end users via Virtual Agent GenAI
- Self-service AI Search available across Employee Center
- Adoption dashboard visible to management
L1 Agents
Incident Summarization + Resolution Notes + Chat Summary
30-min guided session + reference card
"AI writes the boring parts so you can focus on solving."
L2/L3 Engineers
AI Search + Knowledge Generation + Agentic Workflows
1-hour deep dive + sandbox environment
"AI is your research assistant, not your replacement."
Service Desk Managers
Now Assist Analytics + Adoption Dashboard
45-min dashboard walkthrough
"See exactly how AI impacts your team's KPIs."
End Users
Virtual Agent GenAI + AI Search
No training needed — GenAI understands natural language
"Just ask your question in normal language."
| Concern | Response |
|---|---|
| "AI will replace my job" | AI handles documentation so you can focus on complex problem-solving. No headcount reduction — reallocation to higher-value work. |
| "I don't trust AI answers" | Every AI response is grounded to your KB data, with citations. You review and approve — AI never acts alone on high-risk tasks. |
| "This is another tool that will be abandoned" | We measure adoption weekly. If Wave 1 champions don't see value, we adjust before expanding. Phase gates prevent 'deploy and hope.' |
| "Works council / Betriebsrat concerns" EU | AI assists, doesn't monitor. No individual performance tracking via AI. Governance council includes employee representative. |
Real-World Proof Point
At Bosch (2023-2025): When I led the ServiceNow rollout for Central Services across 8 divisions, initial adoption of the Automation Hub was 22% in week 2. We implemented a champions program (3 per division), ran weekly 'AI Success Stories' in the internal newsletter, and introduced gamified leaderboards. By week 8, adoption reached 78%. The key lesson: agents adopt when their peers show them the value, not when IT tells them to use it.
Concern #3 — Sustainability
Concern #3: Sustainability — No Technical Debt
Prompt Review
Monthly for 90 days, quarterly after. Quality scored via user feedback.
Upgrade Resilience
Zero customization policy. ATF regression suite. Pre-upgrade smoke tests.
KB Hygiene
Health score, quarterly audits, auto-retirement, AI-generated article review loop.
No Tech Debt
No custom scripts. Flow Designer only. Config register in CMDB. No shadow AI.
Team Ramp
M1-3: partner leads. M4-6: internal leads. M7+: fully self-sufficient.
Risk Management
Risk Register — 7 Risks, All Mitigated
| # | Risk | Likelihood | Impact | Mitigation | Owner |
|---|---|---|---|---|---|
| R1 | PII leakage via AI response | Medium | Critical | DPNA masking, Guardian content filtering, human review for external-facing | CISO |
| R2 | EU AI Act non-compliance for IoT use cases | Medium | Critical | AI system register, conformity assessment for Annex III, governance council | CISO + Legal |
| R3 | Low agent adoption (shelfware risk) | High | High | Champions program, phased rollout, weekly adoption metrics, phase gates | Platform Owner |
| R4 | KB quality too low for AI Search accuracy | High | High | Foundation phase cleanup, KB health scoring, quarterly audits | ServiceNow Admin |
| R5 | ServiceNow upgrade breaks AI workflows | Medium | Medium | ATF regression suite, zero-customization policy, pre-upgrade smoke tests | ServiceNow Admin |
| R6 | Agentic workflow exceeds risk appetite | Low | High | Human-in-the-loop gates, per-action policy evaluation, audit logging | AI Governance Lead |
| R7 | Model drift degrades AI output quality | Medium | Medium | Monthly prompt review, user feedback tracking, quality scoring | Platform Owner |
Concern #1: Governance — 100% Logged, 0 PII Violations
Quantum Cloud serves EU public sector and telecom. Both are Annex III categories. EU AI Act compliance isn't optional — it's a procurement prerequisite.
16 safety categories. Detects offensive content, prompt injection attacks, and sensitive topics. Runs in parallel with LLM calls — minimal latency impact. Supports 9 languages including German.
Blocks harmful outputs before they reach usersPII masked in real-time BEFORE data reaches the LLM. Names, emails, device IDs, IP addresses — all replaced with tokens. Original values restored only for internal display.
Zero raw data exposure to AI modelsEvery AI interaction logged to sys_generative_ai_metric: input, output, model version, timestamp, user, confidence score. Exportable. Queryable. Auditor-friendly.
100% traceability — Art. 12 compliantConfigurable per risk level. Low risk: autonomous. Medium risk: approval gate. High risk: human-only. AI escalates automatically when confidence drops below threshold.
Art. 14 Human Oversight — built inAI responses grounded to ServiceNow data only. No internet access. Citations link to specific KB articles and CMDB records. Confidence scoring on every response.
Grounded. Cited. Verifiable.Enterprise governance dashboard: inventory all AI agents, monitor performance, detect drift, track ROI. One place to see everything AI does across your platform.
Full visibility — CISO's command centerEU AI Act — What's At Stake
Art. 4: All AI users must be trained. Already in force.
Art. 5: 8 categories of unacceptable AI banned.
Art. 9-15: Conformity assessment for high-risk AI systems.
Up to €15M or 3% of global turnover for non-compliance.
- Telecom clients → Annex III Category 2 (critical digital infrastructure)
- Public sector clients → Annex III Category 5 (essential public services)
- AI Agents taking autonomous actions on infrastructure → potentially high-risk
- Double regulatory exposure: EU AI Act + NIS2 (telecom = essential entity)
ServiceNow's platform-level controls (Guardian, DPNA, Audit Trail, Human-in-the-Loop) cover the majority of Art. 9-15 requirements. But Quantum Cloud must also: document risk assessments, maintain technical documentation, and establish the AI Governance Council.
AI Governance Council — Operational in 30 Days
Council
Your Success Criteria — Mapped to ServiceNow Metrics
Every metric maps to Quantum Cloud's stated success criteria. All measured natively in ServiceNow Performance Analytics.
Through AI Search + clean KM + GenAI Virtual Agent
Measured via in-app thumbs up/down on every AI response
From 25 min (failed search) to 3 min (AI-resolved)
Incident Summarization + Resolution Notes + Knowledge reuse
AI-generated articles from resolved MQTT incidents
Now Assist augments every fulfiller interaction
sys_generative_ai_metric — every AI action auditable
DPNA masking + Guardian + zero persistence architecture
AI Control Tower continuous monitoring
Four Actions, Starting This Week
CISO, Platform Owner, business stakeholders. First meeting within 7 days.
Audit 6,000 articles. Identify duplicates, outdated MQTT v3.1 content. Define cleanup plan.
Incident Summarization, Resolution Notes, Chat Summarization. Zero custom dev required.
Track KPIs. Review AI interactions. Prepare for Aug 2026 EU AI Act enforcement.
"Let's make Quantum Cloud the reference customer for GenAI-powered ITSM in the EU."
Contact Baybora Gülec