Chapter 25: Agentic Development Methodology¶

Development model, parallel workflow, phase timelines, sensitivity analysis, recommendations

Note: This chapter describes development methodology and project planning guidance. It is not kernel specification — the kernel's behavior is defined in Chapters 1-24.

This chapter describes development methodology, not kernel specification. UmkaOS is developed using agentic programming — AI agents perform both design and implementation from the architecture docs. Phase timelines, hardware bottlenecks, and sensitivity analysis are project planning guidance, not kernel behavior definitions.

25.1 Understanding the Bottleneck¶

25.1.1 What AI Agents Are Fast At¶

At 50 t/s, an AI agent can: - Read 30K lines of architecture docs: ~5-10 minutes (vs human: 3-5 hours) - Write 500 lines of Rust code: ~5-10 minutes (vs human: 2-4 hours) - Understand complex context: ~2-5 minutes (vs human: 20-60 minutes) - Generate test cases: ~2-5 minutes (vs human: 30-60 minutes)

AI speedup for pure cognitive work: 10-30x

25.1.2 What AI Agents Are NOT Fast At¶

The real bottlenecks in agentic development:

Compilation time (hardware-bound):
Full cargo build --release for UmkaOS kernel: ~15-25 minutes (300K SLOC Rust with heavy monomorphization via LLVM)
Incremental rebuild: ~30 seconds to 2 minutes
AI can't speed this up — it's CPU/disk I/O
Test execution time (hardware-bound):
QEMU boot + run tests: ~2-5 minutes per test suite
Real hardware boot: ~1-2 minutes
Integration tests (network, distributed): ~5-15 minutes
AI can't speed this up — it's waiting for hardware
Iteration cycles (required for bugs):
Average bug requires 3-5 test-fix-test cycles (even for AI)
Each cycle: code (2 min) + compile (5 min) + test (3 min) = 10 minutes
AI can reduce iteration count slightly (fewer logic bugs) but not eliminate it
Real hardware testing (physics-bound):
Testing WiFi driver on 10 different chipsets: ~1-2 days per chipset (firmware loading, WPA3, roaming, power save, monitor mode)
Suspend/resume testing: ~4 hours per laptop (1000 cycles, 10-15 seconds per cycle + failure analysis)
Battery life validation: ~10-15 hours per test run (actually drain the battery)
AI can't speed this up — you must wait for physical hardware
Unknown unknowns (spec bugs):
The architecture had 89 documented flaws from initial reviews; ~50 remain after three rounds of architecture review and targeted fixes (individual findings are resolved in-place across the architecture documents as they are identified)
Implementation will find more (estimated 200-300 additional issues)
Each requires: discovery → spec fix → re-implementation → re-test
AI speeds up the fix but not the discovery

25.2 Development Model: Parallel Agentic Workflow¶

25.2.1 Agent Parallelization¶

Key advantage: Unlike humans (1-10 developers), you can run 100+ AI agents in parallel with proper coordination.

Parallelization strategy:

Phase 1.1: Core kernel (Roadmap Phase 1: Foundations)
  - Agent 1: Boot code (x86_64)
  - Agent 2: Boot code (aarch64)
  - Agent 3: Boot code (riscv64)
  - Agent 4: Memory management
  - Agent 5: Scheduler
  - Agent 6: Capabilities
  - Agent 7: IPC
  - Agent 8: RCU
  ... (20 agents in parallel)

Phase 2.1: Essential drivers (Roadmap Phase 2: Self-Hosting Shell)
  - Agent 1: NVMe driver
  - Agent 2: Intel NIC driver
  - Agent 3: Realtek NIC driver
  - Agent 4: USB core
  - Agent 5: WiFi (Intel)
  - Agent 6: WiFi (Realtek)
  ... (50 agents in parallel)

Bottleneck: Integration conflicts, shared infrastructure dependencies.

Agent coordination protocol: Agent coordination uses git branches (one feature branch per agent task), task files (.claude/ skills and plans), and project-level CLAUDE.md for shared instructions — not a formal inter-agent protocol. File-level locking (one agent per file) prevents merge conflicts. CI merge validation (all 8 architectures must pass QEMU boot + unit tests) gates every merge. Agent code review protocol: each agent's changes are reviewed by a separate agent instance before merge.

Realistic parallelism: Theoretical parallelism of 100+ agents per the task graph; practical sustained throughput limited to ~10-20 concurrent agents by coordination overhead, context switching, and merge conflict resolution.

25.2.2 Coordination Overhead¶

With N agents working in parallel: - Code review: Each agent's code must be reviewed by another agent - Integration: Merging N parallel branches requires conflict resolution - Testing: Integrated system must be tested after each merge - Synchronization: Agents must wait for shared infrastructure (memory allocator before scheduler, etc.)

Estimated coordination overhead: ~20-30% of total time with 10-20 agents (estimate based on independent subsystem boundaries; may increase to ~40-50% for tightly-coupled subsystems with shared infrastructure dependencies).

25.3 Phase-by-Phase Timeline (Agentic)¶

Note: Phase numbering uses the Chapter 24 roadmap as the primary reference. Sub-phases (e.g., Phase 2.1) correspond to agentic workflow steps within each roadmap phase. See Section 24.2 for the top-level five-phase structure.

25.3.1 Phase 1.1: Core Kernel (all 8 architectures, minimal functionality)¶

Scope: Boot, memory, scheduler, capabilities, syscall interface, ELF loader — sufficient to boot a statically linked hello-world ELF binary on all 8 architectures.

Exit criteria: make test passes on all 8 architectures (hello-world binary boots and prints to serial in QEMU).

Status update (2026-03-20): The architecture spec has been through 3 major review cycles (~900 findings processed, ~400 spec fixes applied). Spec coverage for Phase 1 subsystems is ~95% — struct definitions, pseudocode, per-arch tables, memory ordering annotations, error paths, and lock hierarchies are all explicit. Boot scaffolding (entry assembly, serial drivers, linker scripts) exists for all 8 architectures. This fundamentally changes the agentic development model: the implementation task is spec translation, not design.

25.3.1.1 What Already Exists¶

Boot entry assembly for all 8 architectures (Multiboot1/2, DTB, SBI, IPL, SLOF)
Serial drivers for all 8 architectures (COM1, PL011, 16550, SCLP, NS16550)
Linker scripts for all 8 architectures
umka-core skeleton with cap, ipc, phys modules (~850 LOC, most needs rewrite)
umka-driver-sdk types and ring buffer definitions
Build system (make build/test/run for all architectures)

25.3.1.2 Spec Readiness by Subsystem¶

Stream	Subsystems	Spec LOC	Spec Detail	Implementation LOC
A: Memory	Boot alloc → buddy → slab → heap	~13K	Pseudocode for alloc/free/coalesce, zone model, PCP magazines, GfpFlags, watermarks	~4K
B: Concurrency	Locks → CpuLocal → RCU → IRQ → workqueue	~12K	Per-arch register tables, lock state machines, ordering annotations, BoundedMpmcRing	~4K
C: Scheduling	EEVDF → timekeeping	~10K	Augmented RB-tree walk, weight table, vDSO seqlock, per-arch clock sources	~3.5K
D: Security	Capabilities → isolation domains	~7K	XArray-based CapSpace, delegation protocol, per-arch isolation (MPK/POE/DACR)	~2K
E: Boot/HW	ACPI/DTB → features → SMP → RNG → clocks	~10K	Per-arch init sequences, fan-out tree, feature flag tables	~3.5K
F: KABI/Syscall	KABI compiler → dispatch → ELF loader	~6K	IDL grammar, dispatch table, ELF PT_LOAD walk, user stack layout	~2.3K
G: Tests	Host unit tests → in-kernel harness	~2K	Test patterns, KTest macro, assertion protocol	~2.5K
Z: Core0/Core1	Image split → boot rewrite	~5K	Linker sections, vtable population, canonical boot phase table	~1.1K

Total: ~65K lines of reviewed spec → ~23K lines of production Rust. Spec-to-code ratio: ~3:1 (expected when spec includes pseudocode).

25.3.1.3 Agent Batching (8 Streams, 7 Sequential Layers)¶

Phase 1 has 31 subsystems organized into 8 parallel streams (A-G, Z). The critical path is the memory stream: boot_alloc → vmemmap → buddy → slab → heap. All other streams can begin in parallel once their dependencies from the memory stream are met.

Effective parallelism: 7-8 concurrent agents (limited by dependency graph, not agent availability). The KABI compiler (F1) has zero kernel dependencies and can run from the first batch.

Layer 0 (parallel, no deps):
  F1: KABI compiler (host tool)        — independent
  Z2: Boot sequence rewrite            — independent (restructure main.rs)
  E4: CPU feature detection            — independent (reads CPUID/ID regs)
  E5: Hardware RNG                     — independent (reads RDRAND/RNDR)

Layer 1 (after Layer 0):
  A1: Boot allocator                   — needs Z2 (boot flow)
  B1: Locking primitives               — needs E4 (arch features)
  E1: ACPI parsing (x86)               — needs A1 (memory for tables)
  E2: DTB parsing (non-x86)            — needs A1

Layer 2 (after Layer 1):
  A2: Page descriptor + vmemmap        — needs A1
  B2: CpuLocal + PerCpu                — needs B1
  E3: Clock framework                  — needs E2/E1

Layer 3 (after Layer 2):
  A3: Buddy allocator                  — needs A1, A2
  B3: RCU                              — needs B1, B2
  B4: IRQ domain hierarchy             — needs B1, E4
  E6: SMP bringup                      — needs B2, E4

Layer 4 (after Layer 3):
  A4: Slab allocator                   — needs A3, B2
  B5: Workqueues                       — needs B1, B3
  C2: Timekeeping                      — needs B4, E3

Layer 5 (after Layer 4):
  A5: Heap allocator bridge            — needs A4
  C1: EEVDF scheduler                  — needs A4, B2, B5, C2
  D1: Capability system                — needs A4, B1
  D2: Isolation domain infra           — needs E4, B1

Layer 6 (convergence):
  F2: Syscall dispatch (3 syscalls)    — needs C1, D1
  F3: ELF loader (static only)        — needs A5, C1
  Z1: Core0/Core1 boundary             — needs A1
  G1: Host unit tests                  — incremental throughout
  G2: In-kernel test harness           — needs F2

Critical path: A1 → A2 → A3 → A4 → C1 → F2 → F3 (7 layers). Each layer's wall-clock time is dominated by the largest subsystem in that layer.

25.3.1.4 Key Differences From Prior Estimate¶

Phase 1.1 and 1.2 are merged. The prior estimate split Phase 1 into x86-only (1.1) then multi-arch port (1.2). This was based on the assumption that multi-arch code would need separate design work. With the spec now containing per-arch tables for every subsystem (register mappings, init sequences, feature flags, barrier instructions), multi-arch support is a compile-time configuration, not a design phase. All 8 architectures are implemented from the start.
Integration testing is NOT the bottleneck. The spec review cycles eliminated most design ambiguities. Integration bugs will be spec-implementation mismatches (caught by unit tests) rather than design-level incompatibilities (which required redesign in the old model).
IPC is not in Phase 1. The original estimate included IPC as a Phase 1 subsystem. Per the current roadmap (Section 24.2), full IPC is Phase 2. Phase 1 needs only the syscall dispatch path (write/execve/exit_group) — no inter-process communication.
Boot code already exists. Entry assembly, serial drivers, and linker scripts for all 8 architectures are implemented and tested. The boot sequence rewrite (Z2) restructures the existing code to follow the canonical phase table, not writes it from scratch.

25.3.1.5 Risk Assessment¶

High risk (most likely to need iteration): - EEVDF augmented RB-tree (C1) — algorithm complexity, subtle invariants - BoundedMpmcRing (B5) — lock-free CAS with per-arch barriers - Buddy allocator zone model (A3) — largest Phase 1 subsystem - Context switch (within C1) — 8 different architectures, register conventions

Medium risk: - Capability system (D1) — security-critical, XArray integration - SMP bringup (E6) — per-arch IPI mechanisms, timing-sensitive

Low risk (mechanical translation): - Boot allocator (A1) — bump allocator, minimal state - Page descriptor (A2) — pure data structure - Syscall dispatch (F2) — 3 syscalls, trivial table - ELF loader (F3) — static binaries only, PT_LOAD walk - KABI compiler (F1) — host tool, no kernel dependency

25.3.2 Phase 2.1: Essential Drivers (NVMe, NIC, USB, I/O)¶

Scope: NVMe, Intel NIC, USB core, serial, framebuffer

Human estimate: 6-9 months
Agent estimate:

Driver	Agent Work	Hardware Testing	Debug Cycles	Real Time
NVMe	8 hours	10 hours	8x	7 days
Intel e1000e NIC	6 hours	8 hours	6x	5 days
USB core	12 hours	15 hours	10x	10 days
USB HID	4 hours	5 hours	4x	3 days
Framebuffer (full per §21.4 DRM/KMS)	3 hours	4 hours	3x	2 days
Serial (all arches)	2 hours	3 hours	2x	1 day

With 6 agents in parallel: - Wall clock time: ~10 days (2 weeks) - Real hardware testing is the bottleneck (need NVMe drives, NICs, USB devices)

25.3.3 Phase 2.2: Linux Compatibility Layer¶

Scope: 330 syscalls, eBPF verifier (ext4 is in Phase 3.1, not here — Phase 2 uses tmpfs/initramfs)

Human estimate: 9-12 months (eBPF verifier alone is 6+ months)
Agent estimate:

Component	Agent Work	Testing	Iterations	Real Time
Syscall dispatch	4 hours	6 hours	4x	3 days
File I/O syscalls (50)	15 hours	20 hours	10x	12 days
Process/thread (40)	12 hours	18 hours	8x	10 days
Memory syscalls (30)	10 hours	15 hours	8x	8 days
Network syscalls (50)	15 hours	20 hours	10x	12 days
Misc syscalls (160)	30 hours	40 hours	15x	20 days
eBPF verifier	40 hours	60 hours	30x	TBD (see caveat below)
(ext4 moved to Phase 3.1 — implemented once, completely)	—	—	—	—

Scope caveat (eBPF verifier): The eBPF verifier is implemented as a complete subsystem per Section 19.2: all program types (socket filter, XDP, tc, kprobe, tracepoint, cgroup, LSM, struct_ops), all map types, full abstract interpretation with the complete RegState/RegType type system. This is not a "socket filters only" partial implementation — the verifier is one subsystem and ships complete. Linux's verifier.c is ~23K SLOC (v6.12) with a decade of security hardening and dozens of CVE-driven fixes. Reaching equivalent security coverage will require sustained fuzzing campaigns (see Section 24.3). The eBPF verifier is one of the most complex single subsystems (comparable in complexity to a compiler backend) and its real-time estimate is not meaningfully reducible to a day count.

With 10 agents in parallel (syscall groups can be independent): - Wall clock time: dominated by eBPF verifier (the most complex single subsystem — comparable in complexity to a compiler backend) - Bottleneck: eBPF verifier complexity (Linux's verifier.c is ~23K SLOC with a decade of security hardening; even AI needs extensive iteration and security fuzzing)

25.3.4 Phase 2.3: Networking Stack¶

Scope: TCP/IP, UDP, routing, netfilter, WiFi subsystem

Human estimate: 6-9 months
Agent estimate:

Component	Agent Work	Testing	Iterations	Real Time
Ethernet layer	6 hours	10 hours	6x	5 days
IPv4/IPv6 stack	15 hours	25 hours	12x	15 days
TCP	20 hours	35 hours	15x	20 days
UDP	8 hours	12 hours	6x	6 days
Routing	10 hours	15 hours	8x	8 days
Netfilter/firewall	12 hours	18 hours	10x	10 days
WiFi subsystem	15 hours	25 hours	12x	15 days

Scope caveat (TCP): TCP is implemented as a complete, conformant stack per Section 16.1: full state machine, SACK, congestion control (Reno + CUBIC — each a complete pluggable module per §16.4), and ECN. MPTCP multi-path, BBRv2, and TCP-AO are separate subsystems added in Phase 3.2+ — they do not extend the core TCP implementation, they plug into it. The core TCP stack is complete at Phase 2.3 exit.

With 7 agents in parallel: - Wall clock time: ~20 days (3 weeks) (core TCP complete; MPTCP/BBRv2 are separate Phase 3.2 subsystems) - Bottleneck: TCP complexity, WiFi driver integration

25.3.5 Phase 3.1: Storage Stack (VFS, filesystems, DM/MD)¶

Scope: VFS layer, ext4, XFS, Btrfs core, device mapper

Human estimate: 6-9 months
Agent estimate:

Component	Agent Work	Testing	Iterations	Real Time
VFS layer	20 hours	30 hours	15x	20 days
Page cache	12 hours	18 hours	10x	10 days
ext4 (full)	30 hours	45 hours	20x	TBD (see caveat below)
XFS	25 hours	40 hours	18x	25 days
Btrfs core (COW, subvols, snaps, checksums)	35 hours	50 hours	22x	TBD (see caveat below)
Device mapper (DM)	15 hours	25 hours	12x	15 days
MD RAID	12 hours	20 hours	10x	12 days

Scope caveat (filesystems): "ext4 (full)" means feature-complete for the on-disk format (extents, journaling, inline data, encryption hooks) but not bug-for-bug compatibility with Linux's fs/ext4/ (~50-70K SLOC including jbd2). "Btrfs core" means COW B-tree, subvolumes, snapshots, and checksums — a complete subsystem per Section 15.8. Btrfs RAID5/6, send/receive, and deduplication are separate subsystems added in Phase 4+ — they plug into the Btrfs core but do not modify it. These estimates assume the filesystem trait API is stable; if significant VFS redesign is needed, add 50-100% contingency.

With 7 agents in parallel: - Wall clock time: ~35 days (5 weeks) (all filesystems complete per spec; Btrfs RAID/dedup separate Phase 4) - Bottleneck: Filesystem complexity (ext4, Btrfs are massive)

25.3.6 Phase 3.2: Advanced Features (Distributed, Observability, Power)¶

Scope: DSM, DLM, FMA, power budgeting, live evolution

Human estimate: 9-12 months
Agent estimate:

Component	Agent Work	Testing	Iterations	Real Time
DSM (distributed shared memory)	40 hours	60 hours	25x	40 days
DLM (distributed lock manager)	35 hours	50 hours	20x	35 days
RDMA integration	20 hours	30 hours	15x	20 days
Cluster membership	15 hours	25 hours	12x	15 days
FMA (telemetry)	12 hours	18 hours	10x	10 days
Power budgeting	18 hours	28 hours	14x	18 days
Live kernel evolution	25 hours	40 hours	18x	25 days
Observability (umkafs)	15 hours	22 hours	12x	12 days

With 8 agents in parallel: - Wall clock time: ~40 days (6 weeks) - Bottleneck: Distributed systems testing (need cluster hardware)

25.3.7 Phase 4.1: Consumer Hardware (WiFi, Bluetooth, Audio, Graphics)¶

Scope: WiFi drivers (5 chipsets), Bluetooth, audio, touchpad, suspend/resume

Human estimate: 12-18 months (hardware compatibility is painful)
Agent estimate:

Component	Agent Work	Hardware Testing	Iterations	Real Time
WiFi driver (Intel)	12 hours	20 hours	12x	15 days
WiFi driver (Realtek)	12 hours	20 hours	12x	15 days
WiFi driver (Qualcomm)	12 hours	20 hours	12x	15 days
Bluetooth stack	15 hours	25 hours	15x	20 days
Audio (Intel HDA)	10 hours	15 hours	10x	10 days
Touchpad (I2C-HID)	8 hours	12 hours	8x	8 days
Graphics (i915 modesetting + display)	20 hours	30 hours	18x	25 days
S3 suspend/resume	15 hours	40 hours	20x	30 days
Power management UX	10 hours	15 hours	10x	10 days

Scope caveat (i915): "i915 modesetting + display" is a complete subsystem: modesetting, framebuffer, and display output for Gen 9+ (Skylake and later) per Section 21.5. GPU compute (OpenCL/Vulkan) is a separate subsystem requiring the accelerator framework from Section 22.1 — added in Phase 5+, not an extension of the display driver.

With 9 agents in parallel: - Wall clock time: ~30 days (4 weeks) (modesetting only; see scope caveat) - Bottleneck: Suspend/resume testing (need real laptops, slow iteration)

25.3.8 Phase 5.1: Windows Emulation Acceleration (WEA)¶

Scope: NT object manager, IOCP, memory management, SEH

Human estimate: 12-15 months
Agent estimate:

Component	Agent Work	Testing	Iterations	Real Time
NT object manager	15 hours	25 hours	15x	20 days
Synchronization (wait)	12 hours	20 hours	12x	15 days
IOCP	18 hours	30 hours	18x	25 days
Memory (VirtualAlloc)	10 hours	18 hours	10x	12 days
Thread model (TEB, APC)	12 hours	20 hours	12x	15 days
Security tokens	8 hours	12 hours	8x	8 days
SEH support	15 hours	25 hours	15x	20 days
WINE integration	10 hours	30 hours	15x	20 days

With 8 agents in parallel: - Wall clock time: ~25 days (3.5 weeks) - Bottleneck: WINE testing (need many games, slow iteration)

25.4 Total Timeline (Sequential Phases)¶

Note: Superseded by realistic-full-timeline.md. Phase 1.1 and 1.2 have been merged per implementation-phases.md.

If phases are done sequentially (each phase depends on previous):

Phase	Human Estimate	Agentic Estimate (10-20 agents)
Phase 1: Core kernel + multi-arch	2-3 months	3-4 weeks
Phase 2.1: Essential drivers	6-9 months	2 weeks
Phase 2.2: Linux compat	9-12 months	8-12 weeks
Phase 2.3: Networking	6-9 months	3 weeks
Phase 3.1: Storage	6-9 months	5 weeks
Phase 3.2: Advanced features	9-12 months	6 weeks
Phase 4.1: Consumer hardware	12-18 months	4 weeks
Phase 5.1: WEA	12-15 months	3.5 weeks
TOTAL (sequential)	5-7 years	~36-42 weeks (~9-10 months)

But many phases can overlap!

25.5 Total Timeline (Optimized Parallelism)¶

Key insight: After Phase 1.1 (core kernel), many subsystems are independent: - Drivers (Phase 2.1) can start immediately after Phase 1.1 - Networking (Phase 2.3) can start after essential drivers - Storage (Phase 3.1) can start after essential drivers - Advanced features (Phase 3.2) can start after Phase 2.2 (syscall layer) - Consumer hardware (Phase 4.1) can start after Phase 2.1 (USB core) - WEA (Phase 5.1) can start after Phase 2.2 (syscall layer)

Critical path (longest dependency chain): 1. Phase 1.1: Core kernel + multi-arch (5 weeks) 2. Phase 2.2: Linux compat (8–12 weeks) — depends on Phase 1.1; dominated by eBPF verifier 3. Phase 3.2: Advanced features (6 weeks) — depends on Phase 2.2 Critical path total: 19–23 weeks (best case assumes eBPF verifier at lower bound)

Parallel work (can happen alongside critical path): - Phase 2.1 (drivers) starts at week 3, finishes week 5 - Phase 2.3 (networking) starts at week 5, finishes week 8 - Phase 3.1 (storage) starts at week 5, finishes week 10 - Phase 4.1 (consumer) starts at week 5, finishes week 9 - Phase 5.1 (WEA) starts at week 8, finishes week 11.5

Optimized timeline with smart parallelization:

Week 0-5:   Phase 1.1 (Core kernel + multi-arch) [critical path]
            Phase 2.1 (Drivers) [parallel, starts week 3]
Week 5-17:  Phase 2.2 (Linux compat) [critical path, 8-12 weeks]
            Phase 2.3 (Networking) [parallel, weeks 5-8]
            Phase 3.1 (Storage) [parallel, weeks 5-10]
            Phase 4.1 (Consumer) [parallel, weeks 5-9]
Week 17-23: Phase 3.2 (Advanced) [critical path]
            Phase 5.1 (WEA) [parallel, weeks 8-11.5]
Week 23-27: Integration, testing, bug fixes

Total optimized timeline: ~27 weeks (~7 months) (best case; eBPF verifier complexity may extend this)

25.6 What About Spec Bugs?¶

~50 remaining documented flaws (down from 89 after three review rounds) + estimated 200-300 more undiscovered = ~250-350 spec bugs.

Per-bug handling: 1. Discovery during implementation: ~10-30 minutes (test fails, agent analyzes) 2. Spec fix: ~30-60 minutes (human architect or agent) 3. Re-implementation: ~30-120 minutes (agent rewrites affected code) 4. Re-testing: ~10-30 minutes (compile + test) Average: ~2-4 hours per bug

300 bugs × 3 hours average = 900 hours = ~37 days with 1 agent

But: Many bugs can be fixed in parallel (different subsystems). - With 10 agents handling bugs in parallel: ~4 days - Spread across 5 months: absorbed into iteration cycles

Impact on timeline: Spec bugs already accounted for in the "iterations" column above. The iteration counts (3-25x) include discovering and fixing spec bugs.

25.7 Hardware Bottlenecks¶

25.7.1 Real Hardware Testing Requirements¶

Cannot be parallelized beyond physical hardware availability:

Suspend/resume testing (Phase 4.1):
- Need: 10 different laptop models
- Test: 1000 cycles per laptop
- Time: ~4 hours per laptop (even if fully automated)
- Total: ~40 hours (2 days) minimum
Battery life validation (Phase 4.1):
- Need: 5 laptop models
- Test: Full discharge cycle
- Time: ~10-15 hours per laptop
- Total: ~60 hours (3 days) minimum
WiFi compatibility testing (Phase 4.1):
- Need: 10 different WiFi chipsets
- Test: Connect, transfer, disconnect, repeat
- Time: ~1-2 days per chipset (firmware, WPA3, roaming, power save, monitor mode)
- Total: ~10-20 days (2-4 weeks)
Multi-GPU testing (Phase 3.2):
- Need: 5 different GPU models
- Test: P2P transfers, workload distribution
- Time: ~4 hours per GPU
- Total: ~20 hours (1 day)
Cluster testing (Phase 3.2):
- Need: 8-16 node cluster with RDMA
- Test: DSM, DLM, membership, failover
- Time: ~40-60 hours (multiple days)
- Total: ~3-5 days

Hardware testing adds: ~2-3 weeks to timeline (but overlaps with development).

25.7.2 Specialized Hardware Acquisition¶

Before development can start, need to acquire: - 10+ laptop models (Intel, AMD, ARM) - 20+ WiFi/Bluetooth adapters - 10+ NVMe drives (different vendors) - 5+ GPUs (NVIDIA, AMD, Intel) - 8-16 node RDMA cluster - Touchpads, touchscreens, webcams, audio devices

Procurement time: ~2-4 weeks
Cost: $150,000-350,000 for full hardware lab (8-16 dual-socket servers with RDMA NICs, switches, and infrastructure)

25.8 QEMU CPU Feature Testing Matrix¶

Target: QEMU 10.2.1. Defines the CPU configurations tested per architecture to ensure UmkaOS handles feature presence/absence correctly at boot and runtime.

25.8.1 Design Principles¶

Every feature-dependent code path must have a test configuration where that feature is absent. If UmkaOS has if cpu_has_X { fast_path } else { fallback }, both branches must be exercised.
One "minimal" config per arch — the weakest CPU the architecture must boot on. Tests fallback paths, graceful degradation, no-SIMD paths, software TLB flush, etc.
One "maximal" config per arch — everything on. Tests feature detection, fast paths, and that no feature combination triggers unexpected interactions.
Feature-targeted configs — isolate specific features UmkaOS cares about (driver isolation, SIMD dispatch, MMU modes, crypto acceleration).
Profiles map to real hardware generations where possible — not synthetic combinations that no silicon ever shipped.

25.8.2 x86-64¶

QEMU binary: qemu-system-x86_64. Machine: default (q35 implied by -cpu).

Features tested: PKU (Tier 1 isolation), AVX2/AVX-512 (SIMD dispatch), SHA-NI (crypto acceleration), SMAP/SMEP (kernel hardening), PCID/INVPCID (TLB management), UMIP (userspace instruction prevention), LA57 (5-level paging).

Profile	`-cpu` Argument	Real HW Equivalent	Key Features	Tests
x86-minimal	`Haswell`	Intel 4th gen (2013)	AVX2, PCID. No PKU, no AVX-512, no SMAP, no SHA-NI	Fallback isolation (no MPK), no SMAP guard, software crypto
x86-broadwell	`Broadwell`	Intel 5th gen (2015)	+SMAP over Haswell. Still no PKU	SMAP enforcement, still no MPK
x86-skylake	`Skylake-Server`	Intel Xeon Scalable 1st gen (2017)	+PKU, +AVX-512F/BW/CD/DQ/VL	MPK Tier 1 isolation, AVX-512 SIMD dispatch
x86-icelake	`Icelake-Server`	Intel 10th gen server (2019)	+SHA-NI, +UMIP, +LA57, +AVX-512-VNNI	5-level paging, SHA-NI crypto, UMIP
x86-sapphire	`SapphireRapids`	Intel 4th gen Xeon (2023)	+AMX, +AVX-512-FP16, +SERIALIZE	Full Intel feature set
x86-epyc-v1	`EPYC-v1`	AMD EPYC Naples (2017)	No PCID, no INVPCID, no PKU. Has SHA-NI	AMD fallback: no PCID TLB, no MPK, AMD crypto
x86-epyc-genoa	`EPYC-Genoa`	AMD EPYC 4th gen (2022)	Full: PKU, AVX-512, SHA-NI, PCID, LA57	Full AMD feature set
x86-max	`max`	Synthetic (all features)	Everything QEMU can emulate	Maximum coverage, feature interaction testing

Feature toggle examples (apply to any base model):

# Skylake without PKU (test MPK fallback on otherwise modern CPU)
-cpu Skylake-Server,-pku

# Haswell with PKU added (test MPK on older baseline)
-cpu Haswell,+pku

# Icelake without LA57 (test 4-level paging on modern CPU)
-cpu Icelake-Server,-la57

# EPYC-v1 with PCID added (test PCID on AMD)
-cpu EPYC-v1,+pcid,+invpcid

Concrete QEMU command (example: x86-skylake):

qemu-system-x86_64 \
    -cdrom target/umka-kernel.iso \
    -serial stdio -display none -no-reboot -m 256M \
    -cpu Skylake-Server

25.8.3 AArch64¶

QEMU binary: qemu-system-aarch64. Machine: -M virt.

Features tested: MTE (memory tagging), SVE (scalable vectors), PAuth (pointer authentication), BTI (branch target identification), LSE/LSE2 (atomics), GICv3 (interrupt controller). POE (FEAT_S1POE) is NOT emulated by QEMU 10.2.1 — Tier 1 POE-based driver isolation cannot be tested; only page-table+ASID fallback is testable.

Profile	`-cpu` Argument	Machine Options	Real HW Equivalent	Key Features	Tests
arm64-minimal	`cortex-a53`	`-M virt`	Raspberry Pi 3/4, many SoCs	ARMv8.0-A only. No MTE, no SVE, no PAuth, no BTI, no LSE2	All fallback paths
arm64-a72	`cortex-a72`	`-M virt`	AWS Graviton 1, many server SoCs	ARMv8.0-A. No MTE/SVE/PAuth/BTI. LSE (v8.1 atomics via QEMU)	Current default. Baseline server
arm64-a76	`cortex-a76`	`-M virt`	Graviton 2 era, mobile flagship	ARMv8.2-A. +LSE. No MTE/SVE/BTI	LSE atomics, DotProd
arm64-n2	`neoverse-n2`	`-M virt,mte=on`	Graviton 3, Ampere Altra Max	ARMv9.0-A. +MTE2, +SVE (128-bit), +PAuth, +BTI	MTE tagging, SVE dispatch, PAuth, BTI
arm64-a710	`cortex-a710`	`-M virt,mte=on`	Mobile ARMv9 (Cortex-X2 era)	ARMv9.0-A. +MTE2, +SVE2, +BTI, +PAuth	SVE2, mobile-class ARMv9
arm64-max	`max`	`-M virt,mte=on`	Synthetic (all features)	MTE3, SVE (max VL), PAuth2, BTI, RME, everything	Maximum coverage

Feature toggle notes (AArch64 is more restrictive than x86):

# SVE can be toggled on most models:
-cpu neoverse-n2,sve=off

# MTE requires BOTH machine and CPU support:
-M virt,mte=on -cpu neoverse-n2    # MTE on
-M virt         -cpu neoverse-n2    # MTE off (machine doesn't enable it)

# PAuth can be toggled:
-cpu neoverse-n2,pauth=off

# SVE vector length can be set:
-cpu max,sve=on,sve128=on,sve256=on,sve512=off

POE testing gap: FEAT_S1POE (Permission Overlay Extension, ARMv8.9-A / ARMv9.4-A) is not implemented in any QEMU version as of 10.2.1. UmkaOS AArch64 Tier 1 driver isolation falls back to page-table+ASID switching (~150-300 cycles) when POE is absent. This fallback path IS tested (all profiles except a hypothetical future POE-enabled one exercise it). POE fast-path testing requires real hardware (e.g., ARM Neoverse V3, Apple M4).

Concrete QEMU command (example: arm64-n2):

qemu-system-aarch64 \
    -M virt,mte=on \
    -cpu neoverse-n2 \
    -serial stdio -display none -no-reboot -m 256M \
    -kernel target/aarch64-unknown-none/release/umka-kernel

25.8.4 ARMv7¶

QEMU binary: qemu-system-arm. Machine: -M vexpress-a15 (primary), -M virt (for alternative CPUs).

Features tested: DACR (Tier 1 isolation — all ARMv7-A CPUs have this), LPAE (large physical address), VFPv4/NEON (floating point/SIMD), Thumb-2.

Profile	`-cpu` Argument	Machine	Real HW Equivalent	Key Features	Tests
armv7-a15	`cortex-a15`	`-M vexpress-a15`	Exynos 5, OMAP5	LPAE, VFPv4-D32, NEON, HYP, TrustZone	Primary target. Full feature set
armv7-a7	`cortex-a7`	`-M virt`	Raspberry Pi 2, many IoT SoCs	LPAE, VFPv4-D32, NEON. No HYP/TZ by default	Lower-end ARMv7, tests core paths

Note: The vexpress-a15 machine only accepts cortex-a15. Alternative CPUs require -M virt. Since all ARMv7-A CPUs have DACR (the Tier 1 isolation mechanism), there is no feature-absent fallback to test — DACR is architectural.

The ARMv7 matrix is intentionally small. ARMv7 is a legacy support tier: two profiles suffice to cover the feature space.

Concrete QEMU command (example: armv7-a15):

qemu-system-arm \
    -M vexpress-a15 \
    -cpu cortex-a15 \
    -serial stdio -display none -no-reboot -m 256M \
    -kernel target/armv7a-none-eabi/release/umka-kernel

25.8.5 RISC-V 64¶

QEMU binary: qemu-system-riscv64. Machine: -M virt.

Features tested: V (vector), H (hypervisor), Zicbom (cache block management), Svpbmt (page-based memory types), Sstc (stimecmp timer), Zba/Zbb/Zbc/Zbs (bitmanip). RISC-V has no fast Tier 1 isolation mechanism — all drivers run as Tier 0 (in-kernel) or Tier 2 (Ring 3 + IOMMU).

Profile	`-cpu` Argument	Real HW Equivalent	Key Extensions	Tests
rv64-minimal	`sifive-u54`	SiFive HiFive Unleashed	RV64GC only. No V, no H, no Bitmanip, no Sstc, no Svpbmt	All fallback paths. Minimal RISC-V
rv64-default	`rv64`	Generic baseline	+Zba/Zbb/Zbc/Zbs, +Zicbom, +Sstc. No V, no H, no Svpbmt	Current boot default. Bitmanip, timer
rv64-rva22	`rva22s64`	RVA22 profile HW	+Svpbmt, +Zicbom, +Zba/Zbb/Zbs. No V, no H, no Sstc	Profile compliance, page-based mem types
rv64-rva23	`rva23s64`	RVA23 profile HW (modern)	+V, +H, +Sstc, +Svpbmt, +Zba/Zbb/Zbs. No Zbc	Modern profile. Vector, hypervisor
rv64-veyron	`veyron-v1`	Ventana Veyron V1	+H, +Sstc, +Svpbmt, +Zba/Zbb/Zbc/Zbs, +Zicbom. No V	Real high-perf core without Vector
rv64-ascalon	`tt-ascalon`	Tenstorrent Ascalon	+V, +H, +Sstc, +Svpbmt, +Zba/Zbb/Zbs. No Zbc	Real high-perf core with Vector
rv64-max	`max`	Synthetic (all extensions)	Everything: V, H, Zicbom, Svpbmt, Sstc, all Bitmanip, crypto, CFI	Maximum coverage

Concrete QEMU command (example: rv64-rva23):

qemu-system-riscv64 \
    -M virt \
    -cpu rva23s64 \
    -serial stdio -display none -no-reboot -m 256M \
    -kernel target/riscv64gc-unknown-none-elf/release/umka-kernel \
    -bios default

25.8.6 PPC32¶

QEMU binary: qemu-system-ppc. Machine: -M ppce500.

Features tested: SPE (signal processing engine), BookE MMU (TLB-based, no hash page table). PPC32 is a legacy support tier.

Profile	`-cpu` Argument	Real HW Equivalent	Key Features	Tests
ppc32-e500v2	`e500v2` (default)	Freescale/NXP P1020, P2020	SPE, BookE MMU, 36-bit phys	Primary target. SPE SIMD, BookE TLB
ppc32-e500mc	`e500mc`	QorIQ P3041, P5020	No SPE. BookE, HW virtualization	Tests SPE-absent path, e500mc multicore

Concrete QEMU command (example: ppc32-e500v2):

qemu-system-ppc \
    -M ppce500 \
    -cpu e500v2 \
    -nographic -no-reboot -m 256M \
    -kernel target/powerpc-unknown-none/release/umka-kernel

25.8.7 PPC64LE¶

QEMU binary: qemu-system-ppc64. Machine: -M pseries.

Features tested: VSX (vector-scalar), HTM (hardware transactional memory), Radix MMU vs Hash Page Table, MMA (matrix math). PPC64LE uses pseries machine with SLOF firmware.

Profile	`-cpu` Argument	Real HW Equivalent	Key Features	Tests
ppc64-power8	`power8`	IBM POWER8 (2014)	VSX, HTM, Hash Page Table only	HPT MMU, HTM paths
ppc64-power9	`power9`	IBM POWER9 (2017)	VSX, HTM, Radix MMU + HPT fallback	Radix MMU (primary UmkaOS path), HTM
ppc64-power10	`power10`	IBM POWER10 (2021)	VSX, MMA, Radix MMU only. No HTM	Modern POWER. MMA, no HTM fallback test

Note on HTM: POWER8 and POWER9 have HTM. POWER10 removed it. UmkaOS should detect HTM absence gracefully — the power10 profile tests this.

Concrete QEMU command (example: ppc64-power10):

qemu-system-ppc64 \
    -M pseries \
    -cpu power10 \
    -nographic -no-reboot -m 1G \
    -kernel target/powerpc64le-unknown-none/release/umka-kernel

25.8.8 s390x¶

QEMU binary: qemu-system-s390x. Machine: -M s390-ccw-virtio.

Features tested: Vector Facility (SIMD), Vector Enhancements (IEEE 754), MSA (message-security-assist — crypto acceleration), NNPA (neural network processing assist), DFLT (deflate conversion). s390x uses Storage Keys for memory protection, but these are page-granularity and too coarse for Tier 1 fast domain isolation — Tier 1 is unavailable; drivers choose Tier 0 or Tier 2. Tier 2 available via channel I/O subchannel protection.

Profile	`-cpu` Argument	Real HW Equivalent	Key Features	Tests
s390x-z14	`z14-base`	IBM z14 (2017)	Vector Facility, MSA5. No NNPA, no DFLT, no Vector Enhancements 2	Baseline vector, crypto. No NNPA fallback
s390x-z15	`gen15a-base`	IBM z15 T01 (2019)	+Vector Enhancements 2, +DFLT, +MSA9 (CPACF enhancements). No NNPA	Enhanced vector, deflate acceleration
s390x-z16	`gen16a-base`	IBM z16 / 3931 (2022)	+NNPA (AI inference), +enhanced crypto (MSA10). Full facility set	NNPA detection, full crypto suite
s390x-max	`max`	Synthetic (all facilities)	Everything QEMU can emulate	Maximum coverage, facility interaction testing

Feature matrix:

Feature	z14-base	gen15a-base	gen16a-base	max
Vector Facility	Yes	Yes	Yes	Yes
Vector Enhancements 2	No	Yes	Yes	Yes
MSA (crypto)	MSA5	MSA9	MSA10	All
ETOKEN	No	Yes	Yes	Yes
NNPA	No	No	Yes	Yes
DFLT (deflate)	No	Yes	Yes	Yes

Known testing gaps: SIE (Start Interpretive Execution — nested virtualization) is not functional in QEMU TCG mode. z16-specific NNPA operations are partially emulated; instruction-level fidelity depends on QEMU version. Storage Key protection is emulated but performance characteristics differ from real hardware.

Concrete QEMU command (example: s390x-z16):

qemu-system-s390x \
    -M s390-ccw-virtio \
    -cpu gen16a-base \
    -nographic -no-reboot -m 512M \
    -kernel target/s390x-unknown-linux-gnu/release/umka-kernel

25.8.9 LoongArch64¶

QEMU binary: qemu-system-loongarch64. Machine: -M virt.

Features tested: LSX (128-bit SIMD), LASX (256-bit SIMD), CRYPTO (CRC32, AES acceleration), LBT (binary translation assist for x86/ARM/MIPS code), PTW (hardware page table walker). LoongArch64 has no hardware memory domain isolation mechanism — Tier 1 is unavailable; drivers choose Tier 0 or Tier 2. Tier 2 available via IOMMU.

Profile	`-cpu` Argument	Real HW Equivalent	Key Features	Tests
la64-base	`la464`	Loongson 3A5000, LA464 core	LSX, LASX, CRYPTO. Software TLB refill	Primary target. SIMD dispatch, software TLB
la64-max	`max`	Synthetic (all features)	Everything QEMU can emulate: LSX, LASX, LBT, PTW	Maximum coverage, all feature paths

Feature matrix:

Feature	la464	max
LSX (128-bit SIMD)	Yes	Yes
LASX (256-bit SIMD)	Yes	Yes
CRYPTO (CRC32/AES)	Yes	Yes
LBT (binary translation)	No	Yes
PTW (HW page table walker)	No	Yes

Known testing gaps: QEMU has limited CPU model variety for LoongArch — only the LA464 core and max are available. The LA664 core (Loongson 3A6000) is not yet modeled. Hardware page table walker (PTW) behavior in max mode differs from real silicon software TLB refill in LA464 — both paths must be tested but performance characteristics are synthetic. LBT (binary translation assist) is a QEMU max-only feature with no real hardware equivalent in currently modeled cores.

The LoongArch64 matrix is intentionally small — two profiles cover the available QEMU CPU models. As QEMU adds LA664 and future cores, additional profiles should be added.

Concrete QEMU command (example: la64-base):

qemu-system-loongarch64 \
    -M virt \
    -cpu la464 \
    -serial stdio -display none -no-reboot -m 512M \
    -kernel target/loongarch64-unknown-linux-gnu/release/umka-kernel

25.8.10 CI Test Matrix (Recommended)¶

Three tiers of test coverage, from fast (every commit) to thorough (nightly):

25.8.10.1 Tier 1: Every Commit (8 configs, ~3 min total)¶

One profile per architecture — the current defaults, ensuring nothing regresses:

Arch	Profile	`-cpu`
x86-64	x86-max	`max`
AArch64	arm64-a72	`cortex-a72`
ARMv7	armv7-a15	`cortex-a15`
RISC-V 64	rv64-default	`rv64`
PPC32	ppc32-e500v2	`e500v2`
PPC64LE	ppc64-power10	`power10`
s390x	s390x-max	`max`
LoongArch64	la64-base	`la464`

25.8.10.2 Tier 2: Every PR (22 configs, ~10 min total)¶

Adds feature-absent profiles to exercise fallback paths:

All Tier 1 configs, plus:

Arch	Profile	Tests
x86-64	x86-minimal (Haswell)	No PKU, no SMAP, no SHA-NI
x86-64	x86-skylake	PKU + AVX-512
x86-64	x86-epyc-v1	AMD, no PCID
AArch64	arm64-minimal (cortex-a53)	No MTE/SVE/PAuth/BTI
AArch64	arm64-n2 + mte=on	MTE + SVE + PAuth + BTI
AArch64	arm64-max + mte=on	Full feature set
RISC-V 64	rv64-minimal (sifive-u54)	Bare RV64GC
RISC-V 64	rv64-rva23	Vector + Hypervisor
PPC64LE	ppc64-power8	HPT MMU, HTM
PPC64LE	ppc64-power9	Radix + HTM
PPC32	ppc32-e500mc	No SPE
ARMv7	armv7-a7 (virt)	Lower-end ARMv7
s390x	s390x-z14	Baseline vector, no NNPA/DFLT
LoongArch64	la64-max	All features, HW PTW + LBT

25.8.10.3 Tier 3: Nightly (34 configs, ~25 min total)¶

All Tier 2 configs, plus:

Arch	Profile	Tests
x86-64	x86-broadwell	+SMAP, still no PKU
x86-64	x86-icelake	SHA-NI, LA57, UMIP
x86-64	x86-sapphire	Full Intel
x86-64	x86-epyc-genoa	Full AMD
x86-64	Skylake-Server,-pku	Modern CPU, MPK disabled
x86-64	Icelake-Server,-la57	Modern CPU, 4-level paging
AArch64	arm64-a76	LSE, no MTE/SVE
AArch64	arm64-a710 + mte=on	SVE2, mobile ARMv9
RISC-V 64	rv64-rva22	Svpbmt, no V/H
RISC-V 64	rv64-veyron	Real core, no V
s390x	s390x-z15	Enhanced vector, DFLT
s390x	s390x-z16	NNPA, full crypto

25.8.11 Known Testing Gaps¶

Gap	Impact	Mitigation
AArch64 POE (FEAT_S1POE)	Cannot test POE-based Tier 1 fast isolation (~40-80 cycles)	Test page-table+ASID fallback (all profiles exercise this). POE fast path requires real hardware (Neoverse V3, Apple M4).
AArch64 RME (Realm Management Extension)	Cannot test CCA confidential computing	RME is a Phase 4+ feature. `max` CPU advertises RME but full CCA requires firmware support not emulated.
RISC-V Tier 1 isolation	No fast isolation mechanism exists in RISC-V ISA	By design: RISC-V Tier 1 runs as Tier 0. Tier 2 (Ring 3 + IOMMU) tested via `rv64-max` with H extension.
x86 TDX/SEV	Cannot test confidential VM support	Phase 4+ feature. Requires KVM passthrough, not QEMU TCG.
Real NUMA topology	QEMU `-smp` + `-numa` can simulate, but latency is synthetic	Functional testing only; performance characteristics require real hardware.
PPC64 PowerVM LPAR	pseries emulates, but not full LPAR partitioning	Acceptable for functional testing.
s390x SIE (nested virtualization)	Cannot test KVM-on-s390x in QEMU TCG mode	Requires real z/Architecture hardware or KVM passthrough. Functional guest support only.
s390x NNPA fidelity	z16 NNPA instruction emulation is partial in QEMU	Feature detection tested; instruction-level behavior requires real z16 hardware.
LoongArch64 LA664 core	No LA664 (3A6000) model in QEMU	Only LA464 (3A5000) available. LA664-specific features untestable until QEMU adds the model.
LoongArch64 PTW vs software refill	HW PTW in `max` mode differs from LA464 software TLB refill	Both paths tested functionally, but real PTW latency characteristics require hardware.

25.9 Human Involvement Required¶

Agentic development is not fully autonomous. Humans are needed for:

25.9.1 Architectural Decisions (Non-Automatable)¶

Canonical list: Section 24.11 (§24.11). This table mirrors that list — update both when status changes.

Question	Status	Notes
OEM partnerships strategy	OPEN	Framework, System76, Dell, HP — go-to-market for Phase 5b
GPU confidential computing (VRAM encryption)	RESOLVED	Both paths supported: runtime detection + admin override (`umka.cc_device_dma=`). See Section 9.7
Nested GPU passthrough	RESOLVED	Supported if hardware allows it (IOMMU nested + TEE firmware + ≤3x overhead). See Section 9.7
Policy module measurement enforcement	RESOLVED	Tied to boot security posture: enforce/advisory/off. See Section 19.9
DPU io_uring submission offload	RESOLVED	Not a separate question — Tier M peer protocol IS the transport; dumb drivers use normal KABI path
Multi-arch fallback acceptance criteria	RESOLVED	Per-feature thresholds (native ≤5%, fallback ≤10%), sysfs + dmesg notification. See Section 2.22
Cross-feature CI testing formalization	RESOLVED	21 pairs, 3-tier CI spec. See Section 24.10

All previously open items (WiFi tier, BlueZ, proprietary drivers, default filesystem, eBPF verifier, io_uring+SEV-SNP, CXL+DSM, live evolution attestation) are also RESOLVED — see the full resolved decisions table in Section 24.11.

25.9.2 Spec Review & Correction¶

Automated review cycles (ZAI, Opus, Qwen, cross-subsystem) have resolved the majority of spec bugs and design gaps. Remaining work is incremental: each review cycle finds fewer issues as the spec matures. Human review is needed for ambiguous cases where multiple valid design approaches exist.

25.9.3 External Coordination¶

WINE/Proton integration: Negotiate with Valve, CodeWeavers
OEM partnerships: Framework, System76, Dell, HP
Upstream contributions: Linux driver code reuse, licensing
Community building: Documentation, marketing, beta testing

25.10 Realistic Full Timeline (Agentic + Human)¶

Assuming: - 50 t/s inference (fast model) - 10-20 AI agents in parallel - Human architect for decisions - Hardware lab available - Spec is corrected first (arch-review loops)

Activity	Duration	Notes
Pre-development
Arch review + spec fixes	2-3 weeks	Human-in-loop with AI review agents
Hardware procurement	2-4 weeks	Can overlap with spec fixes
Setup CI/CD infrastructure	1 week	Automated build/test pipelines (see below)
Core development
Phases 1.1–5.1 (optimized)	20 weeks (~5 months)	AI agents, parallelized
Hardware testing	3 weeks	Overlaps with development
Post-development
Integration testing	2-3 weeks	Full system, all architectures
Security testing	3-4 weeks	Adversarial testing (see below)
Bug fixing (found in integration + security)	2-3 weeks	Final polish
Performance tuning	2-3 weeks	Optimize hot paths
Documentation	2 weeks	User docs, admin guides
Beta testing
Internal alpha (10 users)	4 weeks	Find major issues
Public beta (100 users)	8 weeks	Broader hardware, edge cases
TOTAL	~12-14 months	From spec to public beta

CI/CD infrastructure: GitHub Actions with 8-architecture QEMU matrix (x86-64, AArch64, ARMv7, RISC-V, PPC32, PPC64LE, s390x, LoongArch64). Local pre-push validation: make check (= fmt + lint + build + test for the default architecture). Each PR runs: make fmt then make lint-all then make build per arch then make test per arch then syzkaller fuzz (x86-64 only, 5-minute regression run). Hardware lab: real-device testing on x86-64 (Intel + AMD) and AArch64 (Raspberry Pi 5, Apple M1) for non-emulatable hardware paths (PCIe link training, IOMMU fault injection, actual suspend/resume cycles).

Security testing phase (included in post-development): - Syzkaller fuzzing: Continuous syscall fuzzing across all 8 architectures. Minimum 72-hour clean fuzzing gate per architecture before beta (continuous fuzzing continues beyond this gate in CI; the 72-hour requirement is the minimum before a release is considered candidate-ready). - eBPF verifier adversarial testing: Crafted programs targeting bypass, out-of-bounds access, and infinite loops. Coverage-guided with KASAN. - Namespace/capability escape testing: Adversarial seccomp/setns/clone sequences attempting privilege escalation across namespace boundaries. - Tier 1 isolation testing: Verify that a compromised Tier 1 driver cannot read PKEY 0 memory (on hardware with MPK; on QEMU, verify the software enforcement path). - Penetration testing: External audit of the capability system, IPC paths, and SysAPI layer for TOCTOU, use-after-free, and confused deputy.

KVM implementation is included in Roadmap Phase 4 (Production Ready) as a sub-item, since KVM host-side (VMX/SVM, EPT, vCPU scheduling) depends on the scheduler and memory subsystems from Phases 1.1 through 2.2. Estimated: 40-60 agent hours, 30-45 days elapsed (comparable to a substantial driver subsystem).

Breakdown: - Pre-development: 1 month - Core development: 5 months - Post-development (including security testing): 2.5 months - Beta testing: 3 months - Buffer: 1 month (unexpected issues)

25.11 Comparison: Human vs Agentic¶

Metric	Human Development	Agentic Development (50 t/s)
Team size	10-15 developers	10-20 AI agents (+ 1 architect)
Timeline (to public beta)	5-7 years	12-14 months
Cost (developer salaries)	$10-15 million (7 years × $150K × 10 devs)	$200K-500K (compute + 1 architect; assumes negotiated enterprise API pricing — retail token costs would be 3-5× higher)
Cost (hardware)	$100K (same)	$100K (same)
Total cost	$10-15M	$0.3-0.6M
Speedup	1x	~5x faster
Code quality	Varies by developer	Consistent (determined by spec)
Bugs from spec errors	Same	Same (GIGO applies)
Bugs from implementation	Higher (human error)	Different profile (consistent patterns, but risk of systematic errors across similar code)

Key insight: Agentic development is 5x faster and 10-20x cheaper, but bottlenecked by: 1. Hardware testing (physics-bound) 2. Iteration cycles (compile/test, not coding) 3. Spec quality (AI can't fix bad specs without human guidance)

25.12 Sensitivity Analysis: Slower Inference¶

What if inference is slower?

Inference Speed	Agent Coding Time	Impact on Timeline	Total Timeline
50 t/s (base case)	~5-10 min/component	—	12-14 months
25 t/s (2x slower)	~10-20 min/component	+10-15%	13-16 months
10 t/s (5x slower)	~25-50 min/component	+25-30%	15-18 months
5 t/s (10x slower)	~50-100 min/component	+40-50%	18-21 months

Key insight: Even at 10x slower inference, agentic development is only +50% longer (18-21 months vs 12-14 months), because most time is spent in compilation/testing, not AI inference.

Inference speed matters less than you'd expect once it's above ~5-10 t/s.

25.13 Optimistic vs Pessimistic Scenarios¶

25.13.1 Best Case (Everything Goes Right)¶

Assumptions: - Spec has zero showstoppers after initial review - Hardware available immediately - AI agents rarely hit bugs requiring human intervention - Beta testing finds only minor issues

Timeline: 10-11 months to public beta

25.13.2 Realistic Case (Some Issues)¶

Assumptions: - Spec has ~300 bugs (as analyzed) - Hardware procurement takes time - Some subsystems need multiple rewrites - Beta testing finds 50-100 additional issues

Timeline: 12-14 months to public beta (our base estimate)

25.13.3 Pessimistic Case (Major Problems)¶

Assumptions: - Spec has fundamental architectural flaws (e.g., RCU design is unsound) - Major subsystem needs redesign (e.g., DSM quorum logic) - Hardware compatibility worse than expected (WiFi works on 3/10 chipsets) - Beta testing finds showstopper issues (data corruption, security vulnerabilities)

Timeline: 18-24 months to public beta

25.14 What Determines Success?¶

The bottleneck is NOT AI speed — it's specification quality.

Critical success factors: 1. ✅ Spec correctness (run arch-review → fix loops until zero showstoppers) 2. ✅ Hardware availability (don't wait 6 months for cluster procurement) 3. ✅ Automated testing (CI/CD must catch regressions immediately) 4. ✅ Human architectural guidance (AI can't make strategic decisions) 5. ⚠️ Unknown unknowns (things you discover only during implementation)

With perfect spec: 10-12 months is achievable.
With current spec (~50 remaining flaws): 12-14 months realistic.
With flawed spec (fundamental issues): 18-24 months or requires redesign.

25.15 Recommendations¶

25.15.1 Before Starting Implementation¶

Run 2-3 more arch-review cycles (eliminate all showstoppers)
Procure hardware lab (10 laptops, cluster, WiFi adapters)
Set up CI/CD (automated build/test on every commit)
Define architectural decision process (who decides Tier 1 vs Tier 2 for WiFi?)

Time investment: 1 month
Payoff: Saves 2-4 months during implementation

25.15.2 During Implementation¶

Daily integration testing (catch cross-subsystem bugs early)
Weekly human review (architect reviews AI agent work)
Parallel spec updates (fix spec bugs as they're discovered)
Hardware testing from day 1 (don't wait until "code complete")

25.15.3 Metrics to Track¶

Leading indicators (predict timeline): - Spec bugs discovered per week (should decrease over time) - Test pass rate (should increase toward 95%+) - Integration conflicts per week (should stabilize <10)

Lagging indicators (measure progress): - Lines of code (target: ~300K SLOC) - Test coverage (target: >80%) - Supported hardware (target: 50+ laptop models)

25.16 Final Answer: Realistic Timeline¶

Question: With 50 t/s inference and agentic development, how long to develop UmkaOS?

Answer: 12-14 months from spec finalization to public beta

Breakdown: - Spec review & fixes: 1 month - Core development (Phases 1.1–5.1): 5 months - Integration & polish: 2 months - Beta testing: 3 months - Buffer for unknowns: 1 month

Compared to human development: 5x faster (5-7 years → 12-14 months)

Cost: 10-20x cheaper ($5-10M → $0.3-0.6M)

Caveat: This assumes good spec quality and hardware availability. Poor spec quality adds 6-12 months. Hardware unavailability adds 2-6 months.

The bottleneck is not AI — it's specification correctness and hardware testing.

25.17 Agentic Live Development Workflow¶

Phase numbering: This chapter uses the same phase numbering as Section 24.2: Phase 1 = Foundations (boot to hello-world), Phase 2 = Self-hosting shell + Tier 1 fault recovery (busybox, VirtIO-blk), Phase 3 = Real workloads + Tier M peer demo (systemd, Docker, TCP, NVMe), Phase 4 = Production ready (K8s, KVM, LTP, real hardware), Phase 5 = Ecosystem and platform maturity (5a-5e sub-phases).

UmkaOS's architecture enables a development workflow fundamentally different from traditional kernel development: live, on-host, iterative development where an AI agent develops and tests kernel code on the same machine the kernel is running on, without rebooting.

This is not a distant Phase 5 aspiration — it falls out naturally from three existing architectural features:

KABI tier isolation (Section 11.2): Drivers compiled against umka-driver-sdk run at any tier (0/1/2) without recompilation. Tier 2 (Ring 3 + IOMMU) provides full crash containment — a buggy driver cannot harm the host.
Live kernel evolution (Section 13.18): Core components and KABI services can be replaced at runtime via the EvolvableComponent protocol. State is serialized, swapped atomically (~1-10 μs), and rolled back if the new version crashes.
Multikernel peer model (Section 11.1, Section 5.1): Peer kernels on DPUs or remote nodes provide isolated test environments within the same cluster. A new service version can be deployed to one peer, soaked, and rolled to others.

25.17.1 Driver Development on a Live Host¶

The standard agentic driver development loop:

repeat {
    1. Agent writes/modifies driver code against umka-driver-sdk.
    2. Agent compiles: cargo build --release -p my-driver
    3. Agent loads driver at Tier 2:
         echo "load my-driver.kabi tier=2" > /ukfs/kernel/drivers/control
       Tier 2: Ring 3, IOMMU-isolated. Crashes cannot harm the host kernel.
    4. Agent runs test suite against the driver.
       - Functional tests (does the driver handle I/O correctly?)
       - Stress tests (high IOPS, error injection, timeout simulation)
       - KABI conformance tests (vtable completeness, ring buffer protocol)
    5. If tests fail: agent reads crash log, analyzes, modifies code → goto 1.
    6. If tests pass: agent promotes to Tier 1:
         echo "1" > /ukfs/kernel/drivers/my-driver/tier
}

Iteration speed: Steps 1-5 take ~2-5 minutes per cycle (compile ~30s incremental, load ~10ms, test ~1-3 min). This is 10-100x faster than traditional kernel development (which requires reboot per change on a monolithic kernel) and comparable to userspace development speed.

No QEMU needed for driver development: Because Tier 2 drivers run in Ring 3 with IOMMU isolation, the host kernel is protected. The agent can develop drivers on the target hardware directly, accessing real devices (NVMe, NIC, GPU) with real firmware and real interrupt behavior — not emulated. QEMU is still needed for: - Core kernel development (scheduler, memory manager — not tier-isolated) - Multi-architecture testing (cross-compile + QEMU boot for non-host architectures) - CI/CD validation (reproducible environments)

25.17.2 Kernel Service Development via Live Replacement¶

For kernel subsystems (VFS, networking, block layer) — not just drivers — the live evolution framework (Section 13.18, Section 13.18) enables iterative development without rebooting:

repeat {
    1. Agent modifies KABI service code (e.g., umka-net TCP congestion module).
    2. Agent compiles: cargo build --release -p umka-net
    3. Agent triggers live replacement:
         echo "evolve umka-net /path/to/new/umka-net.uko" > /ukfs/kernel/evolution/control
       The evolution framework runs Phase A/A'/B/C:
         Phase A  — Loads new binary, validates KABI signature and vtable compatibility.
                    Exports old service state in chunks (connection table, routing FIB,
                    etc.) while old service continues handling requests.
         Phase A' — Quiescence (100ms deadline for services): blocks new syscall
                    entries (-ERESTARTSYS), drains in-flight operations, final
                    atomic state re-export. See [Section 13.18](13-device-classes.md#live-kernel-evolution--abort-handling-and-rollback)
                    for quiescence failure handling (abort, rollback, retry).
         Phase B  — Atomic swap (~1-10 μs stop-the-world): IPI, vtable pointer
                    swap, pending ops ring transfer, CPUs released.
         Phase C  — New service drains pending ops, blocked syscalls auto-retry
                    (transparent to userspace).
    4. Agent runs tests against the new service.
    5. If new service crashes within watchdog window (10s for services):
       → Automatic reload of previous version via forward evolution from retained state. Agent reads
         crash log → goto 1.
    6. If tests pass: new version is now the active service.
}

Constraints: - Non-replaceable data components (memory allocator data — PageArray, BuddyFreeList, PcpPagePool; page table hardware ops; capability data — CapTable, CapEntry; page reclaim data; KABI dispatch trampoline; evolution primitive — see Section 13.18) still require QEMU or reboot for development. Their corresponding policy layers (PhysAllocPolicy, VmmPolicy, PageReclaimPolicy, CapPolicy) are replaceable via atomic pointer swap. - The first deployment of a service (from nothing) requires a boot. Live replacement only works for updating an already-running service. - State format changes between versions must provide a migration function (v(N-1) → v(N)). If the agent makes a major restructuring, a chained migration or fresh restart may be needed.

25.17.3 Multikernel Testing Strategy¶

On a multikernel cluster (host + DPU peers, or multi-host), the agent can use the rolling replacement protocol (Section 13.18) for safe incremental testing:

Development cluster topology:
  ┌──────────┐   ┌──────────┐   ┌──────────┐
  │  Host    │   │  DPU #1  │   │  DPU #2  │
  │ (dev box)│←→│ (test    │←→│ (prod    │
  │          │   │  target) │   │  control)│
  └──────────┘   └──────────┘   └──────────┘

Agent development workflow:
  1. Agent writes new network stack version on Host.
  2. Agent deploys to DPU #1 only (via rolling replacement, one peer).
  3. DPU #1 runs the new version; DPU #2 and Host run the old version.
     Capability service routing ensures clients are served by whichever
     peer has an active provider.
  4. Agent runs workload against DPU #1:
     - Network throughput tests (iperf3, netperf)
     - Latency tests (sockperf, ping)
     - Error injection (packet drop, reorder, corrupt)
  5. If DPU #1 crashes: automatic reload of previous version via forward evolution. Only DPU #1 affected.
     Host and DPU #2 continue serving. Agent analyzes → goto 1.
  6. If soak succeeds (1 hour default):
     → Roll to DPU #2.
     → Roll to Host.
     → Cluster-wide verification.

Key advantage: The host running the development tools (compiler, editor, agent runtime) is the last node to receive the update. If the new code has a fatal bug, the development environment is never disrupted — the agent can always diagnose and fix the issue from the unaffected host.

25.17.4 LTP as Agentic Development Substrate¶

Implementing Linux syscall compatibility is the largest single task in UmkaOS development: ~400 syscalls, each with complex edge cases, architecture-specific behavior, and underdocumented invariants. For human developers, this is years of tedious work. For agentic development, the existence of the Linux Test Project (LTP) transforms this from an open-ended research problem into a structured, test-driven implementation task.

Why LTP is uniquely valuable for agents:

Machine-readable behavioral specification. LTP contains ~5,000+ test cases that encode the actual expected behavior of Linux syscalls — not what the man page says, but what the kernel actually does. Each test is a concrete input → expected output pair. An agent can read a test, understand the contract, implement the syscall to satisfy the test, and verify correctness — all without human involvement.
Natural task decomposition. LTP tests are organized by syscall family (open, mmap, clone, futex, etc.). Each family is an independent work unit that an agent can claim, implement, and validate. The test suite provides a natural progress tracker: "242/400 syscall families passing" is an unambiguous status.
Edge case discovery. LTP tests exercise corner cases that are difficult to derive from documentation alone: mmap with MAP_FIXED overlapping an existing mapping, clone with invalid flag combinations, futex wake-vs-requeue races, signalfd interaction with SA_SIGINFO. These tests encode decades of bug reports and regression fixes. The agent gets this knowledge for free.
Regression prevention. Once a test passes, it must never fail again. The agent runs the full LTP suite after every change. Any regression is caught immediately with a specific failing test that identifies the exact syscall and edge case — the agent can localize the bug without human debugging.
Cross-architecture validation. LTP runs on all architectures. The same test suite validates that syscall behavior is identical on x86-64, AArch64, ARMv7, RISC-V, PPC32, and PPC64LE. Architecture-specific bugs (wrong register convention, wrong signal frame layout, wrong struct stat padding) are caught automatically.

Agentic LTP workflow:

For each syscall family (e.g., "mmap"):
  1. Agent reads LTP tests for the family (e.g., ltp/testcases/kernel/syscalls/mmap/)
  2. Agent reads the architecture spec (sysapi/syscall-interface.md, process/*.md, etc.)
  3. Agent implements the syscall handler to satisfy the spec
  4. Agent runs the LTP tests for that family on all 8 arches in QEMU
  5. If tests fail: agent reads failure output, identifies the discrepancy, fixes → goto 3
  6. If tests pass: commit, move to next family
  7. After every N families: run full LTP regression to catch cross-syscall interactions

Scale advantage: A human developer implementing mmap compatibility might spend 2-4 weeks reading documentation, writing code, and debugging edge cases. An agent with LTP tests as a feedback signal can iterate in minutes per cycle. The combinatorial complexity of ~400 syscalls × ~10 edge cases each = ~4,000 implementation decisions is within agent capability when each decision has an immediate correctness signal (LTP test pass/fail).

Complementary test suites: The LTP pattern generalizes to at least 12 subsystems. See Section 25.18 for the comprehensive inventory of Linux test suites usable as agentic development accelerators — including xfstests (~1,500 filesystem tests), packetdrill (2,000+ TCP scripts from Google), BPF selftests (~1,000 eBPF tests), liburing tests, kvm-unit-tests, IGT GPU Tools (2,228+ DRM/KMS subtests), blktests, and 20+ kselftest subdirectories covering mm, cgroup, seccomp, futex, ptrace, and more.

Cross-references: - Syscall interface spec: Section 19.1 - Verification strategy (LTP gate): Section 24.3 - Phase 4 exit criteria (>95% LTP): Section 24.2

25.17.5 Development Acceleration Summary¶

Development task	Traditional Linux	UmkaOS agentic
Driver bug fix	Edit → compile → reboot → reproduce → verify (~10-30 min)	Edit → compile → reload Tier 2 → verify (~2-5 min)
Network stack change	Edit → compile → reboot → reconfigure → test (~15-45 min)	Edit → compile → live evolve → test (~3-8 min)
Scheduler policy tuning	Edit → compile → reboot → benchmark (~20-60 min)	Edit → compile → policy hot-swap → benchmark (~2-5 min)
Cluster service update	N × (reboot + rejoin) (~N × 5-10 min)	N × (live replace + soak) (~N × 200ms + 1 hour soak)
Multi-arch driver testing	6 × (cross-compile → QEMU boot → test) (~6 × 10 min)	Tier 2 on native hardware + 5 × QEMU (~1 × 5 min + 5 × 10 min)

Net effect: The compile-test cycle for drivers and services drops from 10-60 minutes (reboot-bound) to 2-8 minutes (compile-bound). With incremental compilation (~30s), the bottleneck shifts entirely to test execution time — exactly where it should be.

Cross-references: - KABI driver model: Section 12.1 - Tier isolation: Section 11.2 - Crash recovery: Section 11.9 - Live kernel evolution: Section 13.18 - KABI service live replacement: Section 13.18 - Driver tier promotion: Section 13.18 (promotion protocol) - Policy hot-swap: Section 19.9 - Multikernel peer model: Section 11.1 - Cluster membership: Section 5.1 - ServiceDrainNotify: Section 5.11 - FMA telemetry: Section 20.1

25.18 Linux Test Suite Inventory for Agentic Development¶

The "LTP as development substrate" pattern (Section 25.17) generalizes far beyond syscalls. Linux has accumulated decades of test suites across nearly every subsystem — each encoding the actual behavioral contract that UmkaOS must implement. For agentic development, these suites convert open-ended "implement Linux compatibility" tasks into structured implement → run tests → fix failures → repeat loops with unambiguous pass/fail signals.

This section catalogues every major Linux test suite usable as an agentic development accelerator for UmkaOS, organized by value tier.

25.18.1 Tier 1: High-Value Test Suites (Pure Userspace API, Directly Usable)¶

These suites test exclusively via userspace API (syscalls, ioctls, procfs/sysfs) and require no Linux kernel internals. They validate exactly the external ABI boundary UmkaOS must implement.

25.18.1.1 LTP (Linux Test Project)¶

Property	Value
Repository	`github.com/linux-test-project/ltp`
Test count	~5,000+ test cases
Interface	Userspace: syscalls, procfs, sysfs
UmkaOS chapters	Section 8.1 (fork/exec/signals), Section 17.1 (namespaces), Section 17.2 (cgroups), Section 19.1 (syscalls), Section 9.1 (capabilities), Section 17.3 (IPC)

LTP is the single largest general-purpose Linux kernel test suite. Tests are organized by syscall family (open, mmap, clone, futex, etc.). Each family is an independent work unit. See Section 25.17 for the full agentic LTP workflow.

25.18.1.2 xfstests (fstests)¶

Property	Value
Repository	`git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git`, mirror at `github.com/kdave/xfstests`
Test count	~1,500+ tests across generic + per-filesystem groups
Interface	Userspace: open/read/write/fsync/fallocate/ioctl/xattr
Filesystems	ext4 (L4), XFS (L4), btrfs (L4), f2fs (L3), tmpfs (L2), NFS (L2), overlayfs (L2), many others
UmkaOS chapters	Section 14.1 (VFS), Section 15.6 (ext4), Section 15.7 (XFS), Section 15.8 (Btrfs), Section 14.11 (FUSE), Section 14.7 (overlayfs)

Probably the single highest ROI test suite for agentic development. xfstests encodes the complete POSIX filesystem behavioral contract plus Linux-specific extensions (fallocate, FIEMAP, O_TMPFILE, copy_file_range, splice, hole punch, CoW semantics). The tests/generic/ directory contains ~700+ tests that apply to any filesystem — these test VFS-layer semantics independent of filesystem implementation.

Agentic workflow:

For each filesystem operation (e.g., "fsync + CoW"):
  1. Agent reads xfstests/tests/generic/ tests exercising that operation
  2. Agent reads VFS spec ([Section 14.1](14-vfs.md#virtual-filesystem-layer))
  3. Agent implements the VFS operation
  4. Agent runs xfstests generic group: ./check -g generic/quick
  5. Failures → agent reads test output, fixes → goto 3
  6. Pass → run full generic suite, then per-filesystem tests

Full suite runtime: 5-6 days for all filesystems (per LWN 2022 report). The quick group runs in ~30 minutes — suitable for per-iteration feedback.

25.18.1.3 Packetdrill¶

Property	Value
Repository	`github.com/google/packetdrill`
Test count	300+ open-source scripts; 2,000+ at Google (being incrementally open-sourced); 66 in mainline `tools/testing/selftests/net/packetdrill/` as of Linux 6.12
Interface	Userspace: socket API (socket/bind/listen/accept/send/recv) + injected/verified packets via TUN device
Protocols	TCP, UDP, ICMP over IPv4 and IPv6
UmkaOS chapters	Section 16.3 (sockets), Section 16.9 (congestion), Section 16.15 (kTLS), Section 16.11 (MPTCP)

Packetdrill is a scriptable network stack testing tool developed by Google. Each script specifies a precise sequence of socket syscalls interleaved with timestamped packet injections/expectations — testing the full TCP state machine from userspace to wire. Google's internal suite of 2,000+ scripts covers congestion control, loss recovery, buffer management, ECN, SACK, Fast Open, Tail Loss Probe, Path MTU discovery, and more.

Key advantage for agentic development: Packetdrill scripts are machine-readable behavioral specifications. Each script encodes: - Exact syscall sequence with expected return values - Exact packet sequence with expected header values (tcpdump-like syntax) - Timing constraints (±tolerance for jitter) - Internal TCP state assertions (via TCP_INFO getsockopt)

This is the richest available test suite for TCP implementation correctness. An agent implementing UmkaOS TCP can use packetdrill scripts as the primary feedback signal, iterating in seconds per script.

25.18.1.4 BPF Selftests¶

Property	Value
Location	`tools/testing/selftests/bpf/` in Linux source tree
Test count	~1,000+ test cases
Interface	Userspace: bpf() syscall, perf_event_open, netlink
UmkaOS chapters	Section 19.1 (§eBPF subsystem)

Tests eBPF verifier correctness, map operations, program types, BTF (BPF Type Format), tracing hooks, networking hooks, and the full bpf() syscall interface. Already identified as a key agentic accelerator.

25.18.1.5 liburing Tests¶

Property	Value
Repository	`github.com/axboe/liburing` (test/ directory)
Test count	~200+ test cases
Interface	Userspace: io_uring_setup/io_uring_enter syscalls
UmkaOS chapters	Section 19.1 (§io_uring subsystem)

Tests io_uring submission/completion ring semantics: read/write/poll/timeout/ link/cancel/fixed files/multishot/provided buffers/registered buffers. The liburing test suite is the de facto compliance test for io_uring implementations.

25.18.1.6 KVM Unit Tests¶

Property	Value
Repository	`github.com/kvm-unit-tests/kvm-unit-tests`
Test count	Hundreds of tests per architecture
Architectures	x86_64, arm64, s390x, ppc64/ppc64le, riscv64
Interface	KVM ioctl interface (/dev/kvm)
UmkaOS chapters	Section 18.1 (KVM), Section 18.4 (migration)

Each test is a tiny guest OS that exercises specific KVM features: VM-exit handling, interrupt injection, nested virtualization, memory mapping, timer emulation. Tests run as userspace programs that interact with /dev/kvm — they test the hypervisor's ioctl interface as a black box. Multi-architecture coverage aligns perfectly with UmkaOS's 8-arch support matrix.

25.18.1.7 IGT GPU Tools¶

Property	Value
Repository	`gitlab.freedesktop.org/drm/igt-gpu-tools`
Test count	2,228+ subtests (50 KMS test binaries alone). ~6M subtests executed per week across ~130 test machines
Interface	Userspace: DRM ioctls (/dev/dri/*)
UmkaOS chapters	Section 21.4 (§DRM/KMS in user-io), Section 22.1 (accelerators)

IGT tests DRM/KMS kernel APIs: mode setting, planes, flips, atomic modesetting, vblanks, color management, rotation, cursor. Vendor-agnostic tests cover the core DRM subsystem; vendor-specific directories cover Intel, AMD, and vc4/v3d. Tests operate via DRM ioctls — pure userspace.

Note: Originally "Intel GPU Tools" but now vendor-agnostic. VKMS (Virtual KMS) support means tests can run without physical GPU hardware in QEMU — ideal for agentic CI.

25.18.1.8 blktests¶

Property	Value
Repository	`github.com/osandov/blktests`
Test count	~200+ tests across test groups
Test groups	`block` (generic block layer), `loop` (loop devices), `nvme` (NVMe), `scsi` (SCSI), `dm` (device-mapper), `nbd` (network block device), `zbd` (zoned block devices), `thp` (transparent huge pages interaction)
Interface	Userspace: block device ioctls, sysfs, procfs
UmkaOS chapters	Section 15.2 (block I/O), Section 15.19 (NVMe), Section 15.18 (I/O scheduling)

Complements xfstests: while xfstests tests filesystem semantics, blktests tests the block layer below. Full suite runs in ~1 day (vs 5-6 days for xfstests).

25.18.2 Tier 2: Kselftest Subdirectories¶

tools/testing/selftests/ in the Linux source tree contains 100+ subdirectories, each a mini test suite for a specific subsystem. Kselftest runs as userspace programs. The following subdirectories have the highest value for UmkaOS agentic development:

Subdirectory	Approx Tests	UmkaOS Chapter	What It Validates
`net/`	100+ scripts	Section 16.2	Routing, VLANs, tunnels, namespaces, policy routing, GRO/GSO, TCP, UDP, MPTCP
`net/mptcp/`	20+	Section 16.11	MPTCP subflows, path management, fallback to TCP
`net/netfilter/`	20+	Section 16.18	nftables, conntrack, NAT, packet filtering rules
`mm/`	50+	Section 4.8, Section 4.7	mmap, mprotect, madvise, userfaultfd, THP, KSM, NUMA balancing, mremap
`cgroup/`	30+	Section 17.2	Cgroup v2 hierarchy, memory controller, CPU controller, freezer
`seccomp/`	20+	Section 10.3	seccomp-BPF filter installation, SECCOMP_RET_* actions, TSYNC
`capabilities/`	10+	Section 9.1, Section 9.9	Capability inheritance, ambient caps, no_new_privs
`futex/`	15+	Section 19.1 (§futex)	futex ops: wait/wake/waitv, PI futexes, robust lists
`ptrace/`	20+	Section 20.4	ptrace attach/detach, PEEKUSER, register read/write, syscall tracing
`ipc/`	10+	Section 17.3	POSIX IPC: semaphores, shared memory, message queues. Achieves 73% line coverage on Linux — small subsystem, extremely well tested
`io_uring/`	30+	Section 19.1 (§io_uring)	io_uring submission/completion, various op types
`timers/`	15+	Section 7.1 (§timekeeping)	POSIX timers, timerfd, clock_gettime, nanosleep precision
`clone3/`	10+	Section 8.1	clone3() flags, pidfd, CLONE_INTO_CGROUP
`pidfd/`	10+	Section 8.1	pidfd_open, pidfd_send_signal, pidfd_getfd
`mount/`	10+	Section 14.1 (§mount)	New mount API: fsopen/fsmount/move_mount
`filesystems/`	15+	Section 14.1, Section 14.13	statx, fanotify/inotify, overlayfs
`landlock/`	15+	Section 9.8	Landlock LSM access rules
`perf_events/`	10+	Section 20.8	perf_event_open, PMU counters, sampling modes
`kvm/`	20+	Section 18.1	KVM ioctls, vCPU creation, memory regions

Agentic kselftest workflow:

For each subsystem (e.g., "cgroup"):
  1. Agent reads selftests/cgroup/ tests
  2. Agent reads UmkaOS spec ([Section 17.2](17-containers.md#control-groups))
  3. Agent implements the cgroup v2 interface
  4. Agent runs: make -C tools/testing/selftests TARGETS=cgroup run_tests
  5. Failures → fix → goto 3
  6. Pass → commit, move to next subsystem

25.18.3 Tier 3: Specialized External Suites¶

Test Suite	Repository	UmkaOS Chapter	Notes
nftables test suite	`nftables.org` + `selftests/net/netfilter/`	Section 16.18	Packet filtering rules, sets, maps, chains, conntrack
iproute2 tests	iproute2 source tree	Section 16.17	Validates netlink interface: route/rule/neigh/bridge/tc commands
FRRouting Topotests	`github.com/FRRouting/frr`	Section 16.6	Routing protocol compliance (BGP, OSPF) via netlink/routing socket
perf test (built-in)	`perf test` command (~80 subtests)	Section 20.8, Section 20.2	PMU events, tracepoints, symbol resolution, dwarf unwinding
syzkaller	`github.com/google/syzkaller`	All chapters	Fuzz-driven syscall testing. Not deterministic but finds crashes/hangs LTP misses. Agent reads crash reports as bug specs
Docker/K8s test suites	Docker CE tests, K8s e2e conformance	All chapters	Application-level validation: "does `docker run nginx` work?" is the ultimate compat test

25.18.4 Coverage Map: UmkaOS Chapters × Available Test Suites¶

The following table shows which test suites provide coverage for each major UmkaOS subsystem area, and the expected value for test-driven agentic development:

Subsystem Area	Primary Suite	Supporting Suites	Agentic Value
Syscall dispatch	LTP	kselftest (clone3, pidfd)	Very high
Memory management	kselftest/mm	LTP (mm tests)	High
Process lifecycle	LTP (fork/exec/signals)	kselftest (clone3, pidfd)	High
Capabilities + credentials	LTP, kselftest/capabilities	kselftest/landlock	High
seccomp-BPF	kselftest/seccomp	—	High
VFS + filesystems	xfstests (~1,500 tests)	kselftest/mount, kselftest/filesystems	Very high
Block I/O	blktests	xfstests (I/O path tests)	Very high
TCP/IP networking	packetdrill (2,000+)	kselftest/net	Very high
eBPF	BPF selftests (~1,000)	—	Very high
io_uring	liburing tests (~200)	kselftest/io_uring	Very high
Cgroups v2	kselftest/cgroup	LTP (cgroup tests)	High
Namespaces	LTP, kselftest/net	—	High
IPC (SysV + POSIX)	LTP, kselftest/ipc	—	High (73% coverage)
KVM hypervisor	kvm-unit-tests	kselftest/kvm	High
DRM/KMS display	IGT GPU Tools (2,228+)	—	High
Packet filtering	kselftest/net/netfilter	nftables test suite	Medium-high
Perf events + PMU	kselftest/perf_events	`perf test`	Medium
ptrace + debugging	kselftest/ptrace	LTP	Medium
Timers + clocks	kselftest/timers	LTP	Medium
Audio (ALSA)	alsa-utils (bat, speaker-test)	—	Low (thin tests)
Scheduler	—	—	Low (not directly testable via API)

25.18.5 The Test-Driven Agentic Development Pattern¶

All test suites in this inventory share a common property that makes them uniquely valuable for agentic development: they test via the userspace API boundary, which is exactly the external ABI contract UmkaOS must implement. The pattern is:

For each subsystem with an available test suite:
  1. Agent reads the test suite → derives the behavioral contract
  2. Agent reads the UmkaOS spec → understands the intended design
  3. Agent implements the subsystem
  4. Agent runs the test suite as the feedback signal
  5. Test failures provide specific, actionable error descriptions
     (expected vs actual values, specific syscall, specific edge case)
  6. Agent fixes → re-runs → iterates until pass
  7. Full regression after each subsystem completes

Why this works for agents but not (as well) for humans: - Agents can run thousands of test iterations per day (compile + test in minutes) - Agents can read test source code to understand the exact behavioral expectation - Agents do not get fatigued by repetitive fix → test → fix cycles - The test suite provides an unambiguous progress metric (N/M tests passing) - Cross-architecture testing (8 arches × QEMU) is parallelizable

Phase integration (see Section 24.2): - Phase 2 (Self-hosting): LTP core syscalls, basic kselftest/net, basic xfstests/generic - Phase 3 (Real workloads): Full LTP, full xfstests, packetdrill TCP suite, BPF selftests, blktests, liburing tests - Phase 4 (Production): Full kvm-unit-tests, IGT GPU Tools, syzkaller soak, Docker/K8s e2e - Phase 5 (Ecosystem): nftables suite, FRR topotests, perf test, vendor-specific IGT

Cross-references: - LTP agentic workflow: Section 25.17 - Phase exit criteria: Section 24.2 - Verification strategy: Section 24.3 - QEMU CPU testing matrix: Section 25.8