UmkaOS Architecture Design Document¶
Canonical reference for all development. This document defines the complete architecture of UmkaOS. All implementation decisions must trace back to this specification.
The architecture is organized into 25 chapters using chapter-scoped section numbering. Cross-references are clickable links between files.
Copyright © 2025 Anton Starikov \<ant.starikov@gmail.com> All rights reserved. This design document is the original work of the author. Reproduction or distribution without explicit permission is prohibited.
How to Read This Document¶
Section numbering uses Chapter.Section format:
Section 11.4= Chapter 11, Section 4 (in11-drivers.md)Section 11.4.2= Chapter 11, Section 4, Subsection 2- Adding sections to one chapter never affects other chapters
Cross-references are clickable markdown links:
{ref:mount-tree-data-structures-and-operations}
Master Index¶
| Chapter | File | Domain |
|---|---|---|
| 1 | 01-overview.md | Design philosophy, architectural goals, performance budget |
| 2 | 02-boot-hardware.md | Boot chain, device discovery, ACPI/DT, multi-architecture support, hardware memory safety |
| 3 | 03-concurrency.md | Locking strategy, lock-free structures, PerCpu, RCU, atomic operations, memory ordering, interrupt handling |
| 4 | 04-memory.md | Physical allocator, virtual memory, page tables, slab, NUMA, compression tier, page cache |
| 5 | 05-distributed.md | Cluster topology, distance matrix, RDMA transport, distributed lock manager, SmartNIC/DPU integration |
| 6 | 06-dsm.md | Page-granularity coherence over RDMA for workloads that benefit from |
| 7 | 07-scheduling.md | EEVDF, RT, deadline scheduling, per-CPU runqueues, EAS, power budgeting, CPU bandwidth, timekeeping |
| 8 | 08-process.md | Task/Process structs, fork/exec/exit, signals, process groups, sessions, real-time guarantees |
| 9 | 09-security.md | Capabilities, credentials, LSM framework, verified boot, TPM, IMA, post-quantum cryptography, confidential computing |
| 10 | 10-security-extensions.md | Companion to Chapter 9: Security Architecture. |
| 11 | 11-drivers.md | Three-tier protection model, isolation mechanisms, KABI, driver model, device registry, zero-copy I/O, IPC, crash recovery, driver subsystems, D-Bus bridge |
| 12 | 12-kabi.md | Stable driver ABI, KABI IDL, vtable design, driver signing, compatibility windows |
| 13 | 13-device-classes.md | NIC, GPU, USB, I2C/SMBus, WiFi, Bluetooth, Camera, Printers, Live Kernel Evolution, Watchdog, SPI, rfkill, MTD, IPMI, UIO, NVMEM, SoundWire |
| 14 | 14-vfs.md | VFS architecture, dentry cache, mount tree, path resolution, overlayfs, mount namespace operations |
| 15 | 15-storage.md | Durability guarantees, block I/O, volume management, block storage networking, clustered filesystems, persistent memory, SATA/AHCI, ext4/XFS/Btrfs, ZFS |
| 16 | 16-networking.md | Socket layer, NetBuf, routing, TCP stack, congestion control, kTLS, overlays/tunnels, netlink, packet filtering, interface naming, network service provider |
| 17 | 17-containers.md | Namespace architecture (8 types), cgroups v2, POSIX IPC, OCI runtime |
| 18 | 18-virtualization.md | KVM host/guest integration, VMX/VHE/H-ext, live migration, PV features, suspend/resume |
| 19 | 19-sysapi.md | Syscall interface, futex, netlink, Windows emulation, dropped compatibility, native syscalls, safe extensibility |
| 20 | 20-observability.md | Fault management architecture, stable tracepoints, debugging/ptrace, unified object namespace (umkafs) |
| 21 | 21-user-io.md | TTY/PTY, console/logging, input (evdev), audio (ALSA), display/graphics (DRM/KMS) |
| 22 | 22-accelerators.md | Unified accelerator framework, accelerator memory/P2P DMA, isolation/scheduling, in-kernel inference, accelerator networking, unified compute model |
| 23 | 23-ml-policy.md | Companion to Chapter 22: AI/ML and Accelerators. |
| 24 | 24-roadmap.md | Driver ecosystem, implementation phases, verification strategy, technical risks, formal verification, appendices |
| 25 | 25-agentic.md | Development model, parallel workflow, phase timelines, sensitivity analysis, recommendations |
Parts¶
Part I: Foundations¶
- Chapter 1: Architecture Overview — Design philosophy, architectural goals, performance budget
- Chapter 2: Boot and Hardware Discovery — Boot chain, device discovery, ACPI/DT, multi-architecture support, hardware memory safety
- Chapter 3: Concurrency Model — Locking strategy, lock-free structures, PerCpu, RCU, atomic operations, memory ordering, interrupt handling
- Chapter 4: Memory Management — Physical allocator, virtual memory, page tables, slab, NUMA, compression tier, page cache
- Chapter 5: Distributed Kernel Architecture — Cluster topology, distance matrix, RDMA transport, distributed lock manager, SmartNIC/DPU integration
Part II: Core Subsystems¶
- Chapter 6: Distributed Shared Memory — Page-granularity coherence over RDMA for workloads that benefit from
- Chapter 7: Scheduling and Power Management — EEVDF, RT, deadline scheduling, per-CPU runqueues, EAS, power budgeting, CPU bandwidth, timekeeping
- Chapter 8: Process and Task Management — Task/Process structs, fork/exec/exit, signals, process groups, sessions, real-time guarantees
- Chapter 9: Security Architecture — Capabilities, credentials, LSM framework, verified boot, TPM, IMA, post-quantum cryptography, confidential computing
- Chapter 10: Security Extensions — Companion to Chapter 9: Security Architecture.
- Chapter 11: Driver Architecture and Isolation — Three-tier protection model, isolation mechanisms, KABI, driver model, device registry, zero-copy I/O, IPC, crash recovery, driver subsystems, D-Bus bridge
- Chapter 12: KABI — Kernel Driver ABI — Stable driver ABI, KABI IDL, vtable design, driver signing, compatibility windows
- Chapter 13: Device Class Frameworks — NIC, GPU, USB, I2C/SMBus, WiFi, Bluetooth, Camera, Printers, Live Kernel Evolution, Watchdog, SPI, rfkill, MTD, IPMI, UIO, NVMEM, SoundWire
Part III: Major Subsystems¶
- Chapter 14: Virtual Filesystem Layer — VFS architecture, dentry cache, mount tree, path resolution, overlayfs, mount namespace operations
- Chapter 15: Storage and Filesystems — Durability guarantees, block I/O, volume management, block storage networking, clustered filesystems, persistent memory, SATA/AHCI, ext4/XFS/Btrfs, ZFS
- Chapter 16: Networking — Socket layer, NetBuf, routing, TCP stack, congestion control, kTLS, overlays/tunnels, netlink, packet filtering, interface naming, network service provider
- Chapter 17: Containers and Namespaces — Namespace architecture (8 types), cgroups v2, POSIX IPC, OCI runtime
- Chapter 18: Virtualization — KVM host/guest integration, VMX/VHE/H-ext, live migration, PV features, suspend/resume
- Chapter 19: System API — Syscall interface, futex, netlink, Windows emulation, dropped compatibility, native syscalls, safe extensibility
Part IV: Specialized Subsystems¶
- Chapter 20: Observability and Diagnostics — Fault management architecture, stable tracepoints, debugging/ptrace, unified object namespace (umkafs)
- Chapter 21: User I/O Subsystems — TTY/PTY, console/logging, input (evdev), audio (ALSA), display/graphics (DRM/KMS)
- Chapter 22: AI/ML and Accelerators — Unified accelerator framework, accelerator memory/P2P DMA, isolation/scheduling, in-kernel inference, accelerator networking, unified compute model
- Chapter 23: AI/ML Policy Framework — Companion to Chapter 22: AI/ML and Accelerators.
Part V: Meta¶
- Chapter 24: Roadmap and Verification — Driver ecosystem, implementation phases, verification strategy, technical risks, formal verification, appendices
- Chapter 25: Agentic Development Methodology — Development model, parallel workflow, phase timelines, sensitivity analysis, recommendations
Detailed Table of Contents¶
Chapter 1: Architecture Overview¶
- Section 1.1: Overview and Philosophy
- Section 1.1.1: What UmkaOS Is
- Section 1.1.2: Replaceability Model: Nucleus and Evolvable
- Section 1.2: Architecture Coverage
- Section 1.2.1: Why UmkaOS Exists
- Section 1.2.2: What UmkaOS Delivers
- Section 1.2.3: The Core Technical Challenge
- Section 1.2.4: Design Principles
- Section 1.3: Performance Budget
- Section 1.3.2: Per-Operation Overhead
- Section 1.3.3: Macro Benchmark Targets
- Section 1.3.4: Where the Overhead Comes From
- Section 1.3.5: Comprehensive Overhead Budget
- Section 1.3.6: Counter and Identifier Longevity Budget
Chapter 2: Boot and Hardware Discovery¶
- Section 2.1: Boot Overview and Protocols
- Section 2.1.1: Boot Protocols
- Section 2.2: x86-64 Boot Entry and Initialization
- Section 2.2.1: x86-64 Entry Sequence
- Section 2.2.2: Kernel Initialization Phases (x86-64)
- Section 2.3: Boot Init Reference and SMP Bringup
- Section 2.3.1: Kernel Init Phase Reference (Cross-Architecture)
- Section 2.3.2: Secondary CPU Bringup (x86-64 SMP)
- Section 2.4: ACPI Table Parsing
- Section 2.5: AArch64 Boot Sequence
- Section 2.6: ARMv7 Boot Sequence
- Section 2.7: RISC-V 64 Boot Sequence
- Section 2.8: Device Tree and Platform Discovery
- Section 2.8.1: Device Tree Blob Parsing
- Section 2.8.2: Cross-Architecture Comparison
- Section 2.9: Boot Memory Management
- Section 2.9.1: Multiboot1 Memory Map Parsing
- Section 2.9.2: Boot Allocator Design
- Section 2.10: PPC32 Boot Sequence
- Section 2.11: PPC64LE Boot Sequence
- Section 2.12: s390x Boot Sequence
- Section 2.13: LoongArch64 Boot Sequence
- Section 2.14: Interrupt Controller Architecture
- Section 2.15: NUMA Topology Discovery
- Section 2.16: Extended State and CPU Features
- Section 2.16.1: Per-Architecture Extended State (FPU) Initialization
- Section 2.16.2: CPU Feature Registry
- Section 2.17: Production Boot Target
- Section 2.17.1: Goal: Drop-in Kernel Package
- Section 2.17.2: Boot Requirements
- Section 2.17.3: Target Boot Sequences
- Section 2.17.4: Initramfs Detection and Loading
- Section 2.18: CPU Errata and Speculation Mitigations
- Section 2.18.1: ARM SoC and RISC-V Vendor Diversity (Non-x86 Platform Quirks)
- Section 2.18.2: Speculation Mitigations (All Architectures)
- Section 2.18.3: Dual-Boot Safety
- Section 2.18.4: Boot Protocol Migration Path
- Section 2.19: Secure Boot and Measured Boot
- Section 2.19.1: UEFI Secure Boot
- Section 2.19.2: Measured Boot (TPM PCR Chain)
- Section 2.19.3: Kernel Responsibilities Summary
- Section 2.20: UEFI Runtime Services
- Section 2.20.1: Virtual Address Mapping
- Section 2.20.2: NVRAM (EFI Variables)
- Section 2.20.3: Time Services
- Section 2.20.4: Reset and Shutdown
- Section 2.21: Kernel Image Structure and Loading Model
- Section 2.21.1: Nucleus: Verified Nucleus (Non-Replaceable)
- Section 2.21.2: Evolvable: Boot Monolith (First Loadable, Swappable)
- Section 2.21.3: On-Demand KABI Services (Loaded When First Requested)
- Section 2.21.4: Device Drivers (Tier 1 / Tier 2)
- Section 2.21.5: Distribution Model
- Section 2.21.6: CPU-Dependent Adaptation
- Section 2.21.7: Loading Architecture Summary
- Section 2.21.8: Cross-references
- Section 2.22: First-Class Architectures
- Section 2.22.1: Architecture-Specific Code
- Section 2.22.2: No 32-bit Compatibility Modes on 64-bit Kernels
- Section 2.22.3: 64-bit Atomics on 32-bit Architectures
- Section 2.22.4: Advanced Feature Architecture Parity
- Section 2.22.5: QEMU vs Real-Silicon Divergences
- Section 2.23: Hardware Memory Safety
- Section 2.23.1: ARM MTE (Memory Tagging Extension)
- Section 2.23.2: Design: Tag-Aware Memory Allocator
- Section 2.23.3: Integration Points
- Section 2.23.4: Intel LAM (Linear Address Masking)
- Section 2.23.5: AArch64 Pointer Authentication (PAC)
- Section 2.23.6: CHERI (Future)
- Section 2.23.7: Performance Impact
- Section 2.23.8: Hardware Fault Handler Constraints
- Section 2.24: Clock Framework
- Section 2.24.1: Design: Typed Clock Tree vs Linux CCF
- Section 2.24.2: Core Types
- Section 2.24.3: Consumer API
- Section 2.24.4: Clock Tree Population
- Section 2.24.5: Deferred Probe Integration
- Section 2.24.6: Architecture-Specific Notes
- Section 2.24.7: Linux External ABI
Chapter 3: Concurrency Model¶
- Section 3.1: Rust Ownership for Lock-Free Paths
- Section 3.1.1: PerCpuCounter: Batched Per-CPU Counter for Warm Paths
- Section 3.1.2: ArcSwap — Lock-Free Atomic
Arc<T>Replacement - Section 3.2: CpuLocal: Register-Based Per-CPU Fast Path
- Section 3.2.2: Initialization Sequence
- Section 3.3: PerCpu Borrow Checking: Debug-Only in Release Builds
- Section 3.3.1: IRQ Save/Restore Elision:
get_mut_nosave() - Section 3.3.2: NMI Safety
- Section 3.4: Cumulative Performance Budget
- Section 3.4.2: Hierarchical Quiescent State Reporting
- Section 3.4.3: Grace Period State Machine (
rcu_gp_kthread) - Section 3.4.4: Force-Quiescent-State (FQS) Scan
- Section 3.4.5: Expedited Grace Periods
- Section 3.4.6: RCU Interaction with Live Kernel Evolution
- Section 3.4.7: CPU Hotplug and the RCU Tree
- Section 3.4.8: NUMA-Aware Tree Construction
- Section 3.5: Locking Strategy
- Section 3.5.1: Locking Primitive Types
- Section 3.5.2: Preemption and Interrupt Context Model
- Section 3.5.3: Lock Contention Tracking
- Section 3.6: Lock-Free Data Structures
- Section 3.6.1:
SeqLock<T>— Sequence Lock - Section 3.6.2: Compare-and-Swap Semantics Differ by Architecture
- Section 3.6.3: LL/SC Restrictions on Non-Cacheable (MMIO) Memory
- Section 3.6.4:
Idr<T>— Integer ID Allocator - Section 3.6.5:
WaitQueueHead— Blocking Wait Queue - Section 3.6.6:
Completion— One-Shot (or Multi-Shot) Signaling Primitive - Section 3.6.7:
SpscRing<T, N>— Lock-Free Single-Producer Single-Consumer Ring Buffer - Section 3.7: Scalability Analysis: Hot-Path Metadata on 256+ Cores
- Section 3.8: Interrupt Handling
- Section 3.8.1: s390x Interrupt Model
- Section 3.8.2: LoongArch64 Interrupt Model
- Section 3.8.3: Softirq: Deferred Interrupt Processing
- Section 3.9: Memory Model Differences Across Architectures
- Section 3.9.1: x86 TSO Conceals Ordering Bugs
- Section 3.9.2: ARM, RISC-V, and PowerPC: Explicit Memory Ordering Surfaces True Requirements
- Section 3.9.3: Ordering Instruction Cost: Ring Buffer and RCU Performance by Architecture
- Section 3.9.4: Endianness
- Section 3.10: Algorithm Dispatch and In-Kernel SIMD
- Section 3.10.1: AlgoDispatch: Zero-Cost Runtime Dispatch
- Section 3.10.2: SimdKernelGuard: Safe In-Kernel SIMD Use
- Section 3.10.3: Combined Usage Pattern
- Section 3.10.4: Feature-Dependent Subsystem Catalog
- Section 3.11: Workqueue / Deferred Work
- Section 3.11.1: Core Types
- Section 3.11.2: API
- Section 3.11.3: Standard Named Queues
- Section 3.11.4: BoundedMpmcRing Memory Ordering Specification
- Section 3.11.5: Cgroup Integration
- Section 3.12: IRQ Chip and irqdomain Hierarchy
- Section 3.12.1: Core Types
- Section 3.12.2: Root Domain Implementations Per Architecture
- Section 3.12.3: IRQ Receipt Flow
- Section 3.13: Collection Usage Policy
- Section 3.14: Error Handling and Fault Containment
- Section 3.14.1: Kernel Error Model
- Section 3.14.2: Fault Containment Boundaries
- Section 3.14.3: Panic Handling
- Section 3.14.4: Error Reporting to Userspace
- Section 3.14.5: Error Escalation Paths
Chapter 4: Memory Management¶
- Section 4.1: Boot Allocator
- Section 4.1.1: Kernel Address Types
- Section 4.2: Physical Memory Allocator
- Section 4.2.1: BuddyAllocator
- Section 4.2.2: Allocator Replaceability — 50-Year Uptime Design
- Section 4.3: Slab Allocator
- Section 4.3.2: Page Frame Descriptor
- Section 4.3.3: Slab Cache Garbage Collection
- Section 4.4: Page Cache
- Section 4.4.1: Generational LRU Page Reclaim
- Section 4.4.2: Readahead Engine
- Section 4.5: OOM Killer and Process Memory Hibernation
- Section 4.5.1: OOM Killer Policy
- Section 4.5.2: Process Memory Hibernation
- Section 4.6: Writeback Subsystem
- Section 4.6.1: Writeback Thread Organization
- Section 4.7: Transparent Huge Page Promotion and Memory Compaction
- Section 4.7.1: Struct Definitions
- Section 4.8: Virtual Memory Manager
- Section 4.8.1: MmStruct — Per-Process Address Space
- Section 4.8.2: Maple Tree VMA Management
- Section 4.8.3: Page Fault Handler
- Section 4.8.4: Page Fault Metadata by Architecture
- Section 4.8.5: Custom Fault Handler Registration
- Section 4.8.6: Page Table Walk and User Page Pinning
- Section 4.8.7: TLB Invalidation by Architecture
- Section 4.8.8: VMM Replaceability — 50-Year Uptime Design
- Section 4.9: PCID / ASID Management
- Section 4.9.1: Lazy TLB Mode for Kernel Threads
- Section 4.9.2: Observability
- Section 4.10: Memory Tagging (Hardware-Assisted)
- Section 4.11: NUMA Topology and Policy
- Section 4.11.1: Topology Discovery
- Section 4.11.2: Memory Allocation Policy
- Section 4.11.3: Automatic NUMA Balancing
- Section 4.11.4: NUMA-Aware Kernel Allocations
- Section 4.11.5: NUMA Balancing and Isolation Domain Memory
- Section 4.11.6: Memory Tier Classification
- Section 4.12: Memory Compression Tier
- Section 4.12.1: Problem
- Section 4.12.2: Architecture
- Section 4.12.3: Compressed Page Pool
- Section 4.12.4: Compression Policy
- Section 4.12.5: Decompression Path
- Section 4.12.6: NUMA Awareness
- Section 4.12.7: Compression Algorithm Selection
- Section 4.12.8: Latency Spikes and Fragmentation
- Section 4.12.9: Linux Interface Exposure
- Section 4.13: Swap Subsystem
- Section 4.13.1: Swap Area
- Section 4.13.2: Swap Entry
- Section 4.13.3: Swap Cache
- Section 4.13.4: Swap Slot Allocator
- Section 4.13.5: swapon(2) / swapoff(2) Syscalls
- Section 4.13.6: Reclaim Integration
- Section 4.13.7: Swap Readahead
- Section 4.13.8: Per-Cgroup Swap Accounting
- Section 4.13.9: procfs Interface
- Section 4.13.10: Encrypted Swap
- Section 4.13.11: Performance Budget
- Section 4.14: DMA Subsystem
- Section 4.14.1: Design: IOMMU-First DMA
- Section 4.14.2: Core Types
- Section 4.14.3: DmaDevice Trait
- Section 4.14.4: Cache Coherency Per Architecture
- Section 4.14.5: SWIOTLB (Software IOMMU / Bounce Buffering)
- Section 4.14.6: Linux External ABI
- Section 4.14.7: Tier 1 DMA Allocation Path
- Section 4.14.8: DMA-BUF: File-Descriptor-Based Buffer Sharing
- Section 4.15: Extended Memory Operations
- Section 4.15.1: mremap — Remap a Virtual Address Region
- Section 4.15.2: mincore — Query Page Residency
- Section 4.15.3: membarrier — Expedite Memory Barrier
- Section 4.15.4: userfaultfd — User-Space Page Fault Handling
- Section 4.15.5: memfd_create — Create Anonymous File
- Section 4.15.6: memfd_secret — Create a Secret Memory Region
- Section 4.15.7: process_vm_readv / process_vm_writev — Cross-Process Memory I/O
- Section 4.15.8: process_madvise — Batch madvise for Another Process
- Section 4.15.9: Memory Page Offlining (memory_hotplug)
Chapter 5: Distributed Kernel Architecture¶
- Section 5.1: Distributed Kernel Architecture
- Section 5.1.2: UmkaOS Peer Protocol Wire Specification
- Section 5.2: Cluster Topology Model
- Section 5.2.1: Extending the Device Registry
- Section 5.2.2: Cluster Node Descriptor
- Section 5.2.3: Host-Side Component: umka-peer-transport
- Section 5.2.4: Live Firmware Update Without Host Reboot
- Section 5.2.5: Attack Surface Reduction
- Section 5.2.6: Toward a Universal Device Protocol
- Section 5.2.7: Hierarchical Cluster Topology
- Section 5.2.8: Topology Discovery
- Section 5.2.9: Peer Registry and Topology
- Section 5.3: Peer Kernel Isolation and Crash Recovery
- Section 5.3.1: The Isolation Model Shift
- Section 5.3.2: Host Unilateral Controls
- Section 5.3.3: Isolation Comparison
- Section 5.3.4: Crash Detection
- Section 5.3.5: Recovery Sequence
- Section 5.3.6: What Survives Peer Kernel Crash Intact
- Section 5.3.7: Relationship to Other Failure Handling Sections
- Section 5.4: RDMA-Native Transport Layer
- Section 5.4.1: Design: Kernel RDMA Transport (umka-rdma)
- Section 5.4.2: RDMA Transport Implementation
- Section 5.4.3: RDMA Device Capability Flags
- Section 5.4.4: Pre-Registered Kernel Memory
- Section 5.4.5: Performance Characteristics
- Section 5.4.6: Security Considerations
- Section 5.4.7: RDMA Pool Manager
- Section 5.4.8: QP Tear-down Protocol
- Section 5.5: Distributed IPC
- Section 5.5.1: Extending Ring Buffers to RDMA
- Section 5.5.2: Transparent Transport Selection
- Section 5.5.3: Ring Buffer RDMA Protocol
- Section 5.5.4: Batching and Coalescing
- Section 5.6: Cluster-Aware Scheduler
- Section 5.6.1: Problem
- Section 5.6.2: Design: Two-Level Scheduler
- Section 5.6.3: Global Scheduler State
- Section 5.6.4: Process Migration
- Section 5.6.5: Capability-Gated Migration
- Section 5.6.6: Reconciliation: Local vs Distributed Scheduling
- Section 5.6.7: Cluster Placement Policy Expression Language (CPPEL)
- Section 5.7: Network-Portable Capabilities
- Section 5.7.1: Problem
- Section 5.7.2: Design: Cryptographically-Signed Capabilities
- Section 5.7.3: Verification
- Section 5.7.4: Revocation
- Section 5.7.5: Key Rotation and Revocation
- Section 5.7.6: Use Case: Remote GPU Access
- Section 5.7.7: Distributed Device Fabric
- Section 5.8: Failure Handling and Distributed Recovery
- Section 5.8.1: Split-Brain Detection and Recovery
- Section 5.8.2: Graceful Shutdown Protocol
- Section 5.8.3: Cross-Subsystem Recovery Ordering: DSM and DLM
- Section 5.9: CXL 3.0 Fabric Integration
- Section 5.9.1: Why CXL Changes Everything
- Section 5.9.2: Design: CXL as a First-Class Memory Tier
- Section 5.9.3: CXL + RDMA Hybrid
- Section 5.9.4: CXL Shared Memory for DSM
- Section 5.9.5: CXL Devices as UmkaOS Peers
- Section 5.10: Compatibility, Integration, and Phasing
- Section 5.10.1: Linux Compatibility and MPI Integration
- Section 5.10.2: Integration with UmkaOS Architecture
- Section 5.10.3: Implementation Phasing
- Section 5.10.4: Licensing Summary
- Section 5.10.5: Comparison: Why Previous DSM Projects Failed and Why This Succeeds
- Section 5.11: SmartNIC and DPU Integration
- Section 5.11.1: Problem
- Section 5.11.2: Design: DPUs as Tier M Peers
- Section 5.11.3: How It Works
- Section 5.11.4: DPU Discovery and Join
- Section 5.11.5: Use Cases
- Section 5.11.6: DPU Failure Handling
- Section 5.11.7: Service Registry Integration (PeerServiceProxy)
- Section 5.11.8: Shared State Consistency
- Section 5.11.9: Performance Impact
- Section 5.12: Affinity-Based Service Placement
- Section 5.12.1: Affinity Model
- Section 5.12.2: Affinity in KABI Manifests
- Section 5.12.3: Placement Algorithm
- Section 5.12.4: Automatic Offload Example
- Section 5.12.5: Re-evaluation Triggers
- Section 5.12.6: Performance Bounds
- Section 5.12.7: Relationship to Existing Mechanisms
- Section 5.12.8: Small Cluster Optimization
Chapter 6: Distributed Shared Memory¶
- Section 6.1: DSM Foundational Types
- Section 6.1.1: DsmMsgType Range Allocation
- Section 6.1.2: Wire Format Integer Types
- Section 6.2: Design Overview
- Section 6.3: Page Ownership Model
- Section 6.4: Home Node Directory
- Section 6.5: Page Fault Flow
- Section 6.6: DSM Coherence Protocol: MOESI
- Section 6.6.1: MOESI Protocol States
- Section 6.6.2: Directory Entry at the Home Node
- Section 6.6.3: State Transitions — Requestor's View
- Section 6.6.4: Message Types
- Section 6.6.5: Deadlock Avoidance
- Section 6.6.6: Performance Characteristics
- Section 6.6.7: Write-Update Protocol (DSM_WRITE_UPDATE Flag)
- Section 6.6.8: DSM Coherence Message Wire Format
- Section 6.6.9: Write-Update Wire Encoding
- Section 6.6.10: Causal Consistency Protocol (DSM_CAUSAL)
- Section 6.7: PageLocationTracker Extension
- Section 6.8: DSM Region Lifecycle
- Section 6.8.1: DSM Region Management
- Section 6.8.2: DSM Region Destruction Protocol
- Section 6.8.3: Linux Compatibility Interface
- Section 6.9: DSM Operational Properties
- Section 6.9.1: False Sharing Mitigation
- Section 6.9.2: Error Handling
- Section 6.9.3: Honest Performance Expectations
- Section 6.9.4: Interaction with Memory Compression
- Section 6.10: Global Memory Pool
- Section 6.10.1: Design: Cluster Memory as a Unified Tier Hierarchy
- Section 6.10.2: Memory Pool Accounting
- Section 6.10.3: The Killer Use Case: AI Model Memory
- Section 6.10.4: Migration Policy
- Section 6.10.5: Cgroup Integration
- Section 6.11: Distributed Page Cache
- Section 6.11.1: Problem
- Section 6.11.2: Design: Cooperative Page Cache
- Section 6.11.3: Page Cache Directory
- Section 6.11.4: RDMA Probe Protocol
- Section 6.11.5: Cache Coherence for Shared Files
- Section 6.11.6: AI Training Data Pipeline
- Section 6.11.7: DSM Dirty Tracking Coordination
- Section 6.11.8: DSM Eviction Policy
- Section 6.12: Subscriber-Controlled Caching
- Section 6.12.1: Subscriber Trait
- Section 6.12.2: Per-Region Cache Policy
- Section 6.12.3: Subscriber Control API
- Section 6.12.4: DLM-DSM Bidirectional Notification Hooks
- Section 6.12.5: Coherence Mechanism Selection
- Section 6.12.6: DLM Integration Pattern
- Section 6.12.7: DLM Token Binding
- Section 6.12.8: Writeback Ordering and Barriers
- Section 6.12.9: Optimistic Prefetch
- Section 6.12.10: Subscriber Usage Patterns
- Section 6.13: Anti-Entropy Protocol for DSM_RELAXED
- Section 6.13.1: Version Vectors
- Section 6.13.2: Anti-Entropy on Region Join
- Section 6.13.3: Anti-Entropy on Partition Heal
- Section 6.13.4: Wire Messages
- Section 6.13.5: Performance Bounds
- Section 6.13.6: Stale Sharer Reconciliation
- Section 6.13.7: Interaction with Other Consistency Modes
- Section 6.14: Application-Visible DSM (Level 2)
- Section 6.14.1: Syscall Interface
- Section 6.14.2: Kernel-Side Implementation
- Section 6.14.3: Per-Process DSM Region Tracking
- Section 6.14.4: Distributed Futex on DSM Pages
- Section 6.14.5: Security Model
- Section 6.14.6: Compatibility with Linux Shared Memory
- Section 6.14.7: Performance Considerations
Chapter 7: Scheduling and Power Management¶
- Section 7.1: Scheduler
- Section 7.1.1: Multi-Policy Design
- Section 7.1.2: Architecture
- Section 7.1.3: Key Properties
- Section 7.1.4: Scheduler Classes
- Section 7.2: Heterogeneous CPU Support (big.LITTLE / Intel Hybrid / RISC-V)
- Section 7.2.1: CPU Capacity Model
- Section 7.2.2: Energy Model
- Section 7.2.3: Energy-Aware Scheduling Algorithm
- Section 7.2.4: Per-Entity Load Tracking (PELT)
- Section 7.2.5: Frequency Domain Awareness and Cpufreq Integration
- Section 7.2.6: Intel Thread Director (ITD) Integration
- Section 7.2.7: Asymmetric Packing
- Section 7.2.8: Hierarchical Group Scheduling (cpu.weight Backing Mechanism)
- Section 7.2.9: Cgroup Integration
- Section 7.2.10: RISC-V Heterogeneous Hart Support
- Section 7.2.11: Topology Discovery
- Section 7.2.12: Linux Compatibility
- Section 7.2.13: Performance Impact
- Section 7.3: Context Switch and Register State
- Section 7.3.1: Context Switch Procedure
- Section 7.3.2: Extended Register State Management
- Section 7.3.3: Post-Context-Switch Cleanup (finish_task_switch)
- Section 7.3.4: CPU Hotplug Integration
- Section 7.4: Platform Power Management
- Section 7.4.1: Problem and Scope
- Section 7.4.2: RAPL — Running Average Power Limit
- Section 7.4.3: Per-Architecture Power Management Interfaces
- Section 7.4.4: Thermal Framework
- Section 7.4.5: Powercap Interface (sysfs)
- Section 7.4.6: Cgroup Power Accounting
- Section 7.4.7: VM Power Budget Enforcement
- Section 7.4.8: DCMI / IPMI Rack Power Management
- Section 7.4.9: Battery and SMBus Monitoring
- Section 7.4.10: Consumer Power Profiles
- Section 7.5: Suspend, Resume, and Runtime Power Management
- Section 7.5.1: Suspend and Resume Protocol
- Section 7.5.2: Integration Points
- Section 7.5.3: Per-Device Runtime Power Management
- Section 7.5.4: Cpuidle Governor
- Section 7.6: CPU Bandwidth Guarantees
- Section 7.6.1: Problem
- Section 7.6.2: Design: CBS-Based Group Bandwidth Reservation
- Section 7.6.3: Cgroup v2 Interface
- Section 7.6.4: Kernel-Internal Design
- Section 7.6.5: Overcommit Prevention
- Section 7.6.6: Interaction with Existing Controls
- Section 7.6.7: Nested Cgroup Hierarchy
- Section 7.6.8: cpu.max vs cpu.guarantee Interaction
- Section 7.6.9: Use Case: Driver Tier Isolation
- Section 7.6.10: CBS Task Migration Between Cores
- Section 7.6.11: cpu.max Ceiling Enforcement (Bandwidth Throttling)
- Section 7.6.12: ML Policy Integration
- Section 7.6.13: cpu.stat CBS Guarantee Statistics
- Section 7.7: Power Budgeting
- Section 7.7.1: Problem
- Section 7.7.2: Design: Power as a Schedulable Resource
- Section 7.7.3: Cgroup Integration
- Section 7.7.4: Power-Aware Scheduler
- Section 7.7.5: System-Level Power Accounting
- Section 7.7.6: Performance Impact
- Section 7.8: Timekeeping and Clock Management
- Section 7.8.1: Clock Source Hierarchy
- Section 7.8.2: Timekeeping Subsystem
- Section 7.8.3: vDSO Fast Path
- Section 7.8.4: Timer Infrastructure
- Section 7.8.5: Time Namespace Offsets
- Section 7.8.6: Clocksource Watchdog
- Section 7.8.7: Interaction with RT and Power Management
- Section 7.9: System Event Bus
- Section 7.9.1: Event Subscription Model
- Section 7.9.2: Subscription via Capability
- Section 7.9.3: Integration Points
- Section 7.10: Intent-Based Resource Management
- Section 7.10.1: The Abstraction Gap
- Section 7.10.2: Design: Resource Intents
- Section 7.10.3: Cgroup Integration
- Section 7.10.4: Objective Function and Conflict Resolution
- Section 7.10.5: The Optimization Loop
- Section 7.10.6: Performance Impact
- Section 7.10.7: Explainability Interface
- Section 7.10.8: Integration with ML Policy Framework
- Section 7.11: Core Provisioning and Workload Partitioning
- Section 7.11.1: Problem
- Section 7.11.2: Core Classes
- Section 7.11.3: Provisioning and Allocation
- Section 7.11.4: Cgroup Interface
- Section 7.11.5: Backfill Protocol
- Section 7.11.6: Gang Scheduling Protocol
- Section 7.11.7: Integration with Existing Scheduler
- Section 7.11.8: Relationship to Linux cpuset/isolcpus
- Section 7.11.9: Performance Bounds
Chapter 8: Process and Task Management¶
- Section 8.1: Process and Task Management
- Section 8.1.1: Task Model
- Section 8.1.2: Process Creation
- Section 8.1.3: Program Execution (exec)
- Section 8.1.4: Task Exit and Resource Cleanup
- Section 8.1.5: Address Space Operations
- Section 8.1.6: Namespaces
- Section 8.1.7: User-Mode Scheduling (Fibers and M:N Threading)
- Section 8.1.8: Signal Delivery Wakeup Protocol
- Section 8.2: Process Lifecycle Teardown
- Section 8.2.1:
do_exit()— Full Teardown Sequence - Section 8.2.2: Core Dump Generation
- Section 8.3: ELF Binary Loader
- Section 8.3.1: Binary Parameters Struct (BinPrm)
- Section 8.3.2: ELF Header Validation
- Section 8.3.3: Program Header Processing
- Section 8.3.4: Segment Loading Algorithm
- Section 8.3.5: Dynamic Linker Setup
- Section 8.3.6: Script Handler (
#!) - Section 8.3.7: Initial Stack Layout
- Section 8.3.8: Auxiliary Vector
- Section 8.3.9: Address Space Layout and ASLR
- Section 8.3.10: vDSO and VVAR Mapping
- Section 8.3.11: UmkaOS-Specific Page Mappings
- Section 8.3.12: Error Handling
- Section 8.4: Real-Time Guarantees
- Section 8.4.1: Beyond CBS
- Section 8.4.2: Design: Bounded Latency Paths
- Section 8.4.3: Key Design Decisions for RT
- Section 8.4.4: RT + Domain Isolation Interaction
- Section 8.4.5: CPU Isolation for Hard RT
- Section 8.4.6: Driver Crash During RT-Critical Path
- Section 8.4.7: Linux Compatibility
- Section 8.4.8: Performance Impact
- Section 8.4.9: Hardware Resource Determinism
- Section 8.5: Signal Handling
- Section 8.5.1: Signal Table
- Section 8.5.2: Signal Data Structures
- Section 8.5.3: Signal Delivery Algorithm
- Section 8.5.4: Signal Frame Layout (x86-64)
- Section 8.5.5: Signal-Related System Calls
- Section 8.5.6: SA_RESTART and EINTR
- Section 8.5.7: Signal Inheritance Across fork() and exec()
- Section 8.5.8: SIGCHLD and wait()
- Section 8.6: Process Groups and Sessions
- Section 8.6.1: Structures
- Section 8.6.2: System Calls
- Section 8.6.3: Job Control Signals
- Section 8.6.4: Orphaned Process Groups
- Section 8.6.5: Controlling Terminal Association
- Section 8.7: Resource Limits and Accounting
- Section 8.7.1: Resource Limit Types
- Section 8.7.2: Wire Format and Syscalls
- Section 8.7.3: Internal Structures
- Section 8.7.4: Enforcement Points
- Section 8.7.5: Inheritance Across fork() and exec()
- Section 8.7.6: UID-Level Accounting
- Section 8.7.7:
getrusageWire Format - Section 8.7.8:
/proc/PID/limitsFormat - Section 8.7.9:
/proc/PID/statField Mapping - Section 8.7.10: Linux Compatibility Notes
Chapter 9: Security Architecture¶
- Section 9.1: Capability-Based Foundation
- Section 9.1.1: Capability Token Model
- Section 9.2: Permission and ACL Model
- Section 9.2.1: Linux Permission Emulation
- Section 9.2.2: System Administration Capabilities
- Section 9.2.3: Dual ACL Model: POSIX Draft ACLs + NFSv4 ACLs
- Section 9.2.4: Driver Sandboxing
- Section 9.2.5: Security by Default
- Section 9.3: Verified Boot Chain
- Section 9.3.1: Problem
- Section 9.3.2: Boot Chain Verification
- Section 9.3.3: Kernel Image Signing
- Section 9.3.4: Key Revocation
- Section 9.3.5: RCU Read-Side Timeout (DoS Mitigation)
- Section 9.4: TPM Runtime Services
- Section 9.4.1: Measured Boot (TPM)
- Section 9.4.2: TPM Runtime Services
- Section 9.4.3: Anti-Rollback Counter (TOCTOU-Safe Initialization)
- Section 9.4.4: TPM Service Provider (Cluster-Wide TPM Access)
- Section 9.5: Runtime Integrity Measurement (IMA)
- Section 9.5.1: Policy Rule Grammar
- Section 9.5.2: Algorithm Agility
- Section 9.5.3: Crypto API Integration
- Section 9.5.4: Container and Namespace Interaction
- Section 9.5.5: Signed Hash Lifecycle (Package Updates)
- Section 9.5.6: EVM — Extended Verification Module
- Section 9.6: Post-Quantum Cryptography
- Section 9.6.1: Why This Cannot Wait
- Section 9.6.2: Boot Stub Cryptographic Algorithm Subset
- Section 9.6.3: Design: Algorithm-Agile Crypto Abstraction
- Section 9.6.4: Impact on Distributed Capabilities
- Section 9.6.5: Hybrid Mode (Transition Period)
- Section 9.6.6: Performance Impact
- Section 9.6.7: PQC Key Management
- Section 9.7: Confidential Computing
- Section 9.7.1: Why This Cannot Wait
- Section 9.7.2: Architectural Requirements
- Section 9.7.3: Design Approach: Opaque Page Handles
- Section 9.7.4: Guest Mode: UmkaOS as a Confidential Guest
- Section 9.7.5: Host Mode: UmkaOS Hosting Confidential VMs
- Section 9.7.6: Linux Compatibility
- Section 9.7.7: TEE Observability Degradation Model
- Section 9.7.8: Performance Impact
- Section 9.7.9: Device Passthrough for Confidential VMs
- Section 9.7.10: Confidential VM Live Migration
- Section 9.7.11: TEE Key Negotiation Wire Format (UmkaOS-TEE-v1)
- Section 9.8: Linux Security Module (LSM) Framework
- Section 9.8.1: Design Goals
- Section 9.8.2: The
SecurityModuleTrait - Section 9.8.3: Operation Discriminants
- Section 9.8.4: Security Blob Model
- Section 9.8.5: Hook Dispatch and Stacking
- Section 9.8.6: Per-Namespace LSM Profiles
- Section 9.8.7: Hook Integration Points
- Section 9.8.8:
/sys/kernel/securityInterface - Section 9.8.9: LSM Registration and Boot Sequence
- Section 9.8.10: Policy Format Compatibility
- Section 9.8.11: SELinux KABI Interface, AVC, and Policy Load Protocol
- Section 9.9: Credential Model and Capabilities
- Section 9.9.1: Credential Structure
- Section 9.9.2: Credential Lifecycle (Copy-on-Write via RCU)
- Section 9.9.3: The
capable()andns_capable()Check Functions - Section 9.9.4: The
execve()Capability Transformation - Section 9.9.5: UID Transition Capability Adjustments
- Section 9.9.6:
capget()andcapset()Syscalls - Section 9.9.7:
setuid(),setreuid(),setresuid(), andsetfsuid()Syscalls - Section 9.9.8:
setgroups()and Supplementary Group Management - Section 9.9.9:
prctl()Credential Operations - Section 9.9.10: Credential Translation
- Section 9.9.11: Bounding Sets and Dropping Privileges
- Section 9.9.12: Linux Capability to UmkaOS Capability Translation
- Section 9.9.13: Security Subsystem Lock Ordering
Chapter 10: Security Extensions¶
- Section 10.1: Kernel Crypto API
- Section 10.1.1: Algorithm Type Taxonomy
- Section 10.1.2: Algorithm Descriptor and Registration
- Section 10.1.3: Transform Objects
- Section 10.1.4: Algorithm Lookup and Transform Allocation
- Section 10.1.5: Registered Algorithm Descriptors
- Section 10.1.6: Hardware Acceleration Integration
- Section 10.1.7: Hardware Crypto Acceleration by Architecture
- Section 10.1.8: FIPS Mode
- Section 10.1.9: sysfs Interface
- Section 10.1.10: AF_ALG — Userspace Crypto via Sockets
- Section 10.2: Kernel Key Retention Service
- Section 10.2.1: Key Object
- Section 10.2.2: Key Types
- Section 10.2.3: Keyring Hierarchy
- Section 10.2.4: Key Quotas
- Section 10.2.5: The
add_key()andrequest_key()Syscalls - Section 10.2.6: The
keyctl()Syscall - Section 10.2.7: The
request_key()Upcall - Section 10.2.8: LSM Hooks
- Section 10.2.9: Integration: NVMe TLS Authentication
- Section 10.2.10: Integration: RPCSEC_GSS (NFS Kerberos)
- Section 10.3: Seccomp-BPF Syscall Filter
- Section 10.3.1: Entry Points
- Section 10.3.2:
seccomp()Operations - Section 10.3.3:
seccomp_dataStruct (BPF Program Input) - Section 10.3.4: BPF Wire Format
- Section 10.3.5: Return Values (Actions)
- Section 10.3.6: Filter Chain Data Structures
- Section 10.3.7: Filter Installation Algorithm (
SECCOMP_SET_MODE_FILTER) - Section 10.3.8: Syscall Interception Path
- Section 10.3.9: JIT Compilation
- Section 10.3.10: Userspace Notification (SECCOMP_USER_NOTIF)
- Section 10.3.11:
SECCOMP_MODE_STRICT - Section 10.3.12: Inheritance and exec Semantics
- Section 10.3.13: Audit Logging
- Section 10.3.14:
/procIntegration - Section 10.3.15: Linux Compatibility
- Section 10.3.16: Kernel-Internal Seccomp API for Tier 2 Drivers
- Section 10.4: ARM Memory Tagging Extension (MTE)
- Section 10.4.1: MTE Overview and Architecture Coverage
- Section 10.4.2: MTE Modes (SYNC / ASYNC / ASYMM)
- Section 10.4.3: Kernel Data Structures
- Section 10.4.4: MTE-Aware Allocator Design
- Section 10.4.5: Context Switch Handling
- Section 10.4.6: Userspace Interface (prctl, mmap PROT_MTE)
- Section 10.4.7: Integration with UmkaOS Security Model
- Section 10.4.8: Comparison with x86-64 Mitigations
- Section 10.4.9: Linux Compatibility
- Section 10.5: DebugCap — Capability-Based Process Debugging
- Section 10.5.1: DebugCap Data Structures
- Section 10.5.2: Obtaining a DebugCap
- Section 10.5.3: Using a DebugCap
- Section 10.5.4: Capability Transfer
- Section 10.5.5: PR_SET_DEBUG_ACCEPT — Cross-UID Debug Grant
- Section 10.5.6: Revocation
- Section 10.5.7: Audit Logging
- Section 10.5.8: Linux Compatibility
- Section 10.5.9: LSM Hooks
- Section 10.5.10: DebugCap Request Rate Limiting
Chapter 11: Driver Architecture and Isolation¶
- Section 11.1: Three-Tier Protection Model
- Section 11.1.2: How the Tiers Interact
- Section 11.1.3: Tier M: Multikernel Peer Isolation
- Section 11.2: Isolation Mechanisms and Performance Modes
- Section 11.2.1: Isolation Philosophy: Best Effort Within Performance Budget
- Section 11.2.2: How MPK Works
- Section 11.2.3: Cost Comparison
- Section 11.2.4: MPK Domain Allocation
- Section 11.2.5: WRPKRU Threat Model: Crash Containment, Not Exploitation Prevention
- Section 11.2.6: Isolation on Other Architectures
- Section 11.2.7: Adaptive Isolation Policy (Graceful Degradation)
- Section 11.2.8: Isolation Tiers vs. Replaceability: Orthogonal Axes
- Section 11.3: Driver Isolation Tiers
- Section 11.3.1: Tier Classification
- Section 11.3.2: Tier 0: Boot-Critical and Core Framework Code
- Section 11.3.3: Tier 1: Kernel-Adjacent Drivers (Hardware Memory Domain Isolated)
- Section 11.3.4: Protection Key Exhaustion (Hardware Domain Limit)
- Section 11.3.5: Tier 2: User-Space Drivers (Process-Isolated)
- Section 11.3.6: Tier Mobility and Auto-Demotion
- Section 11.3.7: Graceful Tier Degradation
- Section 11.3.8: Debugging Across Isolation Domains (ptrace)
- Section 11.3.9: Signal Delivery Across Isolation Boundaries
- Section 11.3.10: eBPF Interaction with Driver Isolation Domains
- Section 11.3.11: Tier 2 Interface and SDK
- Section 11.4: Device Registry and Bus Management
- Section 11.4.1: Motivation and Prior Art
- Section 11.4.2: Design Principles
- Section 11.4.3: Registry Data Model
- Section 11.4.4: Device Matching
- Section 11.4.5: Device Lifecycle
- Section 11.4.6: Power Management
- Section 11.4.7: Hot-Plug
- Section 11.4.8: Concurrency and Performance
- Section 11.4.9: Resolved Design Decisions
- Section 11.5: IOMMU and DMA Mapping
- Section 11.5.1: IOMMU Groups
- Section 11.5.2: IOMMU Implementation Complexity
- Section 11.5.3: Per-Device DMA Identity Mapping (Opt-In Escape Hatch)
- Section 11.5.4: IOMMU Fault Routing for VM-Assigned Devices
- Section 11.5.5: PCIe ASPM (Active State Power Management)
- Section 11.5.6: Tier 2 Streaming DMA Syscalls
- Section 11.6: Device Services and Boot Integration
- Section 11.6.1: Service Discovery
- Section 11.6.2: KABI Integration
- Section 11.6.3: Crash Recovery Integration
- Section 11.6.4: Boot Sequence Integration
- Section 11.6.5: Sysfs Compatibility
- Section 11.6.6: Firmware Management
- Section 11.6.7: Appendix: Comparison with Prior Art
- Section 11.7: Zero-Copy I/O Path
- Section 11.7.1: NVMe Read Example (io_uring SQPOLL + Registered Buffers)
- Section 11.7.2: NVMe Write Example (Buffered write() → Page Cache → Writeback → NVMe)
- Section 11.7.3: TCP Receive Path
- Section 11.8: IPC Architecture and Message Passing
- Section 11.8.1: IPC Primitives
- Section 11.8.2: Domain Ring Buffer Design
- Section 11.8.3: Channel Types and Capability Passing
- Section 11.8.4: Flow Control and Ordering
- Section 11.8.5: Versioned Ring Entry Format
- Section 11.8.6: Terminology Reference
- Section 11.9: Crash Recovery and State Preservation
- Section 11.9.1: The Linux Problem
- Section 11.9.2: UmkaOS Tier 1 Recovery Sequence
- Section 11.9.3: Reload Failure Handling
- Section 11.9.4: FLR Timeout Recovery
- Section 11.9.5: Crash State Buffer Wire Format
- Section 11.9.6: UmkaOS Tier 2 Recovery Sequence
- Section 11.9.7: State Preservation and Checkpointing
- Section 11.9.8: Crash Dump Infrastructure
- Section 11.9.9: Recovery Comparison
- Section 11.9.10: Crash History and Auto-Demotion
- Section 11.9.11: Compound Recovery Scenarios
- Section 11.9.12: Power Failure During Recovery
- Section 11.9.13: Per-Architecture Crash Isolation Comparison
- Section 11.9.14: Swap Device Crash Interaction
- Section 11.10: Channel I/O Subsystem (s390x)
- Section 11.10.1: Architecture Overview
- Section 11.10.2: Subchannel Enumeration
- Section 11.10.3: CCW Program Model
- Section 11.10.4: Indirect Data Address Lists (IDAL)
- Section 11.10.5: QDIO (Queued Direct I/O)
- Section 11.10.6: virtio-ccw Transport
- Section 11.10.7: Integration with UmkaOS Device Registry
- Section 11.10.8: Protection and Isolation
- Section 11.10.9: Cross-References
- Section 11.11: D-Bus Bridge Service
- Section 11.11.1: Motivation
- Section 11.11.2: Architecture
- Section 11.11.3: D-Bus Interface Schema in KABI Manifest
- Section 11.11.4: D-Bus Type Mapping
- Section 11.11.5: Message Flow
- Section 11.11.6: Bus Types: System vs Session
- Section 11.11.7: Capability Integration
- Section 11.11.8: Crash Recovery
- Section 11.11.9: Object Path Mapping
- Section 11.11.10: Canonical Tier 2 Migration Candidates
- Section 11.11.11: Universal Driver Management Bus
- Section 11.11.12: Non-D-Bus Userspace Protocols
- Section 11.11.13: Relationship to umkafs
- Section 11.11.14: Cross-References
Chapter 12: KABI — Kernel Driver ABI¶
- Section 12.1: KABI Overview
- Section 12.1.1: The Problem We Solve
- Section 12.1.2: Interface Definition Language (.kabi)
- Section 12.2: ABI Rules and Version Lifecycle
- Section 12.2.1: ABI Rules (Enforced by CI)
- Section 12.2.2: KABI Version Lifecycle and Deprecation Policy
- Section 12.2.3: Behavioral Compatibility Rules
- Section 12.3: Bilateral Capability Exchange
- Section 12.3.1: CapValidationToken: Amortized Capability Validation
- Section 12.3.2: CapValidationToken Invalidation on Driver Crash
- Section 12.3.3: KABI Operation Permission Requirements
- Section 12.3.4: Generation Counter Wrap Policy
- Section 12.4: Version Negotiation
- Section 12.4.1: Vtable Bounds Safety (Zero-Extension Contract)
- Section 12.4.2: Deprecation Tombstones
- Section 12.5: KABI IDL Language Specification
- Section 12.5.1: File Format
- Section 12.5.2: Type System
- Section 12.5.3: Struct Definition
- Section 12.5.4: Vtable Definition
- Section 12.5.5: Enum Definition
- Section 12.5.6:
requiresandprovidesDeclarations - Section 12.5.7: Version Compatibility Rules
- Section 12.5.8: Compiler Invocation
- Section 12.6: KABI Transport Classes
- Section 12.6.1: Why Transport Is Separate from Interface
- Section 12.6.2: Three Transport Classes
- Section 12.6.3: Call Direction at the Tier 0 Boundary
- Section 12.6.4: IDL Toolchain Transport Parameter
- Section 12.6.5: KabiDriverManifest: Transport Capability Advertisement
- Section 12.6.6: Default Policy: All Drivers Ship All Three Transports
- Section 12.7: KABI Service Dependency Resolution
- Section 12.7.1: The Problem
- Section 12.7.2: IDL
requiresandprovidesDeclarations - Section 12.7.3: KabiProviderIndex — Boot-Time Service Map
- Section 12.7.4: KabiServiceRegistry — Runtime Service Map
- Section 12.7.5: Requesting a Service: Probe Deferral
- Section 12.7.6: Demand Loading
- Section 12.7.7: Linux Module Tool Compatibility (Dual-Boot Support)
- Section 12.7.8: Circular Dependency Prohibition
- Section 12.7.9: Tier 0 Module Lifecycle (
load_once) - Section 12.7.10: Version Negotiation
- Section 12.7.11: Security Model
- Section 12.7.12: Signing Key Initialization
- Section 12.7.13: IMA Measurement Hook
- Section 12.8: Domain Runtime — Unified Domain Model Mechanics
- Section 12.8.2:
kabi_call!Macro Specification - Section 12.8.3: IDL-Generated Consumer Loop
- Section 12.8.4: Bind-Time Transport Selection
- Section 12.8.5: Module Hello Protocol
- Section 12.8.6: Cross-Domain Ring Setup
- Section 12.8.7: Per-Driver IRQ Ring
- Section 12.8.8: Rebinding on Promotion/Demotion
- Section 12.8.9: Per-Domain Service
- Section 12.8.10: Implementation Phases
- Section 12.8.11: Replaceability Classification
- Section 12.8.12: ML Policy Integration
Chapter 13: Device Class Frameworks¶
- Section 13.1: Device Class Overview
- Section 13.2: Wireless Subsystem
- Section 13.3: Display Subsystem
- Section 13.4: Audio Subsystem
- Section 13.5: GPU Compute
- Section 13.5.1: DMA Fence Behavior on GPU Crash
- Section 13.6: RDMA
- Section 13.7: Video / Media Pipeline
- Section 13.8: AI / NPU Accelerator
- Section 13.9: DMA Engine
- Section 13.10: GPIO and Pin Control
- Section 13.11: Crypto Accelerator
- Section 13.12: USB Class Drivers and Mass Storage
- Section 13.12.1: USB Host Controller (xHCI, Tier 1)
- Section 13.12.2: USB Mass Storage (UMS) and USB Attached SCSI (UAS)
- Section 13.12.3: USB4 and Thunderbolt
- Section 13.13: I2C/SMBus Bus Framework
- Section 13.13.1: I2C Bus Trait
- Section 13.13.2: SMBus and Hardware Sensors
- Section 13.13.3: I2C-HID Protocol
- Section 13.13.4: Precision Touchpad (PTP)
- Section 13.14: Bluetooth HCI Driver
- Section 13.14.1: Kernel HCI Driver (Tier 1)
- Section 13.14.2: BlueZ Daemon (Tier 2)
- Section 13.14.3: A2DP Audio Routing to PipeWire
- Section 13.14.4: HID Input Routing
- Section 13.14.5: Architectural Decision
- Section 13.15: WiFi Driver
- Section 13.15.1: WiFi Driver Architecture
- Section 13.15.2: Firmware Isolation Model
- Section 13.15.3: TX/RX Ring Buffer Design
- Section 13.15.4: Power Management
- Section 13.15.5: WoWLAN (Wake-on-WLAN)
- Section 13.15.6: Scan Offload
- Section 13.15.7: Roaming
- Section 13.15.8: Architectural Decision: WiFi Tier Classification
- Section 13.15.9: nl80211 — Linux Wireless Configuration Interface
- Section 13.16: Camera and Video Capture
- Section 13.16.1: CameraDevice Trait
- Section 13.16.2: Camera Controls
- Section 13.16.3: Pixel Formats
- Section 13.16.4: Stream Configuration and Captured Frames
- Section 13.16.5: ISP Pipeline Model
- Section 13.16.6: Privacy and Security
- Section 13.16.7: UVC Driver Contract
- Section 13.16.8: MIPI CSI-2 Integration
- Section 13.16.9: V4L2 Compatibility
- Section 13.16.10: Error Types and Events
- Section 13.16.11: Cross-References
- Section 13.17: Printers and Scanners
- Section 13.18: Live Kernel Evolution
- Section 13.18.1: The Theseus Model
- Section 13.18.2: Design: Explicit State Ownership Graph
- Section 13.18.3: Component Replacement Flow
- Section 13.18.4: Export Symbol Contract
- Section 13.18.5: What Can Be Live-Replaced
- Section 13.18.6: Performance Impact
- Section 13.18.7: Evolution Framework Data/Code Split
- Section 13.18.8: Evolution Framework Formal Invariants
- Section 13.18.9: KABI Service Live Replacement
- Section 13.18.10: Runtime Evolution Trigger Interface
- Section 13.18.11: Graceful Tier 1 Driver Replacement
- Section 13.18.12: Data Format Evolution
- Section 13.18.13: Evolvable Module Developer SDK
- Section 13.19: Hardware Watchdog Framework
- Section 13.19.1: WatchdogOps KABI Vtable
- Section 13.19.2: WatchdogDev — The Watchdog Device Descriptor
- Section 13.19.3: Character Device Interface —
/dev/watchdog - Section 13.19.4: Nowayout Boot Option
- Section 13.19.5: Pretimeout Notifier
- Section 13.19.6: Software Watchdog (
softdog) - Section 13.19.7: systemd Integration
- Section 13.19.8: Device Registration
- Section 13.20: SPI Bus Framework
- Section 13.20.1: SpiController KABI Trait
- Section 13.20.2: spidev — Userspace SPI Access
- Section 13.21: rfkill — RF Kill Switch Framework
- Section 13.21.1: Data Structures
- Section 13.21.2: /dev/rfkill — Userspace Interface
- Section 13.21.3: rfkill-input: Hardware Kill Switch
- Section 13.22: MTD — Memory Technology Device Framework
- Section 13.22.1: MtdInfo and MtdDevice
- Section 13.22.2: MTD Partitions
- Section 13.22.3: Character Devices: /dev/mtdN and /dev/mtdblockN
- Section 13.22.4: UBI (Unsorted Block Images)
- Section 13.23: IPMI — Intelligent Platform Management Interface
- Section 13.23.1: IPMI Message
- Section 13.23.2: System Interface Drivers
- Section 13.23.3: /dev/ipmiN Character Device
- Section 13.23.4: Platform Event / Panic Notifier
- Section 13.24: UIO — Userspace I/O
- Section 13.24.1: UioDevice Trait
- Section 13.24.2: /dev/uioN Character Device
- Section 13.24.3: uio_pdrv_genirq
- Section 13.25: NVMEM — Non-Volatile Memory Framework
- Section 13.25.1: Data Structures
- Section 13.25.2: Consumer API
- Section 13.25.3: sysfs Interface
- Section 13.26: SoundWire Bus Framework
- Section 13.26.1: Bus Architecture
- Section 13.26.2: Data Structures
- Section 13.26.3: Power States
- Section 13.26.4: Integration with ASoC (ALSA SoC)
- Section 13.27: Regulator Framework
- Section 13.27.1: Design: Voltage Voting Model
- Section 13.27.2: Core Types
- Section 13.27.3: API
- Section 13.27.4: Voltage Voting Algorithm
- Section 13.27.5: Linux External ABI
- Section 13.27.6: Multi-Architecture Notes
- Section 13.28: RTC Subsystem
- Section 13.28.1: RtcDevice Trait
- Section 13.28.2: RTC Registry
- Section 13.28.3: Linux External ABI — /dev/rtcN
- Section 13.28.4: sysfs Interface
- Section 13.28.5: Boot Sequence and Y2K38
- Section 13.29: USB Device Forwarding Service Provider
- Section 13.29.1: Service Identity and Discovery
- Section 13.29.2: Wire Protocol
- Section 13.29.3: Client-Side Integration (VHCI)
- Section 13.29.4: Exclusive Access
- Section 13.29.5: Isochronous Transfer Support
- Section 13.29.6: Performance
- Section 13.29.7: Security
- Section 13.29.8: Drain and Disconnect
- Section 13.29.9: URB Timeout Handling
- Section 13.29.10: Comparison with Linux USB/IP
- Section 13.30: Auxiliary Device Subsystems
- Section 13.30.1: Auxiliary Bus Framework
- Section 13.30.2: devfreq — Device Frequency Scaling
- Section 13.30.3: LED Subsystem
- Section 13.30.4: PWM — Pulse Width Modulation
- Section 13.30.5: Backlight Subsystem
- Section 13.30.6: power_supply Subsystem
- Section 13.30.7: Multi-Architecture Notes
Chapter 14: Virtual Filesystem Layer¶
- Section 14.1: Virtual Filesystem Layer
- Section 14.1.2: VFS Architecture
- Section 14.1.3: Pipe Subsystem
- Section 14.1.4: Inode Cache (icache)
- Section 14.1.5: Dentry Cache
- Section 14.1.6: Path Resolution
- Section 14.1.7: Mount Namespace and Capability-Gated Mounting
- Section 14.2: VFS Ring Buffer Protocol (Cross-Domain Dispatch)
- Section 14.3: Per-CPU VFS Ring Extension
- Section 14.3.1: Motivation
- Section 14.3.2: Design Principles
- Section 14.3.3: Ring Topology
- Section 14.3.4: Request ID Generation
- Section 14.3.5: Doorbell Coalescing Across N Rings
- Section 14.3.6: Mount-Time Negotiation
- Section 14.3.7: Driver-Side Multiplexing
- Section 14.3.8: Driver-Side Ring Entry Prefetch
- Section 14.3.9: Completion Coalescing (Response Direction)
- Section 14.3.10: Crash Recovery
- Section 14.3.11: Live Evolution
- Section 14.3.12: CPU Hotplug
- Section 14.3.13: Performance Analysis
- Section 14.3.14: Backward Compatibility
- Section 14.3.15: Cross-References
- Section 14.3.16: Phase Assignment
- Section 14.4: fsync / fdatasync End-to-End Flow
- Section 14.4.1: Copy-on-Write and Redirect-on-Write Infrastructure
- Section 14.5: Character and Block Device Node Framework
- Section 14.5.1: Character Device Region Registration
- Section 14.5.2: Block Device Registration
- Section 14.5.3: Major Number Allocation Table
- Section 14.5.4: Devtmpfs: Automatic
/devNode Lifecycle - Section 14.5.5: Initial Device Naming
- Section 14.5.6: File Operations Replacement (
replace_fops) - Section 14.6: Mount Tree Data Structures and Operations
- Section 14.6.1: Mount Flags
- Section 14.6.2: Propagation Type
- Section 14.6.3: Mount Node
- Section 14.6.4: Mount Hash Table
- Section 14.6.5: Mount Namespace
- Section 14.6.6: DCACHE_MOUNTED Integration
- Section 14.6.7: Filesystem Context (New Mount API)
- Section 14.6.8: Mount Attribute Structure (mount_setattr)
- Section 14.6.9: Mount Operations — Algorithms
- Section 14.6.10: Mount Propagation Algorithms
- Section 14.6.11: Namespace Operations
- Section 14.6.12: New Mount API Syscalls
- Section 14.6.13: Mount Introspection Syscalls
- Section 14.6.14: /proc/PID/mountinfo Format
- Section 14.6.15: Path Resolution Integration
- Section 14.6.16: Performance Characteristics
- Section 14.6.17: Cross-References
- Section 14.7: Distribution-Aware VFS Extensions
- Section 14.8: overlayfs: Union Filesystem for Containers
- Section 14.8.1: Mount Options and Configuration
- Section 14.8.2: Core Data Structures
- Section 14.8.3: Overlay Dentry Operations
- Section 14.8.4: Lookup Algorithm
- Section 14.8.5: Copy-Up Protocol
- Section 14.8.6: Metacopy Mode
- Section 14.8.7: Directory Operations
- Section 14.8.8: Whiteout and Deletion
- Section 14.8.9: Volatile Mode
- Section 14.8.10: Extended Attribute Handling
- Section 14.8.11: statfs Behavior
- Section 14.8.12: Inode Number Composition (xino)
- Section 14.8.13: Mount and Unmount Flow
- Section 14.8.14: Performance Characteristics
- Section 14.8.15: dm-verity Integration for Container Image Layers
- Section 14.8.16: Linux Compatibility
- Section 14.9: binfmt_misc — Arbitrary Binary Format Registration
- Section 14.9.1: Data Structures
- Section 14.9.2: Registration Interface
- Section 14.9.3: Exec Path Integration
- Section 14.9.4: The binfmt_misc Filesystem
- Section 14.9.5: Persistence and systemd Integration
- Section 14.9.6: Security Model
- Section 14.10: autofs — Kernel Automount Trigger
- Section 14.10.1: Architecture
- Section 14.10.2: Data Structures
- Section 14.10.3: Packetized Pipe Protocol
- Section 14.10.4: Automount Protocol
- Section 14.10.5: Control Interface
- Section 14.10.6: Expiry
- Section 14.10.7: VFS Integration
- Section 14.10.8: Mount Options
- Section 14.10.9: systemd Integration
- Section 14.10.10: Linux Compatibility
- Section 14.11: FUSE — Filesystem in Userspace
- Section 14.11.1: Architecture
- Section 14.11.2: Core Data Structures
- Section 14.11.3: Wire Protocol
- Section 14.11.4: FUSE Opcodes
- Section 14.11.5: FUSE_INIT Handshake
- Section 14.11.6: VFS Integration
- Section 14.11.7: Security Model
- Section 14.11.8: io_uring FUSE
- Section 14.11.9: Linux Compatibility
- Section 14.11.10: VFS Service Provider
- Section 14.12: configfs — Kernel Object Configuration Filesystem
- Section 14.12.1: Architecture
- Section 14.12.2: Data Structures
- Section 14.12.3: Mount Point and Directory Layout
- Section 14.12.4: VFS Operations
- Section 14.12.5: Linux Compatibility
- Section 14.13: File Notification System
- Section 14.13.1: inotify
- Section 14.13.2: fanotify
- Section 14.13.3: UmkaOS-Native File Watch Capabilities
- Section 14.13.4: Cross-References
- Section 14.14: Local File Locking (flock / fcntl POSIX Locks / OFD Locks)
- Section 14.14.1: Data Structures
- Section 14.14.2: Conflict Detection
- Section 14.14.3: Locking Algorithm
- Section 14.14.4: Deadlock Detection
- Section 14.14.5: Lock Release on File Description Close
- Section 14.14.6: memfd Sealing (F_ADD_SEALS / F_GET_SEALS)
- Section 14.14.7: Cross-References
- Section 14.14.8: Lock Semantics Mode (POSIX Default / OFD Opt-in)
- Section 14.15: Disk Quota Subsystem (quotactl)
- Section 14.15.1: Data Structures
- Section 14.15.2: quotactl(2) Dispatch
- Section 14.15.3: VFS Enforcement Hooks
- Section 14.15.4: In-Memory Quota Cache
- Section 14.15.5: Linux Compatibility
- Section 14.15.6: Cross-References
- Section 14.16: Extended Attributes (xattr)
- Section 14.16.1: Syscall Interface
- Section 14.16.2: XattrFlags
- Section 14.16.3: Namespace Prefixes
- Section 14.16.4: Size Limits
- Section 14.16.5: VFS Dispatch Pipeline
- Section 14.16.6: Per-Filesystem Storage
- Section 14.16.7: POSIX ACL Wire Format
- Section 14.16.8: chmod / ACL Mask Interaction
- Section 14.16.9: Default ACL Inheritance
- Section 14.16.10: EVM Integration
- Section 14.16.11: Performance Budget
- Section 14.17: Pipes and FIFOs
- Section 14.17.1: Pipe Data Buffer
- Section 14.17.2: Capacity and fcntl(F_SETPIPE_SZ)
- Section 14.17.3: MPSC Pipes (Multiple Writers)
- Section 14.17.4: O_DIRECT Pipe Mode
- Section 14.17.5: Named FIFOs (mkfifo)
- Section 14.17.6: Splice and Zero-Copy
- Section 14.17.7: Linux Compatibility
- Section 14.18: Pseudo-Filesystems
- Section 14.18.1: Common Registration Framework
- Section 14.18.2: debugfs — Kernel Debug Filesystem
- Section 14.18.3: tracefs — Tracing Filesystem
- Section 14.18.4: hugetlbfs — Huge Page Filesystem
- Section 14.18.5: bpffs — BPF Filesystem
- Section 14.18.6: securityfs — Security Module Filesystem
- Section 14.18.7: efivarfs — EFI Variable Filesystem
- Section 14.18.8: Boot Initialization Order
- Section 14.19: procfs and sysfs
- Section 14.19.1: procfs — Process Information Filesystem
- Section 14.19.2: sysfs — Device Model Filesystem
- Section 14.19.3: Boot Initialization Order
Chapter 15: Storage and Filesystems¶
- Section 15.1: Durability Guarantees
- Section 15.1.1: Boot Initialization Sequence
- Section 15.1.2: Filesystem Error Mode Selection by Error Code
- Section 15.1.3: I/O Result Codes
- Section 15.2: Block I/O and Volume Management
- Section 15.2.1: Evolvable/Nucleus Classification
- Section 15.2.2: Storage Driver Isolation Tiers at Boot
- Section 15.2.3: Block Device Trait
- Section 15.2.4: Device-Mapper and Volume Management
- Section 15.2.5: RAID Write Hole Mitigation
- Section 15.3: SATA/AHCI and Embedded Flash Storage
- Section 15.3.1: SATA/AHCI
- Section 15.3.2: eMMC (Embedded MultiMediaCard)
- Section 15.3.3: SD Card Reader (SDHCI)
- Section 15.4: AHCI/SATA Driver Architecture
- Section 15.4.1: HBA Global Registers
- Section 15.4.2: Per-Port Registers
- Section 15.4.3: FIS (Frame Information Structure) Types
- Section 15.4.4: Command Header and Command Table
- Section 15.4.5: FIS Receive Area
- Section 15.4.6: AhciPort Driver State
- Section 15.4.7: Initialization Sequence
- Section 15.4.8: Command Submission (Non-NCQ)
- Section 15.4.9: NCQ Command Submission
- Section 15.4.10: Flush and Standby Submission
- Section 15.4.11: Interrupt Handler
- Section 15.4.12: Error Recovery
- Section 15.4.13: Hot-Plug
- Section 15.4.14: ATAPI Passthrough
- Section 15.4.15: Power Management
- Section 15.4.16: BlockDeviceOps Implementation
- Section 15.4.17: KABI Driver Manifest
- Section 15.4.18: Design Decisions
- Section 15.5: VirtIO-blk Driver Architecture
- Section 15.5.1: VirtIO Transport
- Section 15.5.2: Device Configuration Space
- Section 15.5.3: Feature Negotiation
- Section 15.5.4: Virtqueue Usage
- Section 15.5.5: Request Format
- Section 15.5.6: Multi-Queue Support
- Section 15.5.7: I/O Submission
- Section 15.5.8: I/O Completion
- Section 15.5.9: Initialization Sequence
- Section 15.5.10: Crash Recovery
- Section 15.5.11: BlockDeviceOps Implementation
- Section 15.5.12: KABI Driver Manifest
- Section 15.5.13: Design Decisions
- Section 15.6: ext4 Filesystem Driver
- Section 15.6.2: ext4
- Section 15.7: XFS Filesystem Driver
- Section 15.7.2: XFS
- Section 15.8: Btrfs Filesystem Driver
- Section 15.8.2: Btrfs
- Section 15.9: Removable Media, Interoperability Filesystems, and FUSE
- Section 15.9.1: Removable Media and Interoperability Filesystems
- Section 15.9.2: Summary of Design Decisions
- Section 15.10: ZFS Integration
- Section 15.10.1: Native ZFS and Filesystem Licensing
- Section 15.10.2: ZFS Advanced Features
- Section 15.11: NFS Client, SunRPC, and RPCSEC_GSS
- Section 15.11.1: SunRPC Transport Layer
- Section 15.11.2: RPC Authentication (RpcAuth)
- Section 15.11.3: RPCSEC_GSS and Kerberos
- Section 15.11.4: NFSv4 Client State Machine
- Section 15.11.5: netfs Page Cache Layer
- Section 15.11.6: Mount Options and Integration
- Section 15.11.7: Locking: lockd and NFSv4 Built-in Locks
- Section 15.11.8: Design Decisions
- Section 15.12: NFS Server (nfsd)
- Section 15.12.1: Overview
- Section 15.12.2: VFS ExportOps Interface
- Section 15.12.3: Exports Database
- Section 15.12.4: Server Threads
- Section 15.12.5: Duplicate Request Cache (DRC)
- Section 15.12.6: NFSv3 Protocol Dispatch
- Section 15.12.7: NFSv4.1 Compound Dispatch
- Section 15.12.8: NFSv4 State Management
- Section 15.12.9: Authentication and Security
- Section 15.12.10: /proc/fs/nfsd Interface
- Section 15.12.11: NLM (Network Lock Manager) Server
- Section 15.12.12: Linux Compatibility
- Section 15.12.13: Design Decisions
- Section 15.13: Block Storage Networking
- Section 15.13.1: Wire Format Validation
- Section 15.13.2: NVMe-oF Reconnect Policy
- Section 15.13.3: Block Service Provider
- Section 15.14: Clustered Filesystems
- Section 15.15: Distributed Lock Manager
- Section 15.15.1: Design Overview and Linux Problem Statement
- Section 15.15.2: Lock Modes and Compatibility Matrix
- Section 15.15.3: Lock Value Blocks (LVBs)
- Section 15.15.4: Lock Resource Naming and Master Assignment
- Section 15.15.5: Transport-Agnostic Lock Operations
- Section 15.15.6: Lease-Based Lock Extension
- Section 15.15.7: Speculative Multi-Resource Lock Acquire
- Section 15.15.8: Targeted Writeback on Lock Downgrade
- Section 15.15.9: Deadlock Detection
- Section 15.15.10: Integration with Cluster Membership ({ref:failure-handling-and-distributed-recovery})
- Section 15.15.11: Recovery Protocol
- Section 15.15.12: UmkaOS Recovery Advantage
- Section 15.15.13: Application-Level Distributed Locking
- Section 15.15.14: Capability Model
- Section 15.15.15: Lockspace Lifecycle API
- Section 15.15.16: Performance Summary
- Section 15.15.17: Data Structures
- Section 15.15.18: Licensing
- Section 15.15.19: DLM Master Election and Liveness Integration
- Section 15.15.20: VFS Lock Integration (ClusterLockAdapter)
- Section 15.15.21: DLM as Foundation for UPFS Token Management
- Section 15.15.22: DLM Wire Protocol
- Section 15.16: Persistent Memory
- Section 15.16.1: The Hardware
- Section 15.16.2: Design: DAX (Direct Access) Integration
- Section 15.16.3: Memory-Mapped Persistent Storage
- Section 15.16.4: Crash Consistency Protocol
- Section 15.16.5: PMEM Error Handling
- Section 15.16.6: Integration with Memory Tiers
- Section 15.16.7: Linux Compatibility
- Section 15.16.8: Performance Impact
- Section 15.16.9: Filesystem Repair and Consistency Checking
- Section 15.16.10: SCSI-3 Persistent Reservations
- Section 15.17: Computational Storage
- Section 15.17.1: Problem
- Section 15.17.2: Design: CSD as AccelBase Device
- Section 15.17.3: CSD Command Submission
- Section 15.17.4: CSD Security Model
- Section 15.17.5: CSD Error Handling
- Section 15.17.6: Linux Compatibility
- Section 15.17.7: Performance Impact
- Section 15.18: I/O Priority and Scheduling
- Section 15.18.1: Syscall Interface
- Section 15.18.2:
IoPriorityEncoding - Section 15.18.3: Priority Inheritance from CPU Nice
- Section 15.18.4: Task Storage and Inheritance
- Section 15.18.5: Permission Model
- Section 15.18.6: UmkaOS I/O Scheduler: Multi-Queue Priority-Aware (MQPA)
- Section 15.18.7: NVMe Multi-Queue Integration
- Section 15.18.8: cgroup Integration
- Section 15.18.9:
/proc/PID/ioAccounting - Section 15.18.10: sysfs Interface
- Section 15.18.11: Linux Compatibility Notes
- Section 15.19: NVMe Host Controller Driver Architecture
- Section 15.19.1: Controller Memory Space (CMS) Registers
- Section 15.19.2: Submission/Completion Queue Pair Model
- Section 15.19.3: NVMe Command Opcodes
- Section 15.19.4: Driver State
- Section 15.19.5: Namespace State
- Section 15.19.6: Initialization Sequence
- Section 15.19.7: I/O Path
- Section 15.19.8: Interrupt Handling
- Section 15.19.9: Error Recovery
- Section 15.19.10: Namespace Management
- Section 15.19.11: Power State Management
- Section 15.19.12: Tier 1 Isolation Integration
- Section 15.19.13: Zoned Namespaces (ZNS)
- Section 15.19.14: NVMe-oF Fabrics Bridge
- Section 15.19.15: BlockDeviceOps Implementation
- Section 15.19.16: KABI Driver Manifest
- Section 15.19.17: Design Decisions
- Section 15.19.18: Error Recovery
- Section 15.19.19: Autonomous Power State Transitions (APST)
- Section 15.20: fscrypt — File-Level Encryption
- Section 15.20.1: Encryption Policies
- Section 15.20.2: Key Derivation
- Section 15.20.3: Ioctls
- Section 15.20.4: I/O Path Integration
- Section 15.20.5: Inline Crypto Engine (ICE) Support
- Section 15.20.6: Filesystem Integration Points
- Section 15.20.7: Crypto Backend Integration
- Section 15.20.8: In-Core State
- Section 15.20.9: Security Considerations
- Section 15.20.10: Cross-References
- Section 15.21: SMB Server (ksmbd)
- Section 15.21.1: Server State
- Section 15.21.2: SMB Session
- Section 15.21.3: Dialect Negotiation
- Section 15.21.4: Share Configuration
- Section 15.21.5: Oplock and Lease Model
- Section 15.21.6: SMB Multichannel
- Section 15.21.7: SMB Direct (RDMA)
- Section 15.21.8: ksmbd.mountd IPC Protocol
- Section 15.21.9: VFS Integration
- Section 15.21.10: Security
- Section 15.21.11: Cross-references
- Section 15.21.12: Design Decisions
Chapter 16: Networking¶
- Section 16.1: TCP Stack Extensibility
- Section 16.1.1: TCP Socket Options (SOL_TCP = IPPROTO_TCP = 6)
- Section 16.1.2: SO-Level Socket Options (SOL_SOCKET = 1)
- Section 16.1.3: TcpInfo Structure
- Section 16.1.4: TCP Fast Open (TFO)
- Section 16.1.5: TCP_REPAIR (CRIU Checkpoint/Restore)
- Section 16.1.6: SO_REUSEPORT
- Section 16.1.7: TCP-MD5 Signature (RFC 2385)
- Section 16.1.8: TCP_ULP (Upper Layer Protocols)
- Section 16.1.9: TCP Zero-Copy
- Section 16.1.10: SYN Cookie Mechanism
- Section 16.1.11: Cross-References
- Section 16.2: Network Stack Architecture
- Section 16.2.1: RX Packet Delivery Path (L2 → L3 → L4)
- Section 16.2.2: NetRxContext: GRO State
- Section 16.2.3: TCP Receive Path (L4 → Socket Buffer → Userspace)
- Section 16.2.4: Tier 1 recvmsg() Cross-Domain Data Path
- Section 16.2.5: IP Layer Implementation (IPv4 / IPv6)
- Section 16.2.6: UDP Subsystem
- Section 16.2.7: Neighbour Subsystem (ARP / NDP)
- Section 16.3: Socket Abstraction
- Section 16.3.1: Namespace-Scoped Network Privilege Checks
- Section 16.3.2: io_uring Socket Operations
- Section 16.4: Socket Operation Dispatch
- Section 16.4.1: Ring Protocol
- Section 16.4.2: Per-CPU Socket Ring Extension
- Section 16.4.3: Dispatch Flow
- Section 16.4.4: epoll Cross-Domain Integration
- Section 16.4.5: Zero-Copy Paths
- Section 16.4.6: Crash Recovery
- Section 16.4.7: Evolvable Classification
- Section 16.4.8: ML Policy Integration
- Section 16.4.9: Shared Buffer Management
- Section 16.4.10: Performance Analysis
- Section 16.4.11: Architecture-Specific Notes
- Section 16.4.12: Cross-References
- Section 16.5: NetBuf: Packet Buffer
- Section 16.5.1: NetBufPool: Per-CPU Slab Pool
- Section 16.5.2: NetBufRingEntry: KABI Wire Format
- Section 16.5.3: NetBufQueue and NetBufList
- Section 16.5.4: NetBufHandle ↔ NetBuf Conversion
- Section 16.5.5: NetBuf Operations
- Section 16.5.6: Domain Crossing Protocol
- Section 16.6: Routing Table (FIB — Forwarding Information Base)
- Section 16.6.1: Data Structures
- Section 16.6.2: Policy Routing Rules
- Section 16.6.3: Route Lookup Algorithm
- Section 16.6.4: FIB Trie Construction: Level-Compressed Trie (LC-Trie) Reference Algorithm
- Section 16.6.5: Batch Mutation API
- Section 16.6.6: VRF Integration
- Section 16.6.7: Netlink Interface
- Section 16.6.8: bpf_fib_lookup() Integration
- Section 16.7: Neighbor Subsystem (ARP/NDP)
- Section 16.7.1: IPv6 Neighbor Discovery Protocol (NDP)
- Section 16.8: TCP Control Block and State Machine
- Section 16.8.1: TCP State Machine
- Section 16.8.2: TCP Zero-Copy Receive (SO_ZEROCOPY)
- Section 16.9: Congestion Control Framework
- Section 16.10: Pluggable TCP Congestion Control
- Section 16.10.1:
CongestionOpsTrait (Full Specification) - Section 16.10.2: Supporting Types
- Section 16.10.3: Registration API
- Section 16.10.4: Per-Socket Selection Lifecycle
- Section 16.10.5: System Default
- Section 16.10.6: TCP Sysctl Entries (
/proc/sys/net/ipv4/tcp_*) - Section 16.10.7:
/proc/net/Filesystem Entries - Section 16.11: MPTCP as First-Class Transport
- Section 16.12: Domain Switch Overhead Analysis
- Section 16.13: Network Device Interface (NetDevice)
- Section 16.13.1: TX Dispatch (Tier-Aware Transmission)
- Section 16.13.2: NetDevice Registration Lifecycle
- Section 16.13.3: VETH (Virtual Ethernet Pair)
- Section 16.13.4: Software Bridge (L2 Switch)
- Section 16.14: NAPI — New API for Packet Polling
- Section 16.15: Kernel TLS (kTLS)
- Section 16.15.1: kTLS Mandatory Cipher Support
- Section 16.15.2: kTLS Position in the TX Pipeline
- Section 16.15.3: kTLS Session Teardown on Crypto Transform Death
- Section 16.16: Network Overlay and Tunneling
- Section 16.16.1: GRE (Generic Routing Encapsulation)
- Section 16.16.2: IPIP (IP-in-IP Encapsulation)
- Section 16.16.3: SIT (Simple Internet Transition)
- Section 16.16.4: Common Tunnel Infrastructure
- Section 16.16.5: Software L2 Switch (Bridge)
- Section 16.16.6: Veth (Virtual Ethernet Pairs)
- Section 16.17: Netlink Socket Interface
- Section 16.18: Packet Filtering (BPF-Based)
- Section 16.19: Network Interface Naming
- Section 16.20: AF_UNIX Socket Specification
- Section 16.21: Traffic Control and Queue Disciplines (tc/qdisc)
- Section 16.21.1: Architecture
- Section 16.21.2:
TcHandleand the Handle Namespace - Section 16.21.3:
QdiscOpsTrait - Section 16.21.4:
QdiscStruct - Section 16.21.5: Builtin Qdiscs
- Section 16.21.6: Classifiers (tc Filters)
- Section 16.21.7: Netlink Interface
- Section 16.21.8: Integration with cgroups Network Bandwidth Enforcement
- Section 16.21.9: Ingress Path
- Section 16.21.10: Qdisc Ownership and Domain Crossing
- Section 16.21.11: State Ownership for Live Evolution
- Section 16.22: IPsec and XFRM Framework
- Section 16.22.1: Security Association (SA) --
XfrmState - Section 16.22.2: Security Policy (SP) --
XfrmPolicy - Section 16.22.3: SA and SP Databases
- Section 16.22.4: Packet Processing Hooks
- Section 16.22.5: Anti-Replay Window
- Section 16.22.6:
xfrm_userNetlink Interface - Section 16.22.7: Crypto API Integration
- Section 16.23: SCTP -- Stream Control Transmission Protocol
- Section 16.23.1: Association State Machine
- Section 16.23.2:
SctpAssocStruct - Section 16.23.3: Multi-Homing
- Section 16.23.4: Multi-Streaming
- Section 16.23.5: SCTP Chunk Types
- Section 16.23.6: Socket API Compatibility
- Section 16.23.7: Integration with NetBuf
- Section 16.24: AF_VSOCK -- Virtual Machine Sockets
- Section 16.24.1: Address Space
- Section 16.24.2:
VsockTransportTrait - Section 16.24.3: Virtio-Vsock Transport
- Section 16.24.4:
VsockSockStruct - Section 16.24.5: Flow Control
- Section 16.24.6: Integration with KVM
- Section 16.24.7: sysfs Interface
- Section 16.25: AF_PACKET Raw Socket
- Section 16.25.1: Socket Creation
- Section 16.25.2: Address Format
- Section 16.25.3: PacketSocket Internal State
- Section 16.25.4: PACKET_MMAP V3 — Zero-Copy Ring Buffer
- Section 16.25.5: PACKET_FANOUT — Socket Load Distribution
- Section 16.25.6: BPF Filter Attachment
- Section 16.25.7: Socket Options
- Section 16.25.8: RX Delivery Path
- Section 16.25.9: TX Path
- Section 16.25.10: Namespace Isolation
- Section 16.25.11: Tier Assignment
- Section 16.26: AF_XDP -- eXpress Data Path Socket
- Section 16.26.1: Socket Creation and Binding
- Section 16.26.2: UMEM (User Memory Region)
- Section 16.26.3: Ring Buffer Protocol
- Section 16.26.4: SockAddrXdp
- Section 16.26.5: XDP Program Integration
- Section 16.26.6: Zero-Copy Mode
- Section 16.26.7: Copy Mode Fallback
- Section 16.26.8: NEED_WAKEUP Mechanism
- Section 16.26.9: Multi-Buffer Support
- Section 16.26.10: XskSocket Kernel State
- Section 16.26.11: Namespace Isolation
- Section 16.26.12: setsockopt / getsockopt Interface
- Section 16.26.13: Comparison with Alternatives
- Section 16.26.14: Shared UMEM
- Section 16.26.15: Per-Architecture Notes
- Section 16.26.16: Performance Budget
- Section 16.26.17: DPDK Migration Path
- Section 16.26.18: Tier Assignment
- Section 16.27: 802.1Q VLAN Subsystem
- Section 16.27.1: Overview
- Section 16.27.2: VLAN Device Model
- Section 16.27.3: Transmit Path
- Section 16.27.4: Receive Path
- Section 16.27.5: GARP and MRP
- Section 16.27.6: Userspace Interface
- Section 16.27.7: Bridge Integration
- Section 16.27.8: Linux Compatibility
- Section 16.28: NIC Bonding and Link Aggregation
- Section 16.28.1: BondDevice Structure
- Section 16.28.2: Bond Modes
- Section 16.28.3: BondSlave
- Section 16.28.4: LACP (802.3ad) Protocol
- Section 16.28.5: Link Monitoring
- Section 16.28.6: TX Hash Policies
- Section 16.28.7: Bond Parameters
- Section 16.28.8: Netlink and Sysfs Interface
- Section 16.28.9: Failover Behavior
- Section 16.28.10: Network Namespace Integration
- Section 16.28.11: Feature Negotiation
- Section 16.28.12: Linux Compatibility
- Section 16.29: Multicast Routing
- Section 16.29.1: IGMP (Internet Group Management Protocol)
- Section 16.29.2: MLD (Multicast Listener Discovery) -- IPv6
- Section 16.29.3: Multicast Forwarding Cache (MFC)
- Section 16.29.4: Virtual Interface (VIF)
- Section 16.29.5: MRT Socket API
- Section 16.29.6: Upcall Mechanism
- Section 16.29.7: PIM Register Tunnel
- Section 16.29.8: IPv6 Multicast Routing
- Section 16.29.9: Per-Namespace Multicast Routing State
- Section 16.29.10: Sysctls
- Section 16.29.11: Cross-References
- Section 16.30: IPVS — IP Virtual Server
- Section 16.30.1: Overview
- Section 16.30.2: Data Structures
- Section 16.30.3: Scheduling Algorithms
- Section 16.30.4: Connection Table
- Section 16.30.5: Health Checking Integration
- Section 16.30.6: Userspace Interface
- Section 16.30.7: IPVS and Kubernetes kube-proxy
- Section 16.30.8: Linux Compatibility
- Section 16.31: Network Service Provider
- Section 16.31.1: Motivation
- Section 16.31.2: Service Provider and Wire Protocol
- Section 16.31.3: Access Modes
- Section 16.31.4: Wire Protocol
- Section 16.31.5: Integration with IP Stack
- Section 16.31.6: Drain Protocol
- Section 16.31.7: Relationship to DPU NIC Offload
Chapter 17: Containers and Namespaces¶
- Section 17.1: Namespace Architecture
- Section 17.1.1: Capability Domain Mapping
- Section 17.1.2: Namespace Implementation
- Section 17.1.3: Container Root Filesystem: pivot_root(2)
- Section 17.1.4: Joining Namespaces: setns(2) and nsenter
- Section 17.1.5: Namespace Hierarchy and Inheritance
- Section 17.1.6: User Namespace UID/GID Mapping Security
- Section 17.1.7: User Namespace Mount Restrictions
- Section 17.1.8: Devtmpfs Namespace Awareness
- Section 17.1.9: Security Policy Integration
- Section 17.1.10: Cross-Node Namespace ID Translation
- Section 17.2: Control Groups (Cgroups v2)
- Section 17.2.1: Core Data Structures
- Section 17.2.2: CPU Controller Integration
- Section 17.2.3: Memory Controller Integration
- Section 17.2.4: I/O Controller Integration
- Section 17.2.5: PIDs Controller (Fork Bomb Prevention)
- Section 17.2.6: Cpuset Controller (CPU and NUMA Pinning)
- Section 17.2.7: Freezer (Cgroup Pause/Resume)
- Section 17.2.8: Additional Controllers
- Section 17.2.9: Cgroup v1 Compatibility Translation
- Section 17.3: POSIX Inter-Process Communication (IPC)
- Section 17.3.1: AF_UNIX Sockets
- Section 17.3.2: Pipes and FIFOs
- Section 17.3.3: Shared Memory (POSIX and SysV)
- Section 17.3.4: IPC Namespace Dispatch (SysV IPC)
Chapter 18: Virtualization¶
- Section 18.1: Host and Guest Integration
- Section 18.1.1: KVM Host-Side Implementation
- Section 18.2: KVM Architecture Backends
- Section 18.2.1: x86-64 VMX Implementation
- Section 18.2.2: AArch64 Host-Side Implementation
- Section 18.2.3: RISC-V Host-Side Implementation
- Section 18.2.4: LoongArch64 LVZ Implementation
- Section 18.2.5: PPC64LE KVM-HV Implementation
- Section 18.3: KVM Operational Integration
- Section 18.3.1: vCPU Scheduling Integration
- Section 18.3.2: In-Kernel Device Models
- Section 18.3.3: Nested Virtualization
- Section 18.3.4: KVM Crash Recovery and vCPU Thread Management
- Section 18.3.5: Guest Memory Integration with VMM Reclaim and NUMA
- Section 18.3.6: Post-Copy
SwitchToPostCopyandmmap_lockLatency - Section 18.3.7: vGIC — Virtual GICv3 Emulation
- Section 18.4: Suspend and Resume
- Section 18.4.1: Suspend Modes
- Section 18.4.2: Device State Save/Restore Ordering
- Section 18.4.3: CPU State Save/Restore
- Section 18.4.4: Memory Handling
- Section 18.4.5: Timer Re-synchronization
- Section 18.4.6: Interrupt Controller State
- Section 18.5: VFIO and iommufd — Device Passthrough Framework
- Section 18.5.1: VFIO Object Model
- Section 18.5.2: iommufd Object Model
- Section 18.5.3: ioctl Interface
- Section 18.5.4: irqbypass — Zero-Latency Interrupt Delivery
- Section 18.5.5: KVM Integration
- Section 18.5.6: SR-IOV and VF Passthrough
- Section 18.5.7: Security Model
- Section 18.5.8: VFIO Unbind and Isolation Domain Teardown
- Section 18.5.9: Integration with UmkaOS IOMMU (Section 11.4)
Chapter 19: System API¶
- Section 19.1: Syscall Interface
- Section 19.1.1: Design Goal
- Section 19.1.2: Syscall Dispatch Architecture
- Section 19.1.3: Foundational ABI Types
- Section 19.1.4: Virtual Filesystems
- Section 19.1.5: Complete Feature Coverage
- Section 19.1.6: Modern File Descriptor Operations
- Section 19.1.7: Signal Handling
- Section 19.1.8: Capability and Credential Syscalls
- Section 19.1.9: Scheduling Syscalls
- Section 19.1.10: Key Management Syscalls
- Section 19.1.11: cgroups: v2 Native with v1 Compatibility Shim
- Section 19.1.12: Event Notification (epoll, poll, select)
- Section 19.2: eBPF Subsystem
- Section 19.2.2: eBPF Verifier Architecture
- Section 19.2.3: eBPF Verifier Risk Mitigation
- Section 19.2.4: BPF Isolation Model
- Section 19.2.5: eBPF Helper Function IDs and Dispatch Table
- Section 19.2.6: TC/Classifier BPF Program Context (
SkBuff) - Section 19.2.7: BPF Kfunc Framework
- Section 19.2.8: KABI Tracepoint Ring for Tier 1 Tracing
- Section 19.3: io_uring Subsystem
- Section 19.3.1: io_uring VFS Integration
- Section 19.3.2: Credential Personalities (IORING_REGISTER_PERSONALITY)
- Section 19.3.3: Direct I/O Path (O_DIRECT)
- Section 19.3.4: io_uring Exit Cleanup
- Section 19.3.5: Direct I/O Operations (O_DIRECT)
- Section 19.4: Futex and Userspace Synchronization
- Section 19.4.1: Futex Implementation
- Section 19.4.2: Priority-Inheritance Futexes (PI)
- Section 19.4.3: Robust Futexes
- Section 19.4.4: futex2 (FUTEX_WAITV)
- Section 19.4.5: Physical Page Stability for Shared Futex Keys
- Section 19.4.6: Cross-Domain Futex Considerations
- Section 19.4.7: UmkaOS Simplified Futex API
- Section 19.5: Netlink Event Compatibility
- Section 19.5.1: NETLINK_KOBJECT_UEVENT (Device Events)
- Section 19.5.2: NETLINK_ROUTE (Network Events)
- Section 19.5.3: NETLINK_GENERIC (Generic Netlink)
- Section 19.5.4: Other Netlink Families
- Section 19.6: Windows Emulation Acceleration (WEA)
- Section 19.6.1: Capability Gating
- Section 19.6.2: NT Object Manager
- Section 19.6.3: Fast Synchronization Primitives
- Section 19.6.4: I/O Completion Ports (IOCP)
- Section 19.6.5: Memory Management Acceleration
- Section 19.6.6: NT Thread Model and Fiber Support
- Section 19.6.7: Security & Token Model
- Section 19.6.8: Structured Exception Handling (SEH)
- Section 19.6.9: Performance: Projected Comparison
- Section 19.6.10: API Surface & Stability
- Section 19.6.11: Implementation Roadmap
- Section 19.6.12: Benefits Summary
- Section 19.6.13: Open Questions
- Section 19.7: Deliberately Dropped Compatibility
- Section 19.8: UmkaOS Native Syscall Interface
- Section 19.8.1: Motivation
- Section 19.8.2: Design Principles
- Section 19.8.3: Syscall Families
- Section 19.8.4: Userspace Library
- Section 19.8.5: Relationship to Linux Syscalls
- Section 19.9: Safe Kernel Extensibility
- Section 19.9.1: The Paradigm
- Section 19.9.2: Extensible Policy Points
- Section 19.9.3: Policy Module Trust Boundary
- Section 19.9.4: Side-Channel Mitigations
- Section 19.9.5: Module Lifecycle
- Section 19.9.6: KabiPolicyManifest
- Section 19.9.7: KABI Vtable Wrappers for Policy Traits
- Section 19.9.8: Stateless Policy Swap Watchdog
- Section 19.9.9: Relationship to eBPF
- Section 19.9.10: Linux Compatibility
- Section 19.9.11: Performance Impact
- Section 19.9.12: Policy Module Error Handling and Fallback
- Section 19.10: Special File Descriptor Objects
- Section 19.10.1: eventfd — Event Notification Counter
- Section 19.10.2: signalfd — Signal Delivery via File Descriptor
- Section 19.10.3: timerfd — Timer Notification via File Descriptor
- Section 19.10.4: pidfd — Process File Descriptor
- Section 19.10.5: Linux Compatibility Reference
- Section 19.10.6: UmkaOS Typed Event Notification API
- Section 19.11: Legacy AIO (Asynchronous I/O)
- Section 19.11.1: Syscall Interface
- Section 19.11.2: ABI Structures
- Section 19.11.3: IOCB_CMD Opcodes
- Section 19.11.4: AioContext (Internal Kernel State)
- Section 19.11.5: AioCompletionRing (Shared Memory Ring)
- Section 19.11.6: Submission Path (io_submit)
- Section 19.11.7: Completion Path
- Section 19.11.8: io_getevents Blocking Behavior
- Section 19.11.9: io_cancel
- Section 19.11.10: Cleanup (io_destroy)
- Section 19.11.11: Resource Limits
- Section 19.11.12: Performance Considerations
Chapter 20: Observability and Diagnostics¶
- Section 20.1: Fault Management Architecture
- Section 20.1.1: Problem
- Section 20.1.2: Architecture
- Section 20.1.3: Telemetry Collection
- Section 20.1.4: Telemetry Buffer
- Section 20.1.5: Diagnosis Engine
- Section 20.1.6: Response Executor
- Section 20.1.7: Linux Interface Exposure
- Section 20.1.8: FaultEvent — Structured FMA Event Type
- Section 20.1.9: Health Score and Escalation Policy
- Section 20.1.10: FMA to umkafs Publishing
- Section 20.2: Stable Tracepoint ABI
- Section 20.2.1: Problem
- Section 20.2.2: Two Categories of Tracepoints
- Section 20.2.3: Stable Tracepoint Interface
- Section 20.2.4: Zero-Overhead When Disabled
- Section 20.2.5: Stable Tracepoint Catalog
- Section 20.2.6: Versioning Rules
- Section 20.2.7: Built-In Aggregation Maps
- Section 20.2.8: Linux Tool Compatibility
- Section 20.2.9: Audit Subsystem
- Section 20.3: Audit Subsystem
- Section 20.3.1: Overview
- Section 20.3.2: Audit State
- Section 20.3.3: Audit Events
- Section 20.3.4: Audit Rules
- Section 20.3.5: Netlink Protocol
- Section 20.3.6: Syscall Audit Path
- Section 20.3.7: Login UID and Session Tracking
- Section 20.3.8: LSM Audit Integration
- Section 20.3.9: Performance Considerations
- Section 20.3.10: Namespace Scoping
- Section 20.3.11: Cross-References
- Section 20.4: Debugging and Process Inspection
- Section 20.4.1: Capability-Gated ptrace
- Section 20.4.2: Ptrace Lifecycle
- Section 20.4.3: Hardware Debug Registers
- Section 20.4.4: Core Dump Generation
- Section 20.4.5: Kernel Debugging and Crash Dumps
- Section 20.4.6: /proc/pid Interface
- Section 20.5: Unified Object Namespace
- Section 20.5.1: Problem
- Section 20.5.2: Design: Kernel-Internal Object Tree
- Section 20.5.3: Namespace Layout
- Section 20.5.4: What The Namespace Provides
- Section 20.5.5: Registration Strategy: Eager vs Lazy
- Section 20.5.6: Linux Interface Exposure — Standard Mechanisms
- Section 20.5.7: umkafs Detail
- Section 20.5.8: Admin Operations via umkafs (write)
- Section 20.5.9: How Subsystems Register Objects
- Section 20.5.10: Device Naming and Registration
- Section 20.5.11: Relationship to Existing Interfaces
- Section 20.5.12: Unified Management CLI (
umkactl) - Section 20.6: EDAC — Error Detection and Correction Framework
- Section 20.6.1: Architecture
- Section 20.6.2: Core Data Structures
- Section 20.6.3: Error Reporting
- Section 20.6.4: sysfs Interface
- Section 20.6.5: umkafs Integration
- Section 20.6.6: Integration with FMA
- Section 20.6.7: Polling Mechanism
- Section 20.7: pstore — Panic Log Persistence
- Section 20.7.1: Architecture
- Section 20.7.2: Backend Interface
- Section 20.7.3: EFI Backend (
efi_pstore) - Section 20.7.4: Ramoops Backend
- Section 20.7.5: pstorefs
- Section 20.7.6: Panic Handler Integration
- Section 20.7.7: umkafs Integration
- Section 20.8: Performance Monitoring Unit (perf_event_open)
- Section 20.8.1: Syscall Interface
- Section 20.8.2: perf_event_attr Wire Format
- Section 20.8.3: Event Types (PERF_TYPE_*)
- Section 20.8.4: Sample Type Flags (PERF_SAMPLE_*)
- Section 20.8.5: Internal Data Structures
- Section 20.8.6: perf_event_mmap_page Header
- Section 20.8.7: PMU Driver Trait (PmuOps)
- Section 20.8.8: Architecture PMU Implementations
- Section 20.8.9: Sampling Overflow Handling
- Section 20.8.10: Ring Buffer Write Protocol
- Section 20.8.11: Event Multiplexing
- Section 20.8.12: ioctl Operations
- Section 20.8.13: /proc/sys Tunables and Linux Compatibility
- Section 20.8.14: Namespace-Aware Perf Output
- Section 20.8.15: Perf Event Exit Cleanup
- Section 20.9: Kernel Parameter Store (Typed Sysctl)
- Section 20.9.1: Problem with
/proc/sys/Sysctl - Section 20.9.2: Design: Typed Parameter Descriptors
- Section 20.9.3: umkafs Layout
- Section 20.9.4: Read and Write Protocol
- Section 20.9.5: Schema Validation on Write
- Section 20.9.6: Registration Macro
- Section 20.9.7: eBPF Access
- Section 20.9.8: Per-Namespace Parameter Scoping
- Section 20.9.9: Inter-Parameter Validation
- Section 20.9.10: Boot Parameter Registry
- Section 20.9.11: Bulk Enumeration
- Section 20.9.12: Linux Compatibility
Chapter 21: User I/O Subsystems¶
- Section 21.1: TTY and PTY Subsystem
- Section 21.1.1: The Problem
- Section 21.1.2: UmkaOS's Lock-Free Ring Architecture
- Section 21.1.3: Character Device Registration
- Section 21.1.4: The devpts Pseudo-Filesystem
- Section 21.1.5: Asynchronous Line Disciplines (N_TTY)
- Section 21.1.6: Serial TTY — Full POSIX termios and Modem Control
- Section 21.1.7: Serial Service Provider (Cluster-Wide Serial Access)
- Section 21.2: Console Framework and Kernel Logging
- Section 21.2.1: Kernel Log Ring Buffer
- Section 21.2.2: Console Framework
- Section 21.2.3: Kernel Command Line Console Parameters
- Section 21.2.4: Serial Console Backend
- Section 21.2.5: Netconsole
- Section 21.2.6: Panic Console Path
- Section 21.2.7: Boot Phase Integration
- Section 21.3: Input Subsystem (evdev)
- Section 21.3.1: Tier 2 Input Drivers
- Section 21.3.2: Input Device Registration
- Section 21.3.3: Secure VT Switching and Panic Console
- Section 21.4: Audio Architecture (ALSA Compatibility)
- Section 21.4.1: ALSA PCM as DMA Rings
- Section 21.4.2: Audio Driver Tier Policy and Resilience
- Section 21.4.3: Audio Device Trait
- Section 21.4.4: Intel HDA Driver Model
- Section 21.4.5: USB Audio Class 2.0 Driver Model
- Section 21.4.6: HDMI/DP Audio Endpoint Model
- Section 21.4.7: PipeWire Integration
- Section 21.4.8: Character Device Registration
- Section 21.4.9: ALSA PCM Compatibility Ioctls
- Section 21.4.10: Jack Detection
- Section 21.4.11: Architectural Decision
- Section 21.4.12: ALSA MIDI Sequencer
- Section 21.4.13: ALSA Timer Interface
- Section 21.4.14: ALSA Hardware-Dependent (hwdep) Interface
- Section 21.4.15: ALSA Control Interface
- Section 21.5: Display and Graphics (DRM/KMS)
- Section 21.5.1: DRM as a Tier 1 Subsystem
- Section 21.5.2: DMA-BUF and Secure File Descriptor Passing
- Section 21.5.3: Display Device Model
- Section 21.5.4: Atomic Modesetting Protocol
- Section 21.5.5: Framebuffer Objects
- Section 21.5.6: Scanout Planes
- Section 21.5.7: Hotplug Detection
- Section 21.5.8: Panel Self-Refresh (PSR)
- Section 21.5.9: Variable Refresh Rate (VRR)
- Section 21.5.10: VBlank Handling and Synchronization
- Section 21.5.11: Multi-Monitor Coordination
- Section 21.5.12: Display Register Abstraction
- Section 21.5.13: DRM/KMS Compatibility Interface
- Section 21.5.14: Architectural Decision
Chapter 22: AI/ML and Accelerators¶
- Section 22.1: Unified Accelerator Framework
- Section 22.1.1: Motivation
- Section 22.1.2: Unified Accelerator Framework
- Section 22.2: Kernel-Side Accelerator Scheduler
- Section 22.3: AccelComputeVTable — Compute-Specific Extensions
- Section 22.3.1: AccelDisplayVTable — Display-Capable Extensions
- Section 22.3.2: Command Buffer Creation Path
- Section 22.3.3: In-Flight Limits and Semaphore Dependency Validation
- Section 22.3.4: Scheduler Integration
- Section 22.3.5: Integration with UmkaOS Architecture
- Section 22.3.6: Device Registry Integration
- Section 22.3.7: Crash Recovery
- Section 22.3.8: GPU Firmware as Cluster Member (Future)
- Section 22.3.9: FMA Integration
- Section 22.3.10: Stable Tracepoints
- Section 22.3.11: ML Policy Observation Types
- Section 22.3.12: Object Namespace
- Section 22.3.13: Partial Failure Handling
- Section 22.3.14: Resolved Design Decisions
- Section 22.3.15: Shared Accelerator Helper Services
- Section 22.3.16: Implementation Phasing
- Section 22.3.17: Priority Rationale
- Section 22.3.18: Licensing Summary
- Section 22.4: Accelerator Memory and P2P DMA
- Section 22.4.1: Heterogeneous Memory Management
- Section 22.4.2: Peer-to-Peer DMA
- Section 22.4.3: GPUDirect Storage (GDS)
- Section 22.5: Accelerator Isolation and Scheduling
- Section 22.5.1: Capability-Based Access Control
- Section 22.5.2: Cgroup Integration
- Section 22.5.3: Memory Isolation
- Section 22.5.4: Compute Time Isolation
- Section 22.5.5: Device Partitioning
- Section 22.5.6: GPU Virtualization Modes
- Section 22.5.7: Hardware Reset-on-Timeout (HROT)
- Section 22.5.8: Multi-Instance GPU (MIG) Partitioning
- Section 22.5.9: GPU Time-Slicing
- Section 22.6: In-Kernel Inference Engine
- Section 22.6.1: Rationale
- Section 22.6.2: Constraints
- Section 22.6.3: Supported Model Types
- Section 22.6.4: Model Loading and Lifecycle
- Section 22.6.5: Use Cases
- Section 22.6.6: Safety Guarantees
- Section 22.6.7: Adversarial Robustness
- Section 22.6.8: Fallback Mode Safety Specification
- Section 22.6.9: Model Binary Format
- Section 22.6.10: Model Drift Detection and Retraining Pipeline
- Section 22.6.11: Tier 2 Inference Services
- Section 22.7: Accelerator Networking, RDMA, and Linux GPU Compatibility
- Section 22.7.1: RDMA and Collective Operations
- Section 22.7.2: Linux Compatibility Layer
- Section 22.7.3: Accelerator Service Provider
- Section 22.8: Unified Compute Model
- Section 22.8.1: The Convergence Problem
- Section 22.8.2: Design Principle: Overlay, Not Replacement
- Section 22.8.3: Multi-Dimensional Compute Capacity
- Section 22.8.4: Unified Compute Topology
- Section 22.8.5: Cross-Device Energy Optimization
- Section 22.8.6: Workload Profile Classification
- Section 22.8.7: Unified Cgroup Compute Budget (Optional)
- Section 22.8.8: Unified Memory Domain Tracking
- Section 22.8.9: NVIDIA Compatibility: No Changes Required
- Section 22.8.10: What the Kernel Does NOT Do
- Section 22.8.11: Sysfs Interface for Userspace Runtimes
- Section 22.8.12: Linux Compatibility
- Section 22.8.13: Convergence Path: Accelerators as Peer Kernel Nodes
- Section 22.8.14: Performance Impact
Chapter 23: AI/ML Policy Framework¶
- Section 23.1: AI/ML Policy Framework: Closed-Loop Kernel Intelligence
- Section 23.1.1: Design Principles
- Section 23.1.2: Kernel Observation Bus
- Section 23.1.3: Tunable Parameter Store
- Section 23.1.4: Policy Consumer KABI (Tier 2 → Kernel)
- Section 23.1.5: Subsystem Integration Catalog
- Section 23.1.6: Heavy Model Integration Pattern
- Section 23.1.7: Model Weight Update Flow
- Section 23.1.8: Security and Capability Model
- Section 23.1.9: Reference Policy Services
- Section 23.1.10: Performance Impact
Chapter 24: Roadmap and Verification¶
- Section 24.1: Driver Ecosystem Strategy
- Section 24.1.1: The Challenge
- Section 24.1.2: Agentic Driver Rewrite Project
- Section 24.1.3: Prioritized Driver List
- Section 24.1.4: Nvidia / Proprietary Driver Strategy
- Section 24.1.5: Community Incentive
- Section 24.1.6: Standalone UmkaOS Peer Protocol Specification
- Section 24.2: Implementation Phases
- Section 24.2.2: Phase 1: Foundations
- Section 24.2.3: Phase 2: Self-Hosting Shell + Tier 1 Fault Recovery
- Section 24.2.4: Phase 3: Real Workloads + Tier M Peer Demo
- Section 24.2.5: Phase 4: Production Ready
- Section 24.2.6: Phase 5: Ecosystem and Platform Maturity
- Section 24.2.7: Adoption Story: From Drivers to Distributed
- Section 24.2.8: Licensing Summary
- Section 24.2.9: Performance Impact Summary
- Section 24.3: Verification Strategy
- Section 24.3.1: Testing Layers
- Section 24.3.2: LTP as Agentic Compatibility Substrate
- Section 24.3.3: Key Benchmarks
- Section 24.3.4: Crash Recovery Testing
- Section 24.3.5: CI Pipeline
- Section 24.4: Formal Verification Readiness
- Section 24.4.1: The Opportunity
- Section 24.4.2: What To Verify
- Section 24.4.3: Design for Verifiability
- Section 24.4.4: Verification Tooling
- Section 24.4.5: Performance Impact
- Section 24.5: Technical Risks
- Section 24.5.1: Risks from Advanced Features (Chapters 16-18)
- Section 24.5.2: Risk Response Priority
- Section 24.5.3: Domain Grouping: Degraded Isolation Analysis
- Section 24.6: Appendices
- Section 24.7: Licensing Model: Open Kernel License Framework (OKLF) v1.3
- Section 24.8: Project Structure
- Section 24.9: What UmkaOS Provides That Linux Cannot
- Section 24.10: Cross-Feature Integration Map
- Section 24.10.1: Cross-Feature Integration Map
- Section 24.10.2: Implementation Dependency Graph
- Section 24.10.3: Cross-Feature Integration Testing Specification
- Section 24.11: Open Questions
- Section 24.11.1: Resolved Decisions (collapsed — full rationale in referenced sections)
- Section 24.11.2: Open Questions (genuinely unresolved)
- Section 24.12: KABI IDL Compiler Specification
Chapter 25: Agentic Development Methodology¶
- Section 25.1: Understanding the Bottleneck
- Section 25.1.1: What AI Agents Are Fast At
- Section 25.1.2: What AI Agents Are NOT Fast At
- Section 25.2: Development Model: Parallel Agentic Workflow
- Section 25.2.1: Agent Parallelization
- Section 25.2.2: Coordination Overhead
- Section 25.3: Phase-by-Phase Timeline (Agentic)
- Section 25.3.1: Phase 1.1: Core Kernel (all 8 architectures, minimal functionality)
- Section 25.3.2: Phase 2.1: Essential Drivers (NVMe, NIC, USB, I/O)
- Section 25.3.3: Phase 2.2: Linux Compatibility Layer
- Section 25.3.4: Phase 2.3: Networking Stack
- Section 25.3.5: Phase 3.1: Storage Stack (VFS, filesystems, DM/MD)
- Section 25.3.6: Phase 3.2: Advanced Features (Distributed, Observability, Power)
- Section 25.3.7: Phase 4.1: Consumer Hardware (WiFi, Bluetooth, Audio, Graphics)
- Section 25.3.8: Phase 5.1: Windows Emulation Acceleration (WEA)
- Section 25.4: Total Timeline (Sequential Phases)
- Section 25.5: Total Timeline (Optimized Parallelism)
- Section 25.6: What About Spec Bugs?
- Section 25.7: Hardware Bottlenecks
- Section 25.7.1: Real Hardware Testing Requirements
- Section 25.7.2: Specialized Hardware Acquisition
- Section 25.8: QEMU CPU Feature Testing Matrix
- Section 25.8.1: Design Principles
- Section 25.8.2: x86-64
- Section 25.8.3: AArch64
- Section 25.8.4: ARMv7
- Section 25.8.5: RISC-V 64
- Section 25.8.6: PPC32
- Section 25.8.7: PPC64LE
- Section 25.8.8: s390x
- Section 25.8.9: LoongArch64
- Section 25.8.10: CI Test Matrix (Recommended)
- Section 25.8.11: Known Testing Gaps
- Section 25.9: Human Involvement Required
- Section 25.9.1: Architectural Decisions (Non-Automatable)
- Section 25.9.2: Spec Review & Correction
- Section 25.9.3: External Coordination
- Section 25.10: Realistic Full Timeline (Agentic + Human)
- Section 25.11: Comparison: Human vs Agentic
- Section 25.12: Sensitivity Analysis: Slower Inference
- Section 25.13: Optimistic vs Pessimistic Scenarios
- Section 25.13.1: Best Case (Everything Goes Right)
- Section 25.13.2: Realistic Case (Some Issues)
- Section 25.13.3: Pessimistic Case (Major Problems)
- Section 25.14: What Determines Success?
- Section 25.15: Recommendations
- Section 25.15.1: Before Starting Implementation
- Section 25.15.2: During Implementation
- Section 25.15.3: Metrics to Track
- Section 25.16: Final Answer: Realistic Timeline
- Section 25.17: Agentic Live Development Workflow
- Section 25.17.1: Driver Development on a Live Host
- Section 25.17.2: Kernel Service Development via Live Replacement
- Section 25.17.3: Multikernel Testing Strategy
- Section 25.17.4: LTP as Agentic Development Substrate
- Section 25.17.5: Development Acceleration Summary
- Section 25.18: Linux Test Suite Inventory for Agentic Development
- Section 25.18.1: Tier 1: High-Value Test Suites (Pure Userspace API, Directly Usable)
- Section 25.18.2: Tier 2: Kselftest Subdirectories
- Section 25.18.3: Tier 3: Specialized External Suites
- Section 25.18.4: Coverage Map: UmkaOS Chapters × Available Test Suites
- Section 25.18.5: The Test-Driven Agentic Development Pattern