Chapter 11: KABI — Kernel Driver ABI

Stable driver ABI, KABI IDL, vtable design, driver signing, compatibility windows

11.1 Driver Model and Stable ABI (KABI)

11.1.1 The Problem We Solve

Linux has NO stable in-kernel ABI. This means:

Drivers must recompile with every kernel update
Nvidia ships binary blobs that constantly break
DKMS rebuilds are fragile and fail
The community's answer is "get upstream or suffer"
Enterprise customers cannot independently update kernel and drivers

UmkaOS provides a stable, versioned, append-only C-ABI (called KABI) that survives kernel updates. A driver compiled against KABI v1 will load and run correctly on any future kernel that supports KABI v1 -- without recompilation.

11.1.2 Interface Definition Language (.kabi)

All driver interfaces are defined in .kabi IDL files. The kabi-compiler tool generates both Rust and C bindings from these definitions.

// interfaces/block_device.kabi
@version(1)
interface BlockDevice {
    fn submit_io(op: IoOp, lba: u64, count: u32, buf: DmaBuffer) -> IoResult;
    fn poll_completion(handle: RequestHandle) -> PollResult;
    fn get_capabilities() -> BlockCapabilities;
}

@version(2) @extends(BlockDevice, 1)
interface BlockDeviceV2 {
    fn discard_blocks(lba: u64, count: u32) -> IoResult;
    fn zone_management(op: ZoneOp, zone: u64) -> ZoneResult;
}

This compiles down to a C-compatible vtable:

#[repr(C)]
pub struct BlockDeviceVTable {
    pub vtable_size: u64,          // Primary version discriminant
    pub version: u32,

    // V1 methods -- mandatory, never Option
    pub submit_io: unsafe extern "C" fn(
        ctx: *mut c_void, op: IoOp, lba: u64, count: u32, buf: DmaBuffer,
    ) -> IoResult,
    pub poll_completion: unsafe extern "C" fn(
        ctx: *mut c_void, handle: RequestHandle,
    ) -> PollResult,
    pub get_capabilities: unsafe extern "C" fn(
        ctx: *mut c_void,
    ) -> BlockCapabilities,

    // V2 methods -- optional, wrapped in Option for graceful absence
    pub discard_blocks: Option<unsafe extern "C" fn(
        ctx: *mut c_void, lba: u64, count: u32,
    ) -> IoResult>,
    pub zone_management: Option<unsafe extern "C" fn(
        ctx: *mut c_void, op: ZoneOp, zone: u64,
    ) -> ZoneResult>,
}

11.1.3 ABI Rules (Enforced by CI)

These rules are non-negotiable and enforced by the kabi-compat-check tool in CI:

Vtables are append-only -- new methods are added at the end only.
Existing methods are never removed, reordered, or changed in signature.
All types crossing the ABI use #[repr(C)] with explicit sizes (u32, u64; never usize which varies by platform).
Enums use #[repr(u32)] with explicit discriminant values.
New struct fields are appended only, never removed or reordered.
The vtable_size field enables runtime version detection. A kernel can determine which methods are present by comparing vtable_size against known offsets.
Padding fields are reserved and must be zero-initialized for forward compatibility.

kabi-compat-check tool specification:

The CI tool enforces the rules above by diffing the current .kabi IDL against the previous release baseline. The algorithm:

Parse both versions: Load old.kabi and new.kabi into an AST representation (vtable definitions, struct definitions, enum definitions, constant declarations).
Vtable diff: For each vtable present in old.kabi:
Reject if any method was removed, reordered, or had its signature changed.
Accept appended methods (new entries after the old vtable's last method).
Reject if the vtable name or module path changed.
Struct diff: For each #[repr(C)] struct in old.kabi:
Reject if any existing field was removed, reordered, or changed type.
Accept appended fields (new fields after the last old field).
Verify explicit padding (_pad: [u8; N]) is preserved (not repurposed).
Verify #[repr(C)] attribute is present on all ABI-crossing types.
Enum diff: For each #[repr(u32)] enum:
Reject if any existing variant was removed or had its discriminant changed.
Accept new variants appended at the end.
Type size check: Verify that no usize, isize, bool, or Vec appears in ABI-crossing types. Only fixed-width types (u8-u64, i8-i64, f32, f64, *const T, *mut T) and #[repr(C)] composites are permitted.
Report: On any rule violation, emit a structured error identifying the breaking change (field name, old type, new type, line number in .kabi file) and exit non-zero. CI treats this as a hard failure — no KABI-breaking change can merge.

The .kabi files are the single source of truth for the stable ABI surface. They are checked into the repository alongside the Rust source and are versioned with the KABI major version number (e.g., kabi-v1.kabi, kabi-v2.kabi).

11.1.4 KABI Version Lifecycle and Deprecation Policy

Append-only vtables ensure forward compatibility indefinitely — a driver compiled against KABI v1 runs on a kernel implementing KABI v47. But without a deprecation policy, vtables grow without bound, accumulating dead methods that no driver uses, wasting cache lines, and complicating auditing. This section defines the lifecycle.

Version numbering — KABI versions are integer-incremented (v1, v2, v3...). Each version corresponds to a vtable layout. A new version is minted when methods are appended or struct fields are added (never removed or reordered). Major kernel releases bump the KABI version; minor releases do not.

Support window — each KABI version is supported for 5 major kernel releases from the release that introduced it. This provides a concrete, predictable window:

KABI v1: introduced in UmkaOS 1.0 → supported through UmkaOS 5.x → deprecated in 6.0
KABI v5: introduced in UmkaOS 5.0 → supported through UmkaOS 9.x → deprecated in 10.0

Deprecation process:

Deprecation announcement (N-2 releases before removal): KABI v1 is marked deprecated when UmkaOS 4.0 ships. Loading a driver built against a deprecated KABI version logs a warning: umka: driver nvme.ko uses deprecated KABI v1 (supported until UmkaOS 5.x, rebuild recommended)
Compatibility shim (during deprecation window): deprecated vtable methods are backed by shim implementations that translate old calls to current equivalents. This is a vtable-level adapter, not per-call overhead.
Removal (at window expiry): when UmkaOS 6.0 ships, the KABI v1 compatibility shim is removed. Drivers compiled against KABI v1 fail to load with a clear error: umka: driver nvme.ko requires KABI v1 (minimum supported: v2)
Never break within window: a driver compiled against any supported KABI version must load and function correctly. This is a hard contract, verified by CI testing with driver binaries compiled against every supported KABI version.

Vtable compaction — when a KABI version is removed, the kernel MAY reorganize internal vtable storage to reclaim space from removed shims. This is invisible to drivers (they see only their own KABI version's vtable layout, which never changes within the support window). Compaction is an implementation optimization, not a semantic change.

Practical impact — with annual major releases and a 5-release window, drivers have ~5 years before they must recompile. This is dramatically longer than Linux's "recompile every kernel update" reality, while avoiding the "append forever" problem.

11.1.5 Bilateral Capability Exchange

Unlike Linux's global kernel symbol table (EXPORT_SYMBOL), UmkaOS uses a bilateral vtable exchange model. There are no global symbols, no symbol versioning, and no uncontrolled dependencies.

Driver Loading Sequence:

1. Kernel resolves ONE well-known symbol: __kabi_driver_entry
2. Kernel passes KernelServicesVTable TO driver
   (this is what the kernel provides to the driver)
3. Driver passes DriverVTable TO kernel
   (this is what the driver provides to the kernel)
4. All further communication flows through these two vtables
5. No other symbols are resolved -- ever

The KernelServicesVTable is also versioned and append-only:

#[repr(C)]
pub struct KernelServicesVTable {
    pub vtable_size: u64,
    pub version: u32,

    // Memory management.
    // All sizes use u64, not usize, to maintain ABI stability across
    // 32-bit (ARMv7, PPC32) and 64-bit targets (rule 3, Section 11.1.3).
    pub alloc_dma_buffer: unsafe extern "C" fn(
        size: u64, align: u64, flags: AllocFlags,
    ) -> AllocResult,
    pub free_dma_buffer: unsafe extern "C" fn(
        handle: DmaBufferHandle,
    ) -> FreeResult,

    // Interrupt management
    pub register_interrupt: unsafe extern "C" fn(
        irq: u32, handler: InterruptHandler, ctx: *mut c_void,
    ) -> IrqResult,
    pub deregister_interrupt: unsafe extern "C" fn(
        irq: u32,
    ) -> IrqResult,

    // Logging
    pub log: unsafe extern "C" fn(
        level: u32, msg: *const u8, len: u32,
    ),

    // Ring buffer creation (added in v2)
    pub create_ring_buffer: Option<unsafe extern "C" fn(
        entries: u32, entry_size: u32, flags: RingFlags,
    ) -> RingResult>,

    // ... extends over time, always append-only ...
}

11.1.5.1 ValidatedCap: Amortized Capability Validation

Capability checks on every KABI dispatch add per-call overhead. The ValidatedCap mechanism amortizes this cost: a caller validates a capability once against a driver and receives a token. Subsequent dispatches present the token rather than re-validating the full capability tree.

/// A validate-once token for KABI dispatch.
/// Created by presenting a `Capability` to the target driver domain.
/// Valid only while the driver instance that produced it remains alive.
pub struct ValidatedCap {
    /// The underlying capability.
    pub cap: Capability,
    /// Driver domain that validated this capability.
    /// Checked on every dispatch to prevent cross-domain misuse.
    pub domain_id: DriverDomainId,
    /// Generation counter of the driver domain at validation time.
    /// The driver domain's `generation` field is incremented each time
    /// the driver crashes and reloads. A mismatch means the token is stale.
    pub driver_generation: u64,
    /// Opaque rights bitmask extracted from the capability at validation time.
    /// Cached here to avoid re-parsing the capability on each dispatch.
    pub cached_rights: u32,
}

Using a ValidatedCap: The KABI dispatch trampoline checks the token before forwarding the call to the driver:

fn kabi_dispatch_with_vcap(
    vcap: &ValidatedCap,
    domain: &DriverDomain,
    request: &KabiRequest,
) -> Result<KabiResponse, KabiError> {
    // Single atomic load — L1-resident, ~1-3 cycles.
    let current_gen = domain.generation.load(Ordering::Acquire);
    if vcap.domain_id != domain.id || vcap.driver_generation != current_gen {
        return Err(KabiError::StaleValidatedCap);
    }
    // Token is fresh: dispatch without re-validating the full capability.
    dispatch_to_domain(domain, request, vcap.cached_rights)
}

11.1.5.2 ValidatedCap Invalidation on Driver Crash

When a Tier 1 driver crashes and is reloaded, any ValidatedCap tokens issued against the old instance are stale. The generation counter mechanism closes this window without requiring a global scan of all callers.

DriverDomain generation counter:

// All generation counters in UmkaOS use u64.
// Rationale: u64 provides 584 years of wraparound-free operation at 1 billion
// driver reloads per second. Silent u64→u32 truncation in generation comparisons
// could allow stale handles to pass validation after a counter wraparound,
// creating a security vulnerability. The uniform u64 policy prevents this class
// of bug entirely.

/// Kernel-managed state for a Tier 1 driver isolation domain.
/// Stored in umka-core memory (never in the driver's own domain) so it
/// remains valid and writable after the driver domain is torn down.
pub struct DriverDomain {
    /// Unique domain identifier. Never reused after domain destruction.
    pub id: DriverDomainId,
    /// Generation counter. Starts at 1 (odd = active). Incremented to an
    /// even value on crash (marking inactive), then to the next odd value
    /// when the replacement driver instance is ready (marking active again).
    /// Stored with `Ordering::SeqCst` writes, `Ordering::Acquire` reads.
    pub generation: AtomicU64,
    // ... isolation key, ring buffer references, etc.
}

On driver crash (performed by the domain fault handler before teardown):

The fault handler atomically increments DriverDomain::generation from odd (active) to even (inactive): rust domain.generation.fetch_add(1, Ordering::SeqCst);
Any subsequent kabi_dispatch_with_vcap call that compares vcap.driver_generation against the now-even generation will find a mismatch and return KabiError::StaleValidatedCap immediately — no stale dispatch reaches the crashed (or reloaded) driver.
After the replacement driver completes initialization, the fault handler increments generation again (even → odd), activating the new instance.

Per-CPU ValidatedCap cache flush:

Tier 1 drivers maintain a per-CPU cache of recently issued ValidatedCap tokens (up to 16 entries per CPU) to avoid repeatedly re-creating tokens for the same capability. When a domain's generation is incremented, these caches must be purged:

The fault handler sets a cap_flush_pending bit in each CPU's CpuLocal data, targeted at the crashing domain's DriverDomainId.
The fault handler issues a cross-CPU IPI to all CPUs that have touched this domain since the last quiescent state (tracked via a per-domain CPU bitmask updated on each KABI call).
Each IPI handler clears all ValidatedCap cache entries with domain_id == crashed_domain.id.
The IPI completes before the fault handler releases the domain's memory. Any in-flight dispatch that passed the generation check before the IPI but has not yet completed will fault into the (now revoked) isolation domain and be caught by the domain fault handler — returning KabiError::StaleValidatedCap to the caller via the crash-recovery path, not a kernel panic.

Caller recovery: A caller that receives KabiError::StaleValidatedCap must:

Discard the stale ValidatedCap.
Wait for the driver to recover (poll the domain's generation for an odd value, or use the service_recovered callback defined in Section 10.5.8.5a).
Re-validate the original Capability against the new driver instance to obtain a fresh ValidatedCap.

This three-step recovery parallels the ServiceHandle re-open protocol described in Section 10.5.8.5a. Both mechanisms force callers to observe the crash boundary rather than silently continuing through a reloaded driver. The difference in representation: KabiServiceHandle (this chapter) is the C-ABI stable handle with a generation counter (detects driver instance replacement); liveness is maintained by the capability system, not by a per-handle reference count. ServiceHandle (Section 10.5.8.5a) is the lower-level cross-domain handle used in the trampoline dispatch path. A KabiServiceHandle is resolved from a ServiceHandle — the registry fills in the generation from the provider's current state_generation at lookup time.

Performance: The generation check is a single Ordering::Acquire atomic load (~3-5 cycles, L1-resident in the domain descriptor cache line). This is cheaper than re-validating the capability tree on each dispatch. The IPI on crash is infrequent (driver crashes are exceptional events) and bounded to CPUs that have actively used the domain.

11.1.5.3 Generation Counter Wrap Policy

The DriverDomain::generation counter is a u64 that increments by 1 on every driver crash (odd → even) and again when the replacement driver becomes ready (even → odd). Two increments per crash cycle means up to ~9.2 × 10^18 increments before the counter reaches u64::MAX. At one billion driver reloads per second (far beyond any realistic scenario) the counter would take approximately 584 years to exhaust. Although this is effectively unreachable in practice, production correctness requires explicit handling of wrap so that no combination of inputs can produce a silent invariant violation.

On increment — wrap detection:

/// Increment the generation counter from odd (active) to even (inactive),
/// marking the domain as crashed. Returns Err(EOVERFLOW) if the next
/// generation value would be zero (wrap boundary).
pub fn mark_crashed(domain: &DriverDomain) -> Result<(), KernelError> {
    let prev = domain.generation.load(Ordering::Acquire);
    let next = prev.wrapping_add(1);
    if next == 0 {
        // The counter has wrapped. The slot must be reset by the operator
        // before it can accept another driver load. Log and refuse.
        log::error!(
            "DriverDomain {:?}: generation counter exhausted after {} cycles. \
             Operator must clear the slot before reloading.",
            domain.id,
            prev / 2,
        );
        return Err(KernelError::EOVERFLOW);
    }
    domain.generation.store(next, Ordering::SeqCst);
    Ok(())
}

When EOVERFLOW is returned, the crash recovery path logs the event to the FMA telemetry ring (Section 19.1) and marks the driver slot as SlotState::GenerationExhausted. The device node remains in the registry (so userspace observability tools can see its state) but all driver_load() calls for that slot return -EOVERFLOW until an operator issues driver_slot_reset() via the management KABI. The reset zeroes the generation counter and clears the exhausted state, allowing the slot to be used again.

Outstanding handles — invalidation across wrap:

Any ValidatedCap or DriverHandle that carries a generation value from before the wrap is automatically invalidated by the existing mismatch check in kabi_dispatch_with_vcap (Section 11.1.5.1): the current domain generation (now reset to 1 after operator reset) will not match any previously issued token carrying a generation value near u64::MAX. No additional logic is required.

HMAC key rotation across wrap:

The DriverHmacKey.generation field (Section 10.5.3.1) is an input to the HKDF derivation. After an operator slot reset the generation restarts at 1. A key derived with generation=1 after a reset is cryptographically distinct from a key derived with generation=1 at initial driver load, because HKDF's Info field includes a time_of_creation field (a monotonic kernel timestamp, captured at key allocation time). The DriverHmacKey struct carries this timestamp for generation-0-after-reset disambiguation:

pub struct DriverHmacKey {
    key:          Zeroize<[u8; 32]>,
    driver_slot:  DriverSlot,
    generation:   u64,
    /// Monotonic nanosecond timestamp at which this key was created.
    /// Included in HKDF `Info` to distinguish keys with the same (slot, generation)
    /// pair that arise after a generation counter reset. Never exported.
    created_at_ns: u64,
}

The HKDF Info field is therefore: b"umka-driver-hmac" || slot_id.to_le_bytes() || generation.to_le_bytes() || created_at_ns.to_le_bytes()

This prevents an attacker who can observe an HMAC tag from a pre-reset driver from replaying it against a post-reset driver with the same (slot, generation) tuple.

Operational note:

Generation counter exhaustion requires approximately 9.2 × 10^18 crash-and-reload increments on a single driver slot. At one crash per second (an extremely unstable driver), this takes over 292 billion years. The wrap check satisfies production correctness requirements and makes the invariant explicit in code, but it is not expected to trigger in any operational environment. Systems that monitor driver crash rates via FMA telemetry will have flagged a repeatedly crashing driver long before generation exhaustion becomes possible.

11.1.6 Version Negotiation

When a driver loads, version negotiation proceeds as follows:

1. Driver calls __kabi_driver_entry(kernel_vtable, &driver_vtable)
2. Driver reads kernel_vtable.version:
   - If kernel version >= driver's minimum required version: proceed
   - If kernel version < driver's minimum: return KABI_ERR_VERSION_MISMATCH
3. Kernel reads driver_vtable.version:
   - If driver version >= kernel's minimum for this interface: proceed
   - If driver version < kernel's minimum: reject driver with log message
4. Both sides use vtable_size to detect which optional methods are present
5. Optional methods (Option<fn>) are checked before each call

This allows: - Old drivers on new kernels (new kernel methods are simply not called) - New drivers on old kernels (driver checks for method presence, degrades gracefully) - Independent kernel and driver update cycles

See also: Section 18.7 (Safe Kernel Extensibility) extends the KABI vtable pattern to kernel policy modules, enabling hot-swappable scheduler policies, memory policies, and fault handlers using the same append-only ABI mechanism.

11.1.6.1 Vtable Version Compatibility (Zero-Extension Contract)

Step 4 of the negotiation protocol above states that both sides use vtable_size to detect which methods are present. This subsection defines the exact binary-level rules — the zero-extension contract — for all three size relationships.

Case 1: Newer driver, older kernel (driver vtable larger than kernel expects)

The driver was compiled against a newer KABI version that appended methods the kernel does not know about.

Rule: The kernel reads only the first min(driver_vtable_size, KERNEL_VTABLE_SIZE) bytes of the driver's vtable. Methods appended by the driver beyond KERNEL_VTABLE_SIZE are silently ignored. The kernel MUST NOT access any byte at or beyond its own compiled vtable size.

// At driver load time, in the kernel's vtable acceptance path:
let effective_size = driver_vtable.vtable_size.min(KERNEL_VTABLE_SIZE as u64);
// All subsequent method dispatches use effective_size as the bound.

This is safe because vtables are append-only (Section 11.1.3 Rule 1): the first KERNEL_VTABLE_SIZE bytes of any newer driver vtable are layout-identical to the kernel's own definition of that vtable.

Case 2: Older driver, newer kernel (driver vtable smaller than kernel expects)

The driver was compiled against an older KABI version that did not include methods the kernel now defines.

Rule: Every byte in the kernel's vtable definition that lies beyond driver_vtable.vtable_size MUST be treated as zero. Concretely: any fn pointer in a slot beyond the driver's declared size is null. The kernel MUST check for null before calling any such method pointer and MUST use the per-method default declared in the IDL (see below) when the pointer is null.

This rule extends to Option<fn(...)> fields that happen to fall inside the driver's declared size but are nonetheless null: null-check is always required regardless of whether the slot might be beyond vtable_size.

The canonical dispatch helper is:

/// Dispatch a vtable method with a null-check and fallback default.
///
/// Evaluates to `$default` when the method slot lies beyond the driver's
/// declared `vtable_size` or the function pointer is null (older driver that
/// predates this method).
///
/// # Safety
///
/// The caller must ensure `$vtable` points to a valid, immutable vtable for
/// the lifetime of the call.
macro_rules! kabi_call {
    ($vtable:expr, $method:ident, $default:expr $(, $args:expr)*) => {{
        let vt = $vtable;
        let offset = core::mem::offset_of!(core::ptr::read(vt).__struct_type, $method);
        // SAFETY: vtable pointer validity is a precondition of kabi_call!.
        let fn_ptr = unsafe { core::ptr::addr_of!((*vt).$method) };
        if offset + core::mem::size_of_val(unsafe { &*fn_ptr }) <= unsafe { (*vt).vtable_size } as usize
            && !unsafe { *fn_ptr }.is_null()
        {
            // SAFETY: pointer validity and vtable_size bound checked above.
            unsafe { (*fn_ptr)($($args),*) }
        } else {
            $default
        }
    }};
}

In practice, the macro is wrapped by per-method helper functions generated by kabi-gen so driver authors and kernel subsystems never write the offset arithmetic by hand.

Case 3: Exact version match

Both sides declare the same vtable_size. No special handling is required. All method pointers within the vtable may still be null if they are declared Option<fn(...)> in the IDL; null-checks on optional methods are always required regardless of version match.

Per-method null-pointer defaults in the IDL

Each method in a .kabi IDL file MUST carry a default annotation. kabi-gen uses this annotation to generate the fallback arm of the dispatch helper and to enforce that every call site provides a default value. The annotation syntax is:

// Accepted default forms:
fn on_suspend() -> () = default_noop;     // null → do nothing, return ()
fn get_capabilities() -> u64 = 0u64;      // null → return literal zero
fn handle_interrupt() -> bool = false;    // null → interrupt not claimed
fn custom_ioctl(cmd: u32, arg: u64)
    -> KabiResult<u64, IoctlError>
    = default_err(KabiError::NOT_SUPPORTED); // null → return error variant

Methods without a default annotation are mandatory: they must be non-null in every driver vtable, and their absence causes driver load rejection (see below). kabi-gen marks these fields as bare unsafe extern "C" fn(...) (not Option<fn>).

Minimum vtable size and load rejection

If driver_vtable.vtable_size < KABI_MINIMUM_VTABLE_SIZE, the kernel MUST reject the driver at load time with ENOEXEC and log:

umka: driver <name> rejected: vtable_size=<N> below minimum=<M> (KABI baseline v1)

KABI_MINIMUM_VTABLE_SIZE is the byte size of the vtable struct as it existed at KABI v1 — the first release that established the mandatory baseline method set. It is a compile-time constant derived from the v1 IDL snapshot and never changes (because KABI v1 methods are never removed or reordered). A driver that cannot even fill the v1 layout is either corrupt, built against a pre-release ABI, or targeting a different device class entirely; loading it would be unsafe.

This check is distinct from the KABI version support-window check (Section 11.1.4): a driver can be within the support window yet still produce ENOEXEC if its vtable is truncated below the v1 baseline (e.g., due to a linker error).

Load rejection decision tree:
  vtable_size < KABI_MINIMUM_VTABLE_SIZE  → ENOEXEC (corrupt / pre-baseline)
  driver version < minimum supported      → ENOEXEC (outside support window)
  vtable_size < KERNEL_VTABLE_SIZE        → load OK, older driver (Case 2 above)
  vtable_size = KERNEL_VTABLE_SIZE        → load OK, exact match (Case 3)
  vtable_size > KERNEL_VTABLE_SIZE        → load OK, newer driver (Case 1 above)

11.1.7 KABI IDL Language Specification

The .kabi IDL defines the stable driver ABI. The umka-kabi-gen compiler transforms .kabi source files into C headers and Rust modules for use by drivers and the kernel. This section is the canonical reference for authoring .kabi files; implementors of umka-kabi-gen must conform to every rule described here.

KabiResult<T, E> (used in vtable return types) is a #[repr(C)] result type with a discriminant tag and a union payload, defined in umka-driver-sdk/src/abi.rs. Rust's Result<T, E> has no guaranteed #[repr(C)] layout and must never cross the ABI boundary directly.

The single resolved symbol: Drivers export exactly one symbol — __kabi_driver_entry: extern "C" fn(*const KernelServicesVTable) -> *const DriverVTable. The kernel calls this at load time, passing the kernel's services vtable and receiving the driver's vtable in return. All subsequent communication flows through vtable method calls; no further symbol resolution occurs. This eliminates the entire class of symbol versioning problems that plagues Linux's EXPORT_SYMBOL / MODULE_VERSION mechanism.

umka-kabi-gen vs kabi-compat-check: umka-kabi-gen generates code (Rust structs, C headers, validation stubs) from .kabi IDL files. kabi-compat-check (described in Section 11.1.3) validates that a new .kabi file is backward-compatible with the previous release baseline. Both tools share the same .kabi parser frontend but serve different purposes: umka-kabi-gen runs at build time, kabi-compat-check runs in CI.

11.1.7.1 File Format

A .kabi file is UTF-8 encoded plain text. Line comments use //. Block comments use /* */ (may not be nested). File extension: .kabi. Convention: one vtable definition per file, stored in umka-driver-sdk/interfaces/.

The first non-comment, non-blank statement in a .kabi file must be the version declaration:

kabi_version <N>;

where <N> is a positive integer giving the highest version number defined in this file. Fields and methods introduced in earlier versions remain present. When a new field or method is added, kabi_version is bumped and the new item carries a @version(N) annotation matching the new version number.

11.1.7.2 Type System

All types in the IDL map to fixed-layout C and Rust equivalents. The type system deliberately excludes types whose layout is platform-dependent.

11.1.7.2.1 Primitive Types

IDL type	C type (64-bit targets)	C type (32-bit targets, via `umka_kabi_u128_t`)	Rust output	Width
`u8`	`uint8_t`	`uint8_t`	`u8`	8-bit
`u16`	`uint16_t`	`uint16_t`	`u16`	16-bit
`u32`	`uint32_t`	`uint32_t`	`u32`	32-bit
`u64`	`uint64_t`	`uint64_t`	`u64`	64-bit
`u128`	`__uint128_t`	`umka_kabi_u128_t` (struct { uint64_t lo, hi })	`u128`	128-bit
`i8`	`int8_t`	`int8_t`	`i8`	8-bit
`i16`	`int16_t`	`int16_t`	`i16`	16-bit
`i32`	`int32_t`	`int32_t`	`i32`	32-bit
`i64`	`int64_t`	`int64_t`	`i64`	64-bit
`i128`	`__int128_t`	`umka_kabi_i128_t` (struct { int64_t lo; uint64_t hi })	`i128`	128-bit
`f32`	`float`	`float`	`f32`	IEEE 754 32-bit
`f64`	`double`	`double`	`f64`	IEEE 754 64-bit
`bool`	`uint8_t`	`uint8_t`	`u8`	0=false, 1=true; other values undefined

Warning: f32 and f64 have platform-defined NaN representations and must not be used to carry values that must be bit-identical across architectures. Use scaled integers (e.g., u32 in milliunits) instead.

11.1.7.2.1a 128-bit Integer Portability Shim

__uint128_t and __int128_t are GCC/Clang extensions available on 64-bit targets only. On 32-bit UmkaOS targets (ARMv7, PPC32), the compiler-defined __SIZEOF_INT128__ macro is absent and these types do not exist. The umka-kabi-gen tool emits the following shim in the C header preamble for all generated headers, selecting the correct representation at compile time:

/* --- umka_kabi_u128_t / umka_kabi_i128_t portability shim ---
 *
 * Generated by umka-kabi-gen in the C header preamble for all targets.
 * On 64-bit targets with __SIZEOF_INT128__, maps directly to compiler built-ins.
 * On 32-bit targets (ARMv7, PPC32) where __SIZEOF_INT128__ is absent, uses a
 * two-u64-field struct. Field order is endian-aware: little-endian targets have
 * lo first; big-endian targets (PPC32 BE) have hi first.
 *
 * Use the UMKA_KABI_U128_LO / UMKA_KABI_U128_HI macros for portable access.
 */
#if defined(__SIZEOF_INT128__)
  typedef unsigned __int128  umka_kabi_u128_t;
  typedef          __int128  umka_kabi_i128_t;
  #define UMKA_KABI_U128_LO(v) ((uint64_t)((v) & 0xFFFFFFFFFFFFFFFFULL))
  #define UMKA_KABI_U128_HI(v) ((uint64_t)((v) >> 64))
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
  typedef struct { uint64_t lo; uint64_t hi; } umka_kabi_u128_t;  /* LE layout */
  typedef struct { int64_t  lo; uint64_t hi; } umka_kabi_i128_t;  /* LE layout */
  #define UMKA_KABI_U128_LO(v) ((v).lo)
  #define UMKA_KABI_U128_HI(v) ((v).hi)
#else
  typedef struct { uint64_t hi; uint64_t lo; } umka_kabi_u128_t;  /* BE layout */
  typedef struct { int64_t  hi; uint64_t lo; } umka_kabi_i128_t;  /* BE layout */
  #define UMKA_KABI_U128_LO(v) ((v).lo)
  #define UMKA_KABI_U128_HI(v) ((v).hi)
#endif

Design guidance: In KABI IDL, avoid u128 in interfaces unless mathematically required (e.g., cryptographic nonces, UUIDs, 128-bit packet counters). Prefer two u64 fields with explicit semantics for cross-platform clarity and debuggability.

The IDL has no usize or isize type. These are intentionally excluded: pointer-sized integers differ between 32-bit and 64-bit UmkaOS targets (ARMv7, PPC32 are 32-bit; x86-64, AArch64, RISC-V 64, PPC64LE are 64-bit). Use u64 or i64 for ABI-stable pointer-sized values. For actual pointers, use *const T or *mut T.

11.1.7.2.2 Pointer and Aggregate Types

IDL syntax	C output	Rust output	Notes
`*const T`	`const T*`	`*const T`	Non-null; use `Option<*const T>` for nullable
`*mut T`	`T*`	`*mut T`	Non-null; use `Option<*mut T>` for nullable
`Option<*const T>`	`const T*`	`Option<*const T>`	Nullable pointer; C uses NULL
`Option<*mut T>`	`T*`	`Option<*mut T>`	Nullable pointer; C uses NULL
`[T; N]`	`T arr[N]`	`[T; N]`	N must be a positive integer literal

Pointers in the IDL are always raw. Rust references (&T, &mut T) are never permitted in .kabi files because lifetime annotations cannot cross the ABI boundary. Raw pointers carry no lifetime, which is correct for cross-language vtable dispatch.

11.1.7.2.3 Type Aliases

Type aliases give semantic names to primitive types for readability. They have no effect on ABI layout.

type DeviceId = u64;
type Timeout  = u32;    // milliseconds; a comment documenting units is required
type ErrCode  = i32;

Type aliases must appear before their first use in the file. An alias to another alias is allowed; cycles are rejected with a compile error.

11.1.7.3 Struct Definition

Structs map to #[repr(C)] in Rust and to a typedef struct with explicit alignment in C. Field layout follows the standard C ABI rules for the target architecture.

11.1.7.3.1 Syntax

@version(<N>)            // highest version any field in this struct was introduced
@align(<A>)              // optional; force struct alignment to A bytes (power of 2)
struct <Name> {
    @version(<V>)        // version this field was introduced; required on every field
    <field_name>: <Type>,

    @version(<V>)
    @deprecated(since = <D>)  // optional; informational only
    <field_name>: <Type>,
}

Rules enforced by the compiler:

For vtable structs, vtable_size must be the first field declared. For plain structs not used as vtables, there is no mandatory first field.
Field version annotations must be monotonically non-decreasing top to bottom (a field annotated @version(5) may not appear before a field annotated @version(3)).
No field may be removed between IDL versions (enforced by kabi-compat-check).
No field may be reordered between IDL versions.
@align must be a power of 2 between 1 and 4096 inclusive.
@deprecated(since = N) is informational only; the field remains in generated output with no layout or runtime effect.

11.1.7.3.2 Example

// interfaces/block_device.kabi
kabi_version 2;

@version(1)
@align(8)
struct BlockDeviceInfo {
    @version(1)
    vtable_size: u32,        // set by caller to sizeof(BlockDeviceInfo)

    @version(1)
    device_flags: u32,

    @version(1)
    block_size: u32,         // bytes per logical block

    @version(1)
    queue_depth: u32,        // maximum number of in-flight commands

    @version(2)              // field added in version 2
    numa_node: u32,          // preferred NUMA node for DMA allocation

    @version(2)
    _pad: [u8; 4],           // explicit padding for 8-byte alignment of capacity_blocks

    @version(2)
    capacity_blocks: u64,    // device capacity in logical blocks
}

11.1.7.3.3 Generated C Output

/* Generated by umka-kabi-gen from interfaces/block_device.kabi */
/* DO NOT EDIT — regenerate with: umka-kabi-gen block_device.kabi */

#define KABI_BLOCK_DEVICE_INFO_V1_SIZE \
    ((size_t)offsetof(kabi_BlockDeviceInfo, numa_node))
#define KABI_BLOCK_DEVICE_INFO_V2_SIZE \
    ((size_t)sizeof(kabi_BlockDeviceInfo))

typedef struct __attribute__((aligned(8))) {
    uint32_t vtable_size;
    uint32_t device_flags;
    uint32_t block_size;
    uint32_t queue_depth;
    /* Version 2+: present only if vtable_size >= KABI_BLOCK_DEVICE_INFO_V2_SIZE */
    uint32_t numa_node;
    uint8_t  _pad[4];
    uint64_t capacity_blocks;
} kabi_BlockDeviceInfo;

11.1.7.3.4 Generated Rust Output

// Generated by umka-kabi-gen from interfaces/block_device.kabi
// DO NOT EDIT — regenerate with: umka-kabi-gen block_device.kabi

#[repr(C, align(8))]
pub struct BlockDeviceInfo {
    pub vtable_size:     u32,
    pub device_flags:    u32,
    pub block_size:      u32,
    pub queue_depth:     u32,
    // Version 2+:
    pub numa_node:       u32,
    pub _pad:            [u8; 4],
    pub capacity_blocks: u64,
}

impl BlockDeviceInfo {
    /// Size of the struct through the last V1 field (up to, not including, numa_node).
    pub const V1_SIZE: usize =
        core::mem::offset_of!(BlockDeviceInfo, numa_node);
    /// Full size of the struct including all fields through version 2.
    pub const V2_SIZE: usize =
        core::mem::size_of::<BlockDeviceInfo>();
}

11.1.7.4 Vtable Definition

Vtables are the primary unit of KABI. A vtable is a C-compatible struct of function pointers with a mandatory vtable_size: u64 field as the first member.

11.1.7.4.1 Syntax

@version(<N>)
vtable <Name> {
    @version(1)
    vtable_size: u64,             // MUST be first, MUST be version 1, MUST be u64

    @version(<V>)
    fn <method_name>(<param>: <Type>, ...) -> <ReturnType>;

    @version(<V>)
    @optional                     // null function pointer is permitted
    fn <method_name>(<param>: <Type>, ...) -> <ReturnType>;
}

The vtable_size field type is always u64, not u32. u64 ensures the same wire encoding on both 32-bit and 64-bit UmkaOS targets (ARMv7 and PPC32 are 32-bit platforms where usize is 32 bits; using u64 avoids a layout discrepancy when a 64-bit kernel talks to a 32-bit driver in a cross-arch scenario).

Function parameters and return types use the same primitive and aggregate types as structs. The return type () denotes no return value (void in C). The KabiResult<T, E> type for error-returning methods is defined in the KABI runtime support library (umka-driver-sdk/src/abi.rs) and referenced by name in the IDL.

@optional marks a method whose function pointer may be null in a loaded driver. Callers generated by umka-kabi-gen always check for null before calling an @optional method and invoke the fallback branch when it is null. Methods without @optional must be non-null in every loaded vtable; a null mandatory pointer causes the driver loader to reject the driver with ENOEXEC.

11.1.7.4.2 Versioning Contract

The kernel fills vtable_size to sizeof(KernelServicesVTable) (the kernel's own compile-time size) before passing its vtable to the driver.
The driver fills vtable_size to sizeof(DriverVTable) (the driver's own compile-time size) before passing its vtable to the kernel.
The receiver of a vtable must check vtable_size >= offset_of!(VTable, method) before calling any method that may not be present in an older vtable.
The umka-kabi-gen-generated _or_fallback helpers perform this check automatically; driver code must always call the helper, never the raw function pointer for a versioned (non-V1) method.
New methods are always appended at the end; the append-only rule is enforced by kabi-compat-check in CI.

11.1.7.4.3 Example

kabi_version 2;

/// Kernel-provided services vtable.  The kernel fills this struct and passes
/// a pointer to the driver at load time via __kabi_driver_entry.
@version(2)
vtable KernelServicesVTable {
    @version(1)
    vtable_size: u64,

    @version(1)
    fn alloc_dma(size: u64, align: u32, flags: u32) -> Option<*mut u8>;

    @version(1)
    fn free_dma(ptr: *mut u8, size: u64);

    @version(1)
    fn log(level: u32, msg: *const u8, len: u32);

    @version(2)
    @optional
    fn alloc_dma_node(size: u64, align: u32, flags: u32, node: u32) -> Option<*mut u8>;
}

11.1.7.4.4 Generated C Output

/* Generated by umka-kabi-gen from interfaces/kernel_services.kabi */
/* DO NOT EDIT */

#define KABI_KERNEL_SERVICES_V1_SIZE \
    ((size_t)offsetof(kabi_KernelServicesVTable, alloc_dma_node))
#define KABI_KERNEL_SERVICES_V2_SIZE \
    ((size_t)sizeof(kabi_KernelServicesVTable))

typedef struct {
    uint64_t vtable_size;
    void *(*alloc_dma)(uint64_t size, uint32_t align, uint32_t flags);
    void  (*free_dma)(void *ptr, uint64_t size);
    void  (*log)(uint32_t level, const uint8_t *msg, uint32_t len);
    /* Version 2+ (@optional): may be NULL; check vtable_size before calling */
    void *(*alloc_dma_node)(uint64_t size, uint32_t align,
                            uint32_t flags, uint32_t node);
} kabi_KernelServicesVTable;

/* Version-safe call helper (use this; never call the pointer directly): */
static inline void *
kabi_alloc_dma_node_or_fallback(
    const kabi_KernelServicesVTable *vtable,
    uint64_t size, uint32_t align, uint32_t flags, uint32_t node)
{
    if (vtable->vtable_size >= KABI_KERNEL_SERVICES_V2_SIZE &&
        vtable->alloc_dma_node != NULL)
        return vtable->alloc_dma_node(size, align, flags, node);
    return vtable->alloc_dma(size, align, flags);
}

11.1.7.4.5 Generated Rust Output

// Generated by umka-kabi-gen from interfaces/kernel_services.kabi
// DO NOT EDIT

#[repr(C)]
pub struct KernelServicesVTable {
    pub vtable_size:    u64,
    pub alloc_dma:      unsafe extern "C" fn(u64, u32, u32) -> Option<*mut u8>,
    pub free_dma:       unsafe extern "C" fn(*mut u8, u64),
    pub log:            unsafe extern "C" fn(u32, *const u8, u32),
    // Version 2+ (@optional):
    pub alloc_dma_node: Option<unsafe extern "C" fn(u64, u32, u32, u32) -> Option<*mut u8>>,
}

impl KernelServicesVTable {
    pub const V1_SIZE: usize =
        core::mem::offset_of!(KernelServicesVTable, alloc_dma_node);
    pub const V2_SIZE: usize =
        core::mem::size_of::<KernelServicesVTable>();

    /// Version-safe wrapper for `alloc_dma_node`.
    /// Falls back to `alloc_dma` (ignoring the node hint) when the kernel's
    /// vtable predates version 2 or when the method pointer is null.
    /// Always use this helper; never call `alloc_dma_node` directly.
    ///
    /// # Safety
    ///
    /// `self` must point to a valid kernel-provided vtable. All pointer
    /// arguments must satisfy the documented preconditions of the underlying
    /// `alloc_dma` / `alloc_dma_node` methods.
    pub unsafe fn alloc_dma_node_or_fallback(
        &self,
        size: u64,
        align: u32,
        flags: u32,
        node: u32,
    ) -> Option<*mut u8> {
        if self.vtable_size as usize >= Self::V2_SIZE {
            if let Some(f) = self.alloc_dma_node {
                return f(size, align, flags, node);
            }
        }
        (self.alloc_dma)(size, align, flags)
    }
}

umka-kabi-gen generates one _or_fallback helper per versioned method. Driver code must call the helper instead of the raw function pointer for any method introduced in version 2 or later. Calling the raw pointer directly on an older vtable produces undefined behavior.

11.1.7.5 Enum Definition

Enums map to #[repr(C, <repr>)] in Rust and to a typedef of the underlying integer type in C. Every enum requires an explicit @repr annotation.

11.1.7.5.1 Syntax

@version(<N>)
@repr(<UnsignedIntType>)    // required; one of: u8, u16, u32, u64
@flags                      // optional; see Section 11.1.7.5.3
enum <Name> {
    @version(<V>)
    <Variant> = <IntegerLiteral>,
}

Rules:

Every variant must carry an explicit integer discriminant value.
Discriminant values must be unique within the enum.
For @flags enums, every discriminant value must be a power of 2. The compiler validates this and rejects any non-power-of-2 value.
New variants may only be appended; existing discriminant values may never be reassigned.
Code receiving an enum value from the ABI must handle unknown discriminants gracefully (return an error or, for @flags, silently mask unknown bits). The generated Rust type carries #[non_exhaustive] to enforce this at compile time. C code must include a default: branch in every switch on a KABI enum.

11.1.7.5.2 Example: Exclusive States

@version(2)
@repr(u32)
enum DriverState {
    @version(1)
    Initializing = 0,

    @version(1)
    Running      = 1,

    @version(1)
    Suspended    = 2,

    @version(2)
    Degraded     = 3,    // new in version 2; code built against V1 never receives this
}

Generated Rust:

#[repr(u32)]
#[non_exhaustive]
pub enum DriverState {
    Initializing = 0,
    Running      = 1,
    Suspended    = 2,
    // Version 2+:
    Degraded     = 3,
}

Generated C:

typedef uint32_t kabi_DriverState;
#define KABI_DRIVER_STATE_INITIALIZING  ((kabi_DriverState)0u)
#define KABI_DRIVER_STATE_RUNNING       ((kabi_DriverState)1u)
#define KABI_DRIVER_STATE_SUSPENDED     ((kabi_DriverState)2u)
/* Version 2+: */
#define KABI_DRIVER_STATE_DEGRADED      ((kabi_DriverState)3u)

11.1.7.5.3 Example: Flag Enums

Enums annotated with @flags represent bitmask sets. Variant values must be non-overlapping powers of 2. Unknown bits from a newer sender must be silently masked out by older receivers. The generated Rust type is an integer typedef with named constants rather than a Rust enum, because Rust's match exhaustiveness cannot handle arbitrary bitmask combinations.

@version(1)
@repr(u32)
@flags
enum DriverFlags {
    @version(1)
    SupportsHotplug   = 0x0001,

    @version(1)
    SupportsSuspend   = 0x0002,

    @version(2)
    SupportsMigration = 0x0004,
}

Generated Rust:

/// Bitmask type for DriverFlags. Unknown bits must be ignored for forward
/// compatibility with newer kernel versions.
pub mod DriverFlags {
    pub type Type = u32;
    pub const SUPPORTS_HOTPLUG:    Type = 0x0001;
    pub const SUPPORTS_SUSPEND:    Type = 0x0002;
    /// Version 2+:
    pub const SUPPORTS_MIGRATION:  Type = 0x0004;
    /// Mask of all bits defined through the version this code was compiled against.
    pub const KNOWN_BITS: Type = 0x0007;
}

Generated C:

typedef uint32_t kabi_DriverFlags;
#define KABI_DRIVER_FLAGS_SUPPORTS_HOTPLUG    ((kabi_DriverFlags)0x0001u)
#define KABI_DRIVER_FLAGS_SUPPORTS_SUSPEND    ((kabi_DriverFlags)0x0002u)
/* Version 2+: */
#define KABI_DRIVER_FLAGS_SUPPORTS_MIGRATION  ((kabi_DriverFlags)0x0004u)

11.1.7.6 `requires` and `provides` Declarations

Every .kabi file that defines a loadable module's interface must declare the KABI services it provides and the services it requires from other modules. These declarations are checked at build time (topological sort to reject cycles) and embedded as metadata in the compiled module binary (see Section 11.1.9).

// mdio.kabi — MDIO bus framework (Tier 0 loadable)
requires pci_bus;       // this module needs the pci_bus interface
provides mdio_service;  // this module exports the mdio_service interface

Circular requires/provides graphs are rejected at build time with an error that identifies the full cycle:

error[KABI-E0021]: circular dependency detected
  → mdio_framework requires pci_bus
  → pci_bus requires mdio_framework

11.1.7.7 Version Compatibility Rules

The following rules apply uniformly to structs, vtables, and enums.

Append-only fields and methods: Fields and methods may only be added at the end. Removal, reordering, or type changes require a new kabi_version number and a compatibility shim (see Section 11.1.4).
Size-based version detection: Both sides of a KABI boundary set their own vtable_size to the compile-time sizeof of the vtable. The receiver uses min(sender_size, receiver_size) as the safe access boundary.
Unknown enum variants: Receivers must not panic or invoke undefined behavior when they receive an unknown discriminant from a newer sender. Rust's #[non_exhaustive] enforces this at compile time. C code must include a default: case in every switch.
Unknown flag bits: Unknown bits in a @flags value must be silently ignored. Code must not assert on the exact set of bits present.
Deprecation: @deprecated(since = N) is informational only. Deprecated items remain in the ABI indefinitely with no layout or behavioral change.
Mandatory presence: The vtable_size field must always be present and always first. Drivers that expose a vtable smaller than the offset of the last mandatory (non-@optional) method are rejected at load time with ENOEXEC.

11.1.7.8 Compiler Invocation

umka-kabi-gen is the single tool for generating KABI bindings and validating ABI compatibility. The build system invokes it automatically; the forms below support driver SDK development and manual CI operations.

11.1.7.8.1 Code Generation

# Generate C and Rust bindings from a single .kabi file:
umka-kabi-gen \
    --input      interfaces/block_device.kabi \
    --output-c   generated/kabi_block_device.h \
    --output-rs  generated/kabi_block_device.rs \
    --transport  ring        # direct | ring | ipc (see Section 11.1.8)

# Generate all three transport variants at once (build system default):
umka-kabi-gen \
    --input      interfaces/block_device.kabi \
    --output-dir generated/
# Produces: kabi_block_device_direct.rs, kabi_block_device_ring.rs,
#           kabi_block_device_ipc.rs, kabi_block_device.h

The --transport flag selects the call dispatch mechanism (Section 11.1.8):

Value	Use case	Dispatch cost
`direct`	Core-domain callers, Tier 0 modules	~2-5 cycles
`ring`	Tier 1 drivers (domain-isolated, Ring 0)	~150-300 cycles
`ipc`	Tier 2 drivers (process-isolated, Ring 3)	~1-5 microseconds

When --output-dir is given without --transport, all three transport variants are generated and the C header is shared across all three.

11.1.7.8.2 ABI Compatibility Validation

# Validate that new.kabi is backward-compatible with baseline old.kabi:
umka-kabi-gen \
    --validate  interfaces/block_device.kabi \
    --previous  baseline/block_device_v1.kabi

# Exit code 0: compatible.
# Non-zero: at least one incompatibility; errors printed to stderr.

The validator rejects any of the following:

A field or method present in --previous is absent in --validate.
Fields or methods appear in a different order than in --previous.
The @repr type of an enum is changed.
vtable_size is absent or is not the first field of a vtable.
An enum discriminant value present in --previous is changed in --validate.
A new field or method in --validate carries a @version annotation not greater than every annotation in --previous.
A @flags enum variant value is not a power of 2.
Two enum variants share an integer discriminant value.

Validation runs in CI on every commit that modifies a .kabi file (see Section 23.3.4, step 4). Failures block merge to master.

11.1.7.8.3 Diagnostic Format

All errors and warnings are reported in a structured format with source locations and stable error codes:

interfaces/block_device.kabi:14:5: error[KABI-E0011]: field 'block_size' removed
    block_size: u32,
    ^~~~~~~~~~
  note: this field was present in baseline/block_device_v1.kabi:14:5
  help: fields may never be removed; annotate with @deprecated(since = 2) instead

interfaces/block_device.kabi:18:5: error[KABI-E0012]: field 'queue_depth' reordered
    queue_depth: u32,
    ^~~~~~~~~~~
  note: expected at position 3 (matching baseline); found at position 4
  help: fields may never be reordered; append new fields at the end

error: aborting due to 2 previous errors

Each diagnostic carries a unique error code (KABI-E<NNNN>) for stable reference in CI logs, changelogs, and issue trackers. umka-kabi-gen exits with a non-zero status whenever any error is emitted.

11.1.8 KABI Transport Classes

KABI bundles two orthogonal concerns that must be kept distinct: the interface contract (IDL types, vtable layout, version fields) and the transport (how a call physically crosses from caller to callee). The interface contract is universal — it is the stable ABI. The transport is determined by whether the call crosses a hardware-enforced domain boundary.

11.1.8.1 Why Transport Is Separate from Interface

The ring buffer transport (Section 10.6) exists for one reason: to safely cross a hardware memory domain boundary (MPK/POE/DACR). At that boundary the caller cannot directly call into the callee's address range — the domain switch must happen first, and the ring buffer is the handshake. The transport IS the isolation mechanism.

This reasoning does not apply inside the Core domain. Tier 0 loadable modules (Section 10.4.2.2) run in Ring 0 in the same memory domain as the static kernel binary. There is no MPK boundary. Forcing ring buffers between them would add 100–500 cycles per call with zero safety benefit, because:

No domain boundary exists to enforce — both sides share the same address space
Ring buffers do not contain crashes when there is no memory isolation behind them
Synchronous framework APIs (bus register read, SCSI command dispatch, SPI transfer) require a result before the caller can proceed; the async ring model adds roundtrip cost for no latency benefit
Debugging is harder — async rings split call chains, hiding the source of a bug

The ring buffer's safety guarantee requires the combination of (ring buffer) + (hardware domain isolation). Without the isolation, the ring buffer is overhead with false safety intuition attached.

11.1.8.2 Three Transport Classes

The kabi-gen toolchain generates bindings for three transport classes from a single .kabi IDL source. The transport class is a parameter to kabi-gen, not a property of the interface.

Transport T0 — Direct Vtable Call (Core domain)

Used between static Core and Tier 0 loadable modules, and between Tier 0 loadable modules. Both caller and callee are in the same memory domain.

/// Generated by: kabi-gen --transport=direct mdio.kabi
///
/// Caller is in Core domain. Callee (mdio_framework) is a Tier 0 loadable module
/// in the same domain. The call is a direct indirect branch through the vtable
/// function pointer — ~2-5 cycles dispatch overhead.
///
/// # Safety
/// `handle` must be a valid T0 service handle obtained from `KabiServiceRegistry`.
/// The module providing this handle is guaranteed loaded and never unloaded
/// (Tier 0 load_once semantics; see Section 11.1.9.8).
pub unsafe fn mdio_read_reg(handle: &MdioServiceHandleT0, dev: u32, reg: u16) -> u16 {
    ((*handle.vtable).read_reg)(handle.ctx, dev, reg)
}

Properties: - Cost: ~2–5 cycles (vtable pointer dereference + indirect call) - Synchronous: caller blocks until return; no queue management - Stack: uses caller's stack; no separate consumer thread - Data: zero-copy — arguments passed in registers or by pointer, same address space - Crash consequence: a fault in the callee panics the kernel (same as static Core) - Debugging: full contiguous call stack visible in backtraces and panic dumps

Transport T1 — Ring Buffer + Domain Switch (Cross-domain)

Used at every boundary that crosses hardware domain isolation: Core domain → Tier 1, Tier 1 → Tier 1, Core domain → Tier 2, Tier 1 → Tier 2. This is the existing KABI transport described in Section 10.6. Ring buffers ARE the isolation mechanism at these boundaries.

Properties: - Cost: ~200–500 cycles minimum (atomic head/tail update, potential cache miss, domain switch, wake-up, dequeue on the far side) - Async-capable: producer and consumer can run independently; completions delivered via ring notifications - Data: zero-copy via shared memory ring descriptors - Crash consequence: contained within the isolated domain; far side survives - Debugging: split stack traces; correlation requires ring sequence numbers

Transport T2 — Ring Buffer + Syscall (Ring 3 boundary)

Used at the Tier 2 (Ring 3 process) boundary. Structurally identical to T1 but the domain switch is a privilege level change (Ring 0 → Ring 3 or vice versa). The ring buffer crossing also acts as a syscall interception point for capability validation.

11.1.8.3 Call Direction at the Tier 0 Boundary

The transport class is determined by the calling side's domain, not the callee's:

Tier 1 driver → Tier 0 loadable service:
  Tier 1 enqueues request into Core-domain ring (T1 transport).
  Core domain receives, dispatches to Tier 0 vtable via direct call (T0 transport).
  Tier 1 does not know or care that the service is Tier 0 loadable vs static Core.
  From Tier 1's perspective: one ring buffer call to "the kernel", same as always.

Tier 0 loadable → Tier 1 driver (callback / event):
  Tier 0 module writes into Tier 1 driver's inbound ring (T1 transport, outbound direction).
  Same mechanism as static Core → Tier 1 today.
  No new ring buffer infrastructure needed.

Static Core → Tier 0 loadable:
  Direct vtable call (T0 transport).
  ~2-5 cycles.

Tier 0 loadable → static Core:
  Direct call — both in the same domain, Core exports functions through the
  KernelServicesVTable (T0 transport, same direct vtable call mechanism).

The Tier 0 loadable module is transparent to Tier 1 and Tier 2 callers. The domain dispatch inside Core routes the inbound ring buffer request to the appropriate service, whether that service is static or dynamically loaded.

11.1.8.4 IDL Toolchain Transport Parameter

umka-kabi-gen has two distinct jobs, both driven from the same .kabi source:

Job 1 — Caller-side bindings (how a module calls out to a service it depends on). The transport is determined by the calling module's tier relative to the service's tier. A Tier 0 module calling a Tier 0 service uses --transport=direct; a Tier 1 module calling into Core uses --transport=ring:

# A Core-domain module calling the MDIO service (T0→T0: direct)
umka-kabi-gen --transport=direct --input mdio.kabi --output-rs mdio_caller_direct.rs

# A Tier 1 module calling the MDIO service (T1→T0: ring, tunnelled through Core)
umka-kabi-gen --transport=ring   --input mdio.kabi --output-rs mdio_caller_ring.rs

Job 2 — Driver-side entry points (the three entry functions the kernel loader calls when loading the driver binary, one per tier). Always generate all three:

# Default: generate all three entry stubs into a directory
umka-kabi-gen --output-dir generated/ --input mdio.kabi
# Produces:
#   generated/kabi_entry_direct.rs  — T0 entry point
#   generated/kabi_entry_ring.rs    — T1 entry point
#   generated/kabi_entry_ipc.rs     — T2 entry point
#   generated/kabi_types.h          — shared C header

The generated entry files share the same interface types (argument structs, return types, error enums) — the IDL defines these. Only the call dispatch code differs.

11.1.8.5 KabiDriverManifest: Transport Capability Advertisement

Every driver binary embeds a KabiDriverManifest structure in the .kabi_manifest ELF section. The kernel loader reads this section before resolving any other driver symbols. It is the single source of truth for which transport entry points the binary implements, which tier the driver prefers, and which tiers it will accept.

/// ELF-embedded driver transport manifest.
/// Placed in section `.kabi_manifest` by the linker script.
/// Generated by `umka-kabi-gen --output-dir`; linked automatically.
/// Driver authors do not write or modify this struct directly.
#[repr(C)]
pub struct KabiDriverManifest {
    /// Magic: 0x4B424944 ("KBID") — identifies a valid manifest.
    pub magic: u32,
    /// Manifest structure version (currently 1). Loader rejects unknown versions.
    pub manifest_version: u32,

    /// Transport implementations present in this binary (bitmask):
    ///   bit 0 = T0 Direct entry point present (entry_direct non-null)
    ///   bit 1 = T1 Ring Buffer entry point present (entry_ring non-null)
    ///   bit 2 = T2 Ring+Syscall entry point present (entry_ipc non-null)
    /// Default (all drivers, no manifest constraints): 0b111.
    pub transport_mask: u8,
    /// Driver's preferred tier (0, 1, or 2).
    /// The loader assigns this tier if hardware supports it and policy allows.
    pub preferred_tier: u8,
    /// Minimum tier this binary accepts (0 = any).
    /// Loader returns ENOTSUP if assigned tier < minimum_tier.
    pub minimum_tier: u8,
    /// Maximum tier this binary accepts (2 = any).
    /// Loader returns ENOTSUP if assigned tier > maximum_tier.
    pub maximum_tier: u8,

    /// Null-terminated UTF-8 driver name (max 63 bytes + null).
    pub driver_name: [u8; 64],
    /// Driver version (major << 16 | minor).
    pub driver_version: u32,
    pub _reserved: [u32; 3],   // must be zero

    // Entry points. null = transport not implemented in this binary.

    /// T0 Direct entry — called when driver runs as a Tier 0 loadable module.
    pub entry_direct: Option<KabiT0EntryFn>,
    /// T1 Ring Buffer entry — called when driver runs at Tier 1.
    pub entry_ring:   Option<KabiT1EntryFn>,
    /// T2 IPC entry — called when driver runs at Tier 2 (Ring 3).
    pub entry_ipc:    Option<KabiT2EntryFn>,
}

/// T0 entry: receives direct-call KernelServicesVTable, returns direct-call DriverVTable.
pub type KabiT0EntryFn = unsafe extern "C" fn(
    ksvc: *const KernelServicesVTable,
) -> *const DriverVTable;

/// T1 entry: receives ring-variant KernelServicesVTable and pre-allocated ring pair.
pub type KabiT1EntryFn = unsafe extern "C" fn(
    ksvc:     *const KernelServicesVTable,
    inbound:  *mut RingBuffer,   // Core → Driver requests
    outbound: *mut RingBuffer,   // Driver → Core completions
) -> u32;                        // 0 = success, errno on failure

/// Tier 2 driver entry point function type.
///
/// Called by the kernel when a Tier 2 driver process is started (or restarted after crash).
/// The driver executes entirely within this function; return means the driver is shutting down.
///
/// # Parameters
///
/// ## `outbound_fd` — Command Ring (kernel → driver)
/// A kernel-created `umka_ring_fd` (ring buffer file descriptor) for the COMMAND ring.
/// The kernel writes commands and events; the driver reads them.
/// - Ring type: `UmkaRingFd` (shared memory ring, same design as Section 10.6)
/// - Created by the kernel before calling this function, pre-populated with any pending
///   commands from the quiescence buffer (accumulated during crash recovery).
/// - **Read blocking**: `poll(outbound_fd, POLLIN)` blocks until a command is available.
/// - Capacity: `KABI_T2_CMD_RING_CAPACITY = 4096` entries.
///
/// ## `inbound_fd` — Completion Ring (driver → kernel)
/// A kernel-created `umka_ring_fd` for the COMPLETION ring.
/// The driver writes completions and events; the kernel reads them.
/// - **Write non-blocking**: O_NONBLOCK on write side; returns EAGAIN if full.
///   A full completion ring indicates the kernel is not draining it — the driver
///   should back off and retry after a short poll.
/// - Capacity: `KABI_T2_COMPLETION_RING_CAPACITY = 4096` entries.
///
/// # FD Ownership
/// Both FDs are **kernel-owned**. The driver MUST NOT close them. The kernel closes
/// them when the driver process is detached or terminated. Both FDs remain valid for
/// the entire lifetime of the entry function and become invalid immediately after return.
///
/// # Return Value
/// - `0`: Clean shutdown (graceful driver exit, no error).
/// - Non-zero: Error code; triggers the crash recovery protocol in umka-core.
///   The kernel will attempt to restart the driver up to `DRIVER_MAX_RESTART_ATTEMPTS` times.
pub type KabiT2EntryFn = unsafe extern "C" fn(
    ksvc:        *const KernelServicesVTable,
    outbound_fd: i32,   // Command ring: kernel → driver (POLLIN to receive commands)
    inbound_fd:  i32,   // Completion ring: driver → kernel (O_NONBLOCK writes)
) -> u32;               // 0 = clean shutdown; non-zero = error, triggers crash recovery

pub const KABI_T2_CMD_RING_CAPACITY: usize = 4096;
pub const KABI_T2_COMPLETION_RING_CAPACITY: usize = 4096;

Loader algorithm (in driver_load()):

1. Map driver ELF into memory (read-only staging area).
2. Locate `.kabi_manifest` section. If absent → ENOEXEC.
3. Validate manifest.magic == 0x4B424944. If wrong → ENOEXEC.
4. Check manifest.manifest_version ≤ supported. If newer → ENOTSUP.
5. Determine assigned tier T (from manifest.preferred_tier, operator policy,
   hardware capability detection — see Section 10.2.7):
     If T < manifest.minimum_tier → ENOTSUP.
     If T > manifest.maximum_tier → ENOTSUP.
6. Confirm transport bit for T is set in manifest.transport_mask:
     T=0: bit 0 set AND entry_direct non-null → else ENOEXEC.
     T=1: bit 1 set AND entry_ring  non-null → else ENOEXEC.
     T=2: bit 2 set AND entry_ipc   non-null → else ENOEXEC.
7. Set up tier resources (MPK domain / Ring 3 process / none).
8. Call the entry point for tier T. On non-zero return → unmap, return errno.
9. Record (driver_name, driver_version, assigned_tier, transport_mask)
   in the driver registry. Expose via /sys/umka/drivers/<name>/.

The sysfs record allows operators to inspect any loaded driver's tier and available transports: /sys/umka/drivers/<name>/assigned_tier, /sys/umka/drivers/<name>/transport_mask.

11.1.8.6 Default Policy: All Drivers Ship All Three Transports

Every driver binary must include all three transport entry points by default.

The umka-kabi-gen --output-dir invocation (the build system default) generates all three receiver stubs. The driver's build.rs includes them automatically:

// In driver/build.rs — emitted by the umka-driver-sdk build helper:
println!("cargo:rerun-if-changed=my_driver.kabi");
umka_kabi_gen::build("my_driver.kabi");   // generates all three into OUT_DIR

// In driver/src/lib.rs:
include!(concat!(env!("OUT_DIR"), "/kabi_entry_direct.rs"));
include!(concat!(env!("OUT_DIR"), "/kabi_entry_ring.rs"));
include!(concat!(env!("OUT_DIR"), "/kabi_entry_ipc.rs"));

All three entry points are linked. The manifest's transport_mask = 0b111. The binary is tier-agnostic at compile time.

Consequence: tier change requires no recompilation. The kernel loader reads transport_mask, confirms the desired tier's bit is set, and calls the matching entry point. The driver binary is unchanged. Moving a driver from Tier 1 to Tier 0 (because the hardware has no fast isolation mechanism, e.g. RISC-V) is an operator action: update /etc/umka/driver-policy.d/<name>.toml, reload. No kernel rebuild, no driver rebuild.

Opting out of a transport — the only legitimate exception — requires an explicit manifest constraint in the .kabi module declaration:

// In my_rt_audio_driver.kabi:
module my_rt_audio_driver {
    provides alsa_driver >= 1.0;
    requires alsa_core >= 1.0;
    minimum_tier: 0;   // real-time audio: cannot tolerate ring buffer latency
    maximum_tier: 0;   // must run at Tier 0 (direct call)
}

umka-kabi-gen omits the T1 and T2 stubs, sets transport_mask = 0b001, and the loader enforces the constraint. Drivers without minimum_tier/maximum_tier declarations default to minimum_tier: 0, maximum_tier: 2, transport_mask: 0b111.

Why this is production-correct: a driver that only ships one transport is a redeployment risk. If hardware changes (e.g., a Tier 1 driver on a RISC-V system where no fast isolation exists, requiring Tier 0 fallback), the system cannot adapt without rebuilding the driver binary. Shipping all three transports eliminates this at the cost of ~2–5 KB of additional binary size per transport stub — negligible for any real driver.

11.1.9 KABI Service Dependency Resolution

11.1.9.1 The Problem

A Tier 1 NIC driver loads. Its probe function calls request_service::<MdioService>(). The MDIO bus framework (a Tier 0 loadable module) is not yet loaded. Without a resolution mechanism, the driver fails to probe, the device never initialises, and the administrator has no clear explanation.

This scenario is not exceptional — it is the normal operating condition for Tier 0 loadable framework modules. The SCSI mid-layer must load before any HBA driver. The cfg80211 framework must load before any WiFi driver. The SoundWire bus core must load before any SoundWire audio codec driver. Dependency resolution is a first-class requirement, not an edge case.

11.1.9.2 IDL `requires` and `provides` Declarations

Every module's .kabi file declares the KABI services it provides and the services it requires. These declarations are checked by kabi-gen and embedded in the compiled module's metadata section.

// mdio.kabi — the MDIO bus framework (Tier 0 loadable)
@version(1)
module mdio_framework {
    provides mdio_service >= 1.0;

    requires pci_bus >= 2.0;          // always in static Core; always satisfied

    load_once: true;                  // Tier 0 module: never unloaded once loaded
    load_phase: boot;                 // load before device enumeration begins
}

// ixgbe.kabi — an Intel 10G NIC driver (Tier 1)
@version(1)
module ixgbe_driver {
    provides ethernet_driver >= 4.2;

    requires mdio_service >= 1.0;     // provided by mdio_framework
    requires pci_bus >= 3.0;          // always in static Core

    load_once: false;
    load_phase: on_demand;
}

// mpt3sas.kabi — LSI SAS HBA driver (Tier 1)
@version(1)
module mpt3sas_driver {
    provides scsi_host >= 1.0;

    requires scsi_midlayer >= 1.0;    // provided by scsi_framework (Tier 0 loadable)
    requires pci_bus >= 3.0;

    load_once: false;
    load_phase: on_demand;
}

The requires entries are minimum version constraints: >= 1.0 means any provider with version ≥ 1.0 satisfies the dependency. The provides entry declares what version of the service this module exports.

11.1.9.3 KabiProviderIndex — Boot-Time Service Map

The provider index and all registry types use KabiVersion, a three-field version triple defined here. Every KABI service carries a KabiVersion that is compared at bind time to enforce the compatibility rules from Section 11.1.4.

/// KABI version triple carried in every driver vtable and compared at registration.
///
/// Compatibility rule: a driver compiled against KABI (`major`, `minor`, `patch`) is
/// accepted by a kernel with KABI (`kmajor`, `kminor`, `kpatch`) if and only if:
/// - `kmajor == major` (major version must match exactly — breaking changes)
/// - `kminor >= minor` (kernel minor must be >= driver minor — additive extensions)
/// - `kpatch` is ignored for compatibility (patch = bug-fix only, no ABI change)
///
/// Vtable size (`vtable_size` field) is checked independently of version.
#[repr(C)]
#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd)]
pub struct KabiVersion {
    /// Breaking change counter. Incompatible across major versions.
    pub major: u16,
    /// Additive extension counter. Backwards-compatible within same major.
    pub minor: u16,
    /// Bug-fix counter. No ABI impact.
    pub patch: u16,
    /// Reserved; must be zero.
    pub _pad: u16,
}

impl KabiVersion {
    pub const fn new(major: u16, minor: u16, patch: u16) -> Self {
        Self { major, minor, patch, _pad: 0 }
    }

    /// Returns true if a driver built against `self` (the driver's required version) is
    /// compatible with `kernel` (the running kernel's version).
    ///
    /// Compatibility rules:
    /// - Same major version required (major version bumps are breaking changes).
    /// - Driver minor ≤ kernel minor: the kernel must expose at least the vtable fields
    ///   the driver was compiled against. A driver requiring minor=3 cannot load on a
    ///   kernel that only provides minor=2.
    ///
    /// The condition `kernel.minor >= self.minor` is an **asymmetric** check:
    /// `self` = driver's required version, `kernel` = running kernel version.
    /// These are two different `KabiVersion` values — NOT a self-comparison.
    pub const fn is_compatible_with(&self, kernel: KabiVersion) -> bool {
        // self.major == kernel.major : breaking change guard
        // kernel.minor >= self.minor : kernel must be >= what driver requires
        self.major == kernel.major && kernel.minor >= self.minor
    }

    /// Pack version into a `u64` for atomic storage in vtable headers.
    ///
    /// Layout (most-significant to least-significant byte group):
    /// `[major:16][minor:16][patch:16][_pad:16]` in native byte order.
    /// This layout ensures that `v1.as_u64() < v2.as_u64()` iff `v1` is an
    /// older version than `v2` (within the same major), enabling lock-free
    /// version checks via `AtomicU64::compare_exchange`.
    pub const fn as_u64(self) -> u64 {
        ((self.major as u64) << 48)
            | ((self.minor as u64) << 32)
            | ((self.patch as u64) << 16)
    }

    /// Unpack a `u64` vtable header word back into a `KabiVersion`.
    pub const fn from_u64(v: u64) -> Self {
        Self {
            major: ((v >> 48) & 0xffff) as u16,
            minor: ((v >> 32) & 0xffff) as u16,
            patch: ((v >> 16) & 0xffff) as u16,
            _pad: 0,
        }
    }
}

/// Current kernel KABI version. Drivers must be built against a compatible version.
pub const KABI_CURRENT: KabiVersion = KabiVersion::new(1, 0, 0);

/// Entry in the KABI provider index. Populated at boot by scanning module headers
/// in the verified module store. Read-only after boot.
#[derive(Debug)]
pub struct KabiProviderEntry {
    /// Stable identifier for the service (e.g., `[b"mdio_service\0\0..."]`).
    pub service_id: ServiceId,
    /// Minimum version of the service that this module provides.
    pub min_version: KabiVersion,
    /// Maximum version this module's implementation is compatible with.
    pub max_version: KabiVersion,
    /// Path to the module in the verified module store.
    /// Read-only static string; never heap-allocated after boot.
    pub module_path: &'static str,
    /// When this module must be loaded relative to the boot sequence.
    pub load_phase: LoadPhase,
}

/// Service identifier — a 64-byte fixed-width name plus major version namespace.
/// Names are ASCII, NUL-padded. Two services with the same name but different
/// major versions are considered distinct services (incompatible API change).
#[repr(C)]
pub struct ServiceId {
    pub name: [u8; 60],
    pub major: u32,
}

/// Load phase controls when demand-loading is triggered.
#[repr(u32)]
pub enum LoadPhase {
    /// Load before device enumeration. Required by bus frameworks (MDIO, SPI,
    /// SCSI mid-layer) that must exist before any device driver can bind.
    /// Handled by the kernel-internal Tier 0 module loader (no userspace needed).
    Boot = 0,
    /// Load when first requested by a driver probe. Handled by either the
    /// kernel-internal loader (if module is in initramfs) or the userspace
    /// `umka-modload` daemon (for post-boot installations).
    OnDemand = 1,
}

/// The index is built once at boot and never mutated.
/// Stored in read-only kernel memory after construction.
pub struct KabiProviderIndex {
    /// Sorted by service_id for O(log n) lookup.
    entries: &'static [KabiProviderEntry],
}

impl KabiProviderIndex {
    /// Find the provider for `service_id` that is compatible with `min_version`.
    ///
    /// A provider is compatible when the requested version falls within its
    /// supported range: `e.min_version <= min_version <= e.max_version`.
    ///
    /// Returns `None` if no registered provider covers the requested version.
    pub fn find(&self, service_id: &ServiceId, min_version: KabiVersion)
        -> Option<&KabiProviderEntry>
    {
        self.entries
            .binary_search_by_key(service_id, |e| &e.service_id)
            .ok()
            .map(|i| &self.entries[i])
            .filter(|e| e.min_version <= min_version && e.max_version >= min_version)
    }
}

The KabiProviderIndex is populated during early boot by scanning the initramfs module store. All entries are verified against the kernel's ML-DSA signing key before being accepted (Section 8.2). The index is sealed read-only before any driver loads. A Tier 1 driver cannot add entries to the index — it can only request services whose entries already exist.

11.1.9.4 KabiServiceRegistry — Runtime Service Map

/// C-ABI stable handle to a KABI service provider.
/// Passed across isolation domain boundaries (Ring 0 driver ↔ UmkaOS Core).
///
/// **Liveness guarantee**: the module providing this service cannot be unloaded
/// while any live KABI service capability references it. This struct does NOT
/// hold an Arc or Rc — liveness is a capability-level invariant, not a per-handle
/// reference count. Users must not retain a raw `KabiServiceHandle` beyond their
/// capability's lifetime.
///
/// **Generation**: `generation` is incremented on driver hot-reload. Callers
/// detect stale handles by comparing against the registry's current generation.
#[repr(C)]
pub struct KabiServiceHandle {
    /// Opaque vtable pointer. Callee casts to the service-specific vtable type.
    pub vtable: *const (),
    /// Opaque context pointer; passed as first argument to all vtable methods.
    pub ctx: *const (),
    /// Generation counter; incremented each time this service provider is reloaded.
    /// A stale handle has `generation < registry.current_generation(service_id)`.
    pub generation: u64,
}

// Type hierarchy for service handles:
//   ServiceHandle (u64, C-ABI token)      — crosses KABI boundary, passed by value
//     ↓ registry lookup by id
//   InternalServiceRef (kernel-private)   — kernel's bookkeeping with raw ptr + generation
//     ↓ vtable resolution
//   KabiServiceHandle (*const (), *const ()) — passed to vtable call site
//
// This separation keeps C-ABI types at boundaries and Rust types in kernel internals.

/// The runtime service registry. Lives in Core, using RCU for read-mostly access.
///
/// The service table is an immutable sorted `Vec<(ServiceId, KabiServiceHandle)>`
/// published under RCU. Lookups acquire an RCU read guard and binary-search the
/// sorted array with no lock contention. Registration is rare and uses a single-
/// writer mutex to clone, update, sort, and publish a new snapshot.
pub struct KabiServiceRegistry {
    /// RCU-protected immutable snapshot of the service table.
    /// Sorted by `ServiceId` for O(log n) binary search.
    services: RcuSnapshot<ServiceTable>,
    /// Single-writer mutex for service registration and unregistration.
    /// Held only during the clone-update-publish sequence; never held during lookups.
    registry_write_mutex: Mutex<()>,
    /// Waiters blocked on a not-yet-registered service.
    /// Key: service_id. Value: list of waker tokens for deferred probe retry.
    waiters: Mutex<BTreeMap<ServiceId, Vec<ProbeWaker>>>,
}

/// Immutable sorted service table published under RCU.
/// Binary search gives O(log n) lookup without any locking.
pub struct ServiceTable {
    pub entries: Vec<(ServiceId, KabiServiceHandle)>,
}

impl KabiServiceRegistry {
    /// Look up a registered service. Returns None if not yet registered.
    /// Does NOT trigger loading — call `request_service` for that.
    ///
    /// Lookup: acquire `rcu_read_lock()`, load the snapshot pointer, binary-search
    /// the sorted array (O(log n)), release guard. Zero lock contention; zero
    /// cache-line bouncing on multi-core systems.
    pub fn get(&self, id: &ServiceId) -> Option<KabiServiceHandle> {
        let guard = rcu_read_lock();
        let table = self.services.load(&guard);
        let result = table.entries
            .binary_search_by_key(&id, |(k, _)| k)
            .ok()
            .map(|idx| table.entries[idx].1.clone());
        drop(guard);
        result
    }

    /// Register a service. Called by a Tier 0 module during its init function.
    /// Notifies all waiters blocked on this service_id.
    ///
    /// Registration: acquire `registry_write_mutex` (single-writer), clone the
    /// existing `ServiceTable`, append the new entry, sort, publish via
    /// `rcu_assign_pointer()`, release mutex, defer-free the old table via
    /// `rcu_call()`.
    pub fn register(&self, id: ServiceId, handle: KabiServiceHandle) {
        let _write_guard = self.registry_write_mutex.lock();
        let guard = rcu_read_lock();
        let old_table = self.services.load(&guard);
        let mut new_entries = old_table.entries.clone();
        drop(guard);
        new_entries.push((id.clone(), handle));
        new_entries.sort_by(|(a, _), (b, _)| a.cmp(b));
        let new_table = Box::new(ServiceTable { entries: new_entries });
        let old_ptr = self.services.swap(new_table);
        drop(_write_guard);
        // Defer-free the old table after all RCU readers have passed through.
        rcu_call(old_ptr, |p| drop(p));
        // Wake all deferred probes waiting on this service.
        let mut waiters = self.waiters.lock();
        if let Some(wakers) = waiters.remove(&id) {
            for waker in wakers {
                waker.schedule_retry();
            }
        }
    }

    /// Unregister a service (called on module unload — Tier 1 only;
    /// load_once Tier 0 modules never call this).
    pub fn unregister(&self, id: &ServiceId) {
        let _write_guard = self.registry_write_mutex.lock();
        let guard = rcu_read_lock();
        let old_table = self.services.load(&guard);
        let mut new_entries = old_table.entries.clone();
        drop(guard);
        new_entries.retain(|(k, _)| k != id);
        let new_table = Box::new(ServiceTable { entries: new_entries });
        let old_ptr = self.services.swap(new_table);
        drop(_write_guard);
        rcu_call(old_ptr, |p| drop(p));
    }
}

The service registry uses an RCU-protected immutable snapshot table rather than a mutable map under a lock. The internal type is RcuSnapshot<ServiceTable> where ServiceTable is an immutable sorted Vec<(ServiceId, ServiceHandle)>.

Lookups (request_service()): acquire rcu_read_lock(), load the snapshot pointer, binary-search the sorted array (O(log n)), release guard. No lock contention, no cache-line bouncing on multi-core systems. Zero overhead when the registry is stable.
Registration (rare): acquire registry_write_mutex (single-writer), clone the existing ServiceTable, append the new entry, sort, publish via rcu_assign_pointer(), release mutex, defer-free the old table via rcu_call().

This follows the standard UmkaOS pattern for read-mostly shared state: RCU for readers, single-writer mutex for updates.

11.1.9.5 Requesting a Service: Probe Deferral

A driver's probe function requests services via request_service. If the service is not yet registered, the probe is deferred rather than blocking or failing:

/// Request a KABI service with the given minimum version.
///
/// Returns:
/// - `Ok(handle)` — service is registered and version-compatible.
/// - `Err(ProbeError::Deferred)` — service not yet available; probe will be
///   retried automatically when the service is registered. The driver must
///   return `Err(ProbeError::Deferred)` from its probe function immediately
///   after receiving this — no partial initialisation.
/// - `Err(ProbeError::ServiceUnavailable)` — service is not in the
///   KabiProviderIndex at all; it will never become available. The driver
///   should fail permanently and log the missing dependency.
pub fn request_service<S: KabiService>(
    registry: &KabiServiceRegistry,
    provider_index: &KabiProviderIndex,
    device: &DeviceNode,
    min_version: KabiVersion,
) -> Result<KabiServiceHandle, ProbeError> {
    let id = S::SERVICE_ID;

    // Fast path: service already registered.
    if let Some(handle) = registry.get(&id) {
        if handle.version >= min_version {
            return Ok(handle);
        }
        // Registered but wrong version — permanent failure.
        return Err(ProbeError::ServiceVersionMismatch {
            service: id,
            have: handle.version,
            need: min_version,
        });
    }

    // Check if a provider exists at all.
    let entry = provider_index.find(&id, min_version)
        .ok_or(ProbeError::ServiceUnavailable { service: id })?;

    // Provider exists but not loaded yet. Register waiter and trigger load.
    {
        let mut waiters = registry.waiters.lock();
        waiters.entry(id.clone()).or_default().push(ProbeWaker::new(device));
    }
    // Trigger demand loading of the providing module.
    // For LoadPhase::Boot modules this is a no-op (already loaded or loading).
    // For LoadPhase::OnDemand this schedules the module loader.
    schedule_module_load(entry);

    Err(ProbeError::Deferred { waiting_for: id })
}

/// Driver probe return type.
pub enum ProbeError {
    /// Permanent failure — log and do not retry.
    Io(IoError),
    NotSupported,
    ServiceUnavailable { service: ServiceId },
    ServiceVersionMismatch { service: ServiceId, have: KabiVersion, need: KabiVersion },
    /// Temporary — the kernel will retry this probe when `waiting_for` is registered.
    /// The driver MUST return immediately after receiving Deferred from request_service.
    /// Partial initialisation state is not allowed — no allocations, no side effects.
    Deferred { waiting_for: ServiceId },
}

Retry semantics: when KabiServiceRegistry::register is called (a Tier 0 module finishes loading and registers its service), all ProbeWaker entries for that service_id are dequeued and each deferred driver's probe is re-submitted to the device registry's probe work queue. The retry is asynchronous — the registering module is not blocked waiting for all dependent drivers to probe successfully.

No partial initialisation invariant: a driver that returns Deferred must not have allocated resources, registered character devices, or modified shared state. The device registry enforces this by checking that the device node remains in the Matching state (Section 10.5.5.1) after a Deferred return — any device that has advanced to Probing and returns Deferred triggers a warning and a state reset.

11.1.9.6 Demand Loading

Tier 0 loadable modules are loaded by the kernel-internal module loader — a small ELF loader in static Core that does not require userspace to be running. This covers both boot-phase loads (before init starts) and on-demand loads of framework modules referenced in the initramfs module store.

The loader is driven by a bounded work queue. The types below define the queue entries and the reason codes that drive loader policy (signature requirements, timeout budget, error handling).

/// Reason a module is being loaded. Drives loader policy (signature requirements,
/// timeout budget, error handling).
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum LoadReason {
    /// Loaded on boot from initramfs or built-in driver list.
    Boot,
    /// Loaded because a device was hot-plugged and matched this module's alias.
    HotPlug,
    /// Loaded by explicit userspace request (e.g., `modprobe`).
    UserRequest,
    /// Loaded as a dependency of another module.
    Dependency,
    /// Loaded because a running driver declared a KABI `requires` dependency that
    /// resolved to an on-demand module (see `schedule_module_load` below).
    ServiceDependency,
    /// Loaded to replace a crashed Tier 1 driver (live recovery path).
    CrashRecovery,
}

/// A pending module load request queued to the module loader worker.
pub struct ModuleLoadRequest {
    /// Full path to the module in the verified module store
    /// (e.g., `/System/Kernel/drivers/net/ixgbe.kmod`).
    pub path: &'static str,
    /// Why this module is being loaded (affects policy checks).
    pub reason: LoadReason,
    /// Completion channel: the loader sends `Ok(())` or an error when done.
    pub completion: oneshot::Sender<Result<(), ModuleLoadError>>,
}

/// Module loader worker queue capacity.
/// Bounded to prevent memory exhaustion from a flood of concurrent load requests.
pub const MODULE_LOADER_QUEUE_DEPTH: usize = 64;

/// Schedule loading of the module that provides a KABI service.
/// Non-blocking: queues the load on the module-loader work queue.
fn schedule_module_load(entry: &KabiProviderEntry) {
    match entry.load_phase {
        LoadPhase::Boot => {
            // Boot modules are loaded by the kernel-internal loader
            // during init sequence. If called after boot, this is a
            // programming error — boot modules must be pre-loaded.
            // Log and do nothing; probe deferral handles the retry.
            log::warn!("Boot-phase module {} not loaded at boot — \
                        will retry when loaded manually", entry.module_path);
        }
        LoadPhase::OnDemand => {
            // Queue on the module loader work queue.
            // The loader verifies ML-DSA signature, maps into Core domain,
            // runs the module's init function.
            MODULE_LOADER_QUEUE.push(ModuleLoadRequest {
                path: entry.module_path,
                reason: LoadReason::ServiceDependency,
            });
        }
    }
}

The kernel-internal module loader handles all Tier 0 loadable module loads. For post-boot installation of new modules not present in the initramfs, the umka-modload userspace daemon submits the signed module blob through a privileged capability (CAP_MODULE_LOAD) and the kernel verifies and loads it. The userspace path is additive — it does not replace the kernel-internal loader.

11.1.9.7 Circular Dependency Prohibition

Circular dependencies between Tier 0 loadable modules are statically prohibited. The kabi-gen toolchain runs a topological sort over the complete requires/provides graph at build time and rejects cycles with a build error identifying the cycle:

error[KABI-E0021]: circular dependency detected
  mdio_framework requires pci_bus (ok)
  scsi_framework requires block_layer (ok)
  hypothetical_a requires hypothetical_b
  hypothetical_b requires hypothetical_a  ← cycle here
  fix: merge hypothetical_a and hypothetical_b into one module, or
       break the cycle by moving shared state into static Core

If two services genuinely need each other, they must either be merged into one module (which can then provide both services and call between them as direct internal calls) or their shared state must be factored into a third module that neither depends on the other. Circular dependencies between a Tier 0 module and static Core are impossible by construction — static Core has no requires declarations and is always available.

11.1.9.8 Tier 0 Module Lifecycle (`load_once`)

Tier 0 loadable modules are never unloaded. Once a Tier 0 module's init function completes and it registers its services, it remains in Core's memory domain for the lifetime of the system. This is enforced by the load_once: true declaration in the module's .kabi file and by the module loader, which never processes an unload request for a load_once module.

Rationale: Tier 0 modules execute in the same address space as static Core. An interrupt handler, timer callback, or RCU deferred callback anywhere in the kernel might hold a function pointer into a Tier 0 module's code. Reference counting alone cannot guarantee that no stale pointer exists — safe unloading would require auditing every possible execution context. The cost (permanent resident memory) is acceptable because Tier 0 loadable modules are framework code (SCSI mid-layer, MDIO, SPI bus core, etc.) — they are small compared to the hardware they enable, and a system that loads them has implicitly declared a need for them.

Tier 1 modules (which are domain-isolated) can be unloaded safely because the isolation boundary prevents stale intra-kernel pointers. Unloading a Tier 1 module revokes its MPK domain and all ring buffer connections to it; no part of the Core domain retains a callable pointer into Tier 1 code.

11.1.9.9 Version Negotiation

When driver X requests service Y at >= version 1.2, and the registered provider exports version 2.0, the registry negotiates the binding version:

fn negotiate_version(
    handle: &KabiServiceHandle,
    caller_min: KabiVersion,
    caller_max: KabiVersion,
) -> Result<KabiServiceHandle, ProbeError> {
    // Provider is newer than caller expects: caller gets a downgraded view.
    // The vtable_size field limits which methods are visible to the caller.
    // The provider's vtable is laid out append-only (Section 11.1.3 Rule 1),
    // so limiting to caller's known size is always safe.
    let effective_version = handle.version.min(caller_max);
    if effective_version < caller_min {
        return Err(ProbeError::ServiceVersionMismatch { ... });
    }
    Ok(KabiServiceHandle { version: effective_version, ..handle.clone() })
}

This reuses the existing vtable_size versioning mechanism. No separate version negotiation protocol is needed beyond what KABI already provides.

11.1.9.10 Security Model

The dependency resolution mechanism is a potential privilege escalation path: a compromised Tier 1 driver requesting a service could cause the kernel to load a Tier 0 module. UmkaOS's defence:

KabiProviderIndex is sealed after boot. The index is populated from ML-DSA-signed module headers during early init and marked read-only before any driver loads. A Tier 1 driver cannot add entries to the index.
Service requests are by opaque ID, not by module path. A Tier 1 driver calls request_service::<MdioService>() — it cannot specify which module file to load. The resolution from service ID to module path happens entirely inside Core, using the pre-verified index.
A Tier 1 driver can only trigger loading of modules that the system already trusts. If a module is not in the signed provider index, request_service returns ServiceUnavailable and no loading occurs.
All module loads verify the ML-DSA signature (Section 8.2) before the module's code is executed. A Tier 1 driver cannot cause execution of unsigned code even if it somehow injected an entry into the provider index (which it cannot, per point 1).