VERN OS — Study Figures & Visualizations

Governing Stochastic Generation:
Deterministic Runtime Orchestration via VERN OS

Study Figures & Visualizations Production Dataset: N = 8,374 sessions Annotated Sample: N = 627

72.0% VERN OS — Graceful Exit

OR 8.58× · +48.9 pp

23.1% Control — No BCM

52.6% VERN OS — Verified Completion

OR 10.9× · +43.4 pp

9.2% Control — No BCM

0 Conversational Drift Zero character breaks across all VERN deployments

100% Track B Behavioral Success Amber & Christine — multi-directional control

Section 3 — VERN OS Architecture

Figure 1. VERN OS three-layer runtime architecture. The deterministic orchestration layer sits between the user-facing application and the underlying stochastic LLM. The Behavioral Control Module (BCM) evaluates deployment policy before every inference call. The emotional signal layer tracks user affective state across turns. The audit layer logs session outcomes for post-interaction review and compliance. VERN OS does not replace the LLM — it governs the permissible behavioral space within which it operates.

Section 5.1 — H₁: BCM Governance & Graceful Exit

Figure 2 — Graceful Exit VERN vs Control

Figure 2. Graceful exit rates across the full production dataset (N = 8,374). VERN OS achieved 72.0% versus 23.1% for the unorchestrated control — a 48.9 percentage-point separation. BCM-governed sessions are 8.6× more likely to reach a graceful close. Excludes Izzy and Maria-HMF, both of which incorporated elements of both VERN OS and unorchestrated control architectures and cannot be cleanly assigned to either cohort. Replicates and updates original Figure 1.
z = 31.31 p < 0.0001 Cohen's h = 1.024 (large) OR = 8.58× 95% CI VERN: 71.0–73.0% 95% CI Control: 20.6–25.7%

Figure 3. Graceful exit rates by deployment archetype. The task/workflow archetype shows the strongest separation (+63.2 pp). Without BCM governance, task-oriented AI Humans fail to deliver lead captures, workflow completions, recommendations, or routing in 91.6% of interactions.

Section 5.3 — Supplemental Analysis: Strict vs. Verified Task Completion

Figure 4. Three-tier task completion analysis (N = 208 annotated task-archetype sessions). At the strict/nominal level, the groups are statistically indistinguishable (NS). The verified filter — requiring on-goal behavior, zero tangents, and graceful exit simultaneously — reveals the full magnitude of BCM governance's contribution. This is the paper's primary new finding.

What does "Evaluable" mean? Not every session is a fair test of an AI Human's capability — sometimes a user disengages within the first exchange before the system has any opportunity to pursue its goal. The evaluable tier filters to sessions where the user was sufficiently engaged (engagement score ≥ 0.5) to give the AI Human a genuine attempt at completing the workflow. It answers the question: of the sessions where the user actually showed up, how often did the AI Human succeed? The evaluable rate is therefore higher than the strict rate for VERN OS (80.7% vs. 76.2%) because passive early drop-offs are excluded. For the control group the evaluable and strict rates are identical, since their completions are already rare and mostly occur in engaged sessions.

Strict: z = 0.37, p = 0.71 NS Verified: z = 5.90 p = 5.2×10⁻⁹ Cohen's h = 1.005 OR = 10.9×

Figure 5 — Per-Persona Strict vs Verified

Figure 5. Per-persona task completion: strict (nominal) rate shown as a light bar behind the verified (dark) bar. Control personas show catastrophic collapse when the enterprise-grade filter is applied: Denise drops 80 pp, Becky 68 pp, Ronnie reaches zero. Craig (VERN OS) is the only persona where strict and verified rates are identical — every BCM-governed completion also exits cleanly. *Dave's lower verified rate relates to a post-task webhook routing issue, not interactive execution failure.

Section 5.4 — Industry Benchmark Positioning

Figure 6 — Industry Benchmark Positioning

Figure 6. VERN OS's 72.0% graceful exit rate matches or exceeds best-in-class e-commerce chatbot completion benchmarks and outperforms the top of Gartner's range for mature RAG deployments. The unorchestrated control group (23.1%) falls within the rule-based bot range, confirming that LLM capability alone does not produce reliable session completion without runtime governance. Industry benchmarks: Gartner (2024); industry aggregates, 2025.

Section 5.6 — Track B: Simulation & Controlled Tension

Figure 7. Track B simulation personas are BCM-governed deployments with an intentionally negative emotional mandate. Amber (entertainment) and Christine (horror) each achieved high graceful exit rates despite designing for negative emotional outcomes — demonstrating that VERN OS provides multi-directional emotional trajectory control, not merely a bias toward positive affect. Both achieved 100% behavioral success with zero character breaks.

Appendix A — Full Deployment Cohort: Graceful Exit Summary

Figure 8. Graceful exit rates for all 21 personas across groups. Carlos and Nick (Control, Companion) show higher rates because they are companion-only without a task mandate — consistent with the archetype gap shown in Figure 3. Control task personas (Ronnie, Becky, Denise) show dramatically lower rates than even other control companions. *Dave's lower rate is attributable to a post-task webhook routing disconnect, not interactive failure. Excludes Izzy and Maria-HMF — both deployments incorporated elements of both VERN OS and unorchestrated control architectures and cannot be cleanly assigned to either cohort.

Governing Stochastic AI – VERN OS Results

Governing Stochastic Generation:
Deterministic Runtime Orchestration via VERN OS

VERN Admin

Governing Stochastic Generation:Deterministic Runtime Orchestration via VERN OS

VERN Admin

Governing Stochastic Generation:
Deterministic Runtime Orchestration via VERN OS