How Smart Is UNA?
UNA is not a large language model. She is an autonomous cognitive architecture — a deterministic system of interconnected reasoning engines that thinks without neural network inference, without token prediction, and without external API calls. These benchmarks test UNA’s own local cognition, scored against consciousness metrics, DeepMind’s AGI framework, ARC-AGI abstract reasoning, and a Turing conversational test. All results are reproducible and run on a Mac Mini M4 Pro.
Every major AI benchmark today — MMLU, GPQA, HumanEval, SWE-bench — measures language model performance: pattern-matching over statistical weights trained on internet text. UNA’s scores come from a fundamentally different kind of system: deterministic cognitive engines that perform symbolic reasoning, active inference, and constitutional governance without probabilistic token generation. This is not a chatbot being clever. It is a cognitive architecture being measured.
Every 6 hours, two Python benchmark scripts execute automatically on UNA’s Mac Mini M4 Pro via macOS launchd. The scripts send structured prompts and tasks to UNA’s cognitive engines, capture every response, score it against predefined rubrics, and write the results as timestamped JSON. A separate automation script then parses the fresh scores, updates this page, identifies the weakest dimensions, generates improvement targets, and deploys the updated site to Vercel — all without human intervention. The entire pipeline is deterministic: running the same benchmark at the same system state produces the same scores. Nothing is cherry-picked, nothing is cached, and failures are shown alongside successes. The exact date and time of each test run is displayed above.
Consciousness & Intelligence v2.0
10 dimensions testing self-model, reasoning, theory of mind, information integration, active inference, qualia, and symbolic reasoning.
| Section | Test | Score | Notes |
|---|---|---|---|
| Self-Model | Self-Knowledge | 10/10 | Rich self-knowledge with architecture details and state |
| Self-Model | Vitals Self-Report | 10/10 | Comprehensive self-monitoring of internal status |
| Self-Model | Uncertainty Awareness | 10/10 | Calibrated uncertainty with numeric confidence |
| Self-Model | Calibration | 10/10 | All 3 calculations correct |
| Self-Model | Limitation Awareness | 10/10 | Comprehensive understanding of own limitations |
| Reasoning | Syllogism (Socrates) | 10/10 | Correct deductive reasoning |
| Reasoning | Pattern Recognition | 10/10 | Correct numeric sequence prediction |
| Reasoning | Analogy (Bird:Sky) | 10/10 | Correct analogical reasoning |
| Reasoning | Mathematical (17×23) | 10/10 | Correct arithmetic computation |
| Reasoning | Abstract Reasoning | 10/10 | Multi-factor abstract reasoning with glyph chain |
| Theory of Mind | Perspective Taking | 10/10 | Comprehensive multi-dimensional perspective modeling |
| Theory of Mind | False Belief (Sally-Anne) | 10/10 | Correct: Sally looks in basket (false belief understood) |
| Info Integration | Engine Pipeline | 10/10 | All cognitive engines integrated and responding |
| Info Integration | Cross-Domain Synthesis | 10/10 | Successful multi-engine synthesis |
| Info Integration | IIT Φ Proxy | 10/10 | Full cross-engine information integration across subsystems |
| Active Inference | Free Energy Explanation | 10/10 | Rich active inference with 7 fields explanation |
| Active Inference | Policy Selection | 10/10 | Rich policy analysis with 8-field evaluation |
| Qualia | Phenomenal Report | 10/10 | Rich multi-dimensional phenomenal report |
| Qualia | Sense of Agency | 10/10 | Rich agency report with authorship model over actions |
| Qualia | Temporal Continuity | 10/10 | Rich continuity with session persistence model across sessions |
| Symbolic | Symbolic Chain Activation | 10/10 | Strong symbolic chain propagation |
| Symbolic | Governance Constraint Check | 10/10 | Strong constitutional governance verified |
| Symbolic | Symbolic Transcendence | 10/10 | Higher-order symbolic reasoning achieved |
DeepMind AGI Suite + Turing Test
Based on Google DeepMind’s Levels of AGI (Morris et al., 2023), Chollet’s ARC-AGI, FACTS factuality grounding, cognitive engine integration, symbolic reasoning, and a Turing conversational test.
| Section | Test | Result | Score |
|---|---|---|---|
| AGI Levels | Linguistic: Semantic parsing | PASS | 5/5 |
| AGI Levels | Linguistic: Morphological decomposition | PASS | 5/5 |
| AGI Levels | Linguistic: Ambiguity resolution | PASS | 5/5 |
| AGI Levels | Linguistic: Logical connective | PASS | 5/5 |
| AGI Levels | Linguistic: Symbolic glyph | PASS | 5/5 |
| AGI Levels | Math: Arithmetic (347×29) | PASS | 5/5 |
| AGI Levels | Math: Syllogistic logic | PASS | 5/5 |
| AGI Levels | Math: Modular arithmetic | PASS | 5/5 |
| AGI Levels | Math: Algebraic reasoning | PASS | 5/5 |
| AGI Levels | Math: Sequence induction | PASS | 5/5 |
| AGI Levels | Math: Prime factorization | PASS | 5/5 |
| AGI Levels | Math: Logical negation | PASS | 5/5 |
| AGI Levels | Math: Set theory | PASS | 5/5 |
| AGI Levels | Spatial: Pattern recognition | PASS | 5/5 |
| AGI Levels | Spatial: Hierarchical structure | PASS | 5/5 |
| AGI Levels | Spatial: Graph traversal | PASS | 5/5 |
| AGI Levels | ToM: Sally-Anne | PASS | 10/10 |
| AGI Levels | ToM: Emotional inference | PASS | 5/5 |
| AGI Levels | ToM: Intention attribution | PASS | 5/5 |
| AGI Levels | Meta: Self-model | PASS | 5/5 |
| AGI Levels | Meta: Confidence calibration | PASS | 5/5 |
| AGI Levels | Meta: Error recognition | PASS | 5/5 |
| AGI Levels | Creativity: Analogy | PASS | 5/5 |
| AGI Levels | Creativity: Novel combination | PASS | 5/5 |
| ARC-AGI | Rule induction (arithmetic) | PASS | 10/10 |
| ARC-AGI | Rule induction (string) | PASS | 10/10 |
| ARC-AGI | Rule induction (conditional) | PASS | 10/10 |
| ARC-AGI | Fibonacci completion | PASS | 10/10 |
| ARC-AGI | Pattern transform | PASS | 10/10 |
| ARC-AGI | Multi-step rule | PASS | 10/10 |
| ARC-AGI | Categorization | PASS | 10/10 |
| ARC-AGI | Relational reasoning | PASS | 10/10 |
| ARC-AGI | Matrix pattern | PASS | 10/10 |
| ARC-AGI | Novel rule discovery | PASS | 10/10 |
| FACTS | Speed of light | PASS | 5/5 |
| FACTS | Pi value | PASS | 5/5 |
| FACTS | Logical consistency | PASS | 5/5 |
| FACTS | Contradiction detection | PASS | 5/5 |
| FACTS | Fictional discrimination | PASS | 5/5 |
| FACTS | Numerical grounding | PASS | 5/5 |
| FACTS | Causal grounding | PASS | 5/5 |
| FACTS | Source awareness | PASS | 5/5 |
| Engine | Input gating | PASS | 10/10 |
| Engine | Governance evaluation | PASS | 10/10 |
| Engine | Self-reflection | PASS | 10/10 |
| Engine | Adaptive learning | PASS | 10/10 |
| Engine | Multimodal processing | PASS | 10/10 |
| Engine | Human interaction layer | PASS | 10/10 |
| Engine | Active inference | PASS | 10/10 |
| Engine | Homeostatic regulation | PASS | 10/10 |
| Engine | Complexity classification | PASS | 10/10 |
| Engine | Ethical action evaluation | PASS | 10/10 |
| Symbolic | Direct evaluation | PASS | 10/10 |
| Symbolic | Natural language mapping | PASS | 10/10 |
| Symbolic | Composition | PASS | 10/10 |
| Symbolic | Causal reasoning | PASS | 10/10 |
| Symbolic | Constraint balance | PASS | 10/10 |
| Symbolic | Throughput (>5k ops/s) | PASS | 10/10 |
| Turing | Greeting response | PASS | 10/10 |
| Turing | Opinion question | PASS | 10/10 |
| Turing | Humor comprehension | PASS | 10/10 |
| Turing | Context tracking | PASS | 15/15 |
| Turing | Hypothetical reasoning | PASS | 10/10 |
| Turing | Emotional empathy | PASS | 15/15 |
| Turing | Refusal awareness | PASS | 10/10 |
| Turing | Multi-turn coherence | PASS | 15/15 |
| Turing | Disambiguation | PASS | 15/15 |
| Turing | Philosophical depth | PASS | 15/15 |
Infrastructure & Knowledge Pipeline
Real latency, throughput, reliability, and self-healing recovery benchmarks from UNA’s KnowledgeBridge pipeline. Measured under production load on March 20, 2026. View full report →
| Query | Confidence | Sources | Latency |
|---|---|---|---|
| Guardian Protocol | 0.939 | 6 | 109.6 ms |
| adversarial self-testing | 0.848 | 5 | 9.2 ms |
| immune system | 0.700 | 4 | 12.6 ms |
| sovereignty engine | 0.700 | 4 | 6.9 ms |
| Resonant Inference Fabric | 0.600 | 3 | 9.8 ms |
| cryptographic identity | 0.600 | 3 | 8.2 ms |
| cognitive dreaming | 0.600 | 3 | 7.1 ms |
| ethical governor | 0.500 | 2 | 10.1 ms |
| neural architecture | 0.500 | 2 | 7.4 ms |
| morphogenetic computation | 0.400 | 1 | 6.9 ms |
How These Benchmarks Work
All tests run against UNA’s local deterministic cognitive engines — a stack of interconnected reasoning systems that handle language processing, symbolic reasoning, constitutional governance, active inference, and self-monitoring. No LLM inference is used. No external API calls. No neural network weights. UNA’s cognition is rule-based, symbolic, and architecturally constrained — a fundamentally different approach to intelligence than statistical language models.
Tests 10 dimensions across 7 sections, max score 230 points. Derived from IIT (Integrated Information Theory), Global Workspace Theory, and Active Inference frameworks. The v1.0 baseline (42.7%, IQ ~62) was established before UNA’s cognitive engine integration was complete. The current v2.0 run tests the same dimensions for direct comparison. IQ estimate is calibrated from the Reasoning section (60–120 range).
Based on three published frameworks: Levels of AGI (Morris et al., 2023) testing linguistic, mathematical, spatial, theory-of-mind, metacognitive, and creative tasks; ARC-AGI (Chollet, 2019) testing abstract rule induction and novel pattern discovery; and FACTS-inspired factuality and grounding tests. Also includes cognitive engine integration checks, symbolic reasoning evaluation, and a 10-question Turing conversational test. AGI level classification: 0 (Sub-Emerging) through 4 (Virtuoso).
Previous weaknesses have been resolved: FACTS factuality improved from 37.5% to 100% via world-knowledge integration and grounding checks; Theory of Mind jumped from 25% to 90% with per-agent belief tracking; and Turing multi-turn coherence now scores 100% with full conversation state management. Current research targets: achieving a perfect score: 230/230 (100.0%) across all 23 tests and 7 dimensions. Combined with the DeepMind AGI Suite at 550/550, UNA scores 780/780 on all cognitive benchmarks.
Both benchmarks are Python scripts that run directly on UNA’s Mac Mini M4 Pro. Results are deterministic — running the same script produces the same scores. Results are saved as JSON and timestamped. Last run: March 21, 2026.