Oracle testing
Every implementation decision in Phasis is gated on one question: does V8 produce the same output?
JavaScript has subtle semantics — NaN !== NaN, -0 === 0, [1] + 1 === "11", the exact order of property enumeration, the timing of Promise microtasks. Re-deriving these by reading the spec is slow and error-prone. Phasis sidesteps that by treating Node.js (V8) as an oracle: write a JS scenario, run it through V8, capture the output as truth; run it through Phasis, capture the output again; diff.
1. SETUP → a JavaScript source file or snippet
2. ORACLE → Node.js executes it, output captured as truth
3. ACTUAL → Phasis executes it, output captured
4. COMPARE → oracle vs actual, diff measures the gapThis is the same model used by other Inline0 projects:
| Concept | php-browser | pitmaster | greph | phasis |
|---|---|---|---|---|
| Oracle | Chromium | canonical git | grep + rg + sg | Node.js (V8) |
| Actual | PHP renderer | Pitmaster | greph | phasis |
| Test suite | fixture snapshots | git interop | scenario corpus | test262 (50,506 tests) |
| Compliance doc | CSS_COVERAGE.md | compat-report | compat-report | COMPAT.md |
Two levels of testing
Level 1 — Custom scenarios
Each scenario in scenarios/ is a small JS program with known output:
scenarios/operators/arithmetic/
├── scenario.json # metadata
├── setup/
│ └── test.js # source
├── oracle/
│ └── output.txt # Node.js output (committed)
├── actual/
│ └── output.txt # Phasis output (regenerated)
└── reports/
└── comparison.jsonRun a single scenario:
./bin/test-scenario operators/arithmeticRun all scenarios:
./bin/test-regression
./bin/test-regression --jobs 4
./bin/test-regression --category expressions
./bin/test-regression --fastRefresh the oracle (after a Node.js version update):
./bin/oracle --refresh operators/arithmeticScenarios are organised by ECMAScript chapter: literals, operators, variables, control-flow, functions, objects, arrays, classes, builtins, errors, interop, edge cases.
Level 2 — test262
The official conformance suite is checked out as a git submodule under test262/. Each test is self-verifying: it passes if it doesn't throw, fails if it does (or the reverse for negative tests). The harness (assert.js, sta.js) is loaded before each test.
./bin/test262 # full suite
./bin/test262 --category built-ins/Array # subset
./bin/test262 --jobs 4 # parallel
./bin/test262 --report # compliance percentage
./bin/compat-report # full COMPAT.md + compat.jsonThe runner parses each test's YAML frontmatter, checks the features and flags lists against the skip set in config/support.php, executes in strict and/or sloppy mode per flags, and asserts the expected outcome.
Compliance documentation
./bin/compat-report is the canonical compliance snapshot generator. It runs the full test262 suite and writes:
compat.json— machine-readable per-group totals.COMPAT.md— human-readable report with category breakdown.
These files are committed to the repo and updated automatically by the compat-matrix.yml GitHub Actions workflow on every push.
./bin/compat-report --jobs 4The repo always reflects the latest pass/fail/skip numbers without you having to re-run the suite locally.
CI matrix
compat-matrix.yml shards test262 across 73 parallel workers — one per top-level category prefix (built-ins/A, built-ins/B, …, language/expressions, language/statements, annexB, intl402, staging, plus dedicated shards for the heavy property-escapes generator subdirectories).
Each shard:
- Checks out the repo + test262 submodule.
- Caches the Composer install.
- Runs
bin/compat-report --match <prefix>with a 90 s per-chunk timeout (or 60 s for property-escapes shards). - Uploads its per-shard
state.jsonand per-chunk results.
A merge job downloads all 73 artifacts, deduplicates pending-vs-complete chunks across mismatched chunk sizes, regenerates compat.json + COMPAT.md, and pushes them back to main as [skip ci] commits.
The full matrix typically completes in 3–4 minutes wall-clock.
verify-all
For local pre-push checks, ./bin/verify-all runs the quality gate that CI enforces:
=== PHPStan === level 6, zero errors
=== Code Standards === PHPCS, zero warnings
=== PHPUnit === 118 / 118 pass
=== Oracle Regression === 12 / 12 scenarios passIt is not sufficient on its own — test262 must also be sampled for any change touching the parser, interpreter, or built-ins. See CLAUDE.md for the per-change checklist.
Why this works
Treating V8 as the source of truth removes nearly all spec-reading from the development loop. Disagreement with V8 always indicates a Phasis bug — never a spec ambiguity that can be argued. Once a fix lands, the corresponding test262 entries become permanent regression tests.
The 100 % current pass rate is downstream of this discipline: every PR is judged against V8, and every regression in compliance is automatically rejected by CI.