Capture Requirements
Tasks are expressed as concrete targets, interfaces, budgets, and constraints that compile directly into atopile designs.
Aletheia gives frontier labs a way to evaluate and train electrical-engineering agents against real design constraints: spec compliance, robustness, cost, sourcing, and manufacturability.
Tasks are expressed as concrete targets, interfaces, budgets, and constraints that compile directly into atopile designs.
Designs are deterministically validated, simulated, and diffed against specs — no manual review in the loop.
BOM analysis, sourcing checks, and manufacturability scoring run directly on the design artifact the agent produced.
Deterministic checks for task targets like voltage, gain, ripple, noise, latency, or bandwidth.
Tolerance, corner, and operating-condition sweeps that punish brittle designs and overfitting.
Score tradeoffs like BOM cost, part pressure, and design simplicity instead of raw pass rate alone.
module TPS54560_DV:
inductor = 6.8uH
c_out = 4x22uF
assert vout within 5V +/- 2%Aletheia starts from a structured task spec with targets, interfaces, budgets, and hidden stress cases.
Agents build a real atopile hardware design artifact that can be reviewed, validated, and manufactured, not just a simulator-only answer.
Pre-simulation checks reject invalid submissions early by validating structure, interfaces, topology, and hard design constraints.
Simulation and design analysis measure behavior under load, corners, tolerances, sourcing pressure, and manufacturability constraints.
EEBench returns a deterministic score package with requirement results, failed checks, sweeps, and reward-ready metrics.
A single-run inspector modeled after the EEBench viewer: task brief, requirement checks, source artifacts, and raw evaluator payloads all stay attached to the same execution record.
Minimum VOUT during the 1 A to 5 A step.
High-signal tasks where efficiency, regulation, protection behavior, and operating envelope all matter at once.
Good surfaces for measuring tradeoffs between gain, noise, stability, bandwidth, and component choice.
Compact tasks with hidden variants that stress topology reasoning, generalization, and evaluator quality.
Version benchmark suites, evaluator logic, and score definitions so labs can compare model runs over time without moving the goalposts.
Publish methodology, task-family coverage, and known limitations so the benchmark is inspectable instead of a black box.
Use hidden tests, robustness sweeps, and operating-condition variation to make benchmark gaming and reward hacking harder.
Return auditable artifacts like plots, failed constraints, and measured outputs so teams can see why a model passed or failed.
Software accelerated when build-test-iterate loops collapsed from months to minutes. Physical product development still spends weeks or months waiting for validated feedback.
At atopile, we are building toward that tooling layer: hardware as code, deterministic validation, and workflows that competent agents can actually operate. Aletheia sits alongside that effort as evaluation and training infrastructure, because models improve much faster when they can be measured against real design tasks with real feedback.
Today, hardware design still depends on a long chain of human work across design, review, procurement, manufacturing, bring-up, and lab validation. Our goal is to make more of that chain machine-operable, so agents can design, manufacture, and test real hardware instead of training only against neat simulator abstractions.
That is how we ground training in reality. When the big loop is closed on real hardware outcomes, the smaller simulation and validation loops can be refined against measured results over time. That makes them more accurate and makes it possible to give models denser, higher-quality rewards without losing touch with the physical world.