DAM 0.5.0 Release Notes¶
DAM 0.5.0 turns the thesis evaluation runners into first-class, real measurements, adds guard-checked dataset playback to physical hardware, and makes the failure-event taxonomy a shared, spec-aligned classifier. The console gains a native Experiments workspace plus useful live/recorded robot hardware and multi-camera inspection.
Guarded Dataset Replay¶
- A production
datasetadapter can now provide recorded observations and actions while a real motor sink receives only guard-validated commands. The implementation reuses the normal runtime, sink, safety pipeline, camera hub, live preview, and MCAP recorder rather than creating a replay-only execution path. dataset_replay_check.yamlis the focused hardware-validation example: recorded actions, real SO-101 output, and four visible/recordable streams: datasetreplay_top/replay_wristplus livetop/wrist.- Image stream namespacing is a composition setting (
image_namespace: replay) rather than dataset-specific rewriting; colliding streams now fail fast instead of silently overwriting recorded observations. - Dataset-to-hardware replay runs in strict mode: absent recorded actions fail closed instead of falling back to synthetic actions.
Guard Configuration and Inspection¶
- Risk Log and Cycle Inspector now use one shared cycle I/O payload and render a merged per-joint table for state, proposed target, validated output, and motor temperature/current/voltage where available.
- Risk outcomes are canonical (
reject,clamp,pass); filtering a fallback-resolved guard rejection now still finds the rejected cycle. - Templates emit only guard layers backed by configured boundaries, in layer
order. Every preset includes L1 motion and L3 health monitoring; SO-101
alone includes the implemented L2 gripper sequence. L3 presets expose
warn_frames, the consecutive-cycle reaction threshold. - The nominal SO-101 voltage safety band is corrected to
10.0-13.0 Vfor a 12 V supply. The previous6.0-8.5 Vexample values could trigger an immediate stop on normally powered hardware. - Task gripper workflow nodes no longer use a one-second timeout as if it measured callback latency; these phases advance explicitly with the task.
Experiment Runners¶
- RQ1–RQ5 are exposed as native runners (
dam.experiments) usable from the console, thedam experimentCLI, andPOST /api/experiments/{id}/run. Each run writesresults.csvplus an SVG (and PNG where applicable) artifact, served byGET /api/experiments/artifact. RQ1 uses a PNG preview because NLL values can be negative and are not represented correctly by the earlier SVG median-bar preview. - RQ3 (Normal-Use False Trigger Study) and RQ5 (Failure Record
Quality) are now real measurements. They were placeholder constants;
they now drive the real L0–L2/L3 guard stack:
- RQ3 (
scripts/run_usability_study.py) feeds benign, in-limit frames across legal deployment variations and counts genuine false triggers from realguard.check()decisions. - RQ5 (
scripts/run_record_quality.py) drives real violating scenarios across all three event categories, harvests records through the shared production classifier, and scores completeness, classification, layer labels, readable reasons, observation window, and taxonomy coverage.
- RQ3 (
- RQ2 (
run_boundary_scan.py) remains a real guard-driven sweep. RQ4 now performs isolated Guard profiling for the thesis latency study: 10 Hz, 20 Hz, and 50 Hz are launched sequentially from the console, each comparing No Safety, Rule-based Safety, OOD-only, and Full RSMF over 500 time steps. Console runs use short visual pacing by default, with a full wall-clockrealtime=trueoption, then embed the plot directly in the result card. The measurement window starts at action proposal receipt and ends at the validated action decision, excluding image preprocessing and policy inference. RQ1 (L0 calibration) is an offline evaluation harness over the shared DAM L0 feature and OOD backend APIs: it trains Real-NVP on normal SO-ARM observations, then compares per-frame NLL across normal, legal-variation, and abnormal-A HuggingFace datasets. RQ1 can optionally compare Welford, MemoryBank, and Real-NVP scoring on the same features, and can fuse pretrained image embeddings when a vision model is selected.
Event Taxonomy¶
- Classification now lives in one place —
dam.runtime.failure_classify— and is used by both the runtime harvester and RQ5. - Categories are defined purely by guard layer and fault source; the
fragile guard-name substring heuristics (
"ood"/"hardware"in the name) were removed. Behaviour for the built-in guards is unchanged.
| Category | failure_type |
Rule |
|---|---|---|
| 感知異常事件 — perception anomaly | ood_only |
Only L0 perception-anomaly guards fired, and nothing else. |
| 動作風險事件 — action risk | guard_triggered |
Any non-hardware guard rejected, limited, or fault-arbitrated an action. |
| 硬體風險事件 — hardware risk | hardware_triggered |
Any L3 guard, or any guard with a hardware fault source, fired (highest priority). |
Host Health Boundary¶
- The
host_health_limitL3 boundary samples host CPU / GPU / memory / temperature (psutil + load average +nvidia-smi) and faults withfault_source="hardware"— so a computer-side breach is a 硬體風險事件 automatically. - Wired into the canonical Console template,
examples/stackfiles/demo.yaml,so101.yaml, and the default local config. - Fixed a crash:
host_health_limitreturnedlayer="L3"as a string, which broke failure harvesting (int(r.layer)) on every host-health fault. It now uses theGuardLayerenum, and the classifier tolerates enum/int/"L3"layer forms. nvidia-smirows reporting[N/A]no longer raise; those GPUs are skipped.
MCAP & Console¶
- The MCAP player and live stream can display multiple cameras from the shared camera hub; hardware replay now preserves recorded dataset cameras alongside top/wrist hardware streams instead of overwriting them.
- Risk Log can switch from the live in-memory feed to a recorded MCAP session, using the same event filters and Cycle Inspector payload for incident review.
- Guard
metadata(includinghost_health) is now surfaced through the positional Rust cycle path, not only the dict path, so the MCAP inspector shows it for Rust-recorded sessions too. - Cycle / observation / action detail use flexible fields instead of a fixed column set; the inspector renders guard metadata and the failure record.
- Replay/session review is less brittle: empty session and empty camera states
keep the same framed inspector/player layout, selected-cycle details stay
compact, and session deletion now archives MCAP files into
_trashinstead of unlinking them immediately. - The live camera path is hub-owned end to end. Runtime/adapters publish JPEG
frames to the image hub, telemetry sends the camera names in JSON plus JPEG
bytes as binary WebSocket payloads, and the legacy
/api/mcap/live/live_imagesJSON preview path was removed. - Dashboard "Cycle Latency" panel gains a Latency / Hardware tab. The Hardware tab subscribes to host + robot health from telemetry and stays calm/idle when no task is running or no health-reporting guard is active.
- Fixed the empty "History (last 60 cycles)" chart after a simulated Stop→Start or episode wrap: the session-scoped latency window now resets and refreshes instead of being blocked by stale cycle ids.
Breaking Changes¶
version.tomlwas removed. The version is now read frompyproject.toml;scripts/sync_version.pyreads the same source.- Failure classification no longer inspects guard names. Custom guards that
relied on a name containing
ood/hardwarefor classification must use the correct layer or setfault_source="hardware". - Built-in SO-101 templates use
10.0-13.0 Vas the nominal voltage band. Deployments intentionally powered outside that band must set their ownvoltage_limitparameters.