Skip to content

Paired-data pipeline: 4 bugs in CSI recorder + ground-truth aligner corrupt or block camera-supervised training data #1007

@ruvnet

Description

@ruvnet

Context

Found during ADR-152 §2.2 measurement (b) (2026-06-10/11), when a fresh 40-minute paired collection initially aligned to zero windows and the trained-model forensics exposed silent data corruption. These bugs also retroactively explain pathologies in earlier sessions (#645, #509). Full forensic record: benchmarks/wiflow-std/RESULTS.md on branch feat/adr-152-wiflow-std-benchmark.

Bug 1 — scripts/record-csi-udp.py stamps local time with a Z (UTC) suffix

parse_csi_packet() builds timestamp via time.strftime('%Y-%m-%dT%H:%M:%S.') + ... + 'Z'local wall time labeled as UTC. The camera collector writes true-epoch ts_ns. The aligner parses the CSI ISO string as UTC, so camera and CSI disagree by the UTC offset (−4 h under EDT) and alignment produces 0 pairs. Workaround used: --clock-offset-ms=-14400000. Fix: write datetime.now(timezone.utc).isoformat() or just use the already-present ts_ns in the aligner (preferred — see Bug 4 note).

Bug 2 — scripts/align-ground-truth.js dilutes window confidence with non-detection frames

loadGroundTruth() keeps records with keypoints: [] (empty array is truthy) at confidence 0; window avgConf then averages detections and empties. At a normal ~27% MediaPipe detection rate, every window's avgConf lands ~0.22 < the 0.5 threshold → all windows rejected even when detections themselves average 0.80 confidence. Fix: skip empty-keypoint records at load (treat as gaps); confidence statistics should be over detections only. --min-camera-frames still guards sparse windows.

Bug 3 — heterogeneous csi_shape with silent zero-padding

extractCsiMatrix() stamps the window's subcarrier count from window[0].subcarriers and zero-pads/truncates the other 19 frames to match. Tonight's session: 1,347×[70,20], 284×[134,20], 243×[26,20], 130×[12,20], 42×[20,20] — ~20% of frames inside even native-70 windows were silently zero-padded. Mixed-subcarrier frames come from the ESP32 emitting different packet formats (HT20/HT40/fragments). Fix: either filter frames to the session's modal subcarrier count before windowing, or record the per-frame subcarrier count and reject mixed windows; never silently pad.

Bug 4 — transposed shape label in extractCsiMatrix

The matrix is filled frame-major (matrix[f * nSc + s]) but declared shape: [nSc, nFrames] (~line 351). Consumers that trust the label transpose the data. Found because the measurement-(b) trainer had to correct it on load. Fix the label or the fill order, and add a round-trip test.

Acceptance

  • A fresh paired session aligns with zero clock-offset flags needed
  • Window kept-rate ≈ csi_frames/20 × detection_coverage (no silent confidence collapse)
  • No zero-padded frames in output windows; csi_shape homogeneous per file
  • Shape label matches memory layout (tested)
  • Re-run alignment on tonight's raw files (data/recordings/csi-1781143789.csi.jsonl + data/ground-truth/keypoints_20260610_221000.jsonl) reproduces ≥2,046 pairs without workarounds

Related

#645 (paired-data quantity/quality tracking), #509 (external reproducibility), ADR-152 §2.2, the 92.9% retraction (CHANGELOG + PR #535).

🤖 Generated with claude-flow

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions