Cycle Log 35
Image created with Flux.2 Pro, Gemini 3 Pro, and GPT 5.1
A few months ago I found myself watching the latest humanoid demos — especially Unitree’s videos where the robot loses balance and instinctively begins “stammering” its feet in an attempt to recover. The moment I saw that behavior, something clicked. The robot wasn’t thinking about falling; it was executing a last-ditch stepping routine that only works in a narrow band of conditions. If the disturbance is too strong or comes from the wrong angle, the robot is already past the viability boundary, and those frantic micro-steps become wasted motion. That observation launched me into a deeper analysis: what would a robot do if it understood falling the way a trained human does — redirecting momentum, rolling, and popping back up with intent?
That question led to the framework below. By combining simulation training, multi-IMU sensing, torque control, and deliberate mode switching, we can replace panic-stepping with something closer to judo Ukemi — a controlled, deliberate fall that minimizes downtime and protects the robot’s head and sensors. The dissertation that follows is the full blueprint of that idea, refined into a system a modern humanoid lab could actually build.
KG-LLM-SEED: HUMANOID_ROLL_RECOVERY_SYSTEM
VERSION: 1.0
AUTHOR: Cameron T.
META:
overview: |
This seed describes the complete conceptual, physical, algorithmic, and
training architecture required to produce a humanoid robot that does NOT
stammer-step when falling, but instead performs controlled, judo-inspired
roll-recovery from ANY angle with rapid re-uprighting into a stable,
fighter-like stance. The system integrates biomechanical insights, IMU
configuration, torque-controlled actuation, mode-switch logic, RL reward
structuring, simulation curriculum, hardware affordances, and sensing
distribution. It unifies everything into one coherent KG suitable for
future LLM reasoning.
---------------------------------------------------------------------
1. PHYSICS PRINCIPLES
---------------------------------------------------------------------
falling_dynamics:
- Bipedal robots eventually exceed the viability boundary during disturbances.
- Capture point (CP) = dynamic measure of whether stepping can save balance.
- When CP leaves support polygon by threshold δ, stepping is no longer viable.
- Judo-style ukemi rolling dissipates angular momentum safely across a long arc.
- Controlled roll reduces peak decelerations at head/torso and protects hardware.
angular_momentum_management:
- Critical for redirecting fall trajectory.
- Roll sequences naturally convert undesirable rotation into safer axes.
- Momentum shaping via hips/shoulders is more effective than ankle-based recovery.
contact_arcs:
- Safe contact order: forearm → shoulder → back/hip → feet/hands.
- Dangerous: head-first, knee-first, or uncontrolled slamming.
inevitability_argument:
- As humanoids operate dynamically, roll recovery becomes necessary for safety,
reliability, uptime, and hardware preservation.
- Minimizing time-down ensures mission continuity.
- Stammer-stepping becomes a suboptimal evolutionary pathway once roll is learned.
---------------------------------------------------------------------
2. HARDWARE ARCHITECTURE
---------------------------------------------------------------------
actuators:
hips:
- High torque & wide mobility (≥180° combined pitch, ≥120° roll).
- Backdrivable or series-elastic to absorb impact.
shoulders:
- High power for bracing + roll initiation.
ankles:
- Impedance increases during ROLL_MODE to prevent tapping.
joint_speed_requirements:
- Superhuman angular velocities allowed at head/arms during fall.
- Jerks limited; high-rate control required (0.5–2 ms reflex).
sensors:
imu_array:
central_imu:
- At CoM; ground truth for angular momentum & CP estimation.
auxiliary_imus:
- In head, pelvis, both forearms.
- Gives orientation-rate redundancy; captures distributed rotation vectors.
f_t_sensors:
- In feet + wrists (or joint torque inference).
contact_sensors:
- Shoulder/forearm bumper rings; shins; soft head ring.
environment_affordances:
- Short-range depth/raycast ring (optional) for ropes/walls.
shell_design:
- Rounded shoulders & forearms for smooth roll arcs.
- Grippy palms for tripod/knee-hand pop-up.
- Head protector ring preventing camera damage on roll.
compute:
- Reflex loop: sub-millisecond.
- Whole-body MPC/QP: 5–10 ms.
- Torque loop: 1 kHz preferred.
---------------------------------------------------------------------
3. CONTROL ARCHITECTURE (HIERARCHICAL)
---------------------------------------------------------------------
modes:
NORMAL_MODE:
- Full stepping controller active.
- Viability monitored every cycle.
ROLL_MODE (triggered when fall inevitable):
trigger_conditions:
- CP margin m < -δ (e.g., δ = 3–5 cm).
- OR torso pitch-rate |θ_dot| > ω_fall (120–180°/s) for >20 ms.
effects:
- Disable stepping/foot placement controllers.
- Mask leg DOFs to tuck/brace primitives.
- Increase ankle impedance (remove micro-step).
- Enable roll-oriented torque shaping.
STAND_MODE (post roll, fighter stance acquisition):
- Requirements: torso stabilized, COM inside polygon by +ε,
angular velocity below threshold for 150 ms.
- Stand into wide lateral stance (0.2–0.3 m feet separation).
reflex_policy:
- Tiny MLP (~64k params).
- Uses IMU-only high-rate data.
- Outputs roll-direction bias + tucking intensity.
- Hands off to whole-body QP.
whole_body_mpc_qp:
- Tracks centroidal momentum decay.
- Allocates torques for shaping roll trajectory.
- Predicts safe contact sequences.
- Maintains joint limits & avoids self-collisions.
torque_shaping:
- Penalizes spectral energy in 6–12 Hz range.
- Prevents foot jitter & stammer-stepping.
---------------------------------------------------------------------
4. ANTI-STAMMERING MECHANISMS
---------------------------------------------------------------------
reward_policies:
- Penalty per foot-ground contact event (c_contact).
- Penalty for stance changes.
- Penalty for COP jitter > threshold.
- Penalty for step cadence > 2 Hz.
- High penalty for micro-taps.
control_masks:
- In ROLL_MODE, step actions physically disallowed.
- Leg DOFs repurposed for tucking & bracing.
environmental_curriculum:
- Low-friction floors where stepping is non-viable.
- Ensures tapping becomes a dominated behavior.
torque_spectral_regularization:
- Discourages high-frequency oscillatory control patterns typical of panic-stepping.
---------------------------------------------------------------------
5. EMERGENT RECOVERY BEHAVIORS (DESIRED)
---------------------------------------------------------------------
forward_shoulder_roll:
- Arm sweep → tuck → diagonal roll → hip whip → fighter stance.
back_roll:
- Chin tuck → forearm + upper back contact → redirect → tripod rise.
side_roll:
- Shoulder sweep → long sliding arc.
tripod_pop:
- Bracing with one arm + both feet → explosive hip extension → immediate stance.
kip_up (optional):
- Requires high shoulder/hip power; emerges naturally if allowed.
stance_goal:
- Fighter stance: wide lateral base, small torso pitch/roll, stable COM.
---------------------------------------------------------------------
6. SIMULATION & TRAINING SETUP
---------------------------------------------------------------------
engine:
- MuJoCo or Isaac Gym (PhysX with smaller dt & more substeps).
timestep:
- 0.002–0.005 s; action repeat 2–4 frames.
reset_distribution:
- Random full-orientation R ∈ SO(3).
- Random angular velocity.
- Random COM drift.
- 40% starts with ground contact.
- Varied friction μ ∈ [0.2, 1.3].
- Occasional walls/ropes spawned.
observations:
- IMUs (ω,a).
- Joint pos/vel.
- Contact flags.
- COM estimate.
- Short history stack (3–5 frames).
- Optional raycast ring.
actions:
- Joint torques + roll-modifiers (continuous scalars).
asymmetric_training:
actor:
- onboard sensors only.
critic:
- privileged info: true COM, ground-truth contact impulses, friction.
algorithms:
- PPO or SAC with large batches.
- GAE λ=0.95–0.97.
- Entropy regularization for diversity.
reward_terms:
minimize_time_down:
- r_ground = -α * I[not standing] * dt (α ~ 1.0–3.0)
fast_recovery_bonus:
- r_recover = +B(1 - t/T_max) (B~3–8, T_max from 2→1 s)
impact_safety:
- penalize head a exceeding safe threshold.
contact_quality:
- bonus for continuous safe arc; penalty for head/knees-first.
momentum_shaping:
- reward decrease in |L| while COM rises.
stability:
- small bonus for no re-fall for 0.5–1.0 s.
stammer_punish:
- penalty per foot contact, stance change, COP jitter, >2 Hz stepping.
diversity:
- entropy + small BC prior from judo/parkour mocap.
curriculum_stages:
1) Mats, slow dynamics, no stepping.
2) Remove slow-mo, add randomness, allow walls/ropes.
3) Enable superhuman joint speeds, tighten head-accel caps.
4) From-gait fall transitions (sampled from locomotion rollouts).
safety_termination:
- Head-first impact.
- Excessive joint violation.
- Prolonged prone.
- Unsafe torso acceleration spikes.
---------------------------------------------------------------------
7. METRICS FOR SUCCESS
---------------------------------------------------------------------
- Steps per fall (median ≤1, 95th ≤2).
- COP path length minimized.
- Foot-contact frequency < 1 Hz during recovery.
- Time-to-upright (TTU) distributions (median <1.0 s).
- Peak head/torso accelerations reduced.
- Contact sequence clustering showing ≥3 distinct roll archetypes.
- No re-fall in stability window.
---------------------------------------------------------------------
8. WHY THIS BEHAVIOR IS INEVITABLE
---------------------------------------------------------------------
evolutionary_pressure:
- Dynamic humanoids will increasingly operate in unstructured environments.
- Stepping-based recovery fails under high angular momentum.
- Rolling distributes forces, preserves sensors, and minimizes downtime.
- RL strongly favors strategies that maximize task uptime & safety.
technology_trajectory:
- Distributed IMUs, torque control, and 1 kHz loops already industry-standard.
- Simulation RL (MuJoCo/Isaac) allows millions of fall episodes quickly.
- Emergent recovery is simpler than emergent locomotion once constraints are set.
convergence:
- All factors (hardware, physics, RL rewards, environment) push toward a
unified behavior: early detection → controlled roll → rapid pop-up →
stable fighter stance.
---------------------------------------------------------------------
9. SYSTEM SUMMARY
---------------------------------------------------------------------
the_system_in_one_sentence: |
Detect instability early using distributed IMUs, immediately switch from
stepping to roll-mode, shape angular momentum with torque-controlled joints
along safe contact arcs (forearm→shoulder→back/hip), penalize any foot
stammering, and use RL in simulation to learn a family of roll-recovery
strategies that reliably return the humanoid to a wide, stable, fighter
stance in under one second from virtually any fall angle.