The Robot That Knows How to Walk Cannot Help You Stand Up

In a research lab at Carnegie Mellon University, Inseung Kang watches a simulated human fall. The fall is deliberate, carefully modeled in a physics simulator, because recruiting actual stroke survivors to fall repeatedly for data collection is, as Kang puts it, something he does not think is feasible. Instead, his team builds digital humans, complete with 50-degree-of-freedom musculoskeletal models and 208 virtual muscles, and lets them stumble inside computers.

Eight hundred miles south, a Unitree G1 humanoid robot named BFM-Zero is doing something remarkable. Trained on a Behavior Foundation Model, it performs motion tracking, goal reaching, and reward-based inference without any task-specific retraining. It simply receives a prompt and moves. Zero-shot. In simulation, the underlying model learned from massive datasets of human motion capture. On the real robot, asymmetric training bridges the gap between full-state simulation and partial-observation reality.

These two stories sound like they should connect. A foundation model that understands human movement helping a wearable robot assist human movement seems like the obvious next step. It is not. The gap between these two achievements is wider, more technically demanding, and more interesting than almost anyone outside the field appreciates.

The Problem Is the Human

The first thing to understand about applying Behavior Foundation Models to wearable robots is that the core difficulty has nothing to do with the models.

BFM-Zero, Meta Motivo, and the growing family of behavior foundation models all share a crucial assumption: the agent controls the entire body. Every joint torque is the model’s to command. The physics simulator provides clean proprioceptive feedback. The objective function is whatever the researcher designs.

Wearable robots break every one of these assumptions. A hip exoskeleton provides roughly 20% of total joint torque. The human generates the other 80% and can override the robot at any moment. The robot does not sense its user’s intentions directly. It infers them from noisy surface EMG signals that degrade as muscles fatigue, from IMU data that measures consequences rather than causes, or from EEG patterns with spatial resolution too low for practical deployment.

Kang frames the core tension precisely: there is a trade-off between the value added by more sensors and the usability of the system. A research prototype with 50 sensors that takes 30 minutes to put on will never leave the lab.

This creates a problem that BFMs were not designed to solve. The model needs to control a coupled human-robot system where it observes only a fraction of the state, contributes only a fraction of the actuation, and must keep a biological system safe at all times. The challenge is not making the model bigger. It is making the model work in a regime where partial control, partial observation, and absolute safety constraints converge.

A Simulation Gap Squared

Every robotics researcher knows about the sim-to-real gap, the difference between physics simulation and physical reality. For humanoid robots, the gap has narrowed considerably. Domain randomization, where the simulator varies link masses, friction coefficients, joint offsets, and center-of-mass positions, has produced policies that transfer reliably from simulation to hardware. ExBody2 demonstrated expressive whole-body tracking across multiple platforms. KungfuBot deployed kungfu and dance moves on a Unitree G1 through a careful motion-processing and adaptive-tracking pipeline.

For wearable robots, the sim-to-real gap is not just larger. It is a different kind of problem entirely.

The landmark paper in this space is Luo et al. in Nature (2024), which demonstrated experiment-free exoskeleton assistance by training entirely in simulation. Their approach used a 50-degree-of-freedom musculoskeletal model with 208 skeletal muscles, three neural networks for motion imitation, muscle coordination, and exoskeleton control, and domain randomization over kinematic properties to generalize across users. The results were genuinely impressive: metabolic cost reductions of 24.3% for walking, 13.1% for running, and 15.4% for stair climbing, all without a single human experiment during training.

But that result, significant as it is, illustrates the remaining challenges. The musculoskeletal model represents a healthy adult. It does not capture the pathological muscle activation patterns of a stroke survivor, the rigidity of Parkinson’s disease, or the variable paralysis levels of spinal cord injury. A March 2025 paper on adaptive torque control under spasticity made early progress, incorporating a differentiable spastic reflex model into the digital twin and training a Soft Actor-Critic controller that reduced maximum knee torques by 10.6%. This is a first step toward pathology-aware simulation, not a solution.

The deeper issue is motor adaptation. When humans wear an exoskeleton, they change how they walk. Their nervous system adapts over minutes to hours, redistributing effort across muscles. This adaptation is itself a moving target, and current simulators do not model it. Luo’s domain randomization handles population-level kinematic variation, not individual-level neural strategy shifts. A BFM trained in simulation would encounter a user whose behavior changes in response to the very assistance being provided.

Scherpereel et al. at Georgia Tech offered one workaround. Their Science Robotics paper (November 2025) used CycleGAN to translate data from unassisted walking to exoskeleton-assisted walking, eliminating the need for device-specific datasets. This reduced metabolic costs by 9.5 to 14.6% in eight participants. The approach sidesteps the simulation problem by learning the mapping directly from data. It is clever, but it requires at least some real-world data, which means it does not fully close the experiment-free loop.

The Safety Problem No One Has Solved

There is a question that hangs over every discussion of AI-controlled wearable robots, and it is the one most researchers would rather not answer directly: can you guarantee safety?

Kang does not sugarcoat it. “At the end of the day, ML or any AI model is a statistical approach,” he says. “There is no such thing as 100% guarantee.” Current exoskeletons manage this through physical joint limits and partial assistance. If the robot goes wrong, the human can overpower it. This is a practical safety envelope, not a theoretical one.

The robotics community has been developing more principled approaches. Control Barrier Functions, mathematical constructs that enforce safety constraints in real time, are showing promise. A November 2025 preprint on safe reinforcement learning for human-robot shared control demonstrated a CBF-QP shield running at 250 to 500 Hz, with more than 90% of episodes showing zero safety violations. The system is notable for being the first to combine formal CBF-based guarantees with human-centric evaluation of workload, trust, and usability.

In contact-rich manipulation, ContactRL achieved a safety violation rate of just 0.2% while maintaining 87.7% task success, using a kinetic-energy-based CBF shield. Real-world experiments on a UR3e robot confirmed contact forces consistently below 10 newtons during handovers.

These results are encouraging, but they reveal a gap. No published paper has applied formal safety verification, whether through CBFs, barrier certificates, or constrained MDPs, to a BFM-controlled wearable exoskeleton interacting with a human user. The comprehensive survey on safe reinforcement learning (May 2025) catalogs three categories of approaches, control-theory-based, formal-method-based, and constrained-optimization-based, yet notes that constrained optimization shows the best practical results for balancing performance and safety. Translating this to the wearable domain, where the “environment” is a human body with pathological conditions, remains an open research problem.

The regulatory landscape adds another layer. The FDA has authorized over 1,250 AI/ML-enabled medical devices as of July 2025, with roughly 100 new approvals per year. The agency’s January 2025 draft guidance on AI-enabled device software functions introduces lifecycle management recommendations, and the concept of Predetermined Change Control Plans allows manufacturers to pre-approve certain model updates. But continuously learning models, the kind a BFM-based exoskeleton would ideally use for personalization, do not fit neatly into these frameworks. An exoskeleton with embedded AI would be classified as Software in a Medical Device (SiMD) under IEC 62304, likely requiring either a 510(k) or PMA pathway. Wandercraft’s Atalante X, a self-balancing exoskeleton for spinal cord injury and multiple sclerosis patients, secured FDA clearance in 2025 through a multi-center clinical study of 547 training sessions. That clearance used traditional control, not machine learning.

Five Bridges to Cross

If the problems are clear, the research community has at least begun sketching the paths forward. Five integration strategies are emerging from the literature, each addressing a different piece of the puzzle.

The first is cross-embodiment transfer, the idea that a foundation model trained on humanoid robots could share useful representations with exoskeleton controllers. HumanoidExo (October 2025) demonstrated that a lightweight wearable exoskeleton could collect whole-body human motion data and use it to train humanoid robot policies, achieving complex skills from as few as five real-robot demonstrations. The pipeline runs from human wearing exoskeleton to data to humanoid policy. Running it in reverse, from humanoid foundation model to exoskeleton assistance, is theoretically possible but undemonstrated.

The conceptual foundation is strengthening. GR00T N1, NVIDIA’s open foundation model for humanoid robots, uses per-embodiment MLPs to project different robot morphologies into a shared embedding dimension. Embodiment scaling laws research (May 2025) found that increasing the number of training embodiments improves generalization to unseen ones more effectively than increasing data on fixed embodiments. If “human wearing exoskeleton” can be treated as another embodiment, these scaling laws would apply.

The second bridge is the digital twin pipeline. Rather than transferring from humanoid models, this approach builds patient-specific musculoskeletal simulations and trains BFM-style policies directly in them. Luo et al. demonstrated the viability. A 2025 review in Knee Surgery, Sports Traumatology, Arthroscopy mapped the state of musculoskeletal digital twins, identifying multi-modal data integration and pathology modeling as the primary technical barriers. Yang et al. showed that deep integration of digital twins with rehabilitation exoskeletons could achieve personalized gait trajectory planning.

The third is multimodal intent recognition. Current exoskeletons read EMG or IMU data through dedicated signal processing pipelines. A foundation-model approach would unify these signals. A November 2025 paper in Scientific Reports demonstrated a hybrid EMG-EEG interface where EEG compensated for EMG degradation under muscle fatigue, a practical solution to one of the most persistent problems in myoelectric control. Transformers are rapidly replacing CNNs and LSTMs for EEG classification, and the Pretrained Actigraphy Transformer (PAT), trained on wearable movement data from nearly 30,000 participants, represents the first true foundation model for wearable behavioral signals. Integrating such a model as the intent-recognition front end for a BFM-based exoskeleton controller would create a system that predicts what the user intends to do before they physically begin the movement, because EEG signals precede physical motion by 200 to 500 milliseconds.

The fourth bridge is tactile and proprioceptive sensing. The exoskeleton-human contact interface is information-rich and largely unexploited. Tactile foundation models are beginning to emerge for humanoid robots, and whole-body tactile compliance control (July 2025) showed that embedding skin sensor data directly into QP-based controllers enables reactive force regulation. Applied to exoskeletons, this means real-time monitoring of pressure distribution at every contact point, detecting abnormal pressures that risk skin breakdown and adjusting assistance before injury occurs.

The fifth, and arguably most critical, is the safety shield. Layering a CBF-based safety filter over BFM policy outputs could provide the mathematical guarantees that purely learned policies cannot. The architecture would look something like this: the BFM generates desired assistance torques based on its understanding of movement and user intent, and the CBF filter projects those torques onto the nearest safe action in real time, enforcing joint angle limits, velocity constraints, interaction force bounds, and maximum assistance ratios. A March 2025 paper in Automation in Construction achieved exactly this for construction robots, providing theoretical safety guarantees and zero collisions during human-robot collaboration, without collision penalties during training. Adapting this framework to the more intimate contact regime of wearable robotics is a natural next step.

The Edge Constraint

There is a practical dimension that the research literature sometimes glosses over. Whatever model ends up controlling a wearable robot must run on hardware strapped to a person’s body. That means battery power, thermal limits, and weight constraints that data center researchers never think about.

The good news: edge AI is maturing fast. NVIDIA’s Jetson Orin platform achieves sub-30-millisecond inference for transformer-based policies with TensorRT optimization. Four-bit NF4 quantization can compress a 2-billion-parameter vision-language model to run on edge hardware while maintaining accuracy. The CLONE framework from USENIX ATC 2025 demonstrated hierarchical optimization that jointly handles latency, accuracy, and energy on a custom 28-nanometer accelerator.

Edge AI benchmarks now show 15 to 50 millisecond inference latencies, 70 to 85% reduction compared to cloud architectures, with 92 to 98% accuracy retention. For wearable robotics, the critical requirement is not raw model size but end-to-end latency from sensor input to motor command. Policy distillation, where a large teacher model trains a compact student model, is the established technique. A December 2025 paper in the Journal of NeuroEngineering and Rehabilitation demonstrated this for lower-limb exoskeletons, distilling a deep RL policy into a minimal-sensor neural network controller that adapts per-user.

The question is whether a BFM-class model can be distilled to the point where it runs on wearable hardware while retaining its generalist capabilities. CLoSD’s diffusion planner generates motion at 3,500 frames per second, 175 times real-time, on GPU hardware. Whether a distilled version of that planner can run within the 2 to 10 millisecond window that wearable robots demand is an open empirical question.

What This Actually Means

After reviewing dozens of papers across behavior foundation models, wearable robotics, safe reinforcement learning, edge computing, and neural interfaces, here is what I believe the data supports.

The fundamental barrier is not the model. It is the human in the loop. BFMs have proven they can learn generalizable behavior representations from large motion datasets and deploy them zero-shot on physical hardware. The unsolved problem is extending this to coupled human-robot systems where the model controls a fraction of actuation, observes a fraction of state, must personalize to individual pathology, and cannot afford a single unsafe action. This is not a scaling problem. It is a new problem.

The pieces exist, but nobody has assembled them. Musculoskeletal digital twins that simulate 208 muscles. CycleGAN domain adaptation that eliminates device-specific data collection. CBF safety shields that run at 500 Hz with zero violations. Transformer-based EEG that predicts movement 200 milliseconds before it happens. Foundation models for wearable behavioral data. Edge hardware that achieves sub-30ms inference. Each piece has been demonstrated in isolation. The integration, building a system where a BFM-style foundation model uses multimodal neural signals to generate safe, personalized assistance through a certified medical device, has not been attempted.

Cross-embodiment transfer is the most promising and least proven path. The idea that a single foundation model trained across humanoid robots, exoskeletons, and prosthetics could generalize to new embodiments is supported by scaling-law evidence and architectural feasibility from GR00T N1 and HumanoidExo. But the claim that a model trained to control a free-standing humanoid can generate useful representations for assisting a coupled human-exoskeleton system is an extrapolation, not a result. It may turn out that the behavioral representations are too different to share, that “moving freely” and “assisting movement” require fundamentally different learned structures.

Safety certification will be the bottleneck, not model performance. Even if the technical challenges are solved, deploying a continuously learning BFM inside a medical device means navigating FDA SiMD classification, IEC 62304 software lifecycle requirements, and predetermined change control plans that were designed for more predictable systems. Wandercraft needed a 547-session multi-center study to clear a traditional exoskeleton. A BFM-based device would face an order of magnitude more scrutiny. This is not necessarily a bad thing.

And the observation that stays with me: Kang predicts a “GPT moment” for wearable robotics within five to ten years. He may well be right. The conditions are assembling, universal controllers, experiment-free training, massive motion datasets, affordable hardware. But GPT moments are only obvious in retrospect. The researchers building the foundations today are working in a period that will look like the deep learning years of 2012 to 2017, full of important incremental progress that only later reveals itself as the beginning of something transformative.

The Robot That Knows How to Walk Cannot Help You Stand Up#

The Problem Is the Human#

A Simulation Gap Squared#

The Safety Problem No One Has Solved#

Five Bridges to Cross#

The Edge Constraint#

What This Actually Means#

References#