Two Robots Make the Bed by Themselves: the Helix-02 Video Looks Like Science Fiction

AI Figure has released a new video demonstrating the capabilities of the Helix-02 platform: two humanoid robots fully tidying up a bedroom in less than two minutes. The most significant aspect of the demonstration concerns not only the execution of individual tasks but especially the ability of the two systems to coordinate autonomously without any central supervisor or shared planner.

According to the company, both humanoids perform the operations using a single Vision-Language-Action policy trained through neural learning. Each robot observes the environment through its own cameras and interprets in real time the intentions of its partner exclusively from movements, similar to what occurs between humans during collaborative activities.

In the video released by the company, the robots execute a long sequence of heterogeneous operations: they open doors, hang clothes, pick up trash, close books, arrange headphones on vertical stands, move a chair under a desk, and collaborate in making the bed. The entire operational flow is carried out without rigid divisions into subtasks or control handoffs between specialized systems.

From a technical standpoint, the demonstration highlights a simultaneous integration of locomotion, advanced manipulation, and environmental perception. In several instances, the robots must utilize their entire bodies to maintain balance and generate controlled forces. For example, opening a door requires locating the handle, pressing the mechanism, maintaining stability, and dynamically repositioning the body during the door's movement.

Another particularly significant step involves the manipulation of deformable objects. The bedspread represents a very complex scenario for robotics: there is no rigid geometry to follow, the shape changes continuously, and each movement of one robot instantly alters the operational conditions of the other. Therefore, the two humanoids must continuously update their predictions and correct the trajectory dozens of times per second while the fabric bends, slides, and changes tension.

The company claims that this is one of the first demonstrations of "Cooperative Locomanipulation between humanoids" achieved directly through a single end-to-end neural network, capable of transforming visual input into coordinated actions without specific controllers for each task. The company also emphasizes that these new capabilities have been gained by adding training data, without altering the system's underlying algorithm. The same framework had previously been used for logistical activities, household cleaning, and laundry folding, while the new demonstration expands the focus to collaborative tasks among multiple humanoids operating in the same environment.