Flying Robots: Physical to Embodied Intelligence

Preface#

Development Path of Flying-Robot Research and Applications#

Current stage: Most systems are still flying around to collect information, and their functionality is basically limited to this. They cannot do a wide range of tasks like dual-arm robots. In the past, quadrotor UAV development often followed a modular, functional paradigm: each module was built based on physical principles, then a refined mathematical model was constructed, and finally a dedicated solver was designed to optimize it. This paradigm can be summarized as physical intelligence.

New approach: Embodied intelligence (a higher-ceiling approach). The robotic agent can actively collect some data and extract useful information (e.g., clean the data to obtain what matters), learn by itself, learn to evolve, and thus acquire a skill.

path

Research Ideas#

From single-agent perception and decision-making to swarm autonomous coordination; from mathematics-driven modeling and optimization to data-driven learning and evolution.

dev

Trajectory Planning & Perception: Full-State Trajectory Generation in Complex Environments#

Challenge 1: Modeling is hard to make accurate (e.g., fitting a relatively accurate geometric shape; the shape may be non-convex, and we want to preserve as much of the original solution space as possible)

hard

Challenge 2: Trajectories are hard to solve (we must consider all kinds of constraints: nonlinear constraints, coupled constraints, and so on)

hard2

Challenge 3: A rotorcraft-specific issue: pose is hard to decouple. For rotorcraft UAVs, position and attitude are coupled. Under a particular attitude, there will be a particular acceleration, which changes the position. Therefore, position and attitude cannot be handled separately.

hard3

Corresponding solutions to the three challenges: point-mass model, spatiotemporal alternating decoupling, and manually specified trajectories.

Closed-Loop Perception-Planning and Modeling in Dynamic UAV Environments#

To close the loop of the overall system (from information collection to planning) in complex situations, the UAV must be able to fly across a wide variety of scenarios.

plan

Representative results: active perception planning and efficient extraction of flight corridors (the UAV does not need to know where obstacles are; it only needs to know where the free space is, so we need a way to estimate the obstacle distribution)

get

Dynamic perception (this depends more on camera advances, such as event cameras. However, building an autonomous navigation UAV centered on an event camera is difficult due to its inherent limitations, so Gao Fei’s team proposed enhancing object-imaging sensitivity and designed such a camera.)

get2

Without relying on external equipment or pre-programming, and with a shared goal, they can perform distributed computation.

Technology Development Roadmap for Flying Robots#

The first idea is the change in how flying robots acquire intelligence: we should no longer build a refined mathematical model for each specific task. Instead, we should design simulation environments, learning policies, and pipelines for information collection, data cleaning, and data generation, along with sim-to-real deployment methods, so that UAVs can learn a skill by themselves.

The second idea is to give them manipulation and interaction capabilities, enabling richer embodied interaction and manipulation/execution/understanding (e.g., VLA, or even VTLA).

nian

Some of Gao Fei’s Work Based on These Ideas#

From the “flying eye” to the “flying hand” (during deformation, changes in the center of mass or moment of inertia make adaptive control difficult to achieve).

work

From “math/physics-driven” to “data-driven”: letting robots acquire a skill via neural networks, learning, and reinforcement learning, so that they can autonomously navigate and complete tasks (one issue in traditional path planning is that solving an entire trajectory path is essentially either a discrete combinatorial optimization problem or a numerical optimization problem in a continuous domain).

work1

RL-based autonomous navigation and obstacle avoidance; end-to-end RL with vision; end-to-end RL with LiDAR for obstacle avoidance; RL for aerobatic flight (for some tasks that do not require extremely precise control but demand better real-time performance, it can be better to hand them to RL). (For control, RL can be somewhat interpretable, like offline sampling-based MPC, though it may have shortcomings in terms of rigorous mathematical completeness proofs.)

work2 work3