

Before We Begin#
Since I’ve had some free time recently, I revisited the papers and technical notes I collected when I first started doing UAV research. I reorganized them systematically and also built my own mind map.
When reading papers and running open-source code, notes are often very fragmented. This time, I tried to take a higher-level view and reconnect them around method paradigms, classic tasks, platform design, and the underlying tech stack, as a quick-reference handbook for research and engineering.
Data-driven vs Model-based#
In the UAV field, data-driven methods and model-based/modular methods each have strengths in different tasks, and they are still competing head-to-head. They are not an either-or choice; rather, each fits different levels of task difficulty.
Why are traditional modular approaches still so strong? Mainly because UAV dynamics models (especially for quadrotors) are not only relatively simple in engineering practice, but also easy to calibrate in the real world. In addition, most UAV tasks are about “moving through the environment” rather than “creating strong physical interaction”. With mature state machines, trajectory planning, and low-level control optimization, you can already achieve excellent flight performance. Most commercial drones we see day to day are still built on this foundation.
Where does end-to-end learning win? Traditional pipelines rely on very accurate state estimation and perception modeling. But on small platforms, constrained by compute, payload, and sensor noise, the whole pipeline is often pushed to its limits. In such cases, learning-based methods (especially using reinforcement learning for perception-driven agile flight) can bypass cumbersome explicit mapping and state derivation, showing reaction speed and robustness beyond the traditional route.
Simulators for UAV RL#
For deep learning/reinforcement learning, a good simulator is essential. In the UAV domain, a few high-frequency “productivity tools” show up again and again:
- AirSim ↗: Based on Unreal Engine (UE4/UE5). Great visuals and very realistic dynamics. However, making low-level changes has a relatively high barrier, and the runtime frame rate is a bit low for large-scale RL training.
- Flightmare ↗: The main feature is speed, very suitable for RL tasks that require massive data sampling.
- AerialGym: A class of environment wrappers highly customized for reinforcement learning, especially popular in Sim2Real (simulation-to-reality transfer) research.
Classic Skills and Representative Works#
This section mainly introduces data-driven methods for classic tasks. It is worth noting that among the works below, some approaches reduce or eliminate reliance on SLAM systems and odometry. Interestingly, the initial rise of UAV autonomy benefited greatly from the increasing maturity of SLAM/odometry systems, so this could become an interesting direction for UAV skill learning.
Obstacle Avoidance in Unknown Environments and Agile Flight#
How can a UAV weave through forests full of unknown obstacles, rubble, or narrow corridors? This is a highly representative challenge. From early days to now, many clever approaches have been proposed.
Inspired by autonomous driving, CMU tried supervised learning in ICRA 2013 to map monocular images directly to discrete control commands. Later, UCB’s CAD2RL appeared, training entirely in simulation on monocular RGB images combined with Domain Randomization, and successfully flew in a real corridor.
Then, work from the University of Zurich (UZH) pushed this direction to a new peak:
- DroNet source code ↗: Cleverly leveraged autonomous-driving datasets to teach UAVs to output velocity commands.
- Agile Autonomy project ↗: Published in SciRob. The core idea is to use the DAgger algorithm to fuse expert data from traditional trajectory planning, arguing that the extremely low latency of end-to-end networks can greatly raise the flight-speed limit in unknown environments.
Domestic universities have also produced very impressive work in this direction. For example, Shanghai Jiao Tong University’s team (Back to Newton’s Laws: Learning Vision-based Agile Flight via Differentiable Physics) proposed using a differentiable physics model to provide first-order gradients for policy optimization, removing dependence on explicit position/velocity estimation. The paper uses low-resolution depth images; for obstacle avoidance it trains more efficiently than RL and achieves high-speed flight.
Similarly, Zhejiang University’s FAST Lab combined reinforcement learning with onboard radar to achieve extreme autonomous obstacle avoidance. In their latest Flying on Point Clouds with Reinforcement Learning ↗, they use onboard radar and sim2real RL to realize autonomous avoidance. Although learning-based methods are advancing rapidly, if you walk around real engineering projects you will find that traditional trajectory-planning approaches such as Ego-Planner ↗ are still the backbone. The reason is simple: they are reliable enough in most scenarios and make debugging straightforward, while the data closed-loop and verification cost of end-to-end approaches is still a significant hurdle.
Representative Works for Other Classic Tasks#
- UAV target recognition and pursuit
- HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit ↗. Arxiv 2024, University of Manchester.
- Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning ↗. Arxiv 2024, THU.
- Autonomous exploration without a prior map
- Deep Reinforcement Learning-based Large-scale Robot Exploration ↗. Arxiv2024, National University of Singapore (NUS). Uses attention mechanisms to learn dependencies across different spatial scales, implicitly predicts unknown regions, optimizes exploration strategies in known space, and improves exploration efficiency.
- ARiADNE: A Reinforcement learning approach using Attention-based Deep Networks for Exploration ↗. Arxiv2023, National University of Singapore (NUS). Learns interdependencies among known regions across multiple spatial scales and implicitly predicts potential gains from exploring those regions. This allows the agent to schedule actions to balance the natural trade-off between exploiting/refining the map in known regions and exploring new regions.
- DARE: Diffusion Policy for Autonomous Robot Exploration ↗. Arxiv2024, National University of Singapore (NUS). DARE uses self-attention to learn spatial information on the map, and generates trajectories to unknown regions via diffusion to improve exploration efficiency.
- UAV racing and high-maneuver / aerobatic flight
- Whole-Body Control Through Narrow Gaps From Pixels to Action ↗. ICRA 2025, ZJU. Uses reinforcement learning for vision end-to-end flight through narrow gaps, without explicit position and velocity estimation, surpassing traditional methods.
Novel UAV Configuration Design#
Beyond making algorithms smarter, many people are also trying to combine UAVs with manipulators or give them morphing capabilities. With these hardware innovations, the task boundaries of UAVs are significantly expanded.
Aerial Manipulator#
An aerial manipulator (also called an aerial manipulation UAV) combines a UAV’s fast spatial mobility with a manipulator’s precise manipulation ability, making it an ideal carrier for embodied intelligence. It can fly and also grasp objects and manipulate items.
- Past, Present, and Future of Aerial Robotic Manipulators. ↗ TRO 2022. One of the most comprehensive survey papers in the aerial manipulator field; essential for getting started.
- Millimeter-Level Pick and Peg-in-Hole Task Achieved by Aerial Manipulator ↗ TRO 2023, BHU. Achieves millimeter-level peg-in-hole tasks using a quadrotor with a serial manipulator.
- NDOB-Based Control of a UAV with Delta-Arm Considering Manipulator Dynamics ↗ ICRA 2025, SYU. Achieves millimeter-level grasping using a quadrotor with a parallel manipulator.
- A Compact Aerial Manipulator: Design and Control for Dexterous Operations ↗ JIRS 2024, BHU. Uses aerial manipulators for interesting applications such as grasping eggs and opening doors.
Fully-Actuated UAV#
Common quadrotor UAVs are underactuated, meaning position and attitude are coupled. A fully-actuated UAV with decoupled position-and-attitude control is theoretically more suitable as a flight platform for aerial manipulation.
- Fully Actuated Multirotor UAVs: A Literature Review ↗ RAM 2020. One of the most comprehensive surveys in the fully-actuated UAV field; essential for getting started.
- Design, modeling and control of an omni-directional aerial vehicle ↗ ICRA 2016, ETH. The first fixed-tilt fully-actuated UAV that enabled omnidirectional flight.
- The Voliro Omniorientational Hexacopter: An Agile and Maneuverable Tiltable-Rotor Aerial Vehicle ↗ RAM 2018, ETH. The first variable-tilt fully-actuated UAV that enabled omnidirectional flight.
- FLOAT Drone: A Fully-actuated Coaxial Aerial Robot for Close-Proximity Operations ↗ Arxiv 2025, ZJU. A small fully-actuated UAV suitable for close-proximity operations.
Deformable UAV#
- Design, Modeling, and Control of an Aerial Robot DRAGON: A Dual-Rotor-Embedded Multilink Robot With the Ability of Multi-Degree-of-Freedom Aerial Transformation ↗. RAL 2018, University of Tokyo. Best paper award on UAV in ICRA 2018; a multi-joint deformable UAV.
- The Foldable Drone: A Morphing Quadrotor That Can Squeeze and Fly ↗. RAL 2019, Uzh. Installs a servo on each quadrotor arm to enable morphing flight.
- Ring-Rotor: A Novel Retractable Ring-Shaped Quadrotor With Aerial Grasping and Transportation Capability ↗. RAL 2023, ZJU. A deformable ring-shaped quadrotor for grasping and transportation tasks.
- Design and Control of a Passively Morphing Quadcopter ↗. ICRA 2019, UCB. A passively morphing quadrotor UAV.
Multi-Modal UAV#
Focuses on configuration design, motion control, and autonomous navigation for multi-modal UAVs. Multi-modal UAVs can operate across multiple domains such as air, ground, and underwater. This can not only address endurance limitations, but also expand application potential.
- A bipedal walking robot that can fly, slackline, and skateboard ↗. SR 2021, Caltech. A multi-modal terrestrial-aerial legged robot.
- Multi-Modal Mobility Morphobot (M4) with appendage repurposing for locomotion plasticity enhancement ↗. NC 2023, Northeastern University. A multi-modal robot with many locomotion modes.
- Skater: A Novel Bi-Modal Bi-Copter Robot for Adaptive Locomotion in Air and Diverse Terrain ↗. RAL 2024, ZJU. A bi-modal terrestrial-aerial bicopter robot that adapts to diverse terrain.
- Autonomous and Adaptive Navigation for Terrestrial-Aerial Bimodal Vehicles ↗. RAL 2022, ZJU. Autonomous navigation for terrestrial-aerial bimodal vehicles.
Key Technical Solutions#
For any UAV, how fast and how stable it can fly is fundamentally determined by the state estimation (localization) system. This inevitably leads to the most familiar Odometry & Simultaneous Localization and Mapping (Odometry & SLAM) technologies.
Odometry provides real-time localization for robots. It is often implemented with Extended Kalman Filtering (EKF), fusing observations from IMU, cameras, LiDAR, encoders, millimeter-wave radar, optical flow sensors, and many other sensors commonly used for robot pose perception, to estimate the robot pose at a high frequency.
In the Visual-Inertial Odometry (VIO) field, one of the most classic representatives is HKUST’s VINS-Mono / VINS-Fusion project ↗.
For LiDAR-Inertial Odometry (LIO), HKU’s series of work is a benchmark, from the classic LOAM ↗, to the widely popular FAST-LIO ↗, and then FAST-LIVO2 ↗, pushing real-time mapping and localization efficiency to new heights step by step.
In addition, for long-term flight in large environments, a SLAM system with loop closure (Simultaneous Localization and Mapping) is also indispensable front-end and back-end infrastructure.
SLAM (Simultaneous Locolization And Mapping) builds a map while localizing, making loop closure detection possible. With loop closure, when a robot revisits a location it can correct some accumulated errors and improve localization accuracy during long-duration operation. SLAM is mainly implemented in two styles: filter-based and optimization-based. In practice, it is typically divided into a front end and a back end, and SLAM based on different sensors has its own characteristics.
Beyond mapping algorithms, some general robot development tools are also must-have staples for UAV R&D:
- The classic
ROS / ROS2ecosystem, especially timestamp alignment across multiple sensors (e.g.,message_filters’sTimeSynchronizer). - In strong-dynamics scenarios such as aerial manipulators, people also often use solver libraries such as NVIDIA’s cuRobo ↗ (CUDA-accelerated collision checking and planning), IKFast, or mplib in the ManiSkill ecosystem ↗.
Outlook#
After organizing all of this, my core takeaway is that Sim2Real (simulation-to-reality transfer) may be one of the easiest routes for individual developers or small teams to enter and obtain tangible results at this stage. In other words, UAVs also need to move toward the transition from physical intelligence to embodied intelligence. I will continue to dig deeper into this part, and will write another blog post to summarize it.