-
End-to-End RL Methods for UAVsAn overview from a broader research perspective
5 min read -
UAV ReviewA quick-reference notebook for the UAV domain.
10 min read -
RL Notes (12): DDPGDDPG
6 min read -
RL Notes (11): Imitation LearningImitation learning
4 min read -
RL Notes (10): Sparse RewardsSparse rewards
3 min read -
RL Notes (9): Actor-CriticActor-Critic
3 min read -
RL Notes (8): Q Methods for Continuous ActionsWhy DQN struggles with continuous actions, common workarounds, and why many switch to Actor-Critic.
3 min read -
RL Notes (7): Advanced DQNAdvanced DQN
3 min read -
RL Notes (6): DQNDQN
3 min read -
RL Notes (5): PPOPPO
4 min read -
RL Notes (4): Policy GradientPolicy Gradient
4 min read -
RL Notes (3): From MC and TD(0) to Sarsa / Q-learningMC vs TD(0) basics, then Sarsa vs Q-learning; on-policy vs off-policy control intuition.
4 min read -
RL Notes (2): MDP, MRP, and the Bellman EquationMarkov property, MRP/MDP, Bellman equation, prediction vs control, and DP (policy/value iteration).
5 min read -
RL Notes (1): What Does RL Learn?Intro to RL: what it learns, I/O, exploration vs exploitation, and state vs observation.
7 min read -
RL Practice (1): Framework + Q-Learning & SarsaA minimal, runnable practice of Q-Learning and Sarsa, epsilon-greedy, an episode-based training loop, and saving/loading.
8 min read