-
RL Practice (2): DQN and ImprovementsFrom Q-table to deep reinforcement learning
8 min read -
RL Practice (3): Policy Gradient + Actor-CriticPolicy distribution (Softmax / Gaussian) design, return accumulation, and parallel sampling.
8 min read -
RL Practice (4): Continuous Control (DDPG/TD3/SAC)Actor/Critic inputs and outputs, replay buffer, exploration noise, and the key differences in each update.
7 min read