Hello,nowadays I using drl to train Bolt to make it learning to track the command velocity.The main idea is refer to ‘Controlling the Solo12 Quadruped Robot with Deep
Reinforcement Learning’.The environment is in mujoco and obs is similar to the article,however I use the velocity instead of the network to compute velocity.Reward is design include track x_vel(like the solo12),penalty y_vel,torque using and contact force of two feet.However,I use PPO(50M total numsteps) to train and find the agent often forgets (have learned to walk quickly but forgets walk slowly or have learning to slowly but forgets walk quickly). And the PPO roll out ep reward may drop down quickly and seem that the agent can’t master the balance between low and high vel.I want to know if PPO or drl can help bolt master the skill or not and if there are any tricks in training.