(1. College of Electrical and Information Engineering,Hunan Unviersity,Changsha 410082,China; 2. National Engineering Research Center of Robot Visual Perception & Control Technology,Hunan University,Changsha 410082,China) 在知网中查找 在百度中查找 在本站中查找
There are some problems with traditional deep reinforcement learning in solving autonomous obstacle avoidance and target tracking tasks for unmanned aerial vehicles(UAV),such as low training efficiency and weak adaptability to variable environments. To overcome these problems,this paper designs an internal and external metaparameter update rule by incorporating Model-Agnostic Meta-Learning(MAML)into Deep Deterministic Policy Gradient(DDPG)algorithm and proposes a Meta-Deep Deterministic Policy Gradient(Meta-DDPG)algorithm inovder to improve the convergence speed and generalization ability of the model. Furthermore,the basic meta-task sets are constructed in the model’s pre-training stage to improve the efficiency of pre-training in practical engineer? ing. Finally,the proposed algorithm is simulated and verified in Various testing environments. The results show that the introduction of the basic meta-task sets can make the model’s pre-training more efficient,Meta-DDPG algo? rithm has better convergence characteristics and environmental adaptability when compared with the DDPG algo? rithm. Furthermore,the meta-learning and the basic meta-task sets are universal to deterministic policy reinforce? ment learning.