+高级检索
基于元强化学习的无人机自主避障与目标追踪
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Meta-Reinforcement Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对传统深度强化学习在求解无人机自主避障与目标追踪任务时所存在的训练 效率低、环境适应性差的问题,在深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)算法中融入与模型无关的元学习(Model-Agnostic Meta-Learning,MAML),设计一种内 外部元参数更新规则,提出了元深度确定性策略梯度(Meta-Deep Deterministic Policy Gradi? ent,Meta-DDPG)算法,以提升模型的收敛速度和泛化能力 . 此外,在模型预训练部分构造基 本元任务集以提升实际工程中的预训练效率 . 最后,在多种测试环境下对所提算法进行了仿 真验证,结果表明基本元任务集的引入可使模型预训练效果更优,Meta-DDPG 算法相比 DDPG 算法在收敛特性和环境适应性方面更有优势,并且元学习方法和基本元任务集对确定 性策略强化学习具有通用性.

    Abstract:

    There are some problems with traditional deep reinforcement learning in solving autonomous obstacle avoidance and target tracking tasks for unmanned aerial vehicles(UAV),such as low training efficiency and weak adaptability to variable environments. To overcome these problems,this paper designs an internal and external metaparameter update rule by incorporating Model-Agnostic Meta-Learning(MAML)into Deep Deterministic Policy Gradient(DDPG)algorithm and proposes a Meta-Deep Deterministic Policy Gradient(Meta-DDPG)algorithm inovder to improve the convergence speed and generalization ability of the model. Furthermore,the basic meta-task sets are constructed in the model’s pre-training stage to improve the efficiency of pre-training in practical engineer? ing. Finally,the proposed algorithm is simulated and verified in Various testing environments. The results show that the introduction of the basic meta-task sets can make the model’s pre-training more efficient,Meta-DDPG algo? rithm has better convergence characteristics and environmental adaptability when compared with the DDPG algo? rithm. Furthermore,the meta-learning and the basic meta-task sets are universal to deterministic policy reinforce? ment learning.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

江未来 ,吴俊 ,王耀南 .基于元强化学习的无人机自主避障与目标追踪[J].湖南大学学报:自然科学版,2022,49(6):101~109

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-06-23
  • 出版日期:
版权声明:稿件一经被本刊录用,即视为作者将版权授予本刊,本刊有权根据工作需要,允许合作的数据库、新媒体平台及其他媒体进行二次转载、推介、下载和传播,如有异议,请在来稿中声明。
关闭