基于DQN的对地观测卫星调度算法

Earth Observation Satellite Scheduling Based on DQN

  • 摘要: 面向国土资源普查的卫星任务规划问题因卫星侧摆角连续可调、时间窗规模庞大而呈高维非线性解空间,叠加强耦合资源约束,极具挑战性. 本文构建基于“观测机会”的离散决策模型,将原问题解耦为观测时序排列与成像条带优选这两个子问题. 针对现有算法在处理此类序列优化与参数选择耦合问题时存在的规则短视和搜索低效局限,提出一种嵌入深度强化学习的变邻域搜索(VNS)调度算法. 该方法构建了分层调度模型,利用VNS的多重邻域结构在宏观层面优化观测序列以跳出局部最优,同时在微观层面引入结合多维状态特征空间的 DQN(deep Q-network),实现对条带选择价值的自适应评估以替代人工设计规则. 仿真试验表明,所提方法兼具优异的收敛速度与求解质量,在99%的测试样本中方案得分与理论总分差距小于15%.

     

    Abstract: The satellite mission planning problem for land resource census is highly challenging, characterized by a high-dimensional nonlinear solution space due to continuously adjustable satellite slewing angles and massive time windows, superimposed with strongly coupled resource constraints. This paper constructs a discrete decision-making model based on "observation opportunities," decoupling the original problem into two subproblems: observation sequencing and optimal imaging strip selection. To address the limitations of existing algorithms—specifically rule myopia and search inefficiency—when handling such coupled problems involving sequence optimization and parameter selection problems, this paper proposes a scheduling algorithm that integrates variable neighborhood search (VNS) with deep reinforcement learning. This method establishes a hierarchical scheduling model: at the macro level, it utilizes the multiple neighborhood structures of VNS to optimize the observation sequence and escape local optima; at the micro level, it introduces a deep Q-network (DQN) combined with a multi-dimensional state feature space to achieve adaptive evaluation of strip selection values, thereby replacing manually designed rules. Simulation experiments demonstrate that the proposed method exhibits both excellent convergence speed and solution quality, with the gap between the solution score and the theoretical total score being less than 15% for 99% of the test samples.

     

/

返回文章
返回