唐长成, 叶佐昌. 基于强化学习的参数化电路优化算法[J]. 微电子学与计算机, 2019, 36(1): 46-50.
引用本文: 唐长成, 叶佐昌. 基于强化学习的参数化电路优化算法[J]. 微电子学与计算机, 2019, 36(1): 46-50.
TANG Chang-cheng, YE Zuo-chang. Parametric Circuit Optimization with Reinforcement Learning[J]. Microelectronics & Computer, 2019, 36(1): 46-50.
Citation: TANG Chang-cheng, YE Zuo-chang. Parametric Circuit Optimization with Reinforcement Learning[J]. Microelectronics & Computer, 2019, 36(1): 46-50.

基于强化学习的参数化电路优化算法

Parametric Circuit Optimization with Reinforcement Learning

  • 摘要: 本文主要致力于解决参数化形式的优化问题, 即minθf(θ, w), 其中θ是需要优化的变量, w则是对应不同优化问题的参数, 在现实中经常会遇到需要解决一系列不同参数下的优化问题.在对某种特定结构的问题下, 通过对不同的参数训练一个模型来解决所有参数下的优化问题.和传统的方法不一样, 并不是通过对不同的参数多次独立抽样来训练我们的模型, 而是利用强化学习的方法加速训练过程.强化学习算法中分别用策略网络来得到优化结果和利用价值网络来评价策略好坏, 通过迭代地训练两个网络来优化策略.在后面一些数学例子和电路优化的例子中显示强化学习的方法取得了比较好的效果.

     

    Abstract: In this paper we are focusing on solving parametric optimization problems, i.e. minθf(θ, w), where θis the variable to be optimized and w is a vector that parameterize the optimization problem. Such kind of problems are very commonly seen in reality. We propose an efficient method to train a model that connects the solution to the parameters and thus solve all the problems with the same structure and different parameters at the same time. During training process, instead of solving a series of optimization problems with randomly sampled w independently, we adopt reinforcement learning to accelerate the training process. Two networks are trained alternately. The first network is a value network, and it is trained to fit the target loss function. The second network is a policy network, whose output is connected to the input θof the value network and it is trained to minimize the output of the value network. Experiments demonstrate the effectiveness of the proposed method.

     

/

返回文章
返回