| Peer-Reviewed

Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning

Received: 17 February 2020     Accepted: 3 March 2020     Published: 24 March 2020
Views:       Downloads:
Abstract

Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.

Published in Machine Learning Research (Volume 5, Issue 1)
DOI 10.11648/j.mlr.20200501.12
Page(s) 10-17
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

One-time Motion, Reinforcement Learning, Motion Control

References
[1] Wen-yan Pang. Optimal Output Regulation of Partially Linear Discrete-Time Systems Using Reinforcement Learning. CPCC 2019. 2019: 252.
[2] J. Jabłońska, Ł. Szumiec, J. R. Parkitna. Reinforcement learning in a probabilistic learning task without time constraints. Pharmacological Reports. 2019, 71 (6).
[3] Paulo C. Heredia, Shaoshuai Mou. Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method. IFAC Papers On Line. 2019, 52 (20).
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski. Playing Atari with Deep Reinforcement Learning. Nature. 518 (7540), 529 (2015).
[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, d. D. G. Van, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot. Human-level control through deep reinforcement learning. Nature. 529 (7587), 484 (2016).
[6] Zhen-peng Zhou, K. Steven, Li Li, Z. Richard N, R. Patrick. Optimization of Molecules via Deep Reinforcement Learning. Scientific reports. 2019, 9 (1).
[7] G. A. Rummery, M. Niranjan. On-line Q-learning using connectionist systems. vol. 37 (University of Cambridge, Department of Engineering Cambridge, England, 1994).
[8] R. S. Sutton. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. in International Conference on Neural Information Processing Systems (1995). pp. 1038–1044.
[9] C. J. C. H. Watkins, P. Dayan. Q -learning. Machine Learning. 8 (3-4), 279 (1992).
[10] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. Computer Science (2013).
[11] R. S. Sutton, A. G. Barto. Reinforcement learning: An introduction (MIT press, 2018).
[12] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning. 8 (3-4), 229 (1992).
[13] I. H. Witten. An adaptive optimal controller for discrete-time Markov environments. Information & Control. 34 (4), 286 (1977).
[14] Sutton, Richard. Temporal credit assignment in reinforcement learning. Phd Thesis University of Massachusetts. 34 (5), 601 (1984).
[15] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller. Deterministic policy gradient algorithms. in ICML (2014).
Cite This Article
  • APA Style

    Boxuan Fan, Guiming Chen, Hongtao Lin. (2020). Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Machine Learning Research, 5(1), 10-17. https://doi.org/10.11648/j.mlr.20200501.12

    Copy | Download

    ACS Style

    Boxuan Fan; Guiming Chen; Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach. Learn. Res. 2020, 5(1), 10-17. doi: 10.11648/j.mlr.20200501.12

    Copy | Download

    AMA Style

    Boxuan Fan, Guiming Chen, Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach Learn Res. 2020;5(1):10-17. doi: 10.11648/j.mlr.20200501.12

    Copy | Download

  • @article{10.11648/j.mlr.20200501.12,
      author = {Boxuan Fan and Guiming Chen and Hongtao Lin},
      title = {Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning},
      journal = {Machine Learning Research},
      volume = {5},
      number = {1},
      pages = {10-17},
      doi = {10.11648/j.mlr.20200501.12},
      url = {https://doi.org/10.11648/j.mlr.20200501.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20200501.12},
      abstract = {Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.},
     year = {2020}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning
    AU  - Boxuan Fan
    AU  - Guiming Chen
    AU  - Hongtao Lin
    Y1  - 2020/03/24
    PY  - 2020
    N1  - https://doi.org/10.11648/j.mlr.20200501.12
    DO  - 10.11648/j.mlr.20200501.12
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 10
    EP  - 17
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20200501.12
    AB  - Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.
    VL  - 5
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Xi’an Research Inst. of High-tech, Xi’an, China

  • Xi’an Research Inst. of High-tech, Xi’an, China

  • Xi’an Research Inst. of High-tech, Xi’an, China

  • Sections