Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning

Boxuan Fan; Guiming Chen; Hongtao Lin

doi:doi:10.11648/j.mlr.20200501.12

| Peer-Reviewed

Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning

Boxuan Fan, Guiming Chen, Hongtao Lin

Published in Machine Learning Research (Volume 5, Issue 1)

Received: 17 February 2020 Accepted: 3 March 2020 Published: 24 March 2020

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.

Published in	Machine Learning Research (Volume 5, Issue 1)
DOI	10.11648/j.mlr.20200501.12
Page(s)	10-17
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

One-time Motion, Reinforcement Learning, Motion Control

References

[1]	Wen-yan Pang. Optimal Output Regulation of Partially Linear Discrete-Time Systems Using Reinforcement Learning. CPCC 2019. 2019: 252.
[2]	J. Jabłońska, Ł. Szumiec, J. R. Parkitna. Reinforcement learning in a probabilistic learning task without time constraints. Pharmacological Reports. 2019, 71 (6).
[3]	Paulo C. Heredia, Shaoshuai Mou. Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method. IFAC Papers On Line. 2019, 52 (20).
[4]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski. Playing Atari with Deep Reinforcement Learning. Nature. 518 (7540), 529 (2015).
[5]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, d. D. G. Van, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot. Human-level control through deep reinforcement learning. Nature. 529 (7587), 484 (2016).
[6]	Zhen-peng Zhou, K. Steven, Li Li, Z. Richard N, R. Patrick. Optimization of Molecules via Deep Reinforcement Learning. Scientific reports. 2019, 9 (1).
[7]	G. A. Rummery, M. Niranjan. On-line Q-learning using connectionist systems. vol. 37 (University of Cambridge, Department of Engineering Cambridge, England, 1994).
[8]	R. S. Sutton. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. in International Conference on Neural Information Processing Systems (1995). pp. 1038–1044.
[9]	C. J. C. H. Watkins, P. Dayan. Q -learning. Machine Learning. 8 (3-4), 279 (1992).
[10]	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. Computer Science (2013).
[11]	R. S. Sutton, A. G. Barto. Reinforcement learning: An introduction (MIT press, 2018).
[12]	R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning. 8 (3-4), 229 (1992).
[13]	I. H. Witten. An adaptive optimal controller for discrete-time Markov environments. Information & Control. 34 (4), 286 (1977).
[14]	Sutton, Richard. Temporal credit assignment in reinforcement learning. Phd Thesis University of Massachusetts. 34 (5), 601 (1984).
[15]	D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller. Deterministic policy gradient algorithms. in ICML (2014).

Cite This Article

Plain Text BibTeX RIS

APA Style

Boxuan Fan, Guiming Chen, Hongtao Lin. (2020). Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Machine Learning Research, 5(1), 10-17. https://doi.org/10.11648/j.mlr.20200501.12

Copy | Download

ACS Style

Boxuan Fan; Guiming Chen; Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach. Learn. Res. 2020, 5(1), 10-17. doi: 10.11648/j.mlr.20200501.12

Copy | Download

AMA Style

Boxuan Fan, Guiming Chen, Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach Learn Res. 2020;5(1):10-17. doi: 10.11648/j.mlr.20200501.12

Copy | Download

@article{10.11648/j.mlr.20200501.12,
  author = {Boxuan Fan and Guiming Chen and Hongtao Lin},
  title = {Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning},
  journal = {Machine Learning Research},
  volume = {5},
  number = {1},
  pages = {10-17},
  doi = {10.11648/j.mlr.20200501.12},
  url = {https://doi.org/10.11648/j.mlr.20200501.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20200501.12},
  abstract = {Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.},
 year = {2020}
}

Copy | Download

TY  - JOUR
T1  - Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning
AU  - Boxuan Fan
AU  - Guiming Chen
AU  - Hongtao Lin
Y1  - 2020/03/24
PY  - 2020
N1  - https://doi.org/10.11648/j.mlr.20200501.12
DO  - 10.11648/j.mlr.20200501.12
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 10
EP  - 17
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20200501.12
AB  - Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.
VL  - 5
IS  - 1
ER  -

Copy | Download

Author Information

Boxuan Fan

Xi’an Research Inst. of High-tech, Xi’an, China
Guiming Chen

Xi’an Research Inst. of High-tech, Xi’an, China
Hongtao Lin

Xi’an Research Inst. of High-tech, Xi’an, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Boxuan Fan, Guiming Chen, Hongtao Lin. (2020). Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Machine Learning Research, 5(1), 10-17. https://doi.org/10.11648/j.mlr.20200501.12

Copy | Download

ACS Style

Boxuan Fan; Guiming Chen; Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach. Learn. Res. 2020, 5(1), 10-17. doi: 10.11648/j.mlr.20200501.12

Copy | Download

AMA Style

Boxuan Fan, Guiming Chen, Hongtao Lin. Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning. Mach Learn Res. 2020;5(1):10-17. doi: 10.11648/j.mlr.20200501.12

Copy | Download

@article{10.11648/j.mlr.20200501.12,
  author = {Boxuan Fan and Guiming Chen and Hongtao Lin},
  title = {Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning},
  journal = {Machine Learning Research},
  volume = {5},
  number = {1},
  pages = {10-17},
  doi = {10.11648/j.mlr.20200501.12},
  url = {https://doi.org/10.11648/j.mlr.20200501.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20200501.12},
  abstract = {Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.},
 year = {2020}
}

Copy | Download

TY  - JOUR
T1  - Timing and Parameter Optimization for One-time Motion Problem Based on Reinforcement Learning
AU  - Boxuan Fan
AU  - Guiming Chen
AU  - Hongtao Lin
Y1  - 2020/03/24
PY  - 2020
N1  - https://doi.org/10.11648/j.mlr.20200501.12
DO  - 10.11648/j.mlr.20200501.12
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 10
EP  - 17
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20200501.12
AB  - Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.
VL  - 5
IS  - 1
ER  -

Copy | Download