Abstract:For stochastic linear discrete time systems, a Q-learning algorithm is proposed in this paper to solve the stochastic linear quadratic optimal tracking control problem in the infinite time domain.First, it is assumed that the reference signal required for tracking is generated by the command generator, and an augmented system consisting of the original stochastic system and the reference trajectory system is established, then the optimal tracking problem is transformed into an optimal regulation problem.Second, in order to solve the optimal tracking problem online, the stochastic system is transformed into a deterministic one, the Q function of stochastic linear quadratic optimal tracking control is defined according to the augmented system, and the augmented stochastic algebraic equation is solved online without knowing the parameters of the system model.Third, the equivalence between the Q-learning algorithm and the augmented stochastic algebraic equation is proved, and the implementation steps of the Q-learning algorithm are given.Finally, a simulation example is given to illustrate the effectiveness of the proposed Q-learning algorithm.