Abstract:A mathematical model is formulated for the multi-objective software project scheduling problem, aiming to optimize both the project duration and employee satisfaction. The model takes into account practical factors such as the skill level classification of employees and the importance of tasks, and matches important tasks with employees of high skill levels. A hyper-heuristic algorithm based on Q-learning is proposed to solve the model. A global search of the task-employee matrix is performed based on the matrix crossover operator and Jaya operator with random jitter; The local search strategies are designed to reduce the project duration and increase the employee satisfaction by using the problem information; The global search operators, the neighborhood parameter values, and the local search strategies are combined to form eight low-level heuristics; Providing a high-level strategy based on Q-learning that adaptively selects appropriate low-level heuristics for different evolutionary states of the population, based on the historical performance of the low-level heuristics. The experimental results show that the proposed algorithm outperforms the representative algorithms in terms of HVR and IGD on most of the cases.