Abstract:The purpose of sequential recommendation is to learn and model dynamically from the user-project interaction, and predict the change of user interest, so as to improve the accuracy of recommendation and user experience. However, most user project sequences are not always sequential, but have more flexible order and even noise. In order to solve this problem, this paper stores the user's historical interaction into the memory network, uses a strategy network which divides the user's current behavior pattern into short-term preference, long-term preference and global preference. Then we use the attention mechanism to generate the corresponding user-memory vector, and use deep reinforcement learning algorithm to identify projects with greater future benefits. In the interaction between users and projects, reinforcement learning network strategy is constantly updated to improve the accuracy of recommendation. The experiments of two common data sets show that the model proposed in this paper is superior to the advanced baseline.