Crowd flow prediction is of great importance in a wide range of applications from urban planning, traffic control to public safety. It aims to predict the inflow (the traffic of crowds entering a region in a given time interval) and outflow (the traffic of crowds leaving a region for other places) of each region in the city with knowing the historical flow data. In this paper, we propose DeepSTN+, a deep learning-based convolutional model, to predict crowd flows in the metropolis. First, DeepSTN+ employs the ConvPlus structure to model the long-range spatial dependence among crowd flows in different regions. Further, PoI distributions and time factor are combined to express the effect of location attributes to introduce prior knowledge of the crowd movements. Finally, we propose an effective fusion mechanism to stabilize the training process, which further improves the performance. Extensive experimental results based on two real-life datasets demonstrate the superiority of our model, i.e., DeepSTN+ reduces the error of the crowd flow prediction by approximately 8%∼13% compared with the state-of-the-art baselines.