The task of travel time estimation (TTE), which estimates the travel time for a given route and departure time, plays an important role in intelligent transportation systems such as navigation, route planning, and ride-hailing services. This task is challenging because of many essential aspects, such as traffic prediction and contextual information. First, the accuracy of traffic prediction is strongly correlated with the traffic speed of the road segments in a route. Existing work mainly adopts spatial-temporal graph neural networks to improve the accuracy of traffic prediction, where spatial and temporal information is used separately. However, one drawback is that the spatial and temporal correlations are not fully exploited to obtain better accuracy. Second, contextual information of a route, i.e., the connections of adjacent road segments in the route, is an essential factor that impacts the driving speed. Previous work mainly uses sequential encoding models to address this issue. However, it is difficult to scale up sequential models to large-scale real-world services. In this paper, we propose an end-to-end neural framework named ConSTGAT, which integrates traffic prediction and contextual information to address these two problems. Specifically, we first propose a spatial-temporal graph neural network that adopts a novel graph attention mechanism, which is designed to fully exploit the joint relations of spatial and temporal information. Then, in order to efficiently take advantage of the contextual information, we design a computationally efficient model that applies convolutions over local windows to capture a route’s contextual information and further employs multi-task learning to improve the performance. In this way, the travel time of each road segment can be computed in parallel and in advance. Extensive experiments conducted on large-scale real-world datasets demonstrate the superiority of ConSTGAT. In addition, ConSTGAT has already been deployed in production at Baidu Maps, and it successfully keeps serving tens of billions of requests every day. This confirms that ConSTGAT is a practical and robust solution for large-scale real-world TTE services