Experimental results show that the proposed Dst Transformer outperforms related machine learning methods in terms of the root mean square error and R-squared. The Dst Transformer combines a multi-head attention layer with Bayesian inference, which is capable of quantifying both aleatoric uncertainty and epistemic uncertainty when making Dst predictions. In this paper, we present a novel deep learning method, called the Dst Transformer, to perform short-term, 1-6 hour ahead, forecasting of the Dst index based on the solar wind parameters provided by the NASA Space Science Data Coordinated Archive. A negative Dst value means that the Earth's magnetic field is weakened, which happens during storms. It has been used to characterize the size and intensity of a geomagnetic storm. The disturbance storm time (Dst) index is an important and useful measurement in space weather research. We conclude that this hybrid methodology provides improvements in the forecast of geomagnetic storms, from 1 hour to 6 hours ahead. To improve it and to obtain probabilistic forecasts, we combine the LSTM model obtained with a Gaussian process, and evaluate the hybrid predictor using the Receiver Operating Characteristic curve and the reliability diagram. However, even if global metrics show excellent performance, it remains poor in predicting intense storms (Dst<-250 nT) 6 hours in advance. This model yields great accuracy in forecasting the Dst index from 1h ahead to 6h ahead, with a correlation coefficient always higher than 0.873 and a root mean square error lower than 9.86. Our model is trained using the hourly OMNI and GPS databases, both of which are publicly available.We first develop a Long Short-Term Memory network to get a single point prediction of Dst. The proposed approach brings together the sequence modelling capabilities of a recurrent neural network with the error bars and confidence bounds provided by a Gaussian process. In this study, we present a method that combines a Long Short-Term Memory (LSTM) recurrent neural network with a Gaussian Process (GP) model to provide up to 6-hour ahead probabilistic forecasts of the Dst geomagnetic index.