We propose a novel supervised dimension reduction method, called supervised t-distributed stochastic neighbor embedding (St-SNE), which achieves dimension reduction by preserving the similarities of data points in both feature and outcome spaces. The proposed method can be used for both prediction and visualization tasks, with the ability to handle high-dimensional data. We show through a variety of datasets that when compared with a comprehensive list of existing methods, St-SNE has superior prediction performance in the ultra-high dimensional setting where the number of features p exceeds the sample size n, and has competitive performance in the p ≤ n setting. We also show that St-SNE is a competitive visualization tool that is capable of capturing within cluster variations. In addition, we propose a penalized Kullback-Leibler divergence criterion to automatically select the reduced dimension size k for St-SNE.