This paper studies the compression of partial differential operators using neural networks. We consider a family of operators, parameterized by a potentially high-dimensional space of coefficients that may vary on a large range of scales. Based on the existing methods that compress such a multiscale operator to a finite-dimensional sparse surrogate model on a given target scale, we propose to directly approximate the coefficient-to-surrogate map with a neural network. We emulate local assembly structures of the surrogates and thus only require a moderately sized network that can be trained efficiently in an offline phase. This enables large compression ratios and the online computation of a surrogate based on simple forward passes through the network is substantially accelerated compared to classical numerical upscaling approaches. We apply the abstract framework to a family of prototypical second-order elliptic heterogeneous diffusion operators as a demonstrating example.