Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates


In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia et al. (2018) and recently extended by Bu et al. (2019). Our main contributions are significantly improved mutual information bounds for SGLD, based on data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that serve to estimate the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo et al. (2015) and Xu et al. (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.

Advances in Neural Information Processing Systems (NeurIPS)