Testing Generalization Bounds: How Good Is Theory in Practice?

GitHub Logo
Project Image

Project Overview

Researchers have sought to understand whether there are any provable guarantees regarding how well a model will generalize. That is, how well your training accuracy will transfer to your test accuracy. Researchers have formulated a variety of principled approaches to determine this bound, such as the Vapnik-Chervonenkis-dimension (VC-dimension) and the Rademacher complexity. The math guiding these bounds is quite complex. Interestingly, we can develop a similar quality bound with a much simpler approach using Hoeffding's inequality.

Going over the details quickly: \[ P(\epsilon_{\text{gen}}[h] \geq \epsilon) \leq 2\cdot e^{-2\epsilon^2} \] Applying the union bound, we can bound any member of our hypothesis class by \( |\mathcal{H}| \cdot 2\cdot e^{-2\epsilon^2} \) We can then show that for any element in our hypothesis class \( \epsilon_{\text{gen}} = O(\sqrt{\frac{d}{n}}) \), where \( d \) is the parameter count and \( n \) is the number of 'training examples'. Given this bound, it is only reasonable to ask how well this bound holds in practice. This project seeks to answer that question.

Emperical Setup

To explore how well this generalization bound holds in practice, I trained a linear binary classifier, a linear multi-class classifier, and a neural network with a single hidden layer on MNIST dataset (the binary classifier was trainined only on the 0s and 1s of MNIST). First, I seperated the dataset into train and test data. Then for each classifier I trained on subsets of the training data containing \(n\) data points, tested on the fixed test data, and recorded the training and test loss as well as the generalization gap (train accuracy - test accuracy). To ensure that the results were not due to the specific train-test split, I repeated the experiment 5 times with a random data shuffled each time. The results for each model are shown below.

Single Layer NN

Single Layer NN Generalization Gap
Single Layer NN Train-Test Loss

Multi-class Linear Classifier

Multi-class Linear Classifier Generalization Gap
Multi-class Linear Classifier Train-Test Loss

Binary Class Linear Classifier

Binary Class Linear Classifier Generalization Gap
Binary Class Linear Classifier Train-Test Loss