arXiv:2605.05209
Representation Demystifies Flat Minima
The same function can appear sharp or flat.
Core claim
Flat minima are not necessarily an intrinsic property of a learned computation. Reparameterizations can dramatically change measured sharpness while preserving the function itself, suggesting that representation influences geometry more than geometry explains generalization.
\( H = \nabla^2 L(\theta) \)
Representation
The same neural network function can be expressed through multiple parameterizations. Predictions remain unchanged even when the geometry of parameter space changes substantially.
Observation
Across many datasets and architectures, flatter minima often correlate with stronger generalization. This empirical observation is one of the most widely cited explanations for neural network performance.
Realization
If sharpness can be altered without changing the underlying function, then flatness may be a property of representation rather than a causal explanation for why a model generalizes.
Repository extension
The companion repo extends the report into a sequence of small computational notebooks. The notebooks start with the classical flat-minima hypothesis, introduce Hessian geometry, demonstrate reparameterization effects, and then separate correlation from causal explanation.
The central result is deliberately simple: equivalent computations can preserve the same function, predictions, and test behavior while changing measured Hessian sharpness.
Notebook roadmap
| Notebook | Question | Links |
|---|---|---|
| 00 | What is the paper about? | 📓 |
| 07 | What is a flat minimum? | 📓 |
| 13 | How is flatness measured? | 📓 |
| 17 | Does sharpness survive reparameterization? | 📓 |
| 23 | Can the same function have different sharpness? | 📓 |
| 29 | Does correlation imply causation? | 📓 |
| 37 | What survives representation changes? | 📓 |
Hessian geometry
The notebooks show how flatness is usually measured through local curvature, especially Hessian eigenvalues such as \( \lambda_{\max}(H) \). This makes sharpness a geometric property of parameter space.
Same function
A simple reparameterization \( w = su \) preserves the effective function at \( u^* = 1/s \), while changing the measured Hessian as \( H_u \propto s^2 \). The computation is fixed, but the geometry moves.
Surviving properties
Function values, predictions, and test error survive equivalent representations. Parameter coordinates, parameter norms, and Hessian sharpness may not. That distinction is the central engineering lesson.
Engineering statement
Scientific explanations should remain meaningful under equivalent representations. If a proposed explanatory quantity changes while the underlying computation remains the same, the explanation requires additional justification. Curvature measures geometry. Geometry depends on representation. Computation survives representation. Scientific explanations should identify what survives.