arXiv:2605.05209

Representation Demystifies Flat Minima

The same function can appear sharp or flat.

Infographic titled Representation Demystifies Flat Minima. The figure shows that reparameterization can change measured sharpness and Hessian geometry while preserving the underlying function, predictions, and observed generalization.
Flatness correlates with generalization across many settings, but equivalent representations can exhibit different sharpness while preserving the same computation.

Core claim

Flat minima are not necessarily an intrinsic property of a learned computation. Reparameterizations can dramatically change measured sharpness while preserving the function itself, suggesting that representation influences geometry more than geometry explains generalization.

\( H = \nabla^2 L(\theta) \)

Representation

The same neural network function can be expressed through multiple parameterizations. Predictions remain unchanged even when the geometry of parameter space changes substantially.

Observation

Across many datasets and architectures, flatter minima often correlate with stronger generalization. This empirical observation is one of the most widely cited explanations for neural network performance.

Realization

If sharpness can be altered without changing the underlying function, then flatness may be a property of representation rather than a causal explanation for why a model generalizes.

Repository extension

The companion repo extends the report into a sequence of small computational notebooks. The notebooks start with the classical flat-minima hypothesis, introduce Hessian geometry, demonstrate reparameterization effects, and then separate correlation from causal explanation.

The central result is deliberately simple: equivalent computations can preserve the same function, predictions, and test behavior while changing measured Hessian sharpness.

Notebook roadmap

Notebook Question Links
00 What is the paper about? 📓
07 What is a flat minimum? 📓
13 How is flatness measured? 📓
17 Does sharpness survive reparameterization? 📓
23 Can the same function have different sharpness? 📓
29 Does correlation imply causation? 📓
37 What survives representation changes? 📓

Hessian geometry

The notebooks show how flatness is usually measured through local curvature, especially Hessian eigenvalues such as \( \lambda_{\max}(H) \). This makes sharpness a geometric property of parameter space.

Same function

A simple reparameterization \( w = su \) preserves the effective function at \( u^* = 1/s \), while changing the measured Hessian as \( H_u \propto s^2 \). The computation is fixed, but the geometry moves.

Surviving properties

Function values, predictions, and test error survive equivalent representations. Parameter coordinates, parameter norms, and Hessian sharpness may not. That distinction is the central engineering lesson.

Engineering statement

Scientific explanations should remain meaningful under equivalent representations. If a proposed explanatory quantity changes while the underlying computation remains the same, the explanation requires additional justification. Curvature measures geometry. Geometry depends on representation. Computation survives representation. Scientific explanations should identify what survives.