A survey of open technical problems and partial solutions across theories of data, learning, and learned representations.
A mature basic science should enable various ambitious mechanistic interpretability (AMI) objectives — principally: constructing unsupervised tools for reverse engineering present-day neural networks, designing interpretable-by-design architectures, and steering the learning process toward safer representational regimes.
The goal is to develop theoretical frameworks that make increasingly accurate predictions about models trained on natural data, including their internal representations and inference algorithms. This requires building up the basic science of how data structure interplays with learning algorithms and learned representations.
A recurring tension throughout the literature is between deterministic and stochastic models of data and learned representations. A central question is how to tease apart epistemic uncertainty (reducible nuisance from theoretical approximations) and aleatoric uncertainty (irreducible structured noise). Critically, we suspect the line between them is not fixed — structural noise at one scale may become nuisance at a coarser scale.
3.1
Structure of Data
Can we construct idealized models rich enough to capture intrinsic properties of data structure (hierarchical, compositional, sparse, sequential) while remaining tractable enough to make quantitative predictions? What observables — governed by scaling laws — do these models share with natural data, and how do they constrain which representations are learned?