Generative modelling of physical fields
Generative modelling is a term broadly ascribed to the generation of synthetic observables that approximate authentic observables. A diverse range of generative models exist with varying motivations, although many are motivated by the manifold hypothesis (Bengio et al. 2013).
Simulation
Classically, physicists have attempted to encode the dynamics of a physical system, evolving some initial conditions over time to late universe observables. Provide the physics is faithfully codified, the resulting observables will faithfully represent authentic observables. The primary drawback of simulation is computational complexity: as we become interested in increasingly finer scale effects, the computational requirement to run such simulations become entirely infeasible.
Latent Emulation
Recently, researchers have instead attempted to map directly from cosmological parameters to late universe observables, circumventing the need for large-scale simulations. To model such complex mappings one may normally adopt machine learning techniques, however in many cases existing data on which to train such models does not exist. Therefore, conventional machine learning methods are impractical to train.
Instead, we adopt bespoke rather than learned statistical representations from which field level realisations may readily be synthesised (Price et al. 2023). One such class of representations are that of wavelet scattering transforms (Mallat et al. 2012, Allys et al. 2019, Mousset et al. 2023) which capture significant complex non-Gaussian structural information of signals.
Latent emulation is a multistep process:
- Compute the scattering representation of a true (or faithfully simulated) field. These statistics will be our target.
- Generate a random noise field to ensure we begin in the maximum entropy state.
- Leverage automatic differentation to update this field such that its statistics match our desired target.
In this way, one may rapidly generate many different realisations of physical fields from a single input, or perhaps a limited ensemble of inputs. In essence, this could be considered extreme data augmentation.
An overview of the process by which a small ensemble of simulated observables can be extremely augmented with emulated observables. In this case we consider cosmic string induced CMB anisotropies. In step 1 (compression) we simply draw uniformly from our simulated ensemble from which a target statistical representation (latent vector) z is calculated. In step 2 (synthesis) we exploit automatic differentation to iteratively recover emulated realisations.