Improving Hamiltonian Monte Carlo samplers with more Hamiltonian mechanics techniques

David Hendriks ¹ , Payel Das ¹ , Yunpeng Li ¹ , Simon Hadfield ¹

Abstract

Hamiltonian Monte Carlo (HMC) samplers are gradient-based sampling methods that have favourable scaling properties and can efficiently sample a wide range of probability distributions, and they are the preferred method for problems with high dimensionality and strong correlation. HMC samplers are different from Markov Chain samplers (MCMC) in that the proposed next state is determined by the integration of a trajectory using Hamiltonian Mechanics, where the negative log of the posterior density acts as the potential for the Hamiltonian. The `vanilla` HMC requires manual input of the step size and integration length tuning parameters, which can be time-consuming to configure correctly. While there are solutions to this, we want to explore a new avenue, based on recent developments in the machine learning field and some well established Hamiltonian mechanics.

We use Hamiltonian mechanics to analytically sample from a normal distribution, calculate optimal stopping locations for the trajectories, and use those to improve our sampling efficiency. These samples are then transformed via transformation maps to the real posterior to do the actual sampling.

In this talk I want to compare our new method to several other known and tested MC samplers and talk about the possibility of using this technique to improve HMC. Moreover, I will give some examples of use cases of this technique in Astrophysics.

Introduction

Hamiltonian Monte-Carlo

Hamiltonian Monte-Carlo (HMC) (1) is a gradient-based Monte-Carlo technique that uses Hamiltonian mechanics to inform the Monte-Carlo sampler about the posterior distribution by integrating the orbit of the walkers as if they were particles in a potential. HMC calculates the acceptance-rejection probability based on the difference in energy.

HMC has favorable dimensional scaling over MCMC, and is widely used in machine learning inference problems.

Sampling through transformations

Recent studies (2) have shown that transforming a complicated posterior distribution to a simpler, more gaussian-like base distribution can benefit the performance of MCMC samplers when the shape of the target distribution is complex. In this approach the samples are generated on the base distribution and transformed to effectively sample the target distribution.

This was later looked at by (3) in the case of the HMC sampler, where they provided a method based on normalising flows to generate the transformation. These normalising flows themselves are symplectic transformations since they are volume preserving, and they are commonly used in data science techniques. Normalising flows transform a given distribution to a standard normal distribution, which means now we can rely on the base distribution being the same in any situation.

Action angle coordinates

Since we know the exact shape of the base distribution, we can make use of a technique in Hamiltonian mechanics called action-angle coordinate transformations.

Action-angle coordinates are coordinates in which the Hamiltonian is fully described by integrals of motion (action). The action-angle transformations contain information about the orbit of a particle, and we can make use of them to integrate the particle on the base distribution by transforming to action-angle coordinates, shifting the angle and transforming back to the position-momentum coordinates.

Using action-angle transformations to integrate on the base distribution effectively also integrates the particle in the target distribution, and as long as the quality of the transformation is sufficient, the energy is conserved in the target distribution as well.

The analytic description for Hamiltonian systems with a standard normal as the potential can be written as a set of independent harmonic oscillators. The action-angle transformation for harmonic oscillators is a well-known transformation, which has a simple analytic form.

We use the transformations between the target distribution and the base distribution, and the transformation to and from action-angle coordinates as a new sampling method called Action-Angle Hamiltonian Monte-Carlo (AAHMC).

References

1: Neal 2011

2: Parno & Marzouk 2018

3: Hoffman & Gellman 2019

Methods

Learning the transformation

To construct the transformation between the target distribution (posterior) and the base distribution (standard normal), we use the NEUTRA Implementation in the PYRO framework.

NEUTRA makes use of normalising flows to learn the transformation, and is optimised by minimising the KL-divergence between the target and the base distribution.

reference:

1: Hoffman et al 2019

Flowchart for learning the map:

a) construct the transformation from the target distribution (right) to the base distribution (left)

Sampling through the map

Flow chart of the action-angle sampling.

a) take the sample on the base distribution (initial position) and transform it to action-angle coordinates

b) Shift the angles

c) Inverse transform the sample back to the base distribution (final position)

d) Inverse transform both positions to the target distribution

e) Calculate the energies for both positions and reject or accept

After training the map we use our action-angle sampling technique to sample the standard normal. We take the initial position and momentum of the current sample, transform these to the action-angle coordinates and increase the angles of each action-angle pair. Since the action-angle transformation describes the orbit of particles, this is the same as integrating the particle through the orbit. After the angle increase we transform back to the position-momentum coordinates and use those as the final position and momentum.

The acceptance-rejection probability is calculated with the total energies in the target space, by transforming both the initial and the final position-momentum pairs.

Comparison measure

We measure the correctness by comparing the samples taken with the action-angle sampler through the map, as well as those taken with the NUTS sampler, to samples directly generated from the target distribution. The measure we use for this is the wasserstein distance, which is a way to calculate the distance between distributions.

References

1: Hoffman & Gellman 2019

Results: Action angle sampling with transformations

We apply our action-angle sampling technique on several controlled multivariate normal distributions, and compare the performance to the direct NUTS sampler and the NUTS sampler that uses the same transformations technique.

Scaled and shifted 5-d distribution

We perform the sampling on a scaled and shifted 5-d multivariate normal distribution, generating 25 samples of samples size 5000 and calculating the mean and confidence intervals.

In the top panel of the figure on the right we see the sample comparison evolution, again comparing the distance between the samples generated by the various samplers (red: NUTS direct, blue: NUTS through map, purple: AAHMC through map) and the samples generated directly from the true distribution. The thick line shows the mean of all the realizations, and the transparent areas indicate the 95% confidence interval.

The bottom panel shows the runtime with growing sample size for all the samplers. The AAHMC sampler is on average an order of magnitude faster than the NUTS sampling through the map and about 5 times faster than the direct NUTS sampler.

We see that the purple region and the yellow region do not overlap as much as in the standard normal case (see other result article), indicating that the AAHMC sampler is less accurate in this case. The correctness of the AAHMC sampler is sensitive to the quality of the transformation, and training this map longer generally gives a better accuracy.

Results: Sampling the standard normal

Testing the implementation

We test our action-angle sampling implementation by sampling a standard normal distribution, comparing the samples to samples taken by the NUTS sampler and samples generated directly from the standard normal distribution. This can be done without any mapping because the action-angle transformation assumes that the base distribution is a standard normal distribution.

The figure on the left shows the (Wasserstein-2) distance between samples directly taken from the standard normal distribution and those those generated by the NUTS sampler and our AAHMC sampler.

The overlap and general similar decrease of distance shows that our AAHMC sampler correctly generates samples from the standard distribution, confirming the correctness of our implementation of the action-angle transformation, and the choice of the angle-shift method.

Next steps

Use real science cases

With our controlled experiments we have knowledge of the actual shape of the posterior distribution, but in real situations this is not the case.

Extend analysis

While we see the AAHMC sampler perform well in recovering the target distributions, we will extend our analysis to include more measures commonly used in MCMC analyses, like convergence rate FINISH

Increase performance

The samplers that we are comparing to have been implemented into the PYRO framework and have received many optimisations. Our current action-angle sampler is not yet fully optimised and we are still improving the runtime performance.

Make public package

Our intention is to create a user-friendly package that performs the calculation of the transformation and the action angle sampling for a given posterior distribution. We are currently working on finalising the code to do this, and hopefully we can provide a public code soon.

Introduction

Methods

Results: Action angle sampling with transformations

Conclusions

Results: Sampling the standard normal

Next steps