kombine package

Submodules

kombine.clustered_kde module

The kernel density estimators.

class kombine.clustered_kde.ClusteredKDE(data, k=1)[source]

Bases: object

Run a k-means clustering algorithm, estimating the distibution of each identified cluster with an independent kernel density estimate. The full distibution is then estimated by combining the individual KDE’s, weighted by the fraction of samples assigned to each cluster.

Parameters:
  • data – An (N, ndim)-shaped array, containing N samples from the target distribution.
  • k – The number of clusters for k-means clustering.
assignments

Cluster assignments from k-means.

bic(pool=None)[source]

Evaluate Bayes Information Criterion for the KDE’s estimate of the distribution

\[\mathrm{BIC} = \mathrm{ln}\mathcal{L}_\mathrm{max} - \frac{d_m}{2} \mathrm{ln} N\]

where \(d_m\) is the number of dimensions of the KDE model (\(n_\mathrm{clusters} d\) centroid location parameters, \(n_\mathrm{clusters} - 1\) normalized weights, and \(n_\mathrm{clusters} (d+1)*d/2\) kernel covariance parameters, one matrix for each of \(n_\mathrm{clusters}\) clusters), and \(N\) is the number of samples used to build the KDE.

centroids

Cluster centroids from k-means.

data

Samples used to build the KDE.

draw(size=1)[source]

Draw size samples from the KDE.

logpdf(pts, pool=None)[source]

Evaluate the logpdf of the KDE at pts.

nclusters

The number of clusters used for k-means.

ndim

The number of dimensions of the KDE.

size

The number of samples used to build the KDE.

class kombine.clustered_kde.KDE(data)[source]

Bases: object

A Gaussian kernel density estimator that provides a means for evaluating the estimated probability density function, and drawing additional samples from the estimated distribution. Cholesky decomposition of the covariance matrix makes this class a bit more stable than the scipy’s Gaussian KDE.

Parameters:data – An (N, ndim)-shaped array, containing N samples from the target distribution.
data

Samples used to build the KDE.

draw(size=1)[source]

Draw samples from the estimated distribution.

logpdf(pts, pool=None)[source]

Evaluate the logpdf at pts as estimated by the KDE.

ndim

The number of dimensions of the KDE.

size

The number of samples used to build the KDE.

class kombine.clustered_kde.TransdimensionalKDE(data, kde=None, max_samples=None, pool=None)[source]

Bases: object

A generalized Gaussian kernel density estimator that reads masked arrays, constructs a ClusteredKDE using optimized_kde() for each unique parameter space, then weighs the KDEs based on the number of samples in each parameter space.

Parameters:
  • data – An (N, max_dim)-shaped masked array, containing N samples from the the target distribution.
  • kde – (optional) An old trans-dimensional KDE to inherit samples from.
  • max_samples – (optional) The maximum number of samples to use for constructing or updating the kde in each unique parameter space. If a KDE is supplied and adding the samples from data will go over this, old samples are thinned by factors of two until under the limit in each parameter space.
draw(size=1, spaces=None)[source]

Draw samples from the transdimensional distribution.

kdes

List of fixed-dimension ClusteredKDE() s

logpdf(pts, pool=None)[source]

Evaluate the log-transdimensional-pdf at pts as estimated by the KDE.

spaces

Unique sets of dimensions, usable as selectors.

kombine.clustered_kde.oas_cov(pts)[source]

Estimate the covariance matrix using the Oracle Approximating Shrinkage algorithm

\[(1 - s)\Sigma + s \mu \mathcal{I}_d\]

where \(\mu = \mathrm{tr}(\Sigma) / d\). This ensures the covariance matrix estimate is well behaved for small sample sizes.

Parameters:pts – An (N, ndim)-shaped array, containing N samples from the target distribution.

This follows the implementation in scikit-learn.

kombine.clustered_kde.optimized_kde(data, pool=None, kde=None, max_samples=None, **kwargs)[source]

Iteratively run a k-means clustering algorithm, estimating the distibution of each identified cluster with an independent kernel density estimate. Starting with k = 1, the distribution is estimated and the Bayes Information criterion (BIC) is calculated. k is increased until the BIC stops increasing.

Parameters:
  • data – An (N, ndim)-shaped array, containing N samples from the target distribution.
  • pool – (optional) A pool of processes with a map() function to use.
  • kde – (optional) An old KDE to inherit samples from.
  • max_samples – (optional) The maximum number of samples to use for constructing or updating the KDE. If a KDE is supplied and adding the samples from it will go over this, old samples are thinned by factors of two until under the limit.
  • kwargs – (optional) Keyword arguments to pass to ClusteredKDE.
Returns:

ClusteredKDE() that maximizes the BIC.

kombine.clustered_kde.unique_spaces(mask)[source]

Determine the unique sets of dimensions based on a mask. Inverted 1D masks are returned to use as selectors.

kombine.interruptible_pool module

This is a drop-in replacement for multiprocessing’s pool that plays better with keyboard interrupts. This implimentation is a modified version of one originally written by Peter K. G. Williams <peter@newton.cx> for emcee:

which was an adaptation of a method written by John Reese, shared as

class kombine.interruptible_pool.Pool(processes=None, initializer=None, initargs=(), **kwargs)[source]

Bases: multiprocessing.pool.Pool

A modified multiprocessing.pool.Pool that handles KeyboardInterrupts in the map() method more gracefully.

Parameters:
  • processes – (optional) The number of processes to use (defaults to number of CPUs).
  • initializer – (optional) A callable to be called by each process when it starts.
  • initargs – (optional) Arguments for initializer; called as initializer(*initargs).
  • kwargs – (optional) Extra arguments. Python 2.7 supports a maxtasksperchild parameter.
map(func, items, chunksize=None)[source]

A replacement for map() that handles KeyboardInterrupt.

Parameters:
  • func – Function to apply to the items.
  • items – Iterable of items to have func applied to.

kombine.sampler module

A kernel-density-based, embarrassingly parallel ensemble sampler.

class kombine.sampler.Sampler(nwalkers, ndim, lnpostfn, transd=False, processes=None, pool=None, args=[])[source]

Bases: object

An Ensemble sampler.

The chain member of this object has the shape: (nsteps, nwalkers, ndim) where nsteps is the stored number of steps taken thus far.

Parameters:
  • nwalkers – The number of individual MCMC chains to include in the ensemble.
  • ndim – Number of dimensions in the parameter space. If transd is True this is the number of unique dimensions across the parameter spaces.
  • lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
  • transd – If True, the sampler will operate across parameter spaces using a clustered_kde.TransdimensionalKDE proposal distribution. In this mode a masked array with samples in each of the possible sets of dimensions must be given for the initial ensemble distribution.
  • processes – (optional) The number of processes to use with multiprocessing. If None, all available cores are used.
  • pool – (optional) A pre-constructed pool with a map method. If None a pool will be created using multiprocessing.
acceptance

Boolean array of ensemble’s past acceptances, with shape (iterations, nwalkers).

acceptance_fraction

A 1-D array of length stored_iterations of the fraction of walkers that accepted each step.

acceptance_rate

An (nwalkers, )-shaped array of the windowed acceptance rate for each walker. The size of the window is chosen automatically based on the fraction of acceptances in the ensembles last step. See windowed_acceptance_rate() if you want more control.

autocorrelation_times

An nwalkers-long vector of the estimated autocorrelation time of each walker, estimated using the number of step acceptances over the last lookback steps.

This function leverages the convenience that the proposal density doesn’t functionally depend on walkers’ current locations, which means an accepted step must be independent. This allows for an analytic estimate of the autocorrelation time :math: au that depends only on the acceptance rate

\[au = \]

rac{2}{r_mathrm{acc}} - 1

NOTE: This method can only be used if the chain is being stored.

param lookback:Number of previous steps to use for the autocorrelation time estimates. If None, all steps since last proposal update will be used.
blobs

Ensemble’s past metadata.

burnin(p0=None, lnpost0=None, lnprop0=None, blob0=None, test_steps=16, critical_pval=0.05, max_steps=None, verbose=False, callback=None, **kwargs)[source]

Evolve an ensemble until the acceptance rate becomes roughly constant. This is done by splitting acceptances in half and checking for statistical consistency. This isn’t guaranteed to return a fully burned-in ensemble, but usually does.

Parameters:
  • p0 – (optional) A list of the initial walker positions. It should have the shape (nwalkers, ndim). If None and the sampler has been run previously, it’ll pick up where it left off.
  • lnpost0 – (optional) The list of log posterior probabilities for the walkers at positions p0. If lnpost0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • lnprop0 – (optional) List of log proposal densities for walkers at positions p0. If lnprop0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • blob0 – (optional) The list of blob data for walkers at positions p0.
  • test_steps – (optional) The (rough) number of accepted steps over which to check for acceptance rate consistency. If you find burnin repeatedly ending prematurely try increasing this.
  • critical_pval – (optional) The critial p-value for considering the acceptance distribution consistent. If the calculated p-value is over this, then the acceptance rate is considered to be stationary. Lower this if you want burnin criteria to be less strict.
  • max_steps – (optional) An absolute maximum number of steps to take, in case burnin is too painful.
  • verbose – (optional) Print status messages each time a milestone is reached in the burnin.
  • kwargs – (optional) The rest is passed to run_mcmc().

After burnin…

Returns:
  • p - A list of the current walker positions with shape (nwalkers, ndim).
  • lnpost - Array of log posterior probabilities for walkers at positions p; has shape (nwalkers, ndim).
  • lnprop - Array of log proposal densities for walkers at positions p; has shape (nwalkers, ndim).
  • blob - (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
burnin_length

If burnin() was not used, and burnin_length is None, the iteration of the last proposal update will be used.

chain

Ensemble’s past samples, with shape (iterations, nwalkers, ndim).

consistent_acceptance_rate(window_size=None, critical_pval=0.05)[source]

A convenience function for burnin() and trigger_update(). Returns True if the number of acceptances each step are consistent with the acceptance rate of the last step. This is done using a chi-squared test.

Parameters:
  • window_size – Number of iterations to look back for acceptances. If None, the iteration of the last proposal update (from updates) is used.
  • critical_pval – The critial p-value for considering the distribution consistent. If the calculated p-value is over this, then True is returned.
draw(size, spaces=None)[source]

Draw size samples from the current proposal distribution.

Parameters:
  • size – Number of samples to draw.
  • spaces – If not None while transdimensional sampling, draws are confined to the requested spaces. Such a thing is useful for burnin (e.g. spaces = ~p.mask).
Returns:

size draws from the proposal distribution.

failed_p

Sample that caused the last exception.

get_samples(burnin_length=None)[source]

Extract the independent samples collected after burnin. If burnin() was not used, and burnin_length is None, the iteration of the last proposal update will be used.

Parameters:burnin_length – (optional) The step after which to extract samples from.
Returns:An (nsamples, ndim) array of independent samples.
ln_ev(ndraws)[source]

Produces a Monte-Carlo estimate of the evidence integral using the current propasal.

Parameters:ndraws – The number of draws to make from the proposal for the evidence estimate.
Returns:(lnZ, dlnZ). Evidence estimate and associated uncertainty.
lnpost

Ensemble’s past posterior probabilities, with shape (iterations, nwalkers).

lnprop

Ensemble’s past proposal probabilities, with shape (iterations, nwalkers).

rollback(iteration)[source]

Shrink internal arrays down to a length of iteration and reset the pool if there is one. This is helpful for keeping things consistent after a KeyboardInterrupt.

run_mcmc(N, p0=None, lnpost0=None, lnprop0=None, blob0=None, **kwargs)[source]

Iterate sample() for N iterations and return the result.

Parameters:
  • N – The number of steps to take.
  • p0 – (optional) A list of the initial walker positions. It should have the shape (nwalkers, ndim). If None and the sampler has been run previously, it’ll pick up where it left off.
  • lnpost0 – (optional) The list of log posterior probabilities for the walkers at positions p0. If lnpost0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • lnprop0 – (optional) List of log proposal densities for walkers at positions p0. If lnprop0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • blob0 – (optional) The list of blob data for walkers at positions p0.
  • kwargs – (optional) The rest is passed to sample().

After N steps…

Returns:
  • p - An array of current walker positions with shape (nwalkers, ndim).
  • lnpost - The list of log posterior probabilities for the walkers at positions p, with shape (nwalkers, ndim).
  • lnprop - The list of log proposal densities for the walkers at positions p, with shape (nwalkers, ndim).
  • blob - (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
sample(p0=None, lnpost0=None, lnprop0=None, blob0=None, iterations=1, kde=None, update_interval=None, kde_size=None, freeze_transd=False, spaces=None, storechain=True, **kwargs)[source]

Advance the ensemble iterations steps as a generator.

Parameters:
  • p0 – (optional) A list of the initial walker positions. It should have the shape (nwalkers, ndim). If None and a proposal distribution exists, walker positions will be drawn from the proposal.
  • lnpost0 – (optional) The list of log posterior probabilities for the walkers at positions p0. If lnpost0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • lnprop0 – (optional) List of log proposal densities for walkers at positions p0. If lnprop0 is None, the initial values are calculated. It should have the shape (nwalkers, ndim).
  • blob0 – (optional) The list of blob data for walkers at positions p0.
  • iterations – (optional) The number of steps to run.
  • kde – (optional) An already-constucted, evaluatable KDE with a draw method.
  • update_interval – (optional) Number of steps between proposal updates.
  • kde_size – (optional) Maximum sample size for KDE construction. When the KDE is updated, existing samples are thinned by factors of two until there’s enough room for nwalkers new samples. The default is nwalkers, and must be greater than \(\geq``nwalkers\) if specified.
  • freeze_transd – (optional) If True when transdimensional sampling, walkers are confined to their parameter space. This is helpful during burnin, and allows fox fixed-D burnin before transdimensional sampling.
  • spaces – (optional) Confine walkers to the requested parameter spaces. Expects an inverted mask with shape (nwalkers, ndim).
  • storechain – (optional) Flag for disabling chain and probability density storage in chain, lnpost, and lnprop.
  • kwargs – (optional) The rest is passed to update_proposal().

After each iteration…

Yields:
  • p - An array of current walker positions with shape (nwalkers, ndim).
  • lnpost - The list of log posterior probabilities for the walkers at positions p, with shape (nwalkers, ndim).
  • lnprop - The list of log proposal densities for the walkers at positions p, with shape (nwalkers, ndim).
  • blob - (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
trigger_update(interval=None)[source]

Decide whether to trigger a proposal update.

Parameters:interval – Interval between proposal updates. If None, no updates will be done. If "auto", acceptances are split in half and checked for consistency (see consistent_acceptance_rate()). If an int, the proposal will be updated every interval iterations.
Returns:bool indicating whether a proposal update is due.
update_proposal(p, max_samples=None, **kwargs)[source]

Update the proposal density with points p.

Parameters:
  • p – Samples to update the proposal with.
  • max_samples – (optional) The maximum number of samples to use for constructing or updating the kde. If a KDE is supplied and adding the samples from it will go over this, old samples are thinned by factors of two until under the limit.
  • kwargs – (optional) The rest is passed to the KDE constructor.
updates

Step numbers where the proposal density was updated.

windowed_acceptance_rate(window=None)[source]

An (nwalkers, -1)-shaped array of the windowed acceptance rate for each walker.

Parameters:window – Number of iterations to calculate acceptance rate over. If None, the fraction of accepances across the ensemble’s last step are used for scale.

kombine.serialpool module

Utility class for a uniform Pool API for serial processing

class kombine.serialpool.SerialPool[source]

Bases: object

close()[source]
map(function, tasks, callback=None)[source]