kombine package¶
Submodules¶
kombine.clustered_kde module¶
The kernel density estimators.
-
class
kombine.clustered_kde.
ClusteredKDE
(data, k=1)[source]¶ Bases:
object
Run a k-means clustering algorithm, estimating the distibution of each identified cluster with an independent kernel density estimate. The full distibution is then estimated by combining the individual KDE’s, weighted by the fraction of samples assigned to each cluster.
Parameters: - data – An (N, ndim)-shaped array, containing N samples from the target distribution.
- k – The number of clusters for k-means clustering.
-
assignments
¶ Cluster assignments from k-means.
-
bic
(pool=None)[source]¶ Evaluate Bayes Information Criterion for the KDE’s estimate of the distribution
\[\mathrm{BIC} = \mathrm{ln}\mathcal{L}_\mathrm{max} - \frac{d_m}{2} \mathrm{ln} N\]where \(d_m\) is the number of dimensions of the KDE model (\(n_\mathrm{clusters} d\) centroid location parameters, \(n_\mathrm{clusters} - 1\) normalized weights, and \(n_\mathrm{clusters} (d+1)*d/2\) kernel covariance parameters, one matrix for each of \(n_\mathrm{clusters}\) clusters), and \(N\) is the number of samples used to build the KDE.
-
centroids
¶ Cluster centroids from k-means.
-
data
¶ Samples used to build the KDE.
-
nclusters
¶ The number of clusters used for k-means.
-
ndim
¶ The number of dimensions of the KDE.
-
size
¶ The number of samples used to build the KDE.
-
class
kombine.clustered_kde.
KDE
(data)[source]¶ Bases:
object
A Gaussian kernel density estimator that provides a means for evaluating the estimated probability density function, and drawing additional samples from the estimated distribution. Cholesky decomposition of the covariance matrix makes this class a bit more stable than the
scipy
’s Gaussian KDE.Parameters: data – An (N, ndim)-shaped array, containing N samples from the target distribution. -
data
¶ Samples used to build the KDE.
-
ndim
¶ The number of dimensions of the KDE.
-
size
¶ The number of samples used to build the KDE.
-
-
class
kombine.clustered_kde.
TransdimensionalKDE
(data, kde=None, max_samples=None, pool=None)[source]¶ Bases:
object
A generalized Gaussian kernel density estimator that reads masked arrays, constructs a
ClusteredKDE
usingoptimized_kde()
for each unique parameter space, then weighs the KDEs based on the number of samples in each parameter space.Parameters: - data – An (N, max_dim)-shaped masked array, containing N samples from the the target distribution.
- kde – (optional) An old trans-dimensional KDE to inherit samples from.
- max_samples – (optional) The maximum number of samples to use for constructing or updating the kde in each unique parameter space. If a KDE is supplied and adding the samples from data will go over this, old samples are thinned by factors of two until under the limit in each parameter space.
-
kdes
¶ List of fixed-dimension
ClusteredKDE()
s
-
logpdf
(pts, pool=None)[source]¶ Evaluate the log-transdimensional-pdf at pts as estimated by the KDE.
-
spaces
¶ Unique sets of dimensions, usable as selectors.
-
kombine.clustered_kde.
oas_cov
(pts)[source]¶ Estimate the covariance matrix using the Oracle Approximating Shrinkage algorithm
\[(1 - s)\Sigma + s \mu \mathcal{I}_d\]where \(\mu = \mathrm{tr}(\Sigma) / d\). This ensures the covariance matrix estimate is well behaved for small sample sizes.
Parameters: pts – An (N, ndim)-shaped array, containing N samples from the target distribution. This follows the implementation in scikit-learn.
-
kombine.clustered_kde.
optimized_kde
(data, pool=None, kde=None, max_samples=None, **kwargs)[source]¶ Iteratively run a k-means clustering algorithm, estimating the distibution of each identified cluster with an independent kernel density estimate. Starting with
k = 1
, the distribution is estimated and the Bayes Information criterion (BIC) is calculated. k is increased until the BIC stops increasing.Parameters: - data – An (N, ndim)-shaped array, containing N samples from the target distribution.
- pool – (optional)
A pool of processes with a
map()
function to use. - kde – (optional) An old KDE to inherit samples from.
- max_samples – (optional) The maximum number of samples to use for constructing or updating the KDE. If a KDE is supplied and adding the samples from it will go over this, old samples are thinned by factors of two until under the limit.
- kwargs – (optional)
Keyword arguments to pass to
ClusteredKDE
.
Returns: ClusteredKDE()
that maximizes the BIC.
kombine.interruptible_pool module¶
This is a drop-in replacement for multiprocessing’s pool that plays better with keyboard interrupts. This implimentation is a modified version of one originally written by Peter K. G. Williams <peter@newton.cx> for emcee:
which was an adaptation of a method written by John Reese, shared as
-
class
kombine.interruptible_pool.
Pool
(processes=None, initializer=None, initargs=(), **kwargs)[source]¶ Bases:
multiprocessing.pool.Pool
A modified
multiprocessing.pool.Pool
that handlesKeyboardInterrupts
in themap()
method more gracefully.Parameters: - processes – (optional) The number of processes to use (defaults to number of CPUs).
- initializer – (optional) A callable to be called by each process when it starts.
- initargs – (optional)
Arguments for initializer; called as
initializer(*initargs)
. - kwargs – (optional) Extra arguments. Python 2.7 supports a maxtasksperchild parameter.
kombine.sampler module¶
A kernel-density-based, embarrassingly parallel ensemble sampler.
-
class
kombine.sampler.
Sampler
(nwalkers, ndim, lnpostfn, transd=False, processes=None, pool=None, args=[])[source]¶ Bases:
object
An Ensemble sampler.
The
chain
member of this object has the shape: (nsteps, nwalkers, ndim) where nsteps is the stored number of steps taken thus far.Parameters: - nwalkers – The number of individual MCMC chains to include in the ensemble.
- ndim – Number of dimensions in the parameter space. If transd is
True
this is the number of unique dimensions across the parameter spaces. - lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
- transd – If
True
, the sampler will operate across parameter spaces using aclustered_kde.TransdimensionalKDE
proposal distribution. In this mode a masked array with samples in each of the possible sets of dimensions must be given for the initial ensemble distribution. - processes – (optional)
The number of processes to use with
multiprocessing
. IfNone
, all available cores are used. - pool – (optional)
A pre-constructed pool with a map method. If
None
a pool will be created usingmultiprocessing
.
-
acceptance
¶ Boolean array of ensemble’s past acceptances, with shape (iterations, nwalkers).
-
acceptance_fraction
¶ A 1-D array of length
stored_iterations
of the fraction of walkers that accepted each step.
-
acceptance_rate
¶ An (nwalkers, )-shaped array of the windowed acceptance rate for each walker. The size of the window is chosen automatically based on the fraction of acceptances in the ensembles last step. See
windowed_acceptance_rate()
if you want more control.
-
autocorrelation_times
¶ An nwalkers-long vector of the estimated autocorrelation time of each walker, estimated using the number of step acceptances over the last lookback steps.
This function leverages the convenience that the proposal density doesn’t functionally depend on walkers’ current locations, which means an accepted step must be independent. This allows for an analytic estimate of the autocorrelation time :math: au that depends only on the acceptance rate
\[au = \]rac{2}{r_mathrm{acc}} - 1
NOTE: This method can only be used if the chain is being stored.
param lookback: Number of previous steps to use for the autocorrelation time estimates. If None
, all steps since last proposal update will be used.
-
blobs
¶ Ensemble’s past metadata.
-
burnin
(p0=None, lnpost0=None, lnprop0=None, blob0=None, test_steps=16, critical_pval=0.05, max_steps=None, verbose=False, callback=None, **kwargs)[source]¶ Evolve an ensemble until the acceptance rate becomes roughly constant. This is done by splitting acceptances in half and checking for statistical consistency. This isn’t guaranteed to return a fully burned-in ensemble, but usually does.
Parameters: - p0 – (optional)
A list of the initial walker positions. It should have the shape (nwalkers, ndim).
If
None
and the sampler has been run previously, it’ll pick up where it left off. - lnpost0 – (optional)
The list of log posterior probabilities for the walkers at positions p0. If
lnpost0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - lnprop0 – (optional)
List of log proposal densities for walkers at positions p0. If
lnprop0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - blob0 – (optional) The list of blob data for walkers at positions p0.
- test_steps – (optional) The (rough) number of accepted steps over which to check for acceptance rate consistency. If you find burnin repeatedly ending prematurely try increasing this.
- critical_pval – (optional) The critial p-value for considering the acceptance distribution consistent. If the calculated p-value is over this, then the acceptance rate is considered to be stationary. Lower this if you want burnin criteria to be less strict.
- max_steps – (optional) An absolute maximum number of steps to take, in case burnin is too painful.
- verbose – (optional) Print status messages each time a milestone is reached in the burnin.
- kwargs – (optional)
The rest is passed to
run_mcmc()
.
After burnin…
Returns: p
- A list of the current walker positions with shape (nwalkers, ndim).lnpost
- Array of log posterior probabilities for walkers at positions p; has shape (nwalkers, ndim).lnprop
- Array of log proposal densities for walkers at positions p; has shape (nwalkers, ndim).blob
- (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
- p0 – (optional)
A list of the initial walker positions. It should have the shape (nwalkers, ndim).
If
-
burnin_length
¶ If
burnin()
was not used, andburnin_length
isNone
, the iteration of the last proposal update will be used.
-
chain
¶ Ensemble’s past samples, with shape (iterations, nwalkers, ndim).
-
consistent_acceptance_rate
(window_size=None, critical_pval=0.05)[source]¶ A convenience function for
burnin()
andtrigger_update()
. ReturnsTrue
if the number of acceptances each step are consistent with the acceptance rate of the last step. This is done using a chi-squared test.Parameters: - window_size – Number of iterations to look back for acceptances. If
None
, the iteration of the last proposal update (fromupdates
) is used. - critical_pval – The critial p-value for considering the distribution consistent. If the calculated
p-value is over this, then
True
is returned.
- window_size – Number of iterations to look back for acceptances. If
-
draw
(size, spaces=None)[source]¶ Draw size samples from the current proposal distribution.
Parameters: - size – Number of samples to draw.
- spaces – If not
None
while transdimensional sampling, draws are confined to the requested spaces. Such a thing is useful for burnin (e.g.spaces = ~p.mask
).
Returns: size draws from the proposal distribution.
-
failed_p
¶ Sample that caused the last exception.
-
get_samples
(burnin_length=None)[source]¶ Extract the independent samples collected after burnin. If
burnin()
was not used, andburnin_length
isNone
, the iteration of the last proposal update will be used.Parameters: burnin_length – (optional) The step after which to extract samples from. Returns: An (nsamples, ndim) array of independent samples.
-
ln_ev
(ndraws)[source]¶ Produces a Monte-Carlo estimate of the evidence integral using the current propasal.
Parameters: ndraws – The number of draws to make from the proposal for the evidence estimate. Returns: (lnZ, dlnZ)
. Evidence estimate and associated uncertainty.
-
lnpost
¶ Ensemble’s past posterior probabilities, with shape (iterations, nwalkers).
-
lnprop
¶ Ensemble’s past proposal probabilities, with shape (iterations, nwalkers).
-
rollback
(iteration)[source]¶ Shrink internal arrays down to a length of iteration and reset the
pool
if there is one. This is helpful for keeping things consistent after aKeyboardInterrupt
.
-
run_mcmc
(N, p0=None, lnpost0=None, lnprop0=None, blob0=None, **kwargs)[source]¶ Iterate
sample()
for N iterations and return the result.Parameters: - N – The number of steps to take.
- p0 – (optional)
A list of the initial walker positions. It should have the shape (nwalkers, ndim).
If
None
and the sampler has been run previously, it’ll pick up where it left off. - lnpost0 – (optional)
The list of log posterior probabilities for the walkers at positions p0. If
lnpost0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - lnprop0 – (optional)
List of log proposal densities for walkers at positions p0. If
lnprop0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - blob0 – (optional) The list of blob data for walkers at positions p0.
- kwargs – (optional)
The rest is passed to
sample()
.
After N steps…
Returns: p
- An array of current walker positions with shape (nwalkers, ndim).lnpost
- The list of log posterior probabilities for the walkers at positionsp
, with shape (nwalkers, ndim).lnprop
- The list of log proposal densities for the walkers at positions p, with shape (nwalkers, ndim).blob
- (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
-
sample
(p0=None, lnpost0=None, lnprop0=None, blob0=None, iterations=1, kde=None, update_interval=None, kde_size=None, freeze_transd=False, spaces=None, storechain=True, **kwargs)[source]¶ Advance the ensemble iterations steps as a generator.
Parameters: - p0 – (optional)
A list of the initial walker positions. It should have the shape (nwalkers, ndim).
If
None
and a proposal distribution exists, walker positions will be drawn from the proposal. - lnpost0 – (optional)
The list of log posterior probabilities for the walkers at positions p0. If
lnpost0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - lnprop0 – (optional)
List of log proposal densities for walkers at positions p0. If
lnprop0 is None
, the initial values are calculated. It should have the shape (nwalkers, ndim). - blob0 – (optional) The list of blob data for walkers at positions p0.
- iterations – (optional) The number of steps to run.
- kde – (optional)
An already-constucted, evaluatable KDE with a
draw
method. - update_interval – (optional) Number of steps between proposal updates.
- kde_size – (optional) Maximum sample size for KDE construction. When the KDE is updated, existing samples are thinned by factors of two until there’s enough room for nwalkers new samples. The default is nwalkers, and must be greater than \(\geq``nwalkers\) if specified.
- freeze_transd – (optional)
If
True
when transdimensional sampling, walkers are confined to their parameter space. This is helpful during burnin, and allows fox fixed-D burnin before transdimensional sampling. - spaces – (optional) Confine walkers to the requested parameter spaces. Expects an inverted mask with shape (nwalkers, ndim).
- storechain – (optional)
Flag for disabling chain and probability density storage in
chain
,lnpost
, andlnprop
. - kwargs – (optional)
The rest is passed to
update_proposal()
.
After each iteration…
Yields: p
- An array of current walker positions with shape (nwalkers, ndim).lnpost
- The list of log posterior probabilities for the walkers at positionsp
, with shape (nwalkers, ndim).lnprop
- The list of log proposal densities for the walkers at positions p, with shape (nwalkers, ndim).blob
- (if lnprobfn returns blobs) The list of blob data for the walkers at positions p.
- p0 – (optional)
A list of the initial walker positions. It should have the shape (nwalkers, ndim).
If
-
trigger_update
(interval=None)[source]¶ Decide whether to trigger a proposal update.
Parameters: interval – Interval between proposal updates. If None
, no updates will be done. If"auto"
, acceptances are split in half and checked for consistency (seeconsistent_acceptance_rate()
). If anint
, the proposal will be updated every interval iterations.Returns: bool
indicating whether a proposal update is due.
-
update_proposal
(p, max_samples=None, **kwargs)[source]¶ Update the proposal density with points p.
Parameters: - p – Samples to update the proposal with.
- max_samples – (optional) The maximum number of samples to use for constructing or updating the kde. If a KDE is supplied and adding the samples from it will go over this, old samples are thinned by factors of two until under the limit.
- kwargs – (optional) The rest is passed to the KDE constructor.
-
updates
¶ Step numbers where the proposal density was updated.