sklearn.mixture.DPGMM¶
-
class
sklearn.mixture.DPGMM(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=None, tol=0.001, verbose=0, min_covar=None, n_iter=10, params='wmc', init_params='wmc')[源代码]¶ Variational Inference for the Infinite Gaussian Mixture Model.
DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.
Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).
Initialization is with normally-distributed means and identity covariance, for proper convergence.
Read more in the User Guide.
Parameters: n_components: int, default 1 :
Number of mixture components.
covariance_type: string, default ‘diag’ :
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
alpha: float, default 1 :
Real number representing the concentration parameter of the dirichlet process. Intuitively, the Dirichlet Process is as likely to start a new cluster for a point as it is to add that point to a cluster with alpha elements. A higher alpha means more clusters, as the expected number of clusters is
alpha*log(N).tol : float, default 1e-3
Convergence threshold.
n_iter : int, default 10
Maximum number of iterations to perform before convergence.
params : string, default ‘wmc’
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars.
init_params : string, default ‘wmc’
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
verbose : int, default 0
Controls output verbosity.
Attributes: covariance_type : string
String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_components : int
Number of mixture components.
weights_ : array, shape (n_components,)
Mixing weights for each mixture component.
means_ : array, shape (n_components, n_features)
Mean parameters for each mixture component.
precs_ : array
Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:
(`n_components`, 'n_features') if 'spherical', (`n_features`, `n_features`) if 'tied', (`n_components`, `n_features`) if 'diag', (`n_components`, `n_features`, `n_features`) if 'full'
converged_ : bool
True when convergence was reached in fit(), False otherwise.
参见
Methods
aic(X)Akaike information criterion for the current model fit bic(X)Bayesian information criterion for the current model fit fit(X[, y])Estimate model parameters with the EM algorithm. fit_predict(X[, y])Fit and then predict labels for data. get_params([deep])Get parameters for this estimator. lower_bound(X, z)returns a lower bound on model evidence based on X and membership predict(X)Predict label for data. predict_proba(X)Predict posterior probability of data under each Gaussian in the model. sample([n_samples, random_state])Generate random samples from the model. score(X[, y])Compute the log probability under the model. score_samples(X)Return the likelihood of the data under the model. set_params(**params)Set the parameters of this estimator. -
__init__(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=None, tol=0.001, verbose=0, min_covar=None, n_iter=10, params='wmc', init_params='wmc')[源代码]¶
-
aic(X)[源代码]¶ Akaike information criterion for the current model fit and the proposed data
Parameters: X : array of shape(n_samples, n_dimensions) Returns: aic: float (the lower the better) :
-
bic(X)[源代码]¶ Bayesian information criterion for the current model fit and the proposed data
Parameters: X : array of shape(n_samples, n_dimensions) Returns: bic: float (the lower the better) :
-
fit(X, y=None)[源代码]¶ Estimate model parameters with the EM algorithm.
A initialization step is performed before entering the expectation-maximization (EM) algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’ when creating the GMM object. Likewise, if you would like just to do an initialization, set n_iter=0.
Parameters: X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns: self :
-
fit_predict(X, y=None)[源代码]¶ Fit and then predict labels for data.
Warning: due to the final maximization step in the EM algorithm, with low iterations the prediction may not be 100% accurate
0.17 新版功能: fit_predict method in Gaussian Mixture Model.
Parameters: X : array-like, shape = [n_samples, n_features] Returns: C : array, shape = (n_samples,) component memberships
-
get_params(deep=True)[源代码]¶ Get parameters for this estimator.
Parameters: deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
predict(X)[源代码]¶ Predict label for data.
Parameters: X : array-like, shape = [n_samples, n_features] Returns: C : array, shape = (n_samples,) component memberships
-
predict_proba(X)[源代码]¶ Predict posterior probability of data under each Gaussian in the model.
Parameters: X : array-like, shape = [n_samples, n_features]
Returns: responsibilities : array-like, shape = (n_samples, n_components)
Returns the probability of the sample for each Gaussian (state) in the model.
-
sample(n_samples=1, random_state=None)[源代码]¶ Generate random samples from the model.
Parameters: n_samples : int, optional
Number of samples to generate. Defaults to 1.
Returns: X : array_like, shape (n_samples, n_features)
List of samples
-
score(X, y=None)[源代码]¶ Compute the log probability under the model.
Parameters: X : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns: logprob : array_like, shape (n_samples,)
Log probabilities of each data point in X
-
score_samples(X)[源代码]¶ Return the likelihood of the data under the model.
Compute the bound on log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X.
This is done by computing the parameters for the mean-field of z for each observation.
Parameters: X : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns: logprob : array_like, shape (n_samples,)
Log probabilities of each data point in X
responsibilities: array_like, shape (n_samples, n_components) :
Posterior probabilities of each mixture component for each observation
-

