Fork me on GitHub

上一页
sklearn.featu... sklearn.feature_selection.VarianceThreshold

向上
API Reference API Reference

这个文档适用于 scikit-learn 版本 0.17 — 其它版本

如果你要使用软件，请考虑引用scikit-learn和Jiancheng Li.

sklearn.feature_selection.chi2
- Examples using sklearn.feature_selection.chi2

`sklearn.feature_selection`.chi2¶

sklearn.feature_selection.chi2(X, y)[源代码]¶

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

Read more in the User Guide.

Parameters:

X : {array-like, sparse matrix}, shape = (n_samples, n_features_in)

Sample vectors.

y : array-like, shape = (n_samples,)

Target vector (class labels).

Returns:

chi2 : array, shape = (n_features,)

chi2 statistics of each feature.

pval : array, shape = (n_features,)

p-values of each feature.

参见

f_classif: ANOVA F-value between labe/feature for classification tasks.
f_regression: F-value between label/feature for regression tasks.

Notes

Complexity of this algorithm is O(n_classes * n_features).

Examples using `sklearn.feature_selection.chi2`¶

../../_images/document_classification_20newsgroups1.png

Classification of text documents using sparse features