API Reference¶
RithML contains four main modules, each corresponding to a category of machine learning algorithms. These categories and modules are listed below:
Classification (
rithml.classification)Regression (
rithml.regression)Dimensionality reduction (
rithml.dimred)Clustering (
rithml.clustering)
Additionally, all classes in these modules inherit from the rithml.base.BaseModel class, which implements the rithml.base.BaseModel.get_params() and rithml.base.BaseModel.set_params() methods.
Classification¶
The rithml.classification module implements various machine
learning algorithm for classification:
AdaBoost (
rithml.classification.AdaBoostClassifer)Decision tree (
rithml.classification.DecisionTreeClassifer)Linear/quadratic discriminant analysis (
rithml.classification.DiscriminantAnalysis)Gaussian naive Bayes (
rithml.classification.GaussianNBClassifier)Gradient boosting classification trees (
rithml.classification.GradientBoostingClassifier)K-nearest neighbors (
rithml.classification.KNNClassifier)Logistic regression (
rithml.classification.LogisticRegression)Random forest (
rithml.classification.RandomForestClassifier)Support vector machine (
rithml.classification.SupportVectorClassifier)
- class rithml.classification.AdaBoostClassifier(n_estimators=20, *, max_depth=1, impurity='entropy', algorithm='SAMME.R', verbose=None)¶
Class for performing classification via AdaBoost (with classification trees as estimators).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
n_classes: Number of classes in the training data.
Source: https://hastie.su.domains/Papers/samme.pdf
- Parameters
n_estimators (int, default 20) – Number of estimators used by the model.
max_depth (int, default 1) – Maximum depth of individual estimators (classification trees).
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
algorithm ({'SAMME.R', 'SAMME'}, default 'SAMME.R') – Algorithm used by the model during fitting. The SAMME algorithm uses classifications to fit the model, whereas the SAMME.R algorithm uses weighted class probability estimates. The latter typically converges more quickly.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.
- classes_¶
Array of classes used by the model.
- Type
numpy.ndarray of (n_classes,)
- estimators_¶
List of all estimators used by the model.
- Type
- estimator_weights_¶
Array of weights of estimators used by the model.
- Type
numpy.ndarray of shape (n_estimators,)
Methods
fit(X, y)Fits an AdaBoost classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits an AdaBoost classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted AdaBoost classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.DecisionTreeClassifier(*, max_depth=None, impurity='entropy', class_weight=None)¶
Class for performing classification via decision tree.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
max_depth (int, default None) – Maximum depth of tree.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
class_weight (dict, default None) – Dictionary of certain classes (keys) and associated weights (values). For each specified class, the associated weight is applied to all corresponding training samples during fitting.
- classes_¶
Array of all classes assumed by the model, where n_classes is the number of classes.
- Type
numpy.ndarray of (n_classes,)
- root_¶
Root node of underlying decision tree.
- Type
_DTCNode
Methods
fit(X, y[, weights])Fits a decision tree classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X[, return_probabilities])Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y, weights=None)¶
Fits a decision tree classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
weights (numpy.ndarray of shape (n_samples,), default None) – Weights for training samples. (This is different from the class_weight attribute, which applies weights by class instead of sample.) If None, then samples are weighted uniformly. These weights are combined with class weights to influence calculations of node impurity and probability estimates (if applicable).
- Returns
self – Fitted decision tree classifier.
- Return type
- predict(X, return_probabilities=False)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
return_probabilities (bool, default False) – If True, also return the probability estimates for the predictions.
- Returns
y_pred (numpy.ndarray of shape (n_test_samples,)) – Predicted labels.
probabilities (numpy.ndarray of shape (n_test_samples,), optional) – Probability estimates for predictions. That is, an array of dictionaries, where each dictionary contains classes (keys) and probability estimates (values) for a particular sample.
- class rithml.classification.DiscriminantAnalysis(kind='linear')¶
Class for performing classification via linear/quadratic discriminant analysis.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
n_classes: Number of classes in the training data.
- Parameters
kind ({'linear', 'quadratic'}, default 'linear') – Type of discriminant analysis performed by the model. If ‘linear’, the model assumes equal covariance matrices for each class. If ‘quadratic’, the model assumes distinct covariance matrices for each class.
- classes_¶
Array of all classes assumed by the model.
- Type
numpy.ndarray of shape (n_classes,)
- probabilities_¶
Array of probabilities (priors) of each class.
- Type
numpy.ndarray of shape (n_classes,)
- means_¶
Array of means of each class.
- Type
numpy.ndarray of shape (n_classes, n_features)
- covariances_¶
Covariance matrices used by the model, depending on kind. If kind == ‘linear’, then all covariance matrices are the same (by assumption).
- Type
numpy.ndarray of shape (n_classes, n_features, n_features)
Methods
fit(X, y)Fits a discriminant analysis model to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a discriminant analysis model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted discriminant analysis model.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.GaussianNBClassifier(*, equal_variances=False)¶
Class for performing classification via Gaussian naive Bayes.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
n_classes: Number of classes in the training data.
- Parameters
equal_variances (bool, default False) – If True, assume that each feature has the same variance for all classes.
- classes_¶
Array of all classes assumed by the model.
- Type
numpy.ndarray of (n_classes,)
- probabilities_¶
Array of probabilities (priors) of each class.
- Type
numpy.ndarray of (n_classes,)
- means_¶
Array of means of each class.
- Type
numpy.ndarray of shape (n_classes, n_features)
- variances_¶
Array of feature variances of each class, depending on equal_variances. If equal_variances is True, then feature variances are the same across all classes (by assumption).
- Type
numpy.ndarray of shape (n_classes, n_features)
Methods
fit(X, y)Fits a Gaussian naive Bayes classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a Gaussian naive Bayes classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted Gaussian naive Bayes classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.GradientBoostingClassifier(n_estimators=100, *, learning_rate=0.1, max_depth=3, impurity='entropy', verbose=None)¶
Class for performing classification via gradient boosting (with classification trees as estimators).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
Adapted from: https://sefiks.com/2018/10/29/a-step-by-step-gradient-boosting-example-for-classification/
- Parameters
n_estimators (int, default 100) – Number of estimators used by the model.
learning_rate (float, default 0.1) – Rate at which each additional estimator contributes to the model.
max_depth (int, default 3) – Maximum depth of individual estimators.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator is being fitted (for each class).
- classes_¶
Array of all classes assumed by the model, where n_classes is the number of classes.
- Type
numpy.ndarray of (n_classes,)
- estimators_¶
Dictionary of all classes (keys) and estimator lists (values).
- Type
dict
Methods
fit(X, y)Fits a gradient boosting classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a gradient boosting classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted gradient boosting classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.KNNClassifier(n_neighbors=5, *, weights='uniform')¶
Class for performing classification via k-nearest neighbors (k-NN).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_neighbors (int, default 5) – Number of nearest neighbors to consider.
weights ({'uniform', 'distance'}, default 'uniform') – Weights assigned to nearest neighbors, either uniform (‘uniform’) or based on inverse distance (‘distance’).
- classes_¶
Array of all classes assumed by the model, where n_classes is the number of classes.
- Type
numpy.ndarray of (n_classes,)
Methods
fit(X, y)Fits a k-NN classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a k-NN classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted k-NN classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.LogisticRegression(*, alpha=0, tol=0.01, verbose=False)¶
Class for performing logistic regression.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
n_classes: Number of classes in the training data.
- Parameters
alpha (float, default 0) – Regularization coefficient (strength). The regularization type is L2.
tol (float, default 0.01) – Tolerance argument for logistic loss minimization. (A smaller value will increase runtime but may improve model performance.)
verbose (bool, default False) – If True, output details about progress and time elapsed during fitting.
- classes_¶
Array of all classes assumed by the model, where n_classes is the number of classes.
- Type
numpy.ndarray of (n_classes,)
- weight_¶
Feature weights used in the decision function. If the labels are not binary (i.e. n_classes > 2), then this is a 2-D array with weights for each class. Otherwise, its shape is (n_features,).
- Type
numpy.ndarray of shape (n_classes, n_features) or (n_features,)
- bias_¶
Bias(es) used in the decision function. If the labels are not binary (i.e. n_classes > 2), then this is an array with a bias for each class. Otherwise, it is a single bias (float).
- Type
numpy.ndarray of (n_classes,) or float
Methods
fit(X, y)Fits a logistic regression model to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a logistic regression model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted logistic regression model.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.RandomForestClassifier(n_estimators=100, *, max_depth=None, impurity='entropy', random_state=None, max_samples=None, max_features='sqrt', bootstrap=True, verbose=None)¶
Class for performing classification via random forest.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_estimators (int, default 100) – Number of estimators used by the model.
max_depth (int, default None) – Maximum depth of individual estimators.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
bootstrap (bool, default True) – If True, use bootstrapping, i.e. re-sample new datasets for each estimator. If False, use the original dataset to fit each estimator (ignoring max_samples).
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e.
(1) drawing samples with replacement to create n_estimators new datasets, based on max_samples (when bootstrap == True)
(2) selecting a subset of features for each such dataset, based on max_features
If None, then a new Generator object is created (i.e. with a fresh seed).
If int, then a new Generator object is created with the specified int as the seed.
If RandomState or Generator, then that object is directly used.
max_samples (callable or int, default None) –
The number of samples to draw from the original dataset X to create each new dataset during bootstrapping (when bootstrap == True), one for each estimator.
If None, then n_samples samples are drawn.
If callable, then max_samples(n_samples) samples are drawn. (For example, this can be used to draw a specified proportion of n_samples samples.)
If int, then max_samples samples are drawn.
max_features ({'sqrt', 'log2'}, callable, or int, default None) –
The number of features used by each estimator.
If None, then n_features features are used.
If ‘sqrt’, then sqrt(n_features) features are used.
If ‘log2’, then log2(n_features) features are used.
If callable, then max_features(n_features) features are used. (For example, this can be used to use a specified proportion of n_features features.)
If int, then max_features are used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.
- classes_¶
Array of all classes assumed by the model, where n_classes is the number of classes.
- Type
numpy.ndarray of (n_classes,)
- estimators_¶
Dictionary of all DecisionTreeClassifier estimators (keys) and arrays of features used (values).
- Type
dict of
rithml.classification.DecisionTreeClassifierto numpy.ndarray
Methods
fit(X, y)Fits a random forest classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a random forest classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted random forest classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.classification.SupportVectorClassifier(*, kind='ovr', C=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, error_coef=1e-06, verbose=False)¶
Class for performing classification via support vector machine (SVM).
Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially C or gamma.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
n_classes: Number of classes in the training data.
- Parameters
kind ({'ovr', 'ovo'}, default 'ovr') –
Specifies how to create the model’s underlying binary classifier(s).
If ‘ovr’, then the model uses one-vs-rest classification. That is, for each class, the model transforms the labels to binary data based on that class and fits a binary classifier to the new data, resulting in a total of n_classes underlying binary classifiers.
If ‘ovo’, then the model uses one-vs-one classification. That is, for each pair of classes, the model fits a binary classifier to the subset of input data corresponding to those two classes, resulting in a total of n_classes * (n_classes - 1) / 2 underlying binary classifiers. Note that this may still be faster than ‘ovr’ due to smaller training sets.
If the labels are binary (i.e. n_classes == 2), then this parameter is ignored, and the model fits a single binary classifier.
C (float) – Regularization constant. Must be positive; lower value means more regularization. Used by all underlying binary classifiers.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by all underlying binary classifiers. If a function is provided, then it must take in two arrays of feature vectors and compute an array of floats.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
error_coef (float, default 1e-6) – Coefficient for margin for error (C * error_coef) in determining support vectors when assessing coefficient values. A smaller value represents a stricter threshold and may result in less support vectors.
verbose (bool, default False) – If True, output details about progress and time elapsed during fitting.
- classes_¶
Array of all classes assumed by the model.
- Type
numpy.ndarray of (n_classes,)
Methods
fit(X, y)Fits a support vector classifier to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a support vector classifier to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted support vector classifier.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
Regression¶
The rithml.regression module implements various machine
learning algorithms for regression:
Decision tree (
rithml.regression.DecisionTreeRegressor)Gradient boosting regression trees (
rithml.regression.GradientBoostingRegressor)Kernel (ridge) regression (
rithml.regression.KernelRegression)K-nearest neighbors (
rithml.regression.KNNRegressor)Linear (ridge) regression (
rithml.regression.LinearRegression)Random forest (
rithml.regression.RandomForestRegressor)Support vector machine (
rithml.regression.SupportVectorRegressor)
- class rithml.regression.DecisionTreeRegressor(max_depth=None, error='squared')¶
Class for performing regression via decision tree.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
max_depth (int, default None) – Maximum depth of tree.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.
- root_¶
Root node of underlying decision tree.
- Type
_DTRNode
Methods
fit(X, y[, weights])Fits a decision tree regressor to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y, weights=None)¶
Fits a decision tree regressor to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
weights (numpy.ndarray of shape (n_samples,), default None) – Weights for training samples. If None, then samples are weighted uniformly.
- Returns
self – Fitted decision tree regressor.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.GradientBoostingRegressor(n_estimators=100, *, learning_rate=0.1, max_depth=3, error='squared', verbose=None)¶
Class for performing regression via gradient boosting (with regression trees as estimators).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_estimators (int, default 100) – Number of estimators used by the model.
learning_rate (float, default 0.1) – Rate at which each additional estimator contributes to the model.
max_depth (int, default 3) – Maximum depth of individual estimators.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.
- estimators_¶
List of estimators used by the model.
- Type
Methods
fit(X, y)Fits a gradient boosting regressor to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a gradient boosting regressor to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted gradient boosting regressor.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.KNNRegressor(n_neighbors=5, *, weights='uniform')¶
Class for performing regression via k-nearest neighbors (k-NN).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_neighbors (int, default 5) – Number of nearest neighbors to consider.
weights ({'uniform', 'distance'}, default 'uniform') – Weights assigned to nearest neighbors, either uniform (‘uniform’) or based on inverse distance (‘distance’).
Methods
fit(X, y)Fits a k-NN regressor to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a k-NN regressor to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted k-NN regressor.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.KernelRegression(*, alpha=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, nonzero_bias=True)¶
Class for performing kernel regression.
This class supports both ordinary regression (no regularization) and ridge regression (L2 regularization).
Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially alpha or gamma.
Note that attempting to perform ordinary regression (with alpha=0) may result in a singular matrix error during fitting. For this reason, it may be better to use a very low alpha value instead.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
Adapted from: https://statinfer.wordpress.com/2013/08/05/undocumented-machine-learning-ii-kernel-regression/
- Parameters
alpha (float, default 1.0) – Regularization coefficient (strength). The regularization type is L2 (ridge regression).
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by the model. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
nonzero_bias (bool, default True) – If False, the model assumes a bias of 0.
- weight_¶
Weight term associated with model.
- Type
numpy.ndarray of shape (n_samples,)
- bias_¶
Bias term associated with model.
- Type
float
- mean_¶
Mean of the training data.
- Type
numpy.ndarray of shape (n_features,)
Methods
fit(X, y[, K])Fits a kernel regression model to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X[, K_pred])Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y, K=None)¶
Fits a kernel regression model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
K (numpy.ndarray of shape (n_samples, n_samples), default None) – Kernel matrix, i.e. kernel result for every pair of training samples. If nonzero_bias is True, then these training samples should be mean-centered before applying the kernel function to them. If None, the model computes the kernel matrix itself.
- Returns
self – Fitted kernel regression model.
- Return type
- predict(X, K_pred=None)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
K_pred (numpy.ndarray of shape (n_test_samples, n_samples), default None) – Kernel matrix, i.e. kernel result for every test sample with every training sample. If nonzero_bias is True, then all samples should be mean-centered (based on the training mean, i.e. the mean_ attribute) before applying the kernel function to them. If None, the model computes the kernel matrix itself.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.LinearRegression(alpha=0)¶
Class for performing linear (least squares) regression.
This class supports both ordinary regression (no regularization) and ridge regression (L2 regularization).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
alpha (float, default 0) – Regularization coefficient (strength). The regularization type is L2 (ridge regression).
- weight_¶
Weight term associated with model.
- Type
numpy.ndarray of shape (n_features,)
- bias_¶
Bias term associated with model.
- Type
float
Methods
fit(X, y)Fits a linear regression model to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a linear regression model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted linear regression model.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.RandomForestRegressor(n_estimators=100, *, max_depth=None, error='squared', random_state=None, max_samples=None, max_features=None, bootstrap=True, verbose=None)¶
Class for performing regression via random forest.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_estimators (int, default 100) – Number of estimators used by the model.
max_depth (int, default None) – Maximum depth of individual estimators.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.
bootstrap (bool, default True) – If True, use bootstrapping, i.e. re-sample new datasets for each estimator. If False, use the original dataset to fit each estimator (ignoring max_samples).
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e.
(1) drawing samples with replacement to create n_estimators new datasets, based on max_samples (when bootstrap == True)
(2) selecting a subset of features for each such dataset, based on max_features
If None, then a new Generator object is created (i.e. with a fresh seed).
If int, then a new Generator object is created with the specified int as the seed.
If RandomState or Generator, then that object is directly used.
max_samples (callable or int, default None) –
The number of samples to draw from the original dataset X to create each new dataset during bootstrapping (when bootstrap == True), one for each estimator.
If None, then n_samples samples are drawn.
If callable, then max_samples(n_samples) samples are drawn. (For example, this can be used to draw a specified proportion of n_samples samples.)
If int, then max_samples samples are drawn.
max_features ({'sqrt', 'log2'}, callable, or int, default None) –
The number of features used by each estimator.
If None, then n_features features are used.
If ‘sqrt’, then sqrt(n_features) features are used.
If ‘log2’, then log2(n_features) features are used.
If callable, then max_features(n_features) features are used. (For example, this can be used to use a specified proportion of n_features features.)
If int, then max_features are used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.
- estimators_¶
Dictionary of all DecisionTreeRegressor estimators (keys) and arrays of features used (values).
- Type
dict of
rithml.regression.DecisionTreeRegressorto numpy.ndarray
Methods
fit(X, y)Fits a random forest regressor to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a random forest regressor to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted random forest regressor.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.regression.SupportVectorRegressor(*, C=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, epsilon=1.0, error_coef=1e-06)¶
Class for performing regression via support vector machine (SVM). Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially C or gamma.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
Adapted from: Smola, Alex J.; Scholkopf, Bernhard (2004). “A tutorial on support vector regression” (PDF). Statistics and Computing. 14 (3): 199-222.
(https://alex.smola.org/papers/2004/SmoSch04.pdf)
- Parameters
C (float) – Regularization constant. Must be positive; lower value means more regularization.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by the model. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
epsilon (float, default 0.1) – Size of margin used by the model. Penalties are based only on the errors of predictions for training samples outside this margin, i.e. the support vectors.
error_coef (float) – Coefficient for margin for error (C * error_coef) in determining support vectors when assessing coefficient values. A smaller value represents a stricter threshold and may result in less support vectors.
- weight_¶
Function for computing the weight term of a prediction (via the kernel trick). Takes in a feature vector and outputs a float.
- Type
callable
- bias_¶
Bias term associated with the model.
- Type
float
Methods
fit(X, y)Fits a support vector regressor to data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels given input data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X, y)¶
Fits a support vector regressor to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
- Returns
self – Fitted support vector regressor.
- Return type
- predict(X)¶
Predicts labels given input data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
- Returns
y_pred – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
Dimensionality Reduction¶
The rithml.dimred module implements various machine learning
algorithms for dimensionality reduction:
Kernel principal components analysis (
rithml.dimred.KernelPCA)Principal components analysis (
rithml.dimred.PCA)
- class rithml.dimred.KernelPCA(n_components=None, *, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, fit_inverse_transform=False, alpha=1.0)¶
Class for performing principal components analysis (PCA) using kernel methods.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_components (int, default None) – Number of principal components kept and used by the model. If None, then all components are kept.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by all underlying binary classifiers. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
fit_inverse_transform (bool, default False) – If True, fit the regressors for inverse transformation during fitting. Note that this takes additional time.
alpha (float, default 1.0) – Regularization coefficient (strength) for regressors used for inverse transformation. If fit_inverse_transform == False, then this is ignored.
- components_¶
Array of components used by the model, where n_components is the number of components (specified in the constructor).
- Type
numpy.ndarray of shape (n_components, n_samples)
- regressors_¶
List of regressors (see
rithml.regression.KernelRegression) used for inverse transformation. Only created if fit_inverse_transform == True.- Type
Methods
fit(X)Fits a kernel PCA model to data.
Fits the model to data and then reduces their dimension.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
Reconstructs transformed data into their original dimension, if supported (i.e.
set_params(**params)Sets the specified __init__ parameters to the specified values.
transform(X)Reduces the dimension of data using the model.
- fit(X)¶
Fits a kernel PCA model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
- Returns
self – Fitted kernel PCA model.
- Return type
- fit_transform(X)¶
Fits the model to data and then reduces their dimension.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and reduce dimension of.
- Returns
Z – Transformed data.
- Return type
numpy.ndarray of shape (n_samples, n_components)
- inverse_transform(Z)¶
Reconstructs transformed data into their original dimension, if supported (i.e. if fit_inverse_transform is set to True).
This is performed via kernel regression (see
rithml.regression.KernelRegression), where the regressors are fitted to the original training data using the transformed training data as features.n_test_samples refers to the number of samples in the input data.
- Parameters
Z (numpy.ndarray of shape (n_test_samples, n_components)) – Transformed data to reconstruct.
- Returns
X – Reconstructed data.
- Return type
numpy.ndarray of shape (n_test_samples, n_features)
- Raises
RuntimeError – If fit_inverse_transform is not set to True.
- transform(X)¶
Reduces the dimension of data using the model.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to reduce dimension of.
- Returns
Z – Transformed data.
- Return type
numpy.ndarray of shape (n_test_samples, n_components)
- class rithml.dimred.PCA(n_components=None)¶
Class for performing principal components analysis (PCA).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_components (int, default None) – Number of principal components kept and used by the model. If None, then all components are kept.
- components_¶
Array of components used by the model, where n_components is the number of components (specified in the constructor).
- Type
numpy.ndarray of shape (n_components, n_features)
Methods
fit(X)Fits a PCA model to data.
Fits the model to data and then reduces their dimension.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
Reconstructs transformed data into their original dimension.
set_params(**params)Sets the specified __init__ parameters to the specified values.
transform(X)Reduces the dimension of data using the model.
- fit(X)¶
Fits a PCA model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
- Returns
self – Fitted PCA model.
- Return type
- fit_transform(X)¶
Fits the model to data and then reduces their dimension.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and reduce dimension of.
- Returns
Z – Transformed data.
- Return type
numpy.ndarray of shape (n_samples, n_components)
- inverse_transform(Z)¶
Reconstructs transformed data into their original dimension.
n_test_samples refers to the number of samples in the input data.
- Parameters
Z (numpy.ndarray of shape (n_test_samples, n_components)) – Transformed data to reconstruct.
- Returns
X – Reconstructed data.
- Return type
numpy.ndarray of shape (n_test_samples, n_features)
- transform(X)¶
Reduces the dimension of data using the model.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to reduce dimension of.
- Returns
Z – Transformed data.
- Return type
numpy.ndarray of shape (n_test_samples, n_components)
Clustering¶
The rithml.clustering module implements various machine
learning algorithms for clustering:
Gaussian mixture model (
rithml.clustering.GaussianMixture)K-means clustering (
rithml.clustering.KMeans)
- class rithml.clustering.GaussianMixture(n_components=3, *, covariance_type='full', tol=0.1, reg=1e-06, max_iter=100, init='k-means', weights_init=None, means_init=None, covariances_init=None, random_state=None, verbose=None)¶
Class for performing clustering via a Gaussian mixture model (GMM).
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_components (int, default 3) – Number of components assumed by the model.
covariance_type ({'full', 'tied', 'diag', 'tied_diag', 'spherical', 'tied_spherical'}, default 'full') –
Specifies assumptions about component covariance matrices.
If ‘full’, each component has its own covariance matrix.
If ‘tied’, all components share the same covariance matrix.
If ‘diag’, each component has its own covariance matrix, which is assumed to be diagonal.
If ‘tied_diag’, all components share the same covariance matrix, which is assumed to be diagonal.
If ‘spherical’, each component has its own covariance matrix, which is assumed to be a multiple of the identity matrix.
If ‘tied_spherical’, all components share the same covariance matrix, which is assumed to be a multiple of the identity matrix.
tol (float, default 0.1) – Tolerance level for assessing convergence. That is, iterations of the EM algorithm stop once the increase in log-likelihood is no longer above this level.
reg (float, default 1e-6) – Regularization constant added to the diagonal of all component covariance matrices to ensure nonsingularity.
max_iter (int, default 100) – Maximum number of iterations for the model to take before stopping. If None, then no maximum is used.
init ({'k-means', 'k-means++', 'random_from_data', 'random'}, default 'k-means') –
Specifies how to initialize the means of the components.
If ‘k-means’, then the k-means algorithm is used to cluster the data, and the resulting cluster centers are used as the means.
If ‘k-means++’, then the k-means++ algorithm is used to initialize the means.
If ‘random_from_data’, then a random sample of size n_components is selected without replacement from the training data.
If ‘random’, then a random sample of size n_components is selected from a multivariate Gaussian distribution fitted to the training data.
weights_init (numpy.ndarray of shape (n_components,), default None) – Array of initial component weights for the model to use. If None, the weights are initialized based on init.
means_init (numpy.ndarray of shape (n_components, n_features), default None) – Array of initial component means for the model to use. If None, the means are initialized based on init.
covariances_init (numpy.ndarray of shape (n_components, n_features, n_features), default None) – Array of initial component covariance matrices for the model to use. If None, the covariances are initialized based on init.
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e. randomly drawing samples from the training data during initialization.
If None, then a new Generator object is created (i.e. with a fresh seed).
If int, then a new Generator object is created with the specified int as the seed.
If RandomState or Generator, then that object is directly used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output a progress message after every verbose iterations.
- weights_¶
Array of component weights.
- Type
numpy.ndarray of shape (n_components,)
- means_¶
Array of component means.
- Type
numpy.ndarray of shape (n_components, n_features)
- covariances_¶
Array of component covariance matrices.
- Type
numpy.ndarray of shape (n_component, n_features, n_features)
- labels_¶
Labels assigned by the fitted model to the training data.
- Type
numpy.ndarray of shape (n_samples,)
- log_likelihood_¶
Log-likelihood of the training data based on the fitted model.
- Type
float
- n_iter_¶
Number of iterations taken by the model before stopping.
- Type
int
- n_params_¶
Number of free parameters in the model. Depends on covariance_type. Used in calculation of AIC and BIC.
- Type
int
- aic_fit_¶
Akaike information criterion (AIC) of the model on the training data.
- Type
float
- bic_fit_¶
Bayesian information criterion (BIC) of the model on the training data.
- Type
float
Methods
aic(X)Computes the Akaike information criterion (AIC) of the model on the specified data.
bic(X)Computes the Bayesian information criterion (BIC) of the model on the specified data.
fit(X)Fits a Gaussian mixture model (GMM) to data using the expectation-maximiziation (EM) algorithm.
fit_predict(X)Fits the model to data and then predicts their labels, i.e. clusters the data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels for the input data, i.e. clusters the data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- aic(X)¶
Computes the Akaike information criterion (AIC) of the model on the specified data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of and compute the AIC on, where n_test_samples is the number of samples in the input data.
- Returns
aic – AIC of the model on the data. A lower value suggests a stronger model.
- Return type
float
- bic(X)¶
Computes the Bayesian information criterion (BIC) of the model on the specified data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of and compute the BIC on, where n_test_samples is the number of samples in the input data.
- Returns
bic – BIC of the model on the data. A lower value suggests a stronger model.
- Return type
float
- fit(X)¶
Fits a Gaussian mixture model (GMM) to data using the expectation-maximiziation (EM) algorithm.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
- Returns
self – Fitted GMM.
- Return type
- fit_predict(X)¶
Fits the model to data and then predicts their labels, i.e. clusters the data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and predict labels of.
- Returns
labels – Predicted labels.
- Return type
numpy.ndarray of shape (n_samples,)
- predict(X)¶
Predicts labels for the input data, i.e. clusters the data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of.
- Returns
labels – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
- class rithml.clustering.KMeans(n_clusters=3, *, init='k-means++', max_iter=100, random_state=None, verbose=None)¶
Class for performing k-means clustering.
The following variable names are used in this class’s documentation:
n_samples: Number of samples in the training data.
n_features: Number of features in the training data.
- Parameters
n_clusters (int, default 3) – Number of clusters assumed by the model.
init ({‘k-means++’, ‘random’} or numpy.ndarray of shape (n_clusters, n_features), default ‘k-means++’) –
Specifies how to initialize the means (cluster centers) for the model.
If ‘k-means++’, then the k-means++ algorithm is used.
If ‘random’, then a random sample of size n_clusters is selected without replacement from the training data.
If numpy.ndarray, then the specified array (if the shape is correct) is used, i.e. assumed to be the array of means.
max_iter (int, default 100) – Maximum number of iterations for the model to take before stopping. If None, then no maximum is used.
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e. randomly drawing samples from the training data during initialization, if init is ‘k-means++’ or ‘random’. (If init is an array, then random_state is not used.)
If None, then a new Generator object is created (i.e. with a fresh seed).
If int, then a new Generator object is created with the specified int as the seed.
If RandomState or Generator, then that object is directly used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output a progress message after every verbose iterations.
- centers_¶
Array of all cluster centers of the fitted model.
- Type
numpy.ndarray of shape (n_clusters, n_features)
- labels_¶
Labels assigned by the fitted model to the training data.
- Type
numpy.ndarray of shape (n_samples,)
- n_iter_¶
Number of iterations taken by the model before stopping.
- Type
int
- distortion_¶
Distortion of the training data, i.e. sum of all squared distances from corresponding cluster centers.
- Type
float
Methods
fit(X)Fits a k-means clustering model to data.
fit_predict(X)Fits the model to data and then predicts their labels, i.e. clusters the data.
get_params([deep])Gets __init__ parameter names and corresponding arguments.
predict(X)Predicts labels for the input data, i.e. clusters the data.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- fit(X)¶
Fits a k-means clustering model to data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
- Returns
self – Fitted k-means clustering model.
- Return type
- fit_predict(X)¶
Fits the model to data and then predicts their labels, i.e. clusters the data.
- Parameters
X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and predict labels of.
- Returns
labels – Predicted labels.
- Return type
numpy.ndarray of shape (n_samples,)
- predict(X)¶
Predicts labels for the input data, i.e. clusters the data.
n_test_samples refers to the number of samples in the input data.
- Parameters
X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of.
- Returns
labels – Predicted labels.
- Return type
numpy.ndarray of shape (n_test_samples,)
Base Model¶
The rithml.base module contains the base model class
(rithml.base.BaseModel) from which all other model classes
inherit.
- class rithml.base.BaseModel(**params)¶
Class for the base model from which all other model classes inherit.
- Parameters
**params (dict) – Model parameters.
Methods
get_params([deep])Gets __init__ parameter names and corresponding arguments.
set_params(**params)Sets the specified __init__ parameters to the specified values.
- get_params(deep=True)¶
Gets __init__ parameter names and corresponding arguments.
- Parameters
deep (bool, default True) – If True, return parameter dictionary as a deep copy. Otherwise, return a shallow copy.
- Returns
params – Dictionary of __init__ parameter names (keys) and corresponding arguments (values).
- Return type
dict
- set_params(**params)¶
Sets the specified __init__ parameters to the specified values.
- Parameters
params (dict) – Model parameters, i.e. dictionary of __init__ parameter names (keys) and corresponding arguments (values).
- Returns
Model object.
- Return type
self
- Raises
ValueError – If an invalid parameter name is provided.