API Reference¶

RithML contains four main modules, each corresponding to a category of machine learning algorithms. These categories and modules are listed below:

Classification (rithml.classification)
Regression (rithml.regression)
Dimensionality reduction (rithml.dimred)
Clustering (rithml.clustering)

Additionally, all classes in these modules inherit from the rithml.base.BaseModel class, which implements the rithml.base.BaseModel.get_params() and rithml.base.BaseModel.set_params() methods.

Classification¶

The rithml.classification module implements various machine learning algorithm for classification:

AdaBoost (rithml.classification.AdaBoostClassifer)
Decision tree (rithml.classification.DecisionTreeClassifer)
Linear/quadratic discriminant analysis (rithml.classification.DiscriminantAnalysis)
Gaussian naive Bayes (rithml.classification.GaussianNBClassifier)
Gradient boosting classification trees (rithml.classification.GradientBoostingClassifier)
K-nearest neighbors (rithml.classification.KNNClassifier)
Logistic regression (rithml.classification.LogisticRegression)
Random forest (rithml.classification.RandomForestClassifier)
Support vector machine (rithml.classification.SupportVectorClassifier)

class rithml.classification.AdaBoostClassifier(n_estimators=20, *, max_depth=1, impurity='entropy', algorithm='SAMME.R', verbose=None)¶

Class for performing classification via AdaBoost (with classification trees as estimators).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

n_classes: Number of classes in the training data.

Source: https://hastie.su.domains/Papers/samme.pdf

Parameters

n_estimators (int, default 20) – Number of estimators used by the model.
max_depth (int, default 1) – Maximum depth of individual estimators (classification trees).
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
algorithm ({'SAMME.R', 'SAMME'}, default 'SAMME.R') – Algorithm used by the model during fitting. The SAMME algorithm uses classifications to fit the model, whereas the SAMME.R algorithm uses weighted class probability estimates. The latter typically converges more quickly.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.

classes_¶

Array of classes used by the model.

Type: numpy.ndarray of (n_classes,)

estimators_¶

List of all estimators used by the model.

Type: list of rithml.classification.DecisionTreeClassifier

estimator_weights_¶

Array of weights of estimators used by the model.

Type: numpy.ndarray of shape (n_estimators,)

Methods

`fit`(X, y)	Fits an AdaBoost classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits an AdaBoost classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted AdaBoost classifier.

Return type

AdaBoostClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.DecisionTreeClassifier(*, max_depth=None, impurity='entropy', class_weight=None)¶

Class for performing classification via decision tree.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

max_depth (int, default None) – Maximum depth of tree.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
class_weight (dict, default None) – Dictionary of certain classes (keys) and associated weights (values). For each specified class, the associated weight is applied to all corresponding training samples during fitting.

classes_¶

Array of all classes assumed by the model, where n_classes is the number of classes.

Type: numpy.ndarray of (n_classes,)

root_¶

Root node of underlying decision tree.

Type: _DTCNode

Methods

`fit`(X, y[, weights])	Fits a decision tree classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X[, return_probabilities])	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y, weights=None)¶

Fits a decision tree classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
weights (numpy.ndarray of shape (n_samples,), default None) – Weights for training samples. (This is different from the class_weight attribute, which applies weights by class instead of sample.) If None, then samples are weighted uniformly. These weights are combined with class weights to influence calculations of node impurity and probability estimates (if applicable).

Returns

self – Fitted decision tree classifier.

Return type

DecisionTreeClassifier

predict(X, return_probabilities=False)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters

X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
return_probabilities (bool, default False) – If True, also return the probability estimates for the predictions.

Returns

y_pred (numpy.ndarray of shape (n_test_samples,)) – Predicted labels.
probabilities (numpy.ndarray of shape (n_test_samples,), optional) – Probability estimates for predictions. That is, an array of dictionaries, where each dictionary contains classes (keys) and probability estimates (values) for a particular sample.

class rithml.classification.DiscriminantAnalysis(kind='linear')¶

Class for performing classification via linear/quadratic discriminant analysis.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

n_classes: Number of classes in the training data.

Parameters: kind ({'linear', 'quadratic'}, default 'linear') – Type of discriminant analysis performed by the model. If ‘linear’, the model assumes equal covariance matrices for each class. If ‘quadratic’, the model assumes distinct covariance matrices for each class.

classes_¶

Array of all classes assumed by the model.

Type: numpy.ndarray of shape (n_classes,)

probabilities_¶

Array of probabilities (priors) of each class.

Type: numpy.ndarray of shape (n_classes,)

means_¶

Array of means of each class.

Type: numpy.ndarray of shape (n_classes, n_features)

covariances_¶

Covariance matrices used by the model, depending on kind. If kind == ‘linear’, then all covariance matrices are the same (by assumption).

Type: numpy.ndarray of shape (n_classes, n_features, n_features)

Methods

`fit`(X, y)	Fits a discriminant analysis model to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a discriminant analysis model to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted discriminant analysis model.

Return type

DiscriminantAnalysis

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.GaussianNBClassifier(*, equal_variances=False)¶

Class for performing classification via Gaussian naive Bayes.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

n_classes: Number of classes in the training data.

Parameters: equal_variances (bool, default False) – If True, assume that each feature has the same variance for all classes.

classes_¶

Array of all classes assumed by the model.

Type: numpy.ndarray of (n_classes,)

probabilities_¶

Array of probabilities (priors) of each class.

Type: numpy.ndarray of (n_classes,)

means_¶

Array of means of each class.

Type: numpy.ndarray of shape (n_classes, n_features)

variances_¶

Array of feature variances of each class, depending on equal_variances. If equal_variances is True, then feature variances are the same across all classes (by assumption).

Type: numpy.ndarray of shape (n_classes, n_features)

Methods

`fit`(X, y)	Fits a Gaussian naive Bayes classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a Gaussian naive Bayes classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted Gaussian naive Bayes classifier.

Return type

GaussianNBClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.GradientBoostingClassifier(n_estimators=100, *, learning_rate=0.1, max_depth=3, impurity='entropy', verbose=None)¶

Class for performing classification via gradient boosting (with classification trees as estimators).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Adapted from: https://sefiks.com/2018/10/29/a-step-by-step-gradient-boosting-example-for-classification/

Parameters

n_estimators (int, default 100) – Number of estimators used by the model.
learning_rate (float, default 0.1) – Rate at which each additional estimator contributes to the model.
max_depth (int, default 3) – Maximum depth of individual estimators.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator is being fitted (for each class).

classes_¶

Array of all classes assumed by the model, where n_classes is the number of classes.

Type: numpy.ndarray of (n_classes,)

estimators_¶

Dictionary of all classes (keys) and estimator lists (values).

Type: dict

Methods

`fit`(X, y)	Fits a gradient boosting classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a gradient boosting classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted gradient boosting classifier.

Return type

GradientBoostingClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.KNNClassifier(n_neighbors=5, *, weights='uniform')¶

Class for performing classification via k-nearest neighbors (k-NN).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_neighbors (int, default 5) – Number of nearest neighbors to consider.
weights ({'uniform', 'distance'}, default 'uniform') – Weights assigned to nearest neighbors, either uniform (‘uniform’) or based on inverse distance (‘distance’).

classes_¶

Array of all classes assumed by the model, where n_classes is the number of classes.

Type: numpy.ndarray of (n_classes,)

Methods

`fit`(X, y)	Fits a k-NN classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a k-NN classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted k-NN classifier.

Return type

KNNClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.LogisticRegression(*, alpha=0, tol=0.01, verbose=False)¶

Class for performing logistic regression.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

n_classes: Number of classes in the training data.

Parameters

alpha (float, default 0) – Regularization coefficient (strength). The regularization type is L2.
tol (float, default 0.01) – Tolerance argument for logistic loss minimization. (A smaller value will increase runtime but may improve model performance.)
verbose (bool, default False) – If True, output details about progress and time elapsed during fitting.

classes_¶

Array of all classes assumed by the model, where n_classes is the number of classes.

Type: numpy.ndarray of (n_classes,)

weight_¶

Feature weights used in the decision function. If the labels are not binary (i.e. n_classes > 2), then this is a 2-D array with weights for each class. Otherwise, its shape is (n_features,).

Type: numpy.ndarray of shape (n_classes, n_features) or (n_features,)

bias_¶

Bias(es) used in the decision function. If the labels are not binary (i.e. n_classes > 2), then this is an array with a bias for each class. Otherwise, it is a single bias (float).

Type: numpy.ndarray of (n_classes,) or float

Methods

`fit`(X, y)	Fits a logistic regression model to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a logistic regression model to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted logistic regression model.

Return type

LogisticRegression

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.RandomForestClassifier(n_estimators=100, *, max_depth=None, impurity='entropy', random_state=None, max_samples=None, max_features='sqrt', bootstrap=True, verbose=None)¶

Class for performing classification via random forest.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_estimators (int, default 100) – Number of estimators used by the model.
max_depth (int, default None) – Maximum depth of individual estimators.
impurity ({'entropy', 'gini'}, default 'entropy') – Impurity function for assessing split quality.
bootstrap (bool, default True) – If True, use bootstrapping, i.e. re-sample new datasets for each estimator. If False, use the original dataset to fit each estimator (ignoring max_samples).
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e.

(1) drawing samples with replacement to create n_estimators new datasets, based on max_samples (when bootstrap == True)

(2) selecting a subset of features for each such dataset, based on max_features

If None, then a new Generator object is created (i.e. with a fresh seed).

If int, then a new Generator object is created with the specified int as the seed.

If RandomState or Generator, then that object is directly used.
max_samples (callable or int, default None) –
The number of samples to draw from the original dataset X to create each new dataset during bootstrapping (when bootstrap == True), one for each estimator.

If None, then n_samples samples are drawn.

If callable, then max_samples(n_samples) samples are drawn. (For example, this can be used to draw a specified proportion of n_samples samples.)

If int, then max_samples samples are drawn.
max_features ({'sqrt', 'log2'}, callable, or int, default None) –
The number of features used by each estimator.

If None, then n_features features are used.

If ‘sqrt’, then sqrt(n_features) features are used.

If ‘log2’, then log2(n_features) features are used.

If callable, then max_features(n_features) features are used. (For example, this can be used to use a specified proportion of n_features features.)

If int, then max_features are used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.

classes_¶

Array of all classes assumed by the model, where n_classes is the number of classes.

Type: numpy.ndarray of (n_classes,)

estimators_¶

Dictionary of all DecisionTreeClassifier estimators (keys) and arrays of features used (values).

Type: dict of rithml.classification.DecisionTreeClassifier to numpy.ndarray

Methods

`fit`(X, y)	Fits a random forest classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a random forest classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted random forest classifier.

Return type

RandomForestClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.classification.SupportVectorClassifier(*, kind='ovr', C=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, error_coef=1e-06, verbose=False)¶

Class for performing classification via support vector machine (SVM).

Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially C or gamma.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

n_classes: Number of classes in the training data.

Parameters

kind ({'ovr', 'ovo'}, default 'ovr') –
Specifies how to create the model’s underlying binary classifier(s).

If ‘ovr’, then the model uses one-vs-rest classification. That is, for each class, the model transforms the labels to binary data based on that class and fits a binary classifier to the new data, resulting in a total of n_classes underlying binary classifiers.

If ‘ovo’, then the model uses one-vs-one classification. That is, for each pair of classes, the model fits a binary classifier to the subset of input data corresponding to those two classes, resulting in a total of n_classes * (n_classes - 1) / 2 underlying binary classifiers. Note that this may still be faster than ‘ovr’ due to smaller training sets.

If the labels are binary (i.e. n_classes == 2), then this parameter is ignored, and the model fits a single binary classifier.
C (float) – Regularization constant. Must be positive; lower value means more regularization. Used by all underlying binary classifiers.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by all underlying binary classifiers. If a function is provided, then it must take in two arrays of feature vectors and compute an array of floats.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
error_coef (float, default 1e-6) – Coefficient for margin for error (C * error_coef) in determining support vectors when assessing coefficient values. A smaller value represents a stricter threshold and may result in less support vectors.
verbose (bool, default False) – If True, output details about progress and time elapsed during fitting.

classes_¶

Array of all classes assumed by the model.

Type: numpy.ndarray of (n_classes,)

Methods

`fit`(X, y)	Fits a support vector classifier to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a support vector classifier to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted support vector classifier.

Return type

SupportVectorClassifier

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

Regression¶

The rithml.regression module implements various machine learning algorithms for regression:

Decision tree (rithml.regression.DecisionTreeRegressor)
Gradient boosting regression trees (rithml.regression.GradientBoostingRegressor)
Kernel (ridge) regression (rithml.regression.KernelRegression)
K-nearest neighbors (rithml.regression.KNNRegressor)
Linear (ridge) regression (rithml.regression.LinearRegression)
Random forest (rithml.regression.RandomForestRegressor)
Support vector machine (rithml.regression.SupportVectorRegressor)

class rithml.regression.DecisionTreeRegressor(max_depth=None, error='squared')¶

Class for performing regression via decision tree.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

max_depth (int, default None) – Maximum depth of tree.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.

root_¶

Root node of underlying decision tree.

Type: _DTRNode

Methods

`fit`(X, y[, weights])	Fits a decision tree regressor to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y, weights=None)¶

Fits a decision tree regressor to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
weights (numpy.ndarray of shape (n_samples,), default None) – Weights for training samples. If None, then samples are weighted uniformly.

Returns

self – Fitted decision tree regressor.

Return type

DecisionTreeRegressor

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.regression.GradientBoostingRegressor(n_estimators=100, *, learning_rate=0.1, max_depth=3, error='squared', verbose=None)¶

Class for performing regression via gradient boosting (with regression trees as estimators).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_estimators (int, default 100) – Number of estimators used by the model.
learning_rate (float, default 0.1) – Rate at which each additional estimator contributes to the model.
max_depth (int, default 3) – Maximum depth of individual estimators.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.

estimators_¶

List of estimators used by the model.

Type: list of rithml.regression.DecisionTreeRegressor

Methods

`fit`(X, y)	Fits a gradient boosting regressor to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a gradient boosting regressor to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted gradient boosting regressor.

Return type

GradientBoostingRegressor

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.regression.KNNRegressor(n_neighbors=5, *, weights='uniform')¶

Class for performing regression via k-nearest neighbors (k-NN).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_neighbors (int, default 5) – Number of nearest neighbors to consider.
weights ({'uniform', 'distance'}, default 'uniform') – Weights assigned to nearest neighbors, either uniform (‘uniform’) or based on inverse distance (‘distance’).

Methods

`fit`(X, y)	Fits a k-NN regressor to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a k-NN regressor to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted k-NN regressor.

Return type

KNNRegressor

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.regression.KernelRegression(*, alpha=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, nonzero_bias=True)¶

Class for performing kernel regression.

This class supports both ordinary regression (no regularization) and ridge regression (L2 regularization).

Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially alpha or gamma.

Note that attempting to perform ordinary regression (with alpha=0) may result in a singular matrix error during fitting. For this reason, it may be better to use a very low alpha value instead.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Adapted from: https://statinfer.wordpress.com/2013/08/05/undocumented-machine-learning-ii-kernel-regression/

Parameters

alpha (float, default 1.0) – Regularization coefficient (strength). The regularization type is L2 (ridge regression).
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by the model. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
nonzero_bias (bool, default True) – If False, the model assumes a bias of 0.

weight_¶

Weight term associated with model.

Type: numpy.ndarray of shape (n_samples,)

bias_¶

Bias term associated with model.

Type: float

mean_¶

Mean of the training data.

Type: numpy.ndarray of shape (n_features,)

Methods

`fit`(X, y[, K])	Fits a kernel regression model to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X[, K_pred])	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y, K=None)¶

Fits a kernel regression model to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.
K (numpy.ndarray of shape (n_samples, n_samples), default None) – Kernel matrix, i.e. kernel result for every pair of training samples. If nonzero_bias is True, then these training samples should be mean-centered before applying the kernel function to them. If None, the model computes the kernel matrix itself.

Returns

self – Fitted kernel regression model.

Return type

KernelRegression

predict(X, K_pred=None)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters

X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
K_pred (numpy.ndarray of shape (n_test_samples, n_samples), default None) – Kernel matrix, i.e. kernel result for every test sample with every training sample. If nonzero_bias is True, then all samples should be mean-centered (based on the training mean, i.e. the mean_ attribute) before applying the kernel function to them. If None, the model computes the kernel matrix itself.

Returns

y_pred – Predicted labels.

Return type

numpy.ndarray of shape (n_test_samples,)

class rithml.regression.LinearRegression(alpha=0)¶

Class for performing linear (least squares) regression.

This class supports both ordinary regression (no regularization) and ridge regression (L2 regularization).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters: alpha (float, default 0) – Regularization coefficient (strength). The regularization type is L2 (ridge regression).

weight_¶

Weight term associated with model.

Type: numpy.ndarray of shape (n_features,)

bias_¶

Bias term associated with model.

Type: float

Methods

`fit`(X, y)	Fits a linear regression model to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a linear regression model to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted linear regression model.

Return type

LinearRegression

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.regression.RandomForestRegressor(n_estimators=100, *, max_depth=None, error='squared', random_state=None, max_samples=None, max_features=None, bootstrap=True, verbose=None)¶

Class for performing regression via random forest.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_estimators (int, default 100) – Number of estimators used by the model.
max_depth (int, default None) – Maximum depth of individual estimators.
error ({'squared', 'absolute'}, default 'squared') – Error function for assessing split quality.
bootstrap (bool, default True) – If True, use bootstrapping, i.e. re-sample new datasets for each estimator. If False, use the original dataset to fit each estimator (ignoring max_samples).
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e.

(1) drawing samples with replacement to create n_estimators new datasets, based on max_samples (when bootstrap == True)

(2) selecting a subset of features for each such dataset, based on max_features

If None, then a new Generator object is created (i.e. with a fresh seed).

If int, then a new Generator object is created with the specified int as the seed.

If RandomState or Generator, then that object is directly used.
max_samples (callable or int, default None) –
The number of samples to draw from the original dataset X to create each new dataset during bootstrapping (when bootstrap == True), one for each estimator.

If None, then n_samples samples are drawn.

If callable, then max_samples(n_samples) samples are drawn. (For example, this can be used to draw a specified proportion of n_samples samples.)

If int, then max_samples samples are drawn.
max_features ({'sqrt', 'log2'}, callable, or int, default None) –
The number of features used by each estimator.

If None, then n_features features are used.

If ‘sqrt’, then sqrt(n_features) features are used.

If ‘log2’, then log2(n_features) features are used.

If callable, then max_features(n_features) features are used. (For example, this can be used to use a specified proportion of n_features features.)

If int, then max_features are used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output when every `verbose`th estimator has been fitted.

estimators_¶

Dictionary of all DecisionTreeRegressor estimators (keys) and arrays of features used (values).

Type: dict of rithml.regression.DecisionTreeRegressor to numpy.ndarray

Methods

`fit`(X, y)	Fits a random forest regressor to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a random forest regressor to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted random forest regressor.

Return type

RandomForestRegressor

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.regression.SupportVectorRegressor(*, C=1.0, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, epsilon=1.0, error_coef=1e-06)¶

Class for performing regression via support vector machine (SVM). Note: The default parameter values may result in a poor model. If so, it is advised to change these values from their defaults, especially C or gamma.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Adapted from: Smola, Alex J.; Scholkopf, Bernhard (2004). “A tutorial on support vector regression” (PDF). Statistics and Computing. 14 (3): 199-222.

(https://alex.smola.org/papers/2004/SmoSch04.pdf)

Parameters

C (float) – Regularization constant. Must be positive; lower value means more regularization.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by the model. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
epsilon (float, default 0.1) – Size of margin used by the model. Penalties are based only on the errors of predictions for training samples outside this margin, i.e. the support vectors.
error_coef (float) – Coefficient for margin for error (C * error_coef) in determining support vectors when assessing coefficient values. A smaller value represents a stricter threshold and may result in less support vectors.

weight_¶

Function for computing the weight term of a prediction (via the kernel trick). Takes in a feature vector and outputs a float.

Type: callable

bias_¶

Bias term associated with the model.

Type: float

Methods

`fit`(X, y)	Fits a support vector regressor to data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels given input data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X, y)¶

Fits a support vector regressor to data.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
y (numpy.ndarray of shape (n_samples,)) – Training labels.

Returns

self – Fitted support vector regressor.

Return type

SupportVectorRegressor

predict(X)¶

Predicts labels given input data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Predictors to predict labels for.
Returns: y_pred – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

Dimensionality Reduction¶

The rithml.dimred module implements various machine learning algorithms for dimensionality reduction:

Kernel principal components analysis (rithml.dimred.KernelPCA)
Principal components analysis (rithml.dimred.PCA)

class rithml.dimred.KernelPCA(n_components=None, *, kernel='rbf', degree=3, gamma=1.0, coef0=1.0, fit_inverse_transform=False, alpha=1.0)¶

Class for performing principal components analysis (PCA) using kernel methods.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_components (int, default None) – Number of principal components kept and used by the model. If None, then all components are kept.
kernel ({'rbf', 'linear', 'poly'} or callable, default 'rbf') – Determines kernel function used by all underlying binary classifiers. If a function is provided, then it must take in two feature vectors and compute a float.
degree (int, default 3) – Degree of polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
gamma (float, default 1.0) – Gamma parameter for polynomial and radial basis function (RBF) kernels. If kernel is not ‘poly’ or ‘rbf’, then this parameter is ignored.
coef0 (float, default 1.0) – Constant term used in polynomial kernel. If kernel is not ‘poly’, then this parameter is ignored.
fit_inverse_transform (bool, default False) – If True, fit the regressors for inverse transformation during fitting. Note that this takes additional time.
alpha (float, default 1.0) – Regularization coefficient (strength) for regressors used for inverse transformation. If fit_inverse_transform == False, then this is ignored.

components_¶

Array of components used by the model, where n_components is the number of components (specified in the constructor).

Type: numpy.ndarray of shape (n_components, n_samples)

regressors_¶

List of regressors (see rithml.regression.KernelRegression) used for inverse transformation. Only created if fit_inverse_transform == True.

Type: list of rithml.regression.KernelRegression

Methods

`fit`(X)	Fits a kernel PCA model to data.
`fit_transform`(X)	Fits the model to data and then reduces their dimension.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`inverse_transform`(Z)	Reconstructs transformed data into their original dimension, if supported (i.e.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.
`transform`(X)	Reduces the dimension of data using the model.

fit(X)¶

Fits a kernel PCA model to data.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
Returns: self – Fitted kernel PCA model.
Return type: KernelPCA

fit_transform(X)¶

Fits the model to data and then reduces their dimension.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and reduce dimension of.
Returns: Z – Transformed data.
Return type: numpy.ndarray of shape (n_samples, n_components)

inverse_transform(Z)¶

Reconstructs transformed data into their original dimension, if supported (i.e. if fit_inverse_transform is set to True).

This is performed via kernel regression (see rithml.regression.KernelRegression), where the regressors are fitted to the original training data using the transformed training data as features.

n_test_samples refers to the number of samples in the input data.

Parameters: Z (numpy.ndarray of shape (n_test_samples, n_components)) – Transformed data to reconstruct.
Returns: X – Reconstructed data.
Return type: numpy.ndarray of shape (n_test_samples, n_features)
Raises: RuntimeError – If fit_inverse_transform is not set to True.

transform(X)¶

Reduces the dimension of data using the model.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to reduce dimension of.
Returns: Z – Transformed data.
Return type: numpy.ndarray of shape (n_test_samples, n_components)

class rithml.dimred.PCA(n_components=None)¶

Class for performing principal components analysis (PCA).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters: n_components (int, default None) – Number of principal components kept and used by the model. If None, then all components are kept.

components_¶

Array of components used by the model, where n_components is the number of components (specified in the constructor).

Type: numpy.ndarray of shape (n_components, n_features)

Methods

`fit`(X)	Fits a PCA model to data.
`fit_transform`(X)	Fits the model to data and then reduces their dimension.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`inverse_transform`(Z)	Reconstructs transformed data into their original dimension.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.
`transform`(X)	Reduces the dimension of data using the model.

fit(X)¶

Fits a PCA model to data.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
Returns: self – Fitted PCA model.
Return type: PCA

fit_transform(X)¶

Fits the model to data and then reduces their dimension.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and reduce dimension of.
Returns: Z – Transformed data.
Return type: numpy.ndarray of shape (n_samples, n_components)

inverse_transform(Z)¶

Reconstructs transformed data into their original dimension.

n_test_samples refers to the number of samples in the input data.

Parameters: Z (numpy.ndarray of shape (n_test_samples, n_components)) – Transformed data to reconstruct.
Returns: X – Reconstructed data.
Return type: numpy.ndarray of shape (n_test_samples, n_features)

transform(X)¶

Reduces the dimension of data using the model.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to reduce dimension of.
Returns: Z – Transformed data.
Return type: numpy.ndarray of shape (n_test_samples, n_components)

Clustering¶

The rithml.clustering module implements various machine learning algorithms for clustering:

Gaussian mixture model (rithml.clustering.GaussianMixture)
K-means clustering (rithml.clustering.KMeans)

class rithml.clustering.GaussianMixture(n_components=3, *, covariance_type='full', tol=0.1, reg=1e-06, max_iter=100, init='k-means', weights_init=None, means_init=None, covariances_init=None, random_state=None, verbose=None)¶

Class for performing clustering via a Gaussian mixture model (GMM).

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_components (int, default 3) – Number of components assumed by the model.
covariance_type ({'full', 'tied', 'diag', 'tied_diag', 'spherical', 'tied_spherical'}, default 'full') –
Specifies assumptions about component covariance matrices.

If ‘full’, each component has its own covariance matrix.

If ‘tied’, all components share the same covariance matrix.

If ‘diag’, each component has its own covariance matrix, which is assumed to be diagonal.

If ‘tied_diag’, all components share the same covariance matrix, which is assumed to be diagonal.

If ‘spherical’, each component has its own covariance matrix, which is assumed to be a multiple of the identity matrix.

If ‘tied_spherical’, all components share the same covariance matrix, which is assumed to be a multiple of the identity matrix.
tol (float, default 0.1) – Tolerance level for assessing convergence. That is, iterations of the EM algorithm stop once the increase in log-likelihood is no longer above this level.
reg (float, default 1e-6) – Regularization constant added to the diagonal of all component covariance matrices to ensure nonsingularity.
max_iter (int, default 100) – Maximum number of iterations for the model to take before stopping. If None, then no maximum is used.
init ({'k-means', 'k-means++', 'random_from_data', 'random'}, default 'k-means') –
Specifies how to initialize the means of the components.

If ‘k-means’, then the k-means algorithm is used to cluster the data, and the resulting cluster centers are used as the means.

If ‘k-means++’, then the k-means++ algorithm is used to initialize the means.

If ‘random_from_data’, then a random sample of size n_components is selected without replacement from the training data.

If ‘random’, then a random sample of size n_components is selected from a multivariate Gaussian distribution fitted to the training data.
weights_init (numpy.ndarray of shape (n_components,), default None) – Array of initial component weights for the model to use. If None, the weights are initialized based on init.
means_init (numpy.ndarray of shape (n_components, n_features), default None) – Array of initial component means for the model to use. If None, the means are initialized based on init.
covariances_init (numpy.ndarray of shape (n_components, n_features, n_features), default None) – Array of initial component covariance matrices for the model to use. If None, the covariances are initialized based on init.
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e. randomly drawing samples from the training data during initialization.

If None, then a new Generator object is created (i.e. with a fresh seed).

If int, then a new Generator object is created with the specified int as the seed.

If RandomState or Generator, then that object is directly used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output a progress message after every verbose iterations.

weights_¶

Array of component weights.

Type: numpy.ndarray of shape (n_components,)

means_¶

Array of component means.

Type: numpy.ndarray of shape (n_components, n_features)

covariances_¶

Array of component covariance matrices.

Type: numpy.ndarray of shape (n_component, n_features, n_features)

labels_¶

Labels assigned by the fitted model to the training data.

Type: numpy.ndarray of shape (n_samples,)

log_likelihood_¶

Log-likelihood of the training data based on the fitted model.

Type: float

n_iter_¶

Number of iterations taken by the model before stopping.

Type: int

n_params_¶

Number of free parameters in the model. Depends on covariance_type. Used in calculation of AIC and BIC.

Type: int

aic_fit_¶

Akaike information criterion (AIC) of the model on the training data.

Type: float

bic_fit_¶

Bayesian information criterion (BIC) of the model on the training data.

Type: float

Methods

`aic`(X)	Computes the Akaike information criterion (AIC) of the model on the specified data.
`bic`(X)	Computes the Bayesian information criterion (BIC) of the model on the specified data.
`fit`(X)	Fits a Gaussian mixture model (GMM) to data using the expectation-maximiziation (EM) algorithm.
`fit_predict`(X)	Fits the model to data and then predicts their labels, i.e. clusters the data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels for the input data, i.e. clusters the data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

aic(X)¶

Computes the Akaike information criterion (AIC) of the model on the specified data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of and compute the AIC on, where n_test_samples is the number of samples in the input data.
Returns: aic – AIC of the model on the data. A lower value suggests a stronger model.
Return type: float

bic(X)¶

Computes the Bayesian information criterion (BIC) of the model on the specified data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of and compute the BIC on, where n_test_samples is the number of samples in the input data.
Returns: bic – BIC of the model on the data. A lower value suggests a stronger model.
Return type: float

fit(X)¶

Fits a Gaussian mixture model (GMM) to data using the expectation-maximiziation (EM) algorithm.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
Returns: self – Fitted GMM.
Return type: GaussianMixture

fit_predict(X)¶

Fits the model to data and then predicts their labels, i.e. clusters the data.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and predict labels of.
Returns: labels – Predicted labels.
Return type: numpy.ndarray of shape (n_samples,)

predict(X)¶

Predicts labels for the input data, i.e. clusters the data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of.
Returns: labels – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

class rithml.clustering.KMeans(n_clusters=3, *, init='k-means++', max_iter=100, random_state=None, verbose=None)¶

Class for performing k-means clustering.

The following variable names are used in this class’s documentation:

n_samples: Number of samples in the training data.

n_features: Number of features in the training data.

Parameters

n_clusters (int, default 3) – Number of clusters assumed by the model.
init ({‘k-means++’, ‘random’} or numpy.ndarray of shape (n_clusters, n_features), default ‘k-means++’) –
Specifies how to initialize the means (cluster centers) for the model.

If ‘k-means++’, then the k-means++ algorithm is used.

If ‘random’, then a random sample of size n_clusters is selected without replacement from the training data.

If numpy.ndarray, then the specified array (if the shape is correct) is used, i.e. assumed to be the array of means.
max_iter (int, default 100) – Maximum number of iterations for the model to take before stopping. If None, then no maximum is used.
random_state (int, numpy.random.RandomState, or numpy.random.Generator, default None) –
Object used for random processes during fitting, i.e. randomly drawing samples from the training data during initialization, if init is ‘k-means++’ or ‘random’. (If init is an array, then random_state is not used.)

If None, then a new Generator object is created (i.e. with a fresh seed).

If int, then a new Generator object is created with the specified int as the seed.

If RandomState or Generator, then that object is directly used.
verbose (int, default None) – If not None, output details about progress and time elapsed during fitting. Additionally, if >0, then output a progress message after every verbose iterations.

centers_¶

Array of all cluster centers of the fitted model.

Type: numpy.ndarray of shape (n_clusters, n_features)

labels_¶

Labels assigned by the fitted model to the training data.

Type: numpy.ndarray of shape (n_samples,)

n_iter_¶

Number of iterations taken by the model before stopping.

Type: int

distortion_¶

Distortion of the training data, i.e. sum of all squared distances from corresponding cluster centers.

Type: float

Methods

`fit`(X)	Fits a k-means clustering model to data.
`fit_predict`(X)	Fits the model to data and then predicts their labels, i.e. clusters the data.
`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`predict`(X)	Predicts labels for the input data, i.e. clusters the data.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

fit(X)¶

Fits a k-means clustering model to data.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Training predictors.
Returns: self – Fitted k-means clustering model.
Return type: KMeans

fit_predict(X)¶

Fits the model to data and then predicts their labels, i.e. clusters the data.

Parameters: X (numpy.ndarray of shape (n_samples, n_features)) – Data to fit to and predict labels of.
Returns: labels – Predicted labels.
Return type: numpy.ndarray of shape (n_samples,)

predict(X)¶

Predicts labels for the input data, i.e. clusters the data.

n_test_samples refers to the number of samples in the input data.

Parameters: X (numpy.ndarray of shape (n_test_samples, n_features)) – Data to predict labels of.
Returns: labels – Predicted labels.
Return type: numpy.ndarray of shape (n_test_samples,)

Base Model¶

The rithml.base module contains the base model class (rithml.base.BaseModel) from which all other model classes inherit.

class rithml.base.BaseModel(**params)¶

Class for the base model from which all other model classes inherit.

Parameters: **params (dict) – Model parameters.

Methods

`get_params`([deep])	Gets __init__ parameter names and corresponding arguments.
`set_params`(**params)	Sets the specified __init__ parameters to the specified values.

get_params(deep=True)¶

Gets __init__ parameter names and corresponding arguments.

Parameters: deep (bool, default True) – If True, return parameter dictionary as a deep copy. Otherwise, return a shallow copy.
Returns: params – Dictionary of __init__ parameter names (keys) and corresponding arguments (values).
Return type: dict

set_params(**params)¶

Sets the specified __init__ parameters to the specified values.

Parameters: params (dict) – Model parameters, i.e. dictionary of __init__ parameter names (keys) and corresponding arguments (values).
Returns: Model object.
Return type: self
Raises: ValueError – If an invalid parameter name is provided.