pydl85.unsupervised.clustering.DL85Cluster

class pydl85.unsupervised.clustering.DL85Cluster(max_depth=1, min_sup=1, error_function=None, max_error=0, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, leaf_value_function=None, print_output=False, cache_type=Cache_Type.Cache_TrieItemset, maxcachesize=0, wipe_type=Wipe_Type.Subnodes, wipe_factor=0.5, use_cache=True, use_ub=True, dynamic_branch=True)[source]

An optimal binary decision tree classifier.

Parameters:
max_depthint, default=1

Maximum depth of the tree to be found

min_supint, default=1

Minimum number of examples per leaf

error_functionfunction, default=None

Function used to evaluate the quality of each node. The function must take at least one argument, the list of instances covered by the node. It should return a float value representing the error of the node. In case of supervised learning, it should additionally return a label. If no error function is provided, the default one is used.

max_errorint, default=0

Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.

stop_after_betterbool, default=False

A parameter used to indicate if the search will stop after finding a tree better than max_error

time_limitint, default=0

Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.

verbosebool, default=False

A parameter used to switch on/off the print of what happens during the search

descfunction, default=None

A parameter used to indicate heuristic function used to sort the items in descending order

ascfunction, default=None

A parameter used to indicate heuristic function used to sort the items in ascending order

repeat_sortbool, default=False

A parameter used to indicate whether the heuristic sort will be applied at each level of the lattice or only at the root

leaf_value_functionfunction, default=None

Function used to assign a label to a leaf in case of unsupervised learning. The function must take at least one argument, the list of instances covered by the leaf. It should return the desired label. If no function is provided, there will be no label assigned to the leafs.

print_outputbool, default=False

A parameter used to indicate if the search output will be printed or not

cache_typeCache_Type, default=Cache_Type.Cache_TrieItemset

A parameter used to indicate the type of cache used when the DL85Predictor.usecache is set to True.

maxcachesizeint, default=0

A parameter used to indicate the maximum size of the cache. If the cache size is reached, the cache will be wiped using the DL85Predictor.wipe_type and DL85Predictor.wipe_factor parameters. Default value 0 stands for no limit.

wipe_typeWipe_Type, default=Wipe_Type.Reuses

A parameter used to indicate the type of cache used when the DL85Predictor.maxcachesize is reached.

wipe_factorfloat, default=0.5

A parameter used to indicate the rate of elements to delete from the cache when the DL85Predictor.maxcachesize is reached.

use_cachebool, default=True

A parameter used to indicate if a cache will be used or not

use_ubbool, default=True

Define whether the hierarchical upper bound is used or not

dynamic_branchbool, default=True

Define whether a dynamic branching is used to decide in which order explore decisions on an attribute

Attributes:
tree_str

Outputted tree in serialized form; remains empty as long as no model is learned.

base_tree_str

Basic outputted tree without any additional data (transactions, proba, etc.)

size_int

The size of the outputted tree

depth_int

Depth of the found tree

error_float

Error of the found tree

accuracy_float

Accuracy of the found tree on training set

lattice_size_int

The number of nodes explored before found the optimal tree

runtime_float

Time of the optimal decision tree search

timeout_bool

Whether the search reached timeout or not

classes_ndarray, shape (n_classes,)

The classes seen at fit().

is_fitted_bool

Whether the classifier is fitted or not

fit(X, X_error=None)[source]

Implements the standard fitting function for a DL8.5 classifier.

Parameters:
Xarray-like, shape (n_samples, n_features)

The training input samples. If X_error is provided, it represents explanation input

X_errorarray-like, shape (n_samples, n_features_1)

The training input used to calculate error. If it is not provided X is used to calculate error

Returns:
selfobject

Returns self.

fit_predict(X, y=None)

Perform clustering on X and returns cluster labels.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input data.

yIgnored

Not used, present for API consistency by convention.

Returns:
labelsndarray of shape (n_samples,), dtype=np.int64

Cluster labels.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Implements the standard predict function for a DL8.5 classifier.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples,)

The label for each sample is the label of the closest sample seen during fit.

predict_proba(X)

Implements the standard predict function for a DL8.5 classifier.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples,)

The label for each sample is the label of the closest sample seen during fit.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.