pydl85.supervised.classifiers.DL85Booster¶
- class pydl85.supervised.classifiers.DL85Booster(base_estimator=None, max_depth=1, min_sup=1, max_iterations=0, model=Boosting_Model.MODEL_LP_DEMIRIZ, gamma=None, error_function=None, fast_error_function=None, opti_gap=0.01, max_error=0, regulator=-1, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, quiet=True, print_output=False, cache_type=Cache_Type.Cache_TrieItemset, maxcachesize=0, wipe_type=Wipe_Type.Subnodes, wipe_factor=0.5, use_cache=True, use_ub=True, dynamic_branch=True)[source]¶
An optimal binary decision tree classifier.
- Parameters:
- base_estimatorObject, default=None
The base estimator implementing fit/predict/predict_proba to learn at each step of the boosting process.
- max_depthint, default=1
Maximum depth of the tree to be found
- min_supint, default=1
Minimum number of examples per leaf
- max_iterationsint, default=0
Maximum number of iterations. Default value stands for no bound.
- modelBoosting_Model, default=Boosting_Model.MODEL_LP_DEMIRIZ
The mathematical model used to solve the boosting problem
- gammafloat, default=None
Regularization parameter used in MDBOOST model. If None, it is set automatically
- error_functionfunction, default=None
Function used to evaluate the quality of each node. The function must take at least one argument, the list of instances covered by the node. It should return a float value representing the error of the node. In case of supervised learning, it should additionally return a label. If no error function is provided, the default one is used.
- fast_error_functionfunction, default=None
Function used to evaluate the quality of each node. The function must take at least one argument, the list of number of instances per class in the node. It should return a float value representing the error of the node and the predicted label. If no error function is provided, the default one is used.
- opti_gapfloat, default=0.01
The optimality gap used in the optimization model
- max_errorint, default=0
Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.
- regulatorfloat, default=-1
The regulator used in the optimization model.
- stop_after_betterbool, default=False
A parameter used to indicate if the search will stop after finding a tree better than max_error
- time_limitint, default=0
Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.
- verbosebool, default=False
A parameter used to switch on/off the print of what happens during the search
- descfunction, default=None
A parameter used to indicate heuristic function used to sort the items in descending order
- ascfunction, default=None
A parameter used to indicate heuristic function used to sort the items in ascending order
- repeat_sortbool, default=False
A parameter used to indicate whether the heuristic sort will be applied at each level of the lattice or only at the root
- quietbool, default=True
A parameter used to indicate if the boosting log will be printed or not
- print_outputbool, default=False
A parameter used to indicate if the search output will be printed or not
- cache_typeCache_Type, default=Cache_Type.Cache_TrieItemset
A parameter used to indicate the type of cache used when the DL85Predictor.usecache is set to True.
- maxcachesizeint, default=0
A parameter used to indicate the maximum size of the cache. If the cache size is reached, the cache will be wiped using the DL85Predictor.wipe_type and DL85Predictor.wipe_factor parameters. Default value 0 stands for no limit.
- wipe_typeWipe_Type, default=Wipe_Type.Reuses
A parameter used to indicate the type of cache used when the DL85Predictor.maxcachesize is reached.
- wipe_factorfloat, default=0.5
A parameter used to indicate the rate of elements to delete from the cache when the DL85Predictor.maxcachesize is reached.
- use_cachebool, default=True
A parameter used to indicate if a cache will be used or not
- use_ubbool, default=True
Define whether the hierarchical upper bound is used or not
- dynamic_branchbool, default=True
Define whether a dynamic branching is used to decide in which order explore decisions on an attribute
- Attributes:
- optimal_bool
Whether the found forest is optimal or not
- estimators_list
List of DL85Classifier in the forest
- estimator_weights_list
List of weights of the estimators in the forest
- accuracy_float
Accuracy of the found forest on training set
- duration_float
Time of the optimal forest learning
- n_estimators_int
Number of estimators in the forest
- n_iterations_int
Number of iterations of the forest learning
- margins_list
List of margins of each instance in the training set
- margins_norm_list
List of normalized margins of each instance in the training set
- classes_ndarray, shape (n_classes,)
The classes seen in
fit().
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- softmax(X, copy=True)[source]¶
Calculate the softmax function. The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1) This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this. Parameters ———- X : array-like of float of shape (M, N) Argument to the logistic function. copy : bool, default=True Copy X or not. Returns ——- out : ndarray of shape (M, N) Softmax function evaluated at every point in x.