teex.featureImportance package#

teex.featureImportance.data module#

Module for synthetic and real datasets with available ground truth feature importance explanations. Also contains methods and classes for decisionRule data manipulation.

All of the datasets must be instanced first. Then, when sliced, they all return the observations, labels and ground truth explanations, respectively.

class teex.featureImportance.data.SenecaFI(nSamples: int = 200, nFeatures: int = 3, featureNames=None, randomState: int = 888)[source]#

Bases: _SyntheticDataset

Generate synthetic binary classification tabular data with ground truth feature importance explanations. This method was presented in [Evaluating local explanation methods on ground truth, Riccardo Guidotti, 2021].

From this class one can also obtain a trained transparent model (instance of TransparentLinearClassifier). When sliced, this object will return

  • X (ndarray) of shape (nSamples, nFeatures) or (nFeatures). Generated data.

  • y (ndarray) of shape (nSamples,) or int. Generated binary data labels.

  • explanations (ndarray) of shape (nSamples, nFeatures) or (nFeatures). Generated g.t. feature importance explanations. For each explanation, the values are normalised to the [-1, 1] range.

Parameters:
  • nSamples – (int) number of samples to be generated.

  • nFeatures – (int) total number of features in the generated data.

  • featureNames – (array-like) names of the generated features. If not provided, a list with the generated feature names will be returned by the function.

  • randomState – (int) random state seed.

class teex.featureImportance.data.TransparentLinearClassifier(randomState: int = 888)[source]#

Bases: _BaseClassifier

Used on the higher level data generation class SenecaFI (use that and get it from there preferably).

Transparent, linear classifier with feature importances as explanations. This class also generates labeled data according to the generated random linear expression. Presented in [Evaluating local explanation methods on ground truth, Riccardo Guidotti, 2021].

explain(data, newLabels=None)[source]#

Get feature importance explanation as the gradient of the expression evaluated at the point (from the n ‘training’ observations) with the same class as ‘obs’ and closest to the decision boundary f = 0.

The procedure is as follows: for each data observation x to explain, get the observation z from the ‘training’ data that is closer to the decision boundary and is of different class than x. Then, get the observation t from the ‘training’ data that is closer to z but of the same class as x. Finally, return the explanation for x as the gradient vector of f evaluated at t.

Parameters:
  • data – (ndarray) array of k observations and m features, shape (k, m).

  • newLabels – (ndarray, optional) precomputed data labels (binary ints) for ‘data’. Shape (k).

Returns:

(ndarray) (k, m) array of feature importance explanations.

fit(nFeatures=None, featureNames=None, nSamples=100) None[source]#

Generates a random linear expression and random data labeled by the linear expression as a binary dataset.

Parameters:
  • nFeatures – (int) number of features in the data.

  • featureNames – (array-like) names of the features in the data.

  • nSamples – (int) number of samples for the generated data.

Returns:

(ndarray, ndarray) data of shape (n, m) and their respective labels of shape (n)

predict(data)[source]#

Predicts label for observations. Class 1 if f(x) > 0 and 0 otherwise where x is a point to label and f() is the generated classification expression.

Parameters:

data – (ndarray) observations to label, shape (k, m).

Returns:

(ndarray) array of length n with binary labels.

predict_proba(data)[source]#

Get class probabilities by evaluating the expression f at ‘data’, normalizing the result and setting the probabilities as 1 - norm(f(data)), norm(f(data)).

Parameters:

data – (ndarray) observations for which to obtain probabilities, shape (k, m).

Returns:

(ndarray) array of shape (n, 2) with predicted class probabilities.

teex.featureImportance.data.lime_to_feature_importance(exp, nFeatures, label=1)[source]#

Convert from a lime.explanation.Explanation object to a np.array feature importance vector.

Parameters:
  • exp (lime.explanation.Explanation) – explanation to convert to vector.

  • label – (int, str) label of lime explanation. If lime explanations are generated by default, then it will be 1.

  • nFeatures (int) – number of features in the explanation

Returns:

feature importance vector

Return type:

np.ndarray

teex.featureImportance.data.scale_fi_bounds(x: ndarray, verbose: bool = False)[source]#

Map values of an 1D or 2D np.ndarray on certain conditions. The mapping is on a by-column basis. That is, each column will be separately scaled.:

(for each column in ``x``)
if values in the range [-1, 1] or [0, 1]       -> do nothing
else:
    case 1: if values in the [0, inf] range    -> map to [0, 1]
    case 2: if values in the [-inf, 0] range   -> map to [-1, 1]
    case 3: if values in the [-inf, inf] range -> map to [-1, 1] 

teex.featureImportance.eval module#

Module for evaluation of feature importance explanations.

teex.featureImportance.eval.cosine_similarity(u, v, bounding: str = 'abs') float[source]#

Computes cosine similarity between two real valued arrays. If negative, returns 0.

Parameters:
  • u – (array-like), real valued array of dimension n.

  • v – (array-like), real valued array of dimension n.

  • bounding (str) – if the CS is < 0, bound it in [0, 1] via absolute val (‘abs’) or max(0, val) (‘max’)

Return float:

(0, 1) cosine similarity.

teex.featureImportance.eval.feature_importance_scores(gts, preds, metrics=None, average: bool = True, thresholdType: str = 'abs', binThreshold: float = 0.5, verbose: bool = True)[source]#

Computes quality metrics between one or more feature importance vectors. The values in the vectors must be bounded in [0, 1] or [-1, 1] (to indicate negative importances in the second case). If they are not, the values will be mapped.

For the computation of the precision, recall and FScore, the vectors are binarized to simulate a classification setting depending on the param. thresholdType. In the case of ROC AUC, the ground truth feature importance vector will be binarized as in the case of ‘precision’, ‘recall’ and ‘FScore’ and the predicted feature importance vector entries will be considered as prediction scores. If the predicted vectors contain negative values, these will be either mapped to 0 or taken their absolute val (depending on the chosen option in the param. thresholdType).

Edge cases: Edge cases for when metrics are not defined have been accounted for:

  • When computing classification scores (‘fscore’, ‘prec’, ‘rec’), if there is only one class in the ground truth and / or the prediction, one random feature will be flipped (same feature in both). Note that some metrics such as ‘auc’ may still be undefined in this case if there is only 1 feature per data observation.

  • For ‘auc’, although the ground truth is binarized, the prediction vector represents scores, and so, if both contain only one value, only in the ground truth a feature will be flipped. In the prediction, a small amount (\(1^{-4}\)) will be summed to a random feature if no value is != 0.

  • When computing cosine similarity, if there is no value != 0 in the ground truth and / or prediction, one random feature will be summed 1e-4.

On vector ranges: If the ground truth array or the predicted array have values that are not bounded in \([-1, 1]\) or \([0, 1]\), they will be mapped accordingly. Note that if the values lie within \([-1, 1]\) or \([0, 1]\) no mapping will be performed, so it is assumed that the scores represent feature importances in those ranges. These are the cases considered for the mapping:

  • if values in the \([0, \infty]\) range: map to \([0, 1]\)

  • if values in the \([-\infty, 0]\) range: map to \([-1, 1]\)

  • if values in the \([-\infty, \infty]\) range: map to \([-1, 1]\)

Parameters:
  • gts (np.ndarray) – (1d np.ndarray or 2d np.ndarray of shape (n_features, n_samples)) ground truth feature importance vectors.

  • preds (np.ndarray) – (1d np.ndarray or 2d np.ndarray of shape (n_features, n_samples)) predicted feature importance vectors.

  • metrics

    (str or array-like of str) metric/s to be computed. Available metrics are

    • ’fscore’: Computes the F1 Score between the ground truths and the predicted vectors.

    • ’prec’: Computes the Precision Score between the ground truths and the predicted vectors.

    • ’rec’: Computes the Recall Score between the ground truths and the predicted vectors.

    • ’auc’: Computes the ROC AUC Score between the ground truths and the predicted vectors.

    • ’cs’: Computes the Cosine Similarity between the ground truths and the predicted vectors.

    The vectors are automatically binarized for computing recall, precision and fscore.

  • average (bool) – (default True) Used only if gt and rule contain multiple observations. Should the computed metrics be averaged across all the samples?

  • thresholdType (str) –

    Options for the binarization of the features for the computation of ‘fscore’, ‘prec’, ‘rec’ and ‘auc’.

    • ’abs’: features with absolute val <= binThreshold will be set to 0 and 1 otherwise. For the predicted feature importances in the case of ‘auc’, their absolute val will be taken.

    • ’thres’: features <= binThreshold will be set to 0, 1 otherwise. For the predicted feature importances in the case of ‘auc’, negative values will be cast to 0 and the others left as-is.

  • binThreshold (float) – (in [-1, 1]) Threshold for the binarization of the features for the computation of ‘fscore’, ‘prec’, ‘rec’ and ‘auc’. The binarization depends on both this parameter and thresholdType. If thresholdType = 'abs', binThreshold cannot be negative.

  • verbose (bool) – Verbosity of warnings. True will report warnings, `False` will not.

Returns:

(ndarray of shape (n_metrics,) or (n_samples, n_metrics)) specified metric/s in the indicated order.