teex.wordImportance package#

teex.wordImportance.data module#

Module for real datasets with available ground truth word importance explanations. Also contains methods and classes for word importance data manipulation.

class teex.wordImportance.data.Newsgroup[source]#

Bases: _ClassificationDataset

20 Newsgroup dataset. Contains 188 human annotaded newsgroup texts belonging to two categories. From

Sina Mohseni, Jeremy E Block, and Eric Ragan. 2021. Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark. https://doi.org/10.1145/3397481.3450689

Example:

>>> nDataset = Newsgroup()
>>> obs, label, exp = nDataset[1]

where obs is a str, label is an int and exp is a dict. containing a score for each important word in obs. When a slice is performed, obs, label and exp are lists of the objects described above.

teex.wordImportance.eval module#

Module for evaluation of word importance explanations.

teex.wordImportance.eval.word_importance_scores(gts: Union[Dict[str, float], List[Dict[str, float]]], preds: Union[Dict[str, float], List[Dict[str, float]]], vocabWords: Optional[Union[List[str], List[List[str]]]] = None, metrics: Optional[Union[str, List[str]]] = None, binThreshold: float = 0.5, average: bool = True, verbose: bool = False) ndarray[source]#

Quality metrics for word importance explanations, where each word is considered as a feature. An example of an explanation:

>>> {'skate': 0.7, 'to': 0.2, 'me': 0.5}
Parameters:
  • gts – (dict, array-like of dicts) ground truth word importance/s, where each BOW is represented as a dictionary with words as keys and floats as importances. Importances must be in \([0, 1]\) or + \([-1, 1]\).

  • preds – (dict, array-like of dicts) predicted word importance/s, where each BOW is represented as a dictionary with words as keys and floats as importances. Importances must be in the same scale as param. gts.

  • vocabWords – (array-like of str 1D or 2D for multiple reference vocabularies, default None) Vocabulary words. If None, the union of the words in each ground truth and predicted explanation will be interpreted as the vocabulary words. This is needed for when explanations are converted to feature importance vectors. If this parameter is provided as a 1D list, the vocabulary words will be the same for all explanations, but if not provided or given as a 2D array-like (same number of reference vocabularies as there are explanations), different vocabulary words will be considered for each explanation.

  • metrics

    (str / array-like of str, default=[‘prec’]) Quality metric/s to compute. Available:

  • binThreshold (float) – (in [0, 1], default .5) pixels of images in sMaps with a val bigger than this will be set to 1 and 0 otherwise when binarizing for the computation of ‘fscore’, ‘prec’, ‘rec’ and ‘auc’.

  • average (bool) – (default True) Used only if gts and preds contain multiple observations. Should the computed metrics be averaged across all samples?

  • verbose (bool) – Will the call be verbose?

Returns:

specified metric/s in the original order. Can be of shape:

  • (n_metrics,) if only one image has been provided in both gts and preds or when both are contain multiple observations and average=True.

  • (n_metrics, n_samples) if gts and preds contain multiple observations and average=False.

Return type:

np.ndarray

teex.wordImportance.eval.word_to_feature_importance(wordImportances, vocabWords) list[source]#

Maps words with importance weights into a feature importance vector.

Parameters:
  • wordImportances – (dict or array-like of dicts) words with feature importances as values with the same format as described in the method word_importance_scores().

  • vocabWords – (array-like of str, 1D or 2D for multiple reference vocabularies) \(m\) words that should be taken into account when transforming into vector representations. Their order will be followed.

Returns:

Word importances as feature importance vectors. Return types:

  • list of np.ndarray, if multiple vocabularies because of the possible difference in size of the reference vocabularies in each explanation.

  • np.ndarray, if only 1 vocabulary

Example:

>>> word_to_feature_importance({'a': 1, 'b': .5},['a', 'b', 'c'])
>>> [1, .5, 0]
>>> word_to_feature_importance([{'a': 1, 'b': .5}, {'b': .5, 'c': .9}],['a', 'b', 'c'])
>>> [[1, .5, 0. ], [0, .5, .9]]