teex.wordImportance package#
teex.wordImportance.data module#
Module for real datasets with available ground truth word importance explanations. Also contains methods and classes for word importance data manipulation.
- class teex.wordImportance.data.Newsgroup[source]#
Bases:
_ClassificationDataset
20 Newsgroup dataset. Contains 188 human annotaded newsgroup texts belonging to two categories. From
Sina Mohseni, Jeremy E Block, and Eric Ragan. 2021. Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark. https://doi.org/10.1145/3397481.3450689
- Example:
>>> nDataset = Newsgroup() >>> obs, label, exp = nDataset[1]
where
obs
is a str,label
is an int andexp
is a dict. containing a score for each important word inobs
. When a slice is performed, obs, label and exp are lists of the objects described above.
teex.wordImportance.eval module#
Module for evaluation of word importance explanations.
- teex.wordImportance.eval.word_importance_scores(gts: Union[Dict[str, float], List[Dict[str, float]]], preds: Union[Dict[str, float], List[Dict[str, float]]], vocabWords: Optional[Union[List[str], List[List[str]]]] = None, metrics: Optional[Union[str, List[str]]] = None, binThreshold: float = 0.5, average: bool = True, verbose: bool = False) ndarray [source]#
Quality metrics for word importance explanations, where each word is considered as a feature. An example of an explanation:
>>> {'skate': 0.7, 'to': 0.2, 'me': 0.5}
- Parameters:
gts – (dict, array-like of dicts) ground truth word importance/s, where each BOW is represented as a dictionary with words as keys and floats as importances. Importances must be in \([0, 1]\) or + \([-1, 1]\).
preds – (dict, array-like of dicts) predicted word importance/s, where each BOW is represented as a dictionary with words as keys and floats as importances. Importances must be in the same scale as param.
gts
.vocabWords – (array-like of str 1D or 2D for multiple reference vocabularies, default None) Vocabulary words. If
None
, the union of the words in each ground truth and predicted explanation will be interpreted as the vocabulary words. This is needed for when explanations are converted to feature importance vectors. If this parameter is provided as a 1D list, the vocabulary words will be the same for all explanations, but if not provided or given as a 2D array-like (same number of reference vocabularies as there are explanations), different vocabulary words will be considered for each explanation.metrics –
(str / array-like of str, default=[‘prec’]) Quality metric/s to compute. Available:
All metrics in
teex.featureImportance.eval.feature_importance_scores()
.
binThreshold (float) – (in [0, 1], default .5) pixels of images in
sMaps
with a val bigger than this will be set to 1 and 0 otherwise when binarizing for the computation of ‘fscore’, ‘prec’, ‘rec’ and ‘auc’.average (bool) – (default
True
) Used only ifgts
andpreds
contain multiple observations. Should the computed metrics be averaged across all samples?verbose (bool) – Will the call be verbose?
- Returns:
specified metric/s in the original order. Can be of shape:
(n_metrics,) if only one image has been provided in both
gts
andpreds
or when both are contain multiple observations andaverage=True
.(n_metrics, n_samples) if
gts
andpreds
contain multiple observations andaverage=False
.
- Return type:
np.ndarray
- teex.wordImportance.eval.word_to_feature_importance(wordImportances, vocabWords) list [source]#
Maps words with importance weights into a feature importance vector.
- Parameters:
wordImportances – (dict or array-like of dicts) words with feature importances as values with the same format as described in the method
word_importance_scores()
.vocabWords – (array-like of str, 1D or 2D for multiple reference vocabularies) \(m\) words that should be taken into account when transforming into vector representations. Their order will be followed.
- Returns:
Word importances as feature importance vectors. Return types:
list of np.ndarray, if multiple vocabularies because of the possible difference in size of the reference vocabularies in each explanation.
np.ndarray, if only 1 vocabulary
- Example:
>>> word_to_feature_importance({'a': 1, 'b': .5},['a', 'b', 'c']) >>> [1, .5, 0] >>> word_to_feature_importance([{'a': 1, 'b': .5}, {'b': .5, 'c': .9}],['a', 'b', 'c']) >>> [[1, .5, 0. ], [0, .5, .9]]