Evaluation of explanation quality: decision rules#

In this notebook, we are going to explore how we can use teex to evaluate decision rule explanations

[14]:

from teex.decisionRule.data import SenecaDR
from teex.decisionRule.eval import rule_scores

# these three imports fix an issue with imports in SkopeRules
import six
import sys
sys.modules['sklearn.externals.six'] = six

from skrules import SkopeRules

The first step is to gather data with available ground truth decison rule explanations. teex makes it simples:

[33]:

dataGen = SenecaDR(nSamples=500, nFeatures=5, randomState=88)
X, y, exps = dataGen[:]

[34]:

X[:5]

[34]:

array([[-2.02711717, -1.32958987, -0.77103092,  0.99843625, -2.27314715],
       [-0.64076447,  1.62339205,  1.75445611, -1.00969545,  1.83765661],
       [ 1.50354713, -1.27483644, -2.19842768, -1.05181378,  1.07449273],
       [ 0.06917376, -0.45268848, -1.05498443,  0.00318232, -0.65430449],
       [ 1.04850317,  2.69542922,  2.05851293, -0.06200245, -1.50837284]])

[35]:

y[:5]

[35]:

array([1, 1, 0, 1, 0])

[36]:

for e in exps[:5]:
    print(e)

IF 'a' <= -0.648, 'e' <= 0.125, 'c' <= -0.638, -1.473 < 'd' THEN 'Class' = 1
IF 'a' <= 0.962, 0.278 < 'e', -1.018 < 'b', -1.441 < 'c', 'd' <= 1.025 THEN 'Class' = 1
IF 0.962 < 'a', -2.876 < 'b' <= -0.656, 'd' <= -0.766, 'c' <= -2.147, -0.739 < 'e' THEN 'Class' = 0
IF -0.467 < 'a' <= 0.962, 'e' <= 0.125, 'c' <= -0.638, -1.473 < 'd' THEN 'Class' = 1
IF 0.962 < 'a', -0.095 < 'b', -1.843 < 'd', -2.64 < 'e' THEN 'Class' = 0

The second step is training an estimator and predicting explanations. We could use any system for training and generating the explanations. We are going to skip this step, as its independent to teex and up to the user to decide in which way to generate the explanations. Instead, we are going to use the ground truth explanations as if they were the predicted ones.

So, we compare the predicted explanations with the ground truth ones.

[39]:

metrics = ['crq', 'prec', 'rec', 'fscore']
scores = rule_scores(exps, exps, dataGen.featureNames, metrics)

/usr/local/lib/python3.8/site-packages/teex/featureImportance/eval.py:77: UserWarning: A binary ground truth contains uniform values, so one entry has been randomly flipped for the metrics to be defined.
  warnings.warn('A binary ground truth contains uniform values, so one entry has been randomly flipped '
/usr/local/lib/python3.8/site-packages/teex/featureImportance/eval.py:80: UserWarning: A binary prediction contains uniform values, so one entry has been randomly flipped for the metrics to be defined.
  warnings.warn('A binary prediction contains uniform values, so one entry has been randomly flipped '

[40]:

for i, metric in enumerate(metrics):
    print(f'{metric}: {scores[i]}')

crq: 1.0
prec: 1.0
rec: 1.0
fscore: 1.0

We obtain perfect scores, as the ground truths are exactly the same as the predictions.