Generating data with available g.t. decision rule explanations#

We are going to see the available options for data generation with g.t. decision rule explanations and related methods.

[68]:
from teex.decisionRule.data import Statement, DecisionRule, SenecaDR, str_to_decision_rule, rulefit_to_decision_rule

from rulefit import RuleFit

1. DecisionRule objects in teex#

To represent decision rules, teex provides a custom class. In short, we consider the atomic structure of a rule, a Statement, that represents an ‘if’ clause. Then, a DecisionRule object is comprised of a collection of Statement objects, which, if all held true, imply a result, also represented as a Statement.

For example, given the Statements:

  • ‘white_feathers’ == true

  • ‘quacks’ == true

we can build the decision rule that says:

  • if (white_feathers == true) and (quacks == true) then (is_duck == true)

In code, we can build this exact example:

[69]:
s1 = Statement('white_feathers', True)
s2 = Statement('quacks', True)
s3 = Statement('is_duck', True)

dr = DecisionRule([s1, s2], s3)
print(dr)
IF 'white_feathers' = True, 'quacks' = True THEN 'is_duck' = True

or, more human-like:

[70]:
strRule = 'white_feathers = True & quacks = True -> is_duck = True'
dr = str_to_decision_rule(strRule, ruleType='unary')

print(repr(dr), '\n', dr)
<teex.decisionRule.data.DecisionRule object at 0x128cb9970>
 IF 'white_feathers' = True, 'quacks' = True THEN 'is_duck' = True

Statements are flexible and can represent multiple operators ({'=', '!=', '>', '<', '>=', '<='}) and be binary for numeric features (0.5 < feature < 1, for example). Both teex and the methods themselves provide methods for easy manipulation of Statement and DecisionRule objects, such as insertion, deletion or upsertion of new statements into a decision rule object. We urge the keen user to take a look at the API documentation for more on this.

The DecisionRule class provides a unified way of dealing with this kind of data, which allows for easier implementation of related methods, be it data generation or evaluation. So, all DecisionRule metrics work only with DecisionRule objects. Not to worry, because teex provides methods for transforming from common decision rule representations to DecisionRule objects.

2. Generating artificial data with SenecaDR#

note This method in particular was not originally conceived as a data generation procedure, but rather as a way to generate transparent classifiers (i.e. a classifier with available ground truth explanations). We use that generated classifier and some artificially generated data to return a dataset with observations, labels and ground truth explanations. The dataset generated contains numerical features with a binary classification.

As with all data generation procedures in teex, first an object needs to be instanced and then the data can be retrieved. We can adjust the number of samples we want, the number of features in the observations, the feature names and the random seed.

[71]:
dataGen = SenecaDR(nSamples=1000, nFeatures=3)
X, y, exps = dataGen[:]

print(f'Observation: {X[0]} \nLabel: {y[0]} \nExplanation: {exps[0]}')
Observation: [1.25824083 1.37756901 0.4123272 ]
Label: 0
Explanation: IF 0.111 < 'c', -0.015 < 'a', 0.901 < 'b' <= 2.31 THEN 'Class' = 0
[72]:
dataGen.featureNames
[72]:
['a', 'b', 'c']

See how the explanations generated are actually DecisionRule objects, with Statements for each class (not in all cases, though).

[73]:
exps[:5]
[73]:
[<teex.decisionRule.data.DecisionRule at 0x128cc7940>,
 <teex.decisionRule.data.DecisionRule at 0x128cc7ac0>,
 <teex.decisionRule.data.DecisionRule at 0x128cc7be0>,
 <teex.decisionRule.data.DecisionRule at 0x128cc7d00>,
 <teex.decisionRule.data.DecisionRule at 0x128cc7e20>]

See how the explanations generated are actually DecisionRule objects, with Statements for each class (not in all cases, though). Note that we can also specify the feature names instead of letting them be automatically generated. As with all of teex’s Seneca methods, the underlying data generated procedure is carried out by a transparent model that follows the sklearn API (has .predict, .predict_proba and .fit methods). In this case, the model is a Decision Tree classifier, and the explanations are the decision paths that the trained model takes when performing predictions. We believe this class can be of utility to users for easily extracting explanations.

[74]:
from teex.decisionRule.data import TransparentRuleClassifier

model = TransparentRuleClassifier()

# it can fit any binary classification data, not just this example
model.fit(X, y, featureNames=['f1', 'f2', 'f3'])
[75]:
print(model.predict(X[:5]))
[0 1 1 1 1]
[76]:
model.predict_proba(X[:5])
[76]:
array([[1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [0., 1.]])
[77]:
model.explain(X[:5])
[77]:
[<teex.decisionRule.data.DecisionRule at 0x128cb90a0>,
 <teex.decisionRule.data.DecisionRule at 0x128cb92e0>,
 <teex.decisionRule.data.DecisionRule at 0x128cb9ee0>,
 <teex.decisionRule.data.DecisionRule at 0x128eaf9a0>,
 <teex.decisionRule.data.DecisionRule at 0x128eaffa0>]
[78]:
for dr in model.explain(X[:5]):
    print(dr)
IF 0.111 < 'f3', -0.015 < 'f1', 0.901 < 'f2' <= 2.31 THEN 'Class' = 0
IF 'f3' <= -0.324, 0.672 < 'f1', 'f2' <= -0.37 THEN 'Class' = 1
IF 'f3' <= -0.324, 0.672 < 'f1', 'f2' <= -0.37 THEN 'Class' = 1
IF -1.705 < 'f3' <= 0.111, 'f1' <= -0.041, 0.428 < 'f2' <= 0.63 THEN 'Class' = 1
IF -1.705 < 'f3' <= 0.111, 'f1' <= -0.041, 0.635 < 'f2' THEN 'Class' = 1

For more information on the transparent model, please see the notebook on Feature Importance data generation or visit teex’s API documentation.

3. Transforming common representations into DecisionRule objects#

If we want to evaluate common decision rule explanation methods and the evaluation methods in teex work only with DecisionRule objects, we need methods for transforming representations. We have seen how we can transform string representations with str_to_decision_rule, but another useful method is rulefit_to_decision_rule. It transforms the rules computed by the RuleFit algorithm:

[79]:
# first, find some data
boston_data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
y = boston_data.medv.values
features = boston_data.columns
X = boston_data.drop("medv", axis=1).values
[99]:
# instance a rule fit object and get explanations
rf = RuleFit()
rf.fit(X, y, feature_names=features)
/Users/master/Google Drive/U/4t/TFG/teex/venv/lib/python3.8/site-packages/sklearn/linear_model/_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.20433295631139, tolerance: 2.1169160949554895
  model = cd_fast.enet_coordinate_descent(
/Users/master/Google Drive/U/4t/TFG/teex/venv/lib/python3.8/site-packages/sklearn/linear_model/_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.268052878131016, tolerance: 2.1169160949554895
  model = cd_fast.enet_coordinate_descent(
[99]:
RuleFit(tree_generator=GradientBoostingRegressor(learning_rate=0.01,
                                                 max_depth=100,
                                                 max_leaf_nodes=5,
                                                 n_estimators=560,
                                                 random_state=559,
                                                 subsample=0.46436099318265595))

The rules from RuleFit can be extracted from here:

[102]:
rf.get_rules()
[102]:
rule type coef support importance
0 crim linear -0.000000 1.000000 0.000000
1 zn linear 0.002153 1.000000 0.048604
2 indus linear -0.000000 1.000000 0.000000
3 chas linear 0.000000 1.000000 0.000000
4 nox linear -0.000000 1.000000 0.000000
... ... ... ... ... ...
1720 ptratio <= 18.75 & rm <= 7.452499866485596 rule -0.000000 0.401709 0.000000
1721 dis > 6.341400146484375 rule -0.000000 0.145299 0.000000
1722 lstat > 5.184999942779541 & ptratio > 13.84999... rule -0.000000 0.829060 0.000000
1723 tax <= 298.0 rule 0.000000 0.333333 0.000000
1724 crim > 18.737899780273438 rule -0.000000 0.029915 0.000000

1725 rows × 5 columns

and we can convert them into DecisionRule objects with a single line. Note that only the rules are transform, not the base coefficients (type = linear). Our method also provides parameters for the minimum support and importance for a rule to be transformed.

[103]:
# and transform into decision rule objects
dRules, skippedRows = rulefit_to_decision_rule(rules)
[104]:
dRules[:5]
[104]:
[<teex.decisionRule.data.DecisionRule at 0x12efcbfa0>,
 <teex.decisionRule.data.DecisionRule at 0x12ef1b100>,
 <teex.decisionRule.data.DecisionRule at 0x12a32ad00>,
 <teex.decisionRule.data.DecisionRule at 0x12a32a970>,
 <teex.decisionRule.data.DecisionRule at 0x12a32a940>]
[105]:
for rule in dRules[:5]:
    print(rule)
IF 'nox' <= 0.6694999933242798, 'dis' <= 1.3980499505996704 THEN None
IF 'ptratio' <= 18.65000057220459, 7.423499822616577 < 'rm' THEN None
IF 1.1736000180244446 < 'dis', 21.489999771118164 < 'lstat', 'rm' <= 7.423500061035156 THEN None
IF 7.433000087738037 < 'rm', 'lstat' <= 14.805000305175781 THEN None
IF 20.19499969482422 < 'lstat' THEN None