Generating data with available g.t. decision rule explanations#
We are going to see the available options for data generation with g.t. decision rule explanations and related methods.
[68]:
from teex.decisionRule.data import Statement, DecisionRule, SenecaDR, str_to_decision_rule, rulefit_to_decision_rule
from rulefit import RuleFit
1. DecisionRule objects in teex#
To represent decision rules, teex provides a custom class. In short, we consider the atomic structure of a rule, a Statement
, that represents an ‘if’ clause. Then, a DecisionRule
object is comprised of a collection of Statement
objects, which, if all held true, imply a result, also represented as a Statement
.
For example, given the Statements:
‘white_feathers’ == true
‘quacks’ == true
we can build the decision rule that says:
if (
white_feathers
== true) and (quacks
== true) then (is_duck
== true)
In code, we can build this exact example:
[69]:
s1 = Statement('white_feathers', True)
s2 = Statement('quacks', True)
s3 = Statement('is_duck', True)
dr = DecisionRule([s1, s2], s3)
print(dr)
IF 'white_feathers' = True, 'quacks' = True THEN 'is_duck' = True
or, more human-like:
[70]:
strRule = 'white_feathers = True & quacks = True -> is_duck = True'
dr = str_to_decision_rule(strRule, ruleType='unary')
print(repr(dr), '\n', dr)
<teex.decisionRule.data.DecisionRule object at 0x128cb9970>
IF 'white_feathers' = True, 'quacks' = True THEN 'is_duck' = True
Statements are flexible and can represent multiple operators ({'=', '!=', '>', '<', '>=', '<='}
) and be binary for numeric features (0.5 < feature < 1
, for example). Both teex and the methods themselves provide methods for easy manipulation of Statement
and DecisionRule
objects, such as insertion, deletion or upsertion of new statements into a decision rule object. We urge the keen user to take a look at the API documentation for more on this.
The DecisionRule
class provides a unified way of dealing with this kind of data, which allows for easier implementation of related methods, be it data generation or evaluation. So, all DecisionRule metrics work only with DecisionRule objects. Not to worry, because teex provides methods for transforming from common decision rule representations to DecisionRule objects.
2. Generating artificial data with SenecaDR#
note This method in particular was not originally conceived as a data generation procedure, but rather as a way to generate transparent classifiers (i.e. a classifier with available ground truth explanations). We use that generated classifier and some artificially generated data to return a dataset with observations, labels and ground truth explanations. The dataset generated contains numerical features with a binary classification.
As with all data generation procedures in teex, first an object needs to be instanced and then the data can be retrieved. We can adjust the number of samples we want, the number of features in the observations, the feature names and the random seed.
[71]:
dataGen = SenecaDR(nSamples=1000, nFeatures=3)
X, y, exps = dataGen[:]
print(f'Observation: {X[0]} \nLabel: {y[0]} \nExplanation: {exps[0]}')
Observation: [1.25824083 1.37756901 0.4123272 ]
Label: 0
Explanation: IF 0.111 < 'c', -0.015 < 'a', 0.901 < 'b' <= 2.31 THEN 'Class' = 0
[72]:
dataGen.featureNames
[72]:
['a', 'b', 'c']
See how the explanations generated are actually DecisionRule
objects, with Statements for each class (not in all cases, though).
[73]:
exps[:5]
[73]:
[<teex.decisionRule.data.DecisionRule at 0x128cc7940>,
<teex.decisionRule.data.DecisionRule at 0x128cc7ac0>,
<teex.decisionRule.data.DecisionRule at 0x128cc7be0>,
<teex.decisionRule.data.DecisionRule at 0x128cc7d00>,
<teex.decisionRule.data.DecisionRule at 0x128cc7e20>]
See how the explanations generated are actually DecisionRule
objects, with Statements for each class (not in all cases, though). Note that we can also specify the feature names instead of letting them be automatically generated. As with all of teex’s Seneca
methods, the underlying data generated procedure is carried out by a transparent model that follows the sklearn
API (has .predict
, .predict_proba
and .fit
methods). In this case, the model is a Decision Tree
classifier, and the explanations are the decision paths that the trained model takes when performing predictions. We believe this class can be of utility to users for easily extracting explanations.
[74]:
from teex.decisionRule.data import TransparentRuleClassifier
model = TransparentRuleClassifier()
# it can fit any binary classification data, not just this example
model.fit(X, y, featureNames=['f1', 'f2', 'f3'])
[75]:
print(model.predict(X[:5]))
[0 1 1 1 1]
[76]:
model.predict_proba(X[:5])
[76]:
array([[1., 0.],
[0., 1.],
[0., 1.],
[0., 1.],
[0., 1.]])
[77]:
model.explain(X[:5])
[77]:
[<teex.decisionRule.data.DecisionRule at 0x128cb90a0>,
<teex.decisionRule.data.DecisionRule at 0x128cb92e0>,
<teex.decisionRule.data.DecisionRule at 0x128cb9ee0>,
<teex.decisionRule.data.DecisionRule at 0x128eaf9a0>,
<teex.decisionRule.data.DecisionRule at 0x128eaffa0>]
[78]:
for dr in model.explain(X[:5]):
print(dr)
IF 0.111 < 'f3', -0.015 < 'f1', 0.901 < 'f2' <= 2.31 THEN 'Class' = 0
IF 'f3' <= -0.324, 0.672 < 'f1', 'f2' <= -0.37 THEN 'Class' = 1
IF 'f3' <= -0.324, 0.672 < 'f1', 'f2' <= -0.37 THEN 'Class' = 1
IF -1.705 < 'f3' <= 0.111, 'f1' <= -0.041, 0.428 < 'f2' <= 0.63 THEN 'Class' = 1
IF -1.705 < 'f3' <= 0.111, 'f1' <= -0.041, 0.635 < 'f2' THEN 'Class' = 1
For more information on the transparent model, please see the notebook on Feature Importance data generation or visit teex’s API documentation.
3. Transforming common representations into DecisionRule
objects#
If we want to evaluate common decision rule explanation methods and the evaluation methods in teex work only with DecisionRule
objects, we need methods for transforming representations. We have seen how we can transform string representations with str_to_decision_rule
, but another useful method is rulefit_to_decision_rule
. It transforms the rules computed by the RuleFit algorithm:
[79]:
# first, find some data
boston_data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
y = boston_data.medv.values
features = boston_data.columns
X = boston_data.drop("medv", axis=1).values
[99]:
# instance a rule fit object and get explanations
rf = RuleFit()
rf.fit(X, y, feature_names=features)
/Users/master/Google Drive/U/4t/TFG/teex/venv/lib/python3.8/site-packages/sklearn/linear_model/_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.20433295631139, tolerance: 2.1169160949554895
model = cd_fast.enet_coordinate_descent(
/Users/master/Google Drive/U/4t/TFG/teex/venv/lib/python3.8/site-packages/sklearn/linear_model/_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.268052878131016, tolerance: 2.1169160949554895
model = cd_fast.enet_coordinate_descent(
[99]:
RuleFit(tree_generator=GradientBoostingRegressor(learning_rate=0.01,
max_depth=100,
max_leaf_nodes=5,
n_estimators=560,
random_state=559,
subsample=0.46436099318265595))
The rules from RuleFit can be extracted from here:
[102]:
rf.get_rules()
[102]:
rule | type | coef | support | importance | |
---|---|---|---|---|---|
0 | crim | linear | -0.000000 | 1.000000 | 0.000000 |
1 | zn | linear | 0.002153 | 1.000000 | 0.048604 |
2 | indus | linear | -0.000000 | 1.000000 | 0.000000 |
3 | chas | linear | 0.000000 | 1.000000 | 0.000000 |
4 | nox | linear | -0.000000 | 1.000000 | 0.000000 |
... | ... | ... | ... | ... | ... |
1720 | ptratio <= 18.75 & rm <= 7.452499866485596 | rule | -0.000000 | 0.401709 | 0.000000 |
1721 | dis > 6.341400146484375 | rule | -0.000000 | 0.145299 | 0.000000 |
1722 | lstat > 5.184999942779541 & ptratio > 13.84999... | rule | -0.000000 | 0.829060 | 0.000000 |
1723 | tax <= 298.0 | rule | 0.000000 | 0.333333 | 0.000000 |
1724 | crim > 18.737899780273438 | rule | -0.000000 | 0.029915 | 0.000000 |
1725 rows × 5 columns
and we can convert them into DecisionRule
objects with a single line. Note that only the rules are transform, not the base coefficients (type = linear). Our method also provides parameters for the minimum support and importance for a rule to be transformed.
[103]:
# and transform into decision rule objects
dRules, skippedRows = rulefit_to_decision_rule(rules)
[104]:
dRules[:5]
[104]:
[<teex.decisionRule.data.DecisionRule at 0x12efcbfa0>,
<teex.decisionRule.data.DecisionRule at 0x12ef1b100>,
<teex.decisionRule.data.DecisionRule at 0x12a32ad00>,
<teex.decisionRule.data.DecisionRule at 0x12a32a970>,
<teex.decisionRule.data.DecisionRule at 0x12a32a940>]
[105]:
for rule in dRules[:5]:
print(rule)
IF 'nox' <= 0.6694999933242798, 'dis' <= 1.3980499505996704 THEN None
IF 'ptratio' <= 18.65000057220459, 7.423499822616577 < 'rm' THEN None
IF 1.1736000180244446 < 'dis', 21.489999771118164 < 'lstat', 'rm' <= 7.423500061035156 THEN None
IF 7.433000087738037 < 'rm', 'lstat' <= 14.805000305175781 THEN None
IF 20.19499969482422 < 'lstat' THEN None