Generating data with available g.t. word importance explanations#

We are going to see an example of data generation with g.t. word importance explanations.

[25]:
from teex.wordImportance.data import Newsgroup

import numpy as np

Word importance representations in teex are presented as dictionaries. These dictionaries contain as keys all of the words (or at least the relevant ones / the ones that have been scored) in a text, and as values the scores. Let’s see an example:

[4]:
dataGen = Newsgroup()
X, y, exps = dataGen[:]

The Newsgroup dataset contains texts from emails that correspond to either a medical or an electronic class:

[5]:
dataGen.classMap
[5]:
{0: 'electronics', 1: 'medicine'}

Each text is represented as a string:

[9]:
X[3]
[9]:
b"From: cfb@fc.hp.com (Charlie Brett)\nSubject: Re: Hi Volt from battery\nNntp-Posting-Host: hpfcmgw.fc.hp.com\nOrganization: Hewlett-Packard Fort Collins Site\nX-Newsreader: TIN [version 1.1 PL8.5]\nLines: 7\n\nYou might want to get a disposible flash camera, shoot the roll of film,\nthen take it apart (they're snapped together). We used a bunch of them\nat my wedding, but instead of sending the whole camera in, I just took\nthe film out (it's a standard 35mm canister), and kept the batteries\n(they use one AA battery). Sorry, I didn't keep any of the flash electronics.\n\n          Charlie Brett - Ft. Collins, CO\n"

corresponds to a specific class:

[10]:
dataGen.classMap[y[3]]
[10]:
'electronics'

and has a ground truth explanation with the format explained above:

[11]:
exps[3]
[11]:
{'volt': 1.0,
 'battery': 1.0,
 'batteries': 0.5,
 'electronics': 1.0,
 'flash': 0.5}

In this instance, the words in the explanation are the ones that characterize the text as pertaining to the “electronics” class. A medical example could be:

[38]:
X[23]
[38]:
b"From: oldman@coos.dartmouth.edu (Prakash Das)\nSubject: Re: Is MSG sensitivity superstition?\nArticle-I.D.: dartvax.C60KrL.59t\nOrganization: Dartmouth College, Hanover, NH\nLines: 19\n\nIn article <1993Apr20.173019.11903@llyene.jpl.nasa.gov> julie@eddie.jpl.nasa.gov (Julie Kangas) writes:\n>\n>As for how foods taste:  If I'm not allergic to MSG and I like\n>the taste of it, why shouldn't I use it?  Saying I shouldn't use\n>it is like saying I shouldn't eat spicy food because my neighbor\n>has an ulcer.\n\nJulie, it doesn't necessarily follow that you should use it (MSG or\nsomething else for that matter) simply because you are not allergic\nto it. For example you might not be allergic to (animal) fats, and\nlike their taste, yet it doesn't follow that you should be using them\n(regularly). MSG might have other bad (or good, I am not up on \nknowledge of MSG) effects on your body in the long run, maybe that's\nreason enough not to use it. \n\nAltho' your example of the ulcer is funny, it isn't an\nappropriate comparison at all.\n\n-Prakash Das\n"
[39]:
dataGen.classMap[y[23]]
[39]:
'medicine'
[40]:
exps[23]
[40]:
{'msg': 0.5, 'ulcer': 0.5, 'allergic': 1.0, 'sensitivity': 0.5}