Tutorial¶
For this tutorial, we suppose you are a teacher who has given an exam, and is now comparing the exam results on different grading scales. The grades for the exams are numbers between 0-100, and you are experimenting with different cutoffs for letter grades, and different definitions of "passing" letter grades.
Letter Grades¶
To start out, let's write a simple data map that defines a resolver to derive "exam_letter_grades"
from "exam_scores"
and "letter_grade_cutoffs"
.
exam_scores = [98, 73, 65, 95, 88, 58, 40, 94]
default_letter_grade_cutoffs = {90: "A", 80: "B", 70: "C", 60: "D", 0: "F"}
data_map = {
"exam_scores": exam_scores,
"letter_grade_cutoffs": default_letter_grade_cutoffs,
"exam_letter_grades": lambda exam_scores, letter_grade_cutoffs: [
letter_grade_cutoffs[max((cut for cut in letter_grade_cutoffs if cut < score))]
for score in exam_scores
]
}
Now we can import datajet and get the letter grades for our exam scores:
import datajet
datajet.execute(data_map, fields=["exam_letter_grades"])
# Result
# {'exam_letter_grades': ['A', 'C', 'D', 'A', 'B', 'F', 'F', 'A']}
Great, easy enough. Let's add a pass/fail component to the datamap now, and find how many are passing:
data_map = {
"exam_scores": exam_scores,
"letter_grade_cutoffs": default_letter_grade_cutoffs,
"exam_letter_grades": lambda exam_scores, letter_grade_cutoffs: [
letter_grade_cutoffs[max((cut for cut in letter_grade_cutoffs if cut < score))]
for score in exam_scores
],
# We define passing as having a "A", "B", "C" or "D" grade
"passing_grades": set(["A", "B", "C", "D"]),
"exam_pass_fail_grades": lambda passing_grades, exam_letter_grades: [grade in passing_grades for grade in exam_letter_grades],
"n_passing": {"in": ["exam_pass_fail_grades"], "f": sum},
}
Let's see how many students passed:
datajet.execute(data_map, fields=["n_passing"])
# Result
{'n_passing': 6}
Note we can also return several different fields:
datajet.execute(data_map, fields=["pct_passing","exam_letter_grades","exam_pass_fail_grades"])
# Result
{'exam_letter_grades': ['A', 'C', 'D', 'A', 'B', 'F', 'F', 'A'],
'exam_pass_fail_grades': [True, True, True, True, True, False, False, True],
'n_passing': 6}
Overwrite DataMap at execute time¶
Say you wanted to calculate the n_passing
on a different grading scale, this time with a pass/fail cutoff of 75:
execute(
data_map,
context={
"letter_grade_cutoffs": {75: "Pass", 0: "Fail"},
"passing_grades": ["Pass"]
},
fields=["n_passing","exam_letter_grades","exam_pass_fail_grades"]
)
# Result
{'exam_letter_grades': ['Pass', 'Fail', 'Fail', 'Pass', 'Pass', 'Fail', 'Fail', 'Pass'],
'exam_pass_fail_grades': [True, False, False, True, True, False, False, True],
'n_passing': 4}
Explanation¶
In this example, the context
field overrides the default values we declared in our original datamap for "letter_grade_cutoffs"
and "passing_grades"
. We set "letter_grade_cutoffs"
to {75: "Pass"}
, which, according to the logic we originally declared in DataPoint "exam_letter_grades"
, means that any exam with a score >=75 will be given the "letter grade" of "Pass"
, while others will receive the "letter grade" "F"
(derived from the default value we declared for the DataPoint "lowest_grade"
). We also tell datajet via the context
parameter that we accept only "Pass"
as a "passing grade."
You can see the logic a little better if you ask for a few more fields:
execute(
data_map,
context={
"letter_grade_cutoffs": {75: "Pass", 0: "Fail"},
"passing_grades": ["Pass"]
},
fields=["n_passing","exam_letter_grades","exam_pass_fail_grades","exam_scores"]
)
{'exam_scores': [98, 73, 65, 95, 88, 58, 40, 94],
'exam_letter_grades': ['Pass', 'Fail', 'Fail', 'Pass', 'Pass', 'Fail', 'Fail', 'Pass'],
'exam_pass_fail_grades': [True, False, False, True, True, False, False, True],
'n_passing': 4}
Find number of passing exams when you only have letter grades to start¶
Say, you started with a set of letter grades, and wanted to know the pct passing, taking either As, Bs, or Cs as "passing".
execute(
data_map,
context={
"exam_letter_grades": ("A"*8)+("B"*18)+("C"*14)+("D"*13)+("F"*5),
"passing_grades": "ABC"
},
fields=['n_passing']
)
{'n_passing': 40}
In this case, datajet bypasses calculating the "exam_letter_grades"
from "exam_scores"
because we gave it the letter grades as constants, and thus removed the dependency of "exam_letter_grades"
on "exam_scores"
that existed in our original DataMap declaration. DataJet takes the "exam_letter_grades"
as-is from our context
, and compares those grades with the "passing_grades"
we also specified in the context, then uses the logic originally declared in the DataMap for "exam_pass_fail_grades"
and "n_passing"
to tell us that 40 of the exams had passing grades.