Parameters
The key part of a BKT application is estimating the best parameters.
The four parameters,
Slip parameter
This parameter,
Important as it is, it is also up to some interpretation.
Here is some discussion [Baker2010]:
• Recently, there has been work towards contextualizing the guess and slip
parameters (Baker, Corbett, & Aleven, 2008a, 2008b)
• Do we really think the chance that an incorrect response was a slip is
equal when
– Student has never gotten action right; spends 78 seconds thinking; answers;
gets it wrong
– Student has gotten action right 3 times in a row; spends 1.2 seconds
thinking; answers; gets it wrong
Also, in [Gobert2013], two interesting points are made about the slip parameter. First, slip seems to occur more easily for students who initially struggled and then attained the mastery. Second, possibly, the slip parameter is, partially, a reflection of the different student perception of knowledge even when the mastery has been declared by the learning software.
For the future
In an ideal model, one would think that the slip tendency would decrease as more time is spent to do a task. Perhaps an exponential function of time is appropriate.
However, one cannot rule out a distraction factor. So, after a certain time, the slip parameter may bottom out, or even go up slightly.
If the lesson is continued after the mastery is acquired, then
might not be changing much, but may be decreasing—one might call this a hardening of knowledge or a tempering of knowledge. It is not enough to acquire knowledge. It is necessary to apply the acquired knowledge to different problems, gain experience, and make it a more rounded one. To do this is to reduce mainly, I think.
These thoughts suggest that
Optimization
Now, it has been stated already that parameters (four, or more, if the slip
parameter is modeled further) must be optimized. In the literature, various
approaches seem to have been tried, including a brute force approach
of making four dimensional grids and evaluating all
It appears that many efforts are made on this front—this is understandable since the multi-parameter least squares fit is always a rather ill-defined process due to the local minima. It seems reasonable that this type of approach would work fine, as long as the model is reasonable and the parameter ranges are narrow so that the answers are already clear from the beginning.
In any case, the following approach might be tried as an improvement to a brute force approach by [BakerWWW].
Define
as the data to be fit. This is the grade to student response and the value of within the BKT model is either 1 or 0, in the simple BKT model. However, it could have a value ranging from 0 to 1, end points included. Now, the theory function can be calculated as a function of parameters
. This function will give a value ranging from 0 to 1, end points included. Here, ’s are fit parameters with : they areand , respectively. Note that
depends not only and , but also , the data itself. So, it is not a conventional function, as it is a functional of .—Does the Levenberg-Marquardt theory continue to work in this case? Let us assume that the standard Levenberg-Marquardt theory works fine; in fact, we may not even worry about the theory aspect, to some extent, since, well, all we want to achieve is the minimization of
. Then, we can call the Levenberg-Marquardt algorithm for fitting , since the algorithm is that of finding the minimum by following the steepest descent. Try random initial values for the initial parameters and make a map of converged results.
For the future
The above approach may be modified to include the anti-slip-hardening. In
this case,
The fitting procedure above will not change; only the computation of the theory
function
a thought-invoked hardening (the more time student spends, the less slip) is parameterized one time scale parameter,
, the exercise-driven hardening (the more problems student solves, the less slip) is parameterized by one scale parameter,
, and the threshold for
is given by some number close to 1 (hardening kicks in only if the mastery is nearly achieved),
then we will have seven parameters in total, not four. Within the Levenberg-Marquardt algorithm, this is perfectly doable, while the brute force method will suffer greatly, as the number of parameters increase.