«STATKEY: ONLINE TOOLS FOR BOOTSTRAP INTERVALS AND RANDOMIZATION TESTS Kari Lock Morgan1, Robin H. Lock2, Patti Frazer Lock2, Eric F. Lock1, Dennis F. ...»
ICOTS9 (2014) Invited Paper - Refereed Lock Morgan, R. Lock, P. F. Lock, E. Lock & D. Lock
STATKEY: ONLINE TOOLS FOR BOOTSTRAP INTERVALS
AND RANDOMIZATION TESTS
Kari Lock Morgan1, Robin H. Lock2, Patti Frazer Lock2, Eric F. Lock1, Dennis F. Lock3
Department of Statistical Science, Duke University, Durham, NC, USA
St. Lawrence University, USA
Iowa State University, USA email@example.com StatKey (www.lock5stat.com/StatKey) is free online technology created by the Lock family, designed to help introductory students understand and easily implement bootstrap intervals and randomization tests. Randomization-based methods make the fundamental concepts of statistical inference more visual and intuitive, free professors to cover inference earlier in the course, and help students see connections that are otherwise lost with numerous different formulae. To make these methods accessible to all introductory students, our goal was to create technology that is free, widely available (StatKey works in any common web browser), very easy to use, and which helps build conceptual understanding. Although particularly designed for randomization-based methods, StatKey can also be used for illustrating descriptive statistics, sampling distributions, confidence intervals, simple linear regression, and as a replacement for paper distribution tables.
INTRODUCTIONStatKey is a set of free online tools designed specifically for teaching bootstrap intervals and randomization tests in introductory statistics. In addition to making simulation-based methods easier to understand and implement, StatKey also provides most of the functionality you would want from software for an introductory course, such as summary statistics, data visualization, and theoretical distributions. It was designed by us (the Lock family), and implemented by computer scientists Rich Sharp, Ed Harcourt, and Kevin Angstadt. This paper is meant to be a guide to StatKey and a lens into the pedagogical reasons for its design.
Figure 1 is the homepage for StatKey, and shows the available functionalities. The top set of features is arranged with parameter type by row (One Quantitative, One Categorical, etc.), and statistical method by column (Descriptive Statistics and Graphs, Bootstrap Confidence Intervals, and Randomization Hypothesis Tests). Below there are three rows: Sampling Distributions, Theoretical Distributions, and More Advanced Randomization Tests. Sampling distributions illustrate the concept of sampling distributions, and connect this concept with confidence intervals.
Theoretical distributions replace conventional distribution tables often found in the back of textbooks with more visual and easy to use online applets. More Advanced Randomization Tests are appropriate for chi-square and ANOVA analyses. You can return to this menu at any point by clicking on the “StatKey” box in the top left corner of every page.
Figure 1. Menu for StatKey: www.
lock5stat.com/StatKey In K. Makar, B. de Sousa, & R. Gould (Eds.), Sustainability in statistics education. Proceedings of the Ninth International Conference on Teaching Statistics (ICOTS9, July, 2014), Flagstaff, Arizona, USA. Voorburg, The Netherlands: International Statistical Institute. iase-web.org [© 2014 ISI/IASE] ICOTS9 (2014) Invited Paper - Refereed Lock Morgan, R. Lock, P. F. Lock, E. Lock & D. Lock Simulation-based methods help students understand the fundamental concepts of inference (sampling variability, confidence intervals, p-values) by using a procedure directly related to the concept at hand (Cobb, 2007). For example, consider a hypothesis test and the concept of a pvalue. Traditional methods involve plugging numbers into the appropriate formula, and comparing the result to a theoretical distribution to find a p-value, offering little to no intuition for the concept of a p-value. In contrast, a randomization test involves simulating what types of statistics would be observed if the null hypothesis were true, and seeing how extreme the observed statistic is, compared with these simulated statistics. This is exactly getting at the notoriously difficult concept of a p-value that we so want our students to understand! We strongly believe in the benefits of randomization-based methods for helping students understand inference, and StatKey emerged to make these methods more easily accessible to everyone. For more information on teaching with this approach, see Lock (2014).
Although StatKey can be used with any textbook, it was designed to accompany our book, Statistics: Unlocking the Power of Data (Lock5, 2012), and all of the built-in datasets come from this text. Clicking the dataset name in the top left corner within any StatKey procedure opens a drop-down menu of these datasets. You can also import your own data into StatKey by choosing “Edit Data” and copy/pasting from a spreadsheet, other electronic source, or typing values directly, taking care to match the format that StatKey expects. Categorical information for proportions can also be entered directly as counts. See the help menu in StatKey for more detailed instructions.
StatKey is available at lock5stat.com/statkey, and works on any common web browser.
StatKey is also available freely as a Google Chrome app, which allows it to be used even without an internet connection. We encourage you to visit StatKey now, and click along!
BOOTSTRAP CONFIDENCE INTERVALSClick on “CI for Single Mean, Median, Std. Dev.,” and a new simulation page opens.
Everything in blue is clickable. The default dataset is “Ottawa Senators (penalty minutes)”, click here for a drop-down menu of other datasets, and choose “Florida Lakes (Mercury in Fish),”. data on average mercury level of fish (large mouth bass) for 53 lakes in Florida (Lange et al., 2004).
The relevant summary statistics and visualization for the data appear below Original Sample. For this quantitative variable, we see the sample size, mean, median, standard deviation, and a dotplot. We start with this to encourage students to first look at the summary statistics and plot of the sample data, before doing inference. This is shown in the top right of Figure 2. The sample mean is 0.527 ppm (the FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm). How much might this mean vary from sample to sample? Let’s bootstrap to find out!
Generating a Bootstrap Distribution By clicking on “Generate 1 Sample,” we generate one bootstrap sample, a sample of the same size as the original sample selected by sampling from the original sample with replacement.
The summary statistics and visualization of this bootstrap sample are displayed under “Bootstrap Sample.” This is a nice place to stop and ensure that students actually understand the process of bootstrapping. It can help to focus on particular units in this discussion, for example, in the particular bootstrap sample shown in Figure 2, the lake with the highest mercury level happened to be sampled twice, and the lake with the lowest mercury level was not sampled at all. By moving your cursor over a single dot in the dotplot, you can see the actual data value, which can be helpful for this discussion.
The bootstrap sample mean of 0.629 ppm is higher than the actual sample mean. This value appears under “Bootstrap Sample” and also as one dot in the bootstrap distribution. Before simulating more samples, this is another good point to stop and make sure students see the connection. The idea that each dot in the bootstrap distribution is one statistic from an entire bootstrap sample, not a single data point, can be difficult for students to grasp and is worth spending time on. We think the ability to generate just one sample at a time is important for helping students see the connection between the statistic under “Bootstrap Sample” and the corresponding dot in the bootstrap distribution. Also, moving the cursor over any dot in the bootstrap distribution will display the bootstrap sample that yielded that bootstrap statistic, another feature that facilitates understanding.
-2ICOTS9 (2014) Invited Paper - Refereed Lock Morgan, R. Lock, P. F. Lock, E. Lock & D. Lock Once students understand how a bootstrap sample is achieved, and what each dot in the bootstrap distribution represents, we can click on “Generate 1000 Samples,” which gives us a bootstrap distribution. We can click this repeatedly for more simulations (intervals and p-values get more precise as the number of simulated samples increases). This distribution is shown in Figure 2. We like bootstrap distributions because they focus on a key idea of inference, how much statistics vary from sample to sample, in a way that is more intuitive and meaningful than formulas.
Each of the numbered arrows in Figure 2 is described below:
1. Display of the original sample.
2. Display of a particular bootstrap or randomization sample.
3. Dotplot of bootstrap/randomization statistics. Mouse over any dot to see the sample that produced it!
4. Choose one of the built-in datasets from the text.
5. Display or edit a table of the data. If importing your own data, paste it in here.
6. Generate 1, 10, 100, or 1000 samples at a time.
7. Change the statistic, null hypothesis, or randomization method (when enabled).
8. Summary statistics for the original sample and a particular bootstrap/randomization sample.
9. Summary statistics for the bootstrap/randomization distribution.
10. Select regions in either or both tails of the bootstrap/randomization distribution. Two-tail gives equal proportions.
11. Editable values for proportions in different regions of the bootstrap/randomization distribution.
12. Editable endpoints for tail regions of the bootstrap/randomization distribution.
13. Mean for a bootstrap distribution, null parameter value for a randomization distribution.
14. Start over with a new simulation.
15. Get help (including videos) on StatKey features.
16. Return to the main StatKey menu
-3ICOTS9 (2014) Invited Paper - Refereed Lock Morgan, R. Lock, P. F. Lock, E. Lock & D. Lock Finding a Confidence Interval from a Bootstrap Distribution One way of creating a 95% confidence interval from a bootstrap distribution, if the distribution is approximately bell-shaped, is to use statistic ± 2⋅SE, with the standard error estimated as the standard deviation of the bootstrap distribution. This facilitates the transition to normal and t-based methods, and also helps students understand the meaning of the standard error (another potentially difficult concept). The summary statistics for the bootstrap distribution are given in the top right corner of the dotplot, and from Figure 2 we see the standard deviation of the bootstrap statistics for the mercury example is 0.046 (note that this is very close to the theoretical s/√n = 0.341/√53 = 0.0468, and much more intuitive). Therefore, a 95% confidence interval is statistic ± 2⋅SE = 0.527 ± 2(0.046) = (0.435, 0.619). We are 95% confident that the average mercury level of large mouth bass in Florida lakes is between 0.435 and 0.619 ppm.
We can also create a confidence interval via the percentiles of the bootstrap distribution; if the bootstrap distribution is roughly symmetric, an approximate C% confidence interval contains the middle C% of bootstrap statistics. This formula-free approach helps build intuition for the meaning of a confidence level. Click the checkbox next to “two-tail” to bring up several editable blue boxes, including the proportion in the middle of the distribution and in each of the tails. By default, 95% of the bootstrap statistics are contained in the middle, so the two corresponding endpoints give a 95% confidence interval. For a 90% interval, we simply click on the box in the middle and change 0.95 to 0.90. The endpoints adjust to contain the middle 90% of bootstrap statistics, as shown in Figure 2, giving a 90% confidence interval of 0.455 to 0.604 ppm. We feel it is pedagogical advantage that students have to know how to interact with the bootstrap distribution.
RANDOMIZATION HYPOTHESIS TESTSLet’s click on “Test for Difference in Means” from the main menu, and choose the dataset “Mindset Matters (WgtChange by Informed)”. This dataset is from a study (Crum and Langer,
2007) in which hotel maids were randomly divided into two groups; one group was informed that the work they do satisfies the surgeon general’s recommendations for an active lifestyle (which it does), and the other group was not informed of this. The variable Informed is whether the information was given, and WgtChange represents weight change over four weeks. In the original sample, we see that the informed maids lost 1.59 more pounds, on average, than the non-informed.
Is this statistically significant? Let’s conduct a randomization test to find out!
Generating a Randomization Distribution Generating a randomization distribution in StatKey is similar to generating a bootstrap distribution, except that randomization samples are simulated in a way that is consistent with the null hypothesis.
The null hypothesis for the Mindset Matters experiment is that Informed status has no effect on weight gain, so each maid would have gained the same amount regardless of the group she was assigned to. We can simulate other potential datasets we could have obtained by random chance if the null hypothesis were true. The “random chance” is the random assignment to either be informed or not, so we reallocate the maids to the two groups, keeping their response values (weight change) fixed. Click “Generate 1 Sample” to do this, and the results are displayed under Randomization Sample. Again, this process is good to discuss with students. It can help to focus on a single unit (outliers are easy to spot), and watch that particular value to see which group it is randomized to. As with bootstrap distributions, each randomization statistic is plotted with a dot in the randomization distribution, and once the process is understood, we can generate thousands of samples to create a randomization distribution. This distribution is shown in Figure 3.