mirror of
https://github.com/saitohirga/WSJT-X.git
synced 2024-11-18 01:52:05 -05:00
782 lines
32 KiB
Plaintext
782 lines
32 KiB
Plaintext
|
|
||
|
[section:st_eg Student's t Distribution Examples]
|
||
|
|
||
|
[section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution]
|
||
|
|
||
|
Let's say you have a sample mean, you may wish to know what confidence intervals
|
||
|
you can place on that mean. Colloquially: "I want an interval that I can be
|
||
|
P% sure contains the true mean". (On a technical point, note that
|
||
|
the interval either contains the true mean or it does not: the
|
||
|
meaning of the confidence level is subtly
|
||
|
different from this colloquialism. More background information can be found on the
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]).
|
||
|
|
||
|
The formula for the interval can be expressed as:
|
||
|
|
||
|
[equation dist_tutorial4]
|
||
|
|
||
|
Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation,
|
||
|
/N/ is the sample size, /[alpha]/ is the desired significance level and
|
||
|
['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t
|
||
|
distribution with /N-1/ degrees of freedom.
|
||
|
|
||
|
[note
|
||
|
The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting
|
||
|
the null-hypothesis. The smaller the value of [alpha] the greater the
|
||
|
strength of the test.
|
||
|
|
||
|
The confidence level of the test is defined as 1 - [alpha], and often expressed
|
||
|
as a percentage. So for example a significance level of 0.05, is equivalent
|
||
|
to a 95% confidence level. Refer to
|
||
|
[@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
|
||
|
"What are confidence intervals?"] in __handbook for more information.
|
||
|
] [/ Note]
|
||
|
|
||
|
[note
|
||
|
The usual assumptions of
|
||
|
[@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)]
|
||
|
variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution]
|
||
|
of course apply here, as they do in other examples.
|
||
|
]
|
||
|
|
||
|
From the formula, it should be clear that:
|
||
|
|
||
|
* The width of the confidence interval decreases as the sample size increases.
|
||
|
* The width increases as the standard deviation increases.
|
||
|
* The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger).
|
||
|
* The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger).
|
||
|
|
||
|
The following example code is taken from the example program
|
||
|
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
|
||
|
|
||
|
We'll begin by defining a procedure to calculate intervals for
|
||
|
various confidence levels; the procedure will print these out
|
||
|
as a table:
|
||
|
|
||
|
// Needed includes:
|
||
|
#include <boost/math/distributions/students_t.hpp>
|
||
|
#include <iostream>
|
||
|
#include <iomanip>
|
||
|
// Bring everything into global namespace for ease of use:
|
||
|
using namespace boost::math;
|
||
|
using namespace std;
|
||
|
|
||
|
void confidence_limits_on_mean(
|
||
|
double Sm, // Sm = Sample Mean.
|
||
|
double Sd, // Sd = Sample Standard Deviation.
|
||
|
unsigned Sn) // Sn = Sample Size.
|
||
|
{
|
||
|
using namespace std;
|
||
|
using namespace boost::math;
|
||
|
|
||
|
// Print out general info:
|
||
|
cout <<
|
||
|
"__________________________________\n"
|
||
|
"2-Sided Confidence Limits For Mean\n"
|
||
|
"__________________________________\n\n";
|
||
|
cout << setprecision(7);
|
||
|
cout << setw(40) << left << "Number of Observations" << "= " << Sn << "\n";
|
||
|
cout << setw(40) << left << "Mean" << "= " << Sm << "\n";
|
||
|
cout << setw(40) << left << "Standard Deviation" << "= " << Sd << "\n";
|
||
|
|
||
|
We'll define a table of significance/risk levels for which we'll compute intervals:
|
||
|
|
||
|
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
|
||
|
|
||
|
Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999).
|
||
|
|
||
|
Next we'll declare the distribution object we'll need, note that
|
||
|
the /degrees of freedom/ parameter is the sample size less one:
|
||
|
|
||
|
students_t dist(Sn - 1);
|
||
|
|
||
|
Most of what follows in the program is pretty printing, so let's focus
|
||
|
on the calculation of the interval. First we need the t-statistic,
|
||
|
computed using the /quantile/ function and our significance level. Note
|
||
|
that since the significance levels are the complement of the probability,
|
||
|
we have to wrap the arguments in a call to /complement(...)/:
|
||
|
|
||
|
double T = quantile(complement(dist, alpha[i] / 2));
|
||
|
|
||
|
Note that alpha was divided by two, since we'll be calculating
|
||
|
both the upper and lower bounds: had we been interested in a single
|
||
|
sided interval then we would have omitted this step.
|
||
|
|
||
|
Now to complete the picture, we'll get the (one-sided) width of the
|
||
|
interval from the t-statistic
|
||
|
by multiplying by the standard deviation, and dividing by the square
|
||
|
root of the sample size:
|
||
|
|
||
|
double w = T * Sd / sqrt(double(Sn));
|
||
|
|
||
|
The two-sided interval is then the sample mean plus and minus this width.
|
||
|
|
||
|
And apart from some more pretty-printing that completes the procedure.
|
||
|
|
||
|
Let's take a look at some sample output, first using the
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
|
||
|
Heat flow data] from the NIST site. The data set was collected
|
||
|
by Bob Zarr of NIST in January, 1990 from a heat flow meter
|
||
|
calibration and stability analysis.
|
||
|
The corresponding dataplot
|
||
|
output for this test can be found in
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
|
||
|
section 3.5.2] of the __handbook.
|
||
|
|
||
|
[pre'''
|
||
|
__________________________________
|
||
|
2-Sided Confidence Limits For Mean
|
||
|
__________________________________
|
||
|
|
||
|
Number of Observations = 195
|
||
|
Mean = 9.26146
|
||
|
Standard Deviation = 0.02278881
|
||
|
|
||
|
|
||
|
___________________________________________________________________
|
||
|
Confidence T Interval Lower Upper
|
||
|
Value (%) Value Width Limit Limit
|
||
|
___________________________________________________________________
|
||
|
50.000 0.676 1.103e-003 9.26036 9.26256
|
||
|
75.000 1.154 1.883e-003 9.25958 9.26334
|
||
|
90.000 1.653 2.697e-003 9.25876 9.26416
|
||
|
95.000 1.972 3.219e-003 9.25824 9.26468
|
||
|
99.000 2.601 4.245e-003 9.25721 9.26571
|
||
|
99.900 3.341 5.453e-003 9.25601 9.26691
|
||
|
99.990 3.973 6.484e-003 9.25498 9.26794
|
||
|
99.999 4.537 7.404e-003 9.25406 9.26886
|
||
|
''']
|
||
|
|
||
|
As you can see the large sample size (195) and small standard deviation (0.023)
|
||
|
have combined to give very small intervals, indeed we can be
|
||
|
very confident that the true mean is 9.2.
|
||
|
|
||
|
For comparison the next example data output is taken from
|
||
|
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
|
||
|
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
|
||
|
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
|
||
|
The values result from the determination of mercury by cold-vapour
|
||
|
atomic absorption.
|
||
|
|
||
|
[pre'''
|
||
|
__________________________________
|
||
|
2-Sided Confidence Limits For Mean
|
||
|
__________________________________
|
||
|
|
||
|
Number of Observations = 3
|
||
|
Mean = 37.8000000
|
||
|
Standard Deviation = 0.9643650
|
||
|
|
||
|
|
||
|
___________________________________________________________________
|
||
|
Confidence T Interval Lower Upper
|
||
|
Value (%) Value Width Limit Limit
|
||
|
___________________________________________________________________
|
||
|
50.000 0.816 0.455 37.34539 38.25461
|
||
|
75.000 1.604 0.893 36.90717 38.69283
|
||
|
90.000 2.920 1.626 36.17422 39.42578
|
||
|
95.000 4.303 2.396 35.40438 40.19562
|
||
|
99.000 9.925 5.526 32.27408 43.32592
|
||
|
99.900 31.599 17.594 20.20639 55.39361
|
||
|
99.990 99.992 55.673 -17.87346 93.47346
|
||
|
99.999 316.225 176.067 -138.26683 213.86683
|
||
|
''']
|
||
|
|
||
|
This time the fact that there are only three measurements leads to
|
||
|
much wider intervals, indeed such large intervals that it's hard
|
||
|
to be very confident in the location of the mean.
|
||
|
|
||
|
[endsect]
|
||
|
|
||
|
[section:tut_mean_test Testing a sample mean for difference from a "true" mean]
|
||
|
|
||
|
When calibrating or comparing a scientific instrument or measurement method of some kind,
|
||
|
we want to be answer the question "Does an observed sample mean differ from the
|
||
|
"true" mean in any significant way?". If it does, then we have evidence of
|
||
|
a systematic difference. This question can be answered with a Students-t test:
|
||
|
more information can be found
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
|
||
|
on the NIST site].
|
||
|
|
||
|
Of course, the assignment of "true" to one mean may be quite arbitrary,
|
||
|
often this is simply a "traditional" method of measurement.
|
||
|
|
||
|
The following example code is taken from the example program
|
||
|
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
|
||
|
|
||
|
We'll begin by defining a procedure to determine which of the
|
||
|
possible hypothesis are rejected or not-rejected
|
||
|
at a given significance level:
|
||
|
|
||
|
[note
|
||
|
Non-statisticians might say 'not-rejected' means 'accepted',
|
||
|
(often of the null-hypothesis) implying, wrongly, that there really *IS* no difference,
|
||
|
but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'.
|
||
|
'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference.
|
||
|
For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and
|
||
|
[@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.]
|
||
|
] [/ note]
|
||
|
|
||
|
|
||
|
// Needed includes:
|
||
|
#include <boost/math/distributions/students_t.hpp>
|
||
|
#include <iostream>
|
||
|
#include <iomanip>
|
||
|
// Bring everything into global namespace for ease of use:
|
||
|
using namespace boost::math;
|
||
|
using namespace std;
|
||
|
|
||
|
void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
|
||
|
{
|
||
|
//
|
||
|
// M = true mean.
|
||
|
// Sm = Sample Mean.
|
||
|
// Sd = Sample Standard Deviation.
|
||
|
// Sn = Sample Size.
|
||
|
// alpha = Significance Level.
|
||
|
|
||
|
Most of the procedure is pretty-printing, so let's just focus on the
|
||
|
calculation, we begin by calculating the t-statistic:
|
||
|
|
||
|
// Difference in means:
|
||
|
double diff = Sm - M;
|
||
|
// Degrees of freedom:
|
||
|
unsigned v = Sn - 1;
|
||
|
// t-statistic:
|
||
|
double t_stat = diff * sqrt(double(Sn)) / Sd;
|
||
|
|
||
|
Finally calculate the probability from the t-statistic. If we're interested
|
||
|
in simply whether there is a difference (either less or greater) or not,
|
||
|
we don't care about the sign of the t-statistic,
|
||
|
and we take the complement of the probability for comparison
|
||
|
to the significance level:
|
||
|
|
||
|
students_t dist(v);
|
||
|
double q = cdf(complement(dist, fabs(t_stat)));
|
||
|
|
||
|
The procedure then prints out the results of the various tests
|
||
|
that can be done, these
|
||
|
can be summarised in the following table:
|
||
|
|
||
|
[table
|
||
|
[[Hypothesis][Test]]
|
||
|
[[The Null-hypothesis: there is
|
||
|
*no difference* in means]
|
||
|
[Reject if complement of CDF for |t| < significance level / 2:
|
||
|
|
||
|
`cdf(complement(dist, fabs(t))) < alpha / 2`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: there
|
||
|
*is difference* in means]
|
||
|
[Reject if complement of CDF for |t| > significance level / 2:
|
||
|
|
||
|
`cdf(complement(dist, fabs(t))) > alpha / 2`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: the sample mean *is less* than
|
||
|
the true mean.]
|
||
|
[Reject if CDF of t > 1 - significance level:
|
||
|
|
||
|
`cdf(complement(dist, t)) < alpha`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: the sample mean *is greater* than
|
||
|
the true mean.]
|
||
|
[Reject if complement of CDF of t < significance level:
|
||
|
|
||
|
`cdf(dist, t) < alpha`]]
|
||
|
]
|
||
|
|
||
|
[note
|
||
|
Notice that the comparisons are against `alpha / 2` for a two-sided test
|
||
|
and against `alpha` for a one-sided test]
|
||
|
|
||
|
Now that we have all the parts in place, let's take a look at some
|
||
|
sample output, first using the
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
|
||
|
Heat flow data] from the NIST site. The data set was collected
|
||
|
by Bob Zarr of NIST in January, 1990 from a heat flow meter
|
||
|
calibration and stability analysis. The corresponding dataplot
|
||
|
output for this test can be found in
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
|
||
|
section 3.5.2] of the __handbook.
|
||
|
|
||
|
[pre
|
||
|
__________________________________
|
||
|
Student t test for a single sample
|
||
|
__________________________________
|
||
|
|
||
|
Number of Observations = 195
|
||
|
Sample Mean = 9.26146
|
||
|
Sample Standard Deviation = 0.02279
|
||
|
Expected True Mean = 5.00000
|
||
|
|
||
|
Sample Mean - Expected Test Mean = 4.26146
|
||
|
Degrees of Freedom = 194
|
||
|
T Statistic = 2611.28380
|
||
|
Probability that difference is due to chance = 0.000e+000
|
||
|
|
||
|
Results for Alternative Hypothesis and alpha = 0.0500
|
||
|
|
||
|
Alternative Hypothesis Conclusion
|
||
|
Mean != 5.000 NOT REJECTED
|
||
|
Mean < 5.000 REJECTED
|
||
|
Mean > 5.000 NOT REJECTED
|
||
|
]
|
||
|
|
||
|
You will note the line that says the probability that the difference is
|
||
|
due to chance is zero. From a philosophical point of view, of course,
|
||
|
the probability can never reach zero. However, in this case the calculated
|
||
|
probability is smaller than the smallest representable double precision number,
|
||
|
hence the appearance of a zero here. Whatever its "true" value is, we know it
|
||
|
must be extraordinarily small, so the alternative hypothesis - that there is
|
||
|
a difference in means - is not rejected.
|
||
|
|
||
|
For comparison the next example data output is taken from
|
||
|
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
|
||
|
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
|
||
|
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
|
||
|
The values result from the determination of mercury by cold-vapour
|
||
|
atomic absorption.
|
||
|
|
||
|
[pre
|
||
|
__________________________________
|
||
|
Student t test for a single sample
|
||
|
__________________________________
|
||
|
|
||
|
Number of Observations = 3
|
||
|
Sample Mean = 37.80000
|
||
|
Sample Standard Deviation = 0.96437
|
||
|
Expected True Mean = 38.90000
|
||
|
|
||
|
Sample Mean - Expected Test Mean = -1.10000
|
||
|
Degrees of Freedom = 2
|
||
|
T Statistic = -1.97566
|
||
|
Probability that difference is due to chance = 1.869e-001
|
||
|
|
||
|
Results for Alternative Hypothesis and alpha = 0.0500
|
||
|
|
||
|
Alternative Hypothesis Conclusion
|
||
|
Mean != 38.900 REJECTED
|
||
|
Mean < 38.900 NOT REJECTED
|
||
|
Mean > 38.900 NOT REJECTED
|
||
|
]
|
||
|
|
||
|
As you can see the small number of measurements (3) has led to a large uncertainty
|
||
|
in the location of the true mean. So even though there appears to be a difference
|
||
|
between the sample mean and the expected true mean, we conclude that there
|
||
|
is no significant difference, and are unable to reject the null hypothesis.
|
||
|
However, if we were to lower the bar for acceptance down to alpha = 0.1
|
||
|
(a 90% confidence level) we see a different output:
|
||
|
|
||
|
[pre
|
||
|
__________________________________
|
||
|
Student t test for a single sample
|
||
|
__________________________________
|
||
|
|
||
|
Number of Observations = 3
|
||
|
Sample Mean = 37.80000
|
||
|
Sample Standard Deviation = 0.96437
|
||
|
Expected True Mean = 38.90000
|
||
|
|
||
|
Sample Mean - Expected Test Mean = -1.10000
|
||
|
Degrees of Freedom = 2
|
||
|
T Statistic = -1.97566
|
||
|
Probability that difference is due to chance = 1.869e-001
|
||
|
|
||
|
Results for Alternative Hypothesis and alpha = 0.1000
|
||
|
|
||
|
Alternative Hypothesis Conclusion
|
||
|
Mean != 38.900 REJECTED
|
||
|
Mean < 38.900 NOT REJECTED
|
||
|
Mean > 38.900 REJECTED
|
||
|
]
|
||
|
|
||
|
In this case, we really have a borderline result,
|
||
|
and more data (and/or more accurate data),
|
||
|
is needed for a more convincing conclusion.
|
||
|
|
||
|
[endsect]
|
||
|
|
||
|
[section:tut_mean_size Estimating how large a sample size would have to become
|
||
|
in order to give a significant Students-t test result with a single sample test]
|
||
|
|
||
|
Imagine you have conducted a Students-t test on a single sample in order
|
||
|
to check for systematic errors in your measurements. Imagine that the
|
||
|
result is borderline. At this point one might go off and collect more data,
|
||
|
but it might be prudent to first ask the question "How much more?".
|
||
|
The parameter estimators of the students_t_distribution class
|
||
|
can provide this information.
|
||
|
|
||
|
This section is based on the example code in
|
||
|
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]
|
||
|
and we begin by defining a procedure that will print out a table of
|
||
|
estimated sample sizes for various confidence levels:
|
||
|
|
||
|
// Needed includes:
|
||
|
#include <boost/math/distributions/students_t.hpp>
|
||
|
#include <iostream>
|
||
|
#include <iomanip>
|
||
|
// Bring everything into global namespace for ease of use:
|
||
|
using namespace boost::math;
|
||
|
using namespace std;
|
||
|
|
||
|
void single_sample_find_df(
|
||
|
double M, // M = true mean.
|
||
|
double Sm, // Sm = Sample Mean.
|
||
|
double Sd) // Sd = Sample Standard Deviation.
|
||
|
{
|
||
|
|
||
|
Next we define a table of significance levels:
|
||
|
|
||
|
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
|
||
|
|
||
|
Printing out the table of sample sizes required for various confidence levels
|
||
|
begins with the table header:
|
||
|
|
||
|
cout << "\n\n"
|
||
|
"_______________________________________________________________\n"
|
||
|
"Confidence Estimated Estimated\n"
|
||
|
" Value (%) Sample Size Sample Size\n"
|
||
|
" (one sided test) (two sided test)\n"
|
||
|
"_______________________________________________________________\n";
|
||
|
|
||
|
|
||
|
And now the important part: the sample sizes required. Class
|
||
|
`students_t_distribution` has a static member function
|
||
|
`find_degrees_of_freedom` that will calculate how large
|
||
|
a sample size needs to be in order to give a definitive result.
|
||
|
|
||
|
The first argument is the difference between the means that you
|
||
|
wish to be able to detect, here it's the absolute value of the
|
||
|
difference between the sample mean, and the true mean.
|
||
|
|
||
|
Then come two probability values: alpha and beta. Alpha is the
|
||
|
maximum acceptable risk of rejecting the null-hypothesis when it is
|
||
|
in fact true. Beta is the maximum acceptable risk of failing to reject
|
||
|
the null-hypothesis when in fact it is false.
|
||
|
Also note that for a two-sided test, alpha must be divided by 2.
|
||
|
|
||
|
The final parameter of the function is the standard deviation of the sample.
|
||
|
|
||
|
In this example, we assume that alpha and beta are the same, and call
|
||
|
`find_degrees_of_freedom` twice: once with alpha for a one-sided test,
|
||
|
and once with alpha/2 for a two-sided test.
|
||
|
|
||
|
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
|
||
|
{
|
||
|
// Confidence value:
|
||
|
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
|
||
|
// calculate df for single sided test:
|
||
|
double df = students_t::find_degrees_of_freedom(
|
||
|
fabs(M - Sm), alpha[i], alpha[i], Sd);
|
||
|
// convert to sample size:
|
||
|
double size = ceil(df) + 1;
|
||
|
// Print size:
|
||
|
cout << fixed << setprecision(0) << setw(16) << right << size;
|
||
|
// calculate df for two sided test:
|
||
|
df = students_t::find_degrees_of_freedom(
|
||
|
fabs(M - Sm), alpha[i]/2, alpha[i], Sd);
|
||
|
// convert to sample size:
|
||
|
size = ceil(df) + 1;
|
||
|
// Print size:
|
||
|
cout << fixed << setprecision(0) << setw(16) << right << size << endl;
|
||
|
}
|
||
|
cout << endl;
|
||
|
}
|
||
|
|
||
|
Let's now look at some sample output using data taken from
|
||
|
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
|
||
|
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
|
||
|
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
|
||
|
The values result from the determination of mercury by cold-vapour
|
||
|
atomic absorption.
|
||
|
|
||
|
Only three measurements were made, and the Students-t test above
|
||
|
gave a borderline result, so this example
|
||
|
will show us how many samples would need to be collected:
|
||
|
|
||
|
[pre'''
|
||
|
_____________________________________________________________
|
||
|
Estimated sample sizes required for various confidence levels
|
||
|
_____________________________________________________________
|
||
|
|
||
|
True Mean = 38.90000
|
||
|
Sample Mean = 37.80000
|
||
|
Sample Standard Deviation = 0.96437
|
||
|
|
||
|
|
||
|
_______________________________________________________________
|
||
|
Confidence Estimated Estimated
|
||
|
Value (%) Sample Size Sample Size
|
||
|
(one sided test) (two sided test)
|
||
|
_______________________________________________________________
|
||
|
75.000 3 4
|
||
|
90.000 7 9
|
||
|
95.000 11 13
|
||
|
99.000 20 22
|
||
|
99.900 35 37
|
||
|
99.990 50 53
|
||
|
99.999 66 68
|
||
|
''']
|
||
|
|
||
|
So in this case, many more measurements would have had to be made,
|
||
|
for example at the 95% level, 14 measurements in total for a two-sided test.
|
||
|
|
||
|
[endsect]
|
||
|
[section:two_sample_students_t Comparing the means of two samples with the Students-t test]
|
||
|
|
||
|
Imagine that we have two samples, and we wish to determine whether
|
||
|
their means are different or not. This situation often arises when
|
||
|
determining whether a new process or treatment is better than an old one.
|
||
|
|
||
|
In this example, we'll be using the
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm
|
||
|
Car Mileage sample data] from the
|
||
|
[@http://www.itl.nist.gov NIST website]. The data compares
|
||
|
miles per gallon of US cars with miles per gallon of Japanese cars.
|
||
|
|
||
|
The sample code is in
|
||
|
[@../../example/students_t_two_samples.cpp students_t_two_samples.cpp].
|
||
|
|
||
|
There are two ways in which this test can be conducted: we can assume
|
||
|
that the true standard deviations of the two samples are equal or not.
|
||
|
If the standard deviations are assumed to be equal, then the calculation
|
||
|
of the t-statistic is greatly simplified, so we'll examine that case first.
|
||
|
In real life we should verify whether this assumption is valid with a
|
||
|
Chi-Squared test for equal variances.
|
||
|
|
||
|
We begin by defining a procedure that will conduct our test assuming equal
|
||
|
variances:
|
||
|
|
||
|
// Needed headers:
|
||
|
#include <boost/math/distributions/students_t.hpp>
|
||
|
#include <iostream>
|
||
|
#include <iomanip>
|
||
|
// Simplify usage:
|
||
|
using namespace boost::math;
|
||
|
using namespace std;
|
||
|
|
||
|
void two_samples_t_test_equal_sd(
|
||
|
double Sm1, // Sm1 = Sample 1 Mean.
|
||
|
double Sd1, // Sd1 = Sample 1 Standard Deviation.
|
||
|
unsigned Sn1, // Sn1 = Sample 1 Size.
|
||
|
double Sm2, // Sm2 = Sample 2 Mean.
|
||
|
double Sd2, // Sd2 = Sample 2 Standard Deviation.
|
||
|
unsigned Sn2, // Sn2 = Sample 2 Size.
|
||
|
double alpha) // alpha = Significance Level.
|
||
|
{
|
||
|
|
||
|
|
||
|
Our procedure will begin by calculating the t-statistic, assuming
|
||
|
equal variances the needed formulae are:
|
||
|
|
||
|
[equation dist_tutorial1]
|
||
|
|
||
|
where Sp is the "pooled" standard deviation of the two samples,
|
||
|
and /v/ is the number of degrees of freedom of the two combined
|
||
|
samples. We can now write the code to calculate the t-statistic:
|
||
|
|
||
|
// Degrees of freedom:
|
||
|
double v = Sn1 + Sn2 - 2;
|
||
|
cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n";
|
||
|
// Pooled variance:
|
||
|
double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);
|
||
|
cout << setw(55) << left << "Pooled Standard Deviation" << "= " << sp << "\n";
|
||
|
// t-statistic:
|
||
|
double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));
|
||
|
cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n";
|
||
|
|
||
|
The next step is to define our distribution object, and calculate the
|
||
|
complement of the probability:
|
||
|
|
||
|
students_t dist(v);
|
||
|
double q = cdf(complement(dist, fabs(t_stat)));
|
||
|
cout << setw(55) << left << "Probability that difference is due to chance" << "= "
|
||
|
<< setprecision(3) << scientific << 2 * q << "\n\n";
|
||
|
|
||
|
Here we've used the absolute value of the t-statistic, because we initially
|
||
|
want to know simply whether there is a difference or not (a two-sided test).
|
||
|
However, we can also test whether the mean of the second sample is greater
|
||
|
or is less (one-sided test) than that of the first:
|
||
|
all the possible tests are summed up in the following table:
|
||
|
|
||
|
[table
|
||
|
[[Hypothesis][Test]]
|
||
|
[[The Null-hypothesis: there is
|
||
|
*no difference* in means]
|
||
|
[Reject if complement of CDF for |t| < significance level / 2:
|
||
|
|
||
|
`cdf(complement(dist, fabs(t))) < alpha / 2`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: there is a
|
||
|
*difference* in means]
|
||
|
[Reject if complement of CDF for |t| > significance level / 2:
|
||
|
|
||
|
`cdf(complement(dist, fabs(t))) < alpha / 2`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: Sample 1 Mean is *less* than
|
||
|
Sample 2 Mean.]
|
||
|
[Reject if CDF of t > significance level:
|
||
|
|
||
|
`cdf(dist, t) > alpha`]]
|
||
|
|
||
|
[[The Alternative-hypothesis: Sample 1 Mean is *greater* than
|
||
|
Sample 2 Mean.]
|
||
|
|
||
|
[Reject if complement of CDF of t > significance level:
|
||
|
|
||
|
`cdf(complement(dist, t)) > alpha`]]
|
||
|
]
|
||
|
|
||
|
[note
|
||
|
For a two-sided test we must compare against alpha / 2 and not alpha.]
|
||
|
|
||
|
Most of the rest of the sample program is pretty-printing, so we'll
|
||
|
skip over that, and take a look at the sample output for alpha=0.05
|
||
|
(a 95% probability level). For comparison the dataplot output
|
||
|
for the same data is in
|
||
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
|
||
|
section 1.3.5.3] of the __handbook.
|
||
|
|
||
|
[pre'''
|
||
|
________________________________________________
|
||
|
Student t test for two samples (equal variances)
|
||
|
________________________________________________
|
||
|
|
||
|
Number of Observations (Sample 1) = 249
|
||
|
Sample 1 Mean = 20.145
|
||
|
Sample 1 Standard Deviation = 6.4147
|
||
|
Number of Observations (Sample 2) = 79
|
||
|
Sample 2 Mean = 30.481
|
||
|
Sample 2 Standard Deviation = 6.1077
|
||
|
Degrees of Freedom = 326
|
||
|
Pooled Standard Deviation = 6.3426
|
||
|
T Statistic = -12.621
|
||
|
Probability that difference is due to chance = 5.273e-030
|
||
|
|
||
|
Results for Alternative Hypothesis and alpha = 0.0500'''
|
||
|
|
||
|
Alternative Hypothesis Conclusion
|
||
|
Sample 1 Mean != Sample 2 Mean NOT REJECTED
|
||
|
Sample 1 Mean < Sample 2 Mean NOT REJECTED
|
||
|
Sample 1 Mean > Sample 2 Mean REJECTED
|
||
|
]
|
||
|
|
||
|
So with a probability that the difference is due to chance of just
|
||
|
5.273e-030, we can safely conclude that there is indeed a difference.
|
||
|
|
||
|
The tests on the alternative hypothesis show that we must
|
||
|
also reject the hypothesis that Sample 1 Mean is
|
||
|
greater than that for Sample 2: in this case Sample 1 represents the
|
||
|
miles per gallon for Japanese cars, and Sample 2 the miles per gallon for
|
||
|
US cars, so we conclude that Japanese cars are on average more
|
||
|
fuel efficient.
|
||
|
|
||
|
Now that we have the simple case out of the way, let's look for a moment
|
||
|
at the more complex one: that the standard deviations of the two samples
|
||
|
are not equal. In this case the formula for the t-statistic becomes:
|
||
|
|
||
|
[equation dist_tutorial2]
|
||
|
|
||
|
And for the combined degrees of freedom we use the
|
||
|
[@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]
|
||
|
approximation:
|
||
|
|
||
|
[equation dist_tutorial3]
|
||
|
|
||
|
Note that this is one of the rare situations where the degrees-of-freedom
|
||
|
parameter to the Student's t distribution is a real number, and not an
|
||
|
integer value.
|
||
|
|
||
|
[note
|
||
|
Some statistical packages truncate the effective degrees of freedom to
|
||
|
an integer value: this may be necessary if you are relying on lookup tables,
|
||
|
but since our code fully supports non-integer degrees of freedom there is no
|
||
|
need to truncate in this case. Also note that when the degrees of freedom
|
||
|
is small then the Welch-Satterthwaite approximation may be a significant
|
||
|
source of error.]
|
||
|
|
||
|
Putting these formulae into code we get:
|
||
|
|
||
|
// Degrees of freedom:
|
||
|
double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;
|
||
|
v *= v;
|
||
|
double t1 = Sd1 * Sd1 / Sn1;
|
||
|
t1 *= t1;
|
||
|
t1 /= (Sn1 - 1);
|
||
|
double t2 = Sd2 * Sd2 / Sn2;
|
||
|
t2 *= t2;
|
||
|
t2 /= (Sn2 - 1);
|
||
|
v /= (t1 + t2);
|
||
|
cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n";
|
||
|
// t-statistic:
|
||
|
double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);
|
||
|
cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n";
|
||
|
|
||
|
Thereafter the code and the tests are performed the same as before. Using
|
||
|
are car mileage data again, here's what the output looks like:
|
||
|
|
||
|
[pre'''
|
||
|
__________________________________________________
|
||
|
Student t test for two samples (unequal variances)
|
||
|
__________________________________________________
|
||
|
|
||
|
Number of Observations (Sample 1) = 249
|
||
|
Sample 1 Mean = 20.145
|
||
|
Sample 1 Standard Deviation = 6.4147
|
||
|
Number of Observations (Sample 2) = 79
|
||
|
Sample 2 Mean = 30.481
|
||
|
Sample 2 Standard Deviation = 6.1077
|
||
|
Degrees of Freedom = 136.87
|
||
|
T Statistic = -12.946
|
||
|
Probability that difference is due to chance = 1.571e-025
|
||
|
|
||
|
Results for Alternative Hypothesis and alpha = 0.0500'''
|
||
|
|
||
|
Alternative Hypothesis Conclusion
|
||
|
Sample 1 Mean != Sample 2 Mean NOT REJECTED
|
||
|
Sample 1 Mean < Sample 2 Mean NOT REJECTED
|
||
|
Sample 1 Mean > Sample 2 Mean REJECTED
|
||
|
]
|
||
|
|
||
|
This time allowing the variances in the two samples to differ has yielded
|
||
|
a higher likelihood that the observed difference is down to chance alone
|
||
|
(1.571e-025 compared to 5.273e-030 when equal variances were assumed).
|
||
|
However, the conclusion remains the same: US cars are less fuel efficient
|
||
|
than Japanese models.
|
||
|
|
||
|
[endsect]
|
||
|
[section:paired_st Comparing two paired samples with the Student's t distribution]
|
||
|
|
||
|
Imagine that we have a before and after reading for each item in the sample:
|
||
|
for example we might have measured blood pressure before and after administration
|
||
|
of a new drug. We can't pool the results and compare the means before and after
|
||
|
the change, because each patient will have a different baseline reading.
|
||
|
Instead we calculate the difference between before and after measurements
|
||
|
in each patient, and calculate the mean and standard deviation of the differences.
|
||
|
To test whether a significant change has taken place, we can then test
|
||
|
the null-hypothesis that the true mean is zero using the same procedure
|
||
|
we used in the single sample cases previously discussed.
|
||
|
|
||
|
That means we can:
|
||
|
|
||
|
* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean].
|
||
|
If the endpoints of the interval differ in sign then we are unable to reject
|
||
|
the null-hypothesis that there is no change.
|
||
|
* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the
|
||
|
result is consistent with a true mean of zero, then we are unable to reject the
|
||
|
null-hypothesis that there is no change.
|
||
|
* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need
|
||
|
in order to obtain a significant result].
|
||
|
|
||
|
[endsect]
|
||
|
|
||
|
[endsect][/section:st_eg Student's t]
|
||
|
|
||
|
[/
|
||
|
Copyright 2006, 2012 John Maddock and Paul A. Bristow.
|
||
|
Distributed under the Boost Software License, Version 1.0.
|
||
|
(See accompanying file LICENSE_1_0.txt or copy at
|
||
|
http://www.boost.org/LICENSE_1_0.txt).
|
||
|
]
|
||
|
|