RUNTEST <data>,<variable>,L tests whether a given sequence of 0's (zeros) and 1's (non-zeros) is a random sequence, i.e. a sequence of Bernoulli trials where the trials are independent and the probability of 1 is a constant p 0<p<1 and p is unknown. The sequence to be tested is saved as <variable> in <data>. Besides for testing randomness of a sequence, RUNTEST may be applied for comparing two samples of a continuous variable for testing whether the samples are drawn from the same population. In this case the user has to create an indicator variable with values 0 for the first sample and 1 for the second one and the the combined sample is sorted according to the continuous variable. If the samples are from the same distribution, the sequence of the values of the indicator variable must be a random sequence of 0's and 1's after sorting. This was the application for which the classical Wald-Wofowitz run test was originally planned. (See also COMPARE2?) RUNTEST performs simultaneously several tests based mainly on runs of 0's and 1's in the given sequence. Wald-Wolfowitz test: This test is simply based on the number of runs. For n<=1000 the exact conditional distribution of runs for given n0 and n1 (i.e. observed numbers of 0's and 1's, respectively) is computed. For n>1000 the standard normal approximation is used. Low P values indicate excessive lengths of runs (low number of runs) which is typical for samples drawn from different populations. Geometric distribution test (by SM 2001): The run length of 0's has the geometric distribution with parameter p and similarly the run length of 1's has the geometric distribution with parameter q=1-p. Thus p is estimated from the sequence (p=n1/n) and the observed lengths are compared to the expected ones from the geometric distribution by the common Chi-square test. For example, in the sequence 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 we have n=68, n1=33 and the number of 1-runs is r1=34. Please note that there are 18 "1-runs of length 0" between consecutive 0's. Since the estimated value of p is now n1/n=33/68, the expected frequencies r1*p^i*q, i=0,1,2,... listed with observed ones are: length (i) 0 1 2 or more observed 18 4 12 X^2=4.38 P=0.036 expected 17.5 8.49 8.01 The reason for departure from the null hypothesis that the sequence is random is the fact that the second half of the sequence is almost identical with the first half. The Wald-Wolfowitz test does not detect such kind of systematic or almost systematic effects. Geometric distribution test (continuation): RUNTEST performs a similar test also for 0-runs. To make the test more powerful these tests are combined, but it cannot be done directly since the X^2 test statistics for 0-runs and 1-runs are not independent. They can be made independent for large samples by leaving out "the runs of length 0". Simulation experiments (made by SM) have shown that by fitting the run length distribution to truncated geometric distributions without the zero values, the X^2-statistics of 0-runs is Chi-square distributed with k0-q degrees of freedom where k0 is the number of classes (for values 1,2,...,k0-1,>=k0) The corresponding X^2-statistics of 1-runs has the Chi-square distri- bution with k1-p degrees of freedom. The sum of these two statistics is Chi-square distributed with k0+k1-1 degrees of freedom under the null hypothesis. The numbers of classes k0, k1 are determined by the condition that the expected frequency of the tail class is at least a value given by specification MINF. Default is MINF=5. Chi-square test for triples (SM 2001): This test studies behaviour of three consecutive observations U1,U2,U3 by making a table of frequencies U3 0 1 Example (below): U1 U2 ** 0 0 F000 F001 11 7 X2=8.37124 df=3 P=0.0389311 0 1 F010 F011 4 12 1 0 F100 F101 6 9 1 1 F110 F111 12 5 Under the null hypothesis (U3 is independent of U1 and U2) the common Chi-square statistics has the Chi-square distribution with 3 degrees of freedom. Simulation experiments (made by SM) show that this is a valid test, although consecutive triples are partially dependent on each other. For example, the sequence 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 giving triples 001,011,110,101,... yields a table shown above. O'Brien-Dyck test: (Biometrics 41, 237-244, 1985) In this test variances of run lengts of 0-runs V0 and 1-runs V1 are computed and pooled together in such a way that the test statistics will have Chi-square distribution under the null hypothesis. Simulation experiments (by SM) have shown that this test is usually giving too small P values when the sequence is not random and thus it rejects the null hypothesis too easily. Therefore the P value is evaluated also by simulation. Example: DATA K: 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 RUNTEST K,K,CUR+1 Run tests for K in data K: N=68 N0=35 N1=33 p=0.485294 run0=17 run1=16 Wald-Wolfowitz test: P=0.359033 Geometric distribution test: X2=5.20952 df=1 P=0.0224635 for 0-runs separately: X20=0.730043 df=1 P=0.392869 for 1-runs separately: X21=4.38174 df=1 P=0.0363259 Chi_square test for triples: X2=8.37124 df=3 P=0.0389311 O'Brien-Dyck test: X2=17.7686 df=21.1611 P=0.67298 1000000 0.68589000 0.68498026 lower limit (O'Brien-Dyck test) s.e. 0.00046416 0.68679974 upper limit (conf.level=0.95) T = More information on statistical tests

More information on Survo from www.survo.fi

Copyright © Survo Systems 2001-2012.

webmaster'at'survo.fi

Copyright © Survo Systems 2001-2012.

webmaster'at'survo.fi