Help System (web edition)

RUNTEST <data>,<variable>,L
tests whether a given sequence of 0's (zeros) and 1's (non-zeros)
is a random sequence, i.e. a sequence of Bernoulli trials where
the trials are independent and the probability of 1 is a constant p
0<p<1 and p is unknown.
The sequence to be tested is saved as <variable> in <data>.

Besides for testing randomness of a sequence, RUNTEST may be applied
for comparing two samples of a continuous variable for testing
whether the samples are drawn from the same population. In this
case the user has to create an indicator variable with values 0
for the first sample and 1 for the second one and the the combined
sample is sorted according to the continuous variable. If the samples
are from the same distribution, the sequence of the values of the
indicator variable must be a random sequence of 0's and 1's after
sorting. This was the application for which the classical
Wald-Wofowitz run test was originally planned. (See also COMPARE2?)

RUNTEST performs simultaneously several tests based mainly on runs of 0's and
1's in the given sequence.

Wald-Wolfowitz test:
This test is simply based on the number of runs.
For n<=1000 the exact conditional distribution of runs for given
n0 and n1 (i.e. observed numbers of 0's and 1's, respectively) is
computed. For n>1000 the standard normal approximation is used.
Low P values indicate excessive lengths of runs (low number of runs)
which is typical for samples drawn from different populations.

Geometric distribution test (by SM 2001):
The run length of 0's has the geometric distribution with parameter p
and similarly the run length of 1's has the geometric distribution with
parameter q=1-p. Thus p is estimated from the sequence (p=n1/n) and
the observed lengths are compared to the expected ones from the
geometric distribution by the common Chi-square test.
For example, in the sequence
0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1
1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0
we have n=68, n1=33 and the number of 1-runs is r1=34. Please note that
there are 18 "1-runs of length 0" between consecutive 0's.
Since the estimated value of p is now n1/n=33/68, the expected
frequencies r1*p^i*q, i=0,1,2,... listed with observed ones are:

length (i)  0     1     2 or more
observed   18     4    12             X^2=4.38 P=0.036
expected   17.5   8.49  8.01
The reason for departure from the null hypothesis that the sequence is
random is the fact that the second half of the sequence is almost
identical with the first half. The Wald-Wolfowitz test does not detect
such kind of systematic or almost systematic effects.

Geometric distribution test (continuation):
RUNTEST performs a similar test also for 0-runs.
To make the test more powerful these tests are combined, but it
cannot be done directly since the X^2 test statistics for 0-runs and
1-runs are not independent. They can be made independent for large
samples by leaving out "the runs of length 0".
Simulation experiments (made by SM) have shown that by fitting
the run length distribution to truncated geometric distributions
without the zero values, the X^2-statistics of 0-runs is Chi-square
distributed with k0-q degrees of freedom where k0 is the number
of classes (for values 1,2,...,k0-1,>=k0)
The corresponding X^2-statistics of 1-runs has the Chi-square distri-
bution with k1-p degrees of freedom.
The sum of these two statistics is Chi-square distributed with
k0+k1-1 degrees of freedom under the null hypothesis.
The numbers of classes k0, k1 are determined by the condition that
the expected frequency of the tail class is at least a value
given by specification MINF. Default is MINF=5.

Chi-square test for triples (SM 2001):
This test studies behaviour of three consecutive observations U1,U2,U3
by making a table of frequencies
        U3  0     1           Example (below):
  U1 U2 **
   0  0     F000  F001        11    7    X2=8.37124 df=3 P=0.0389311
   0  1     F010  F011         4   12
   1  0     F100  F101         6    9
   1  1     F110  F111        12    5

Under the null hypothesis (U3 is independent of U1 and U2) the common
Chi-square statistics has the Chi-square distribution with 3 degrees
of freedom. Simulation experiments (made by SM) show that this is a
valid test, although consecutive triples are partially dependent on
each other.
For example, the sequence
0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1
1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0
giving triples 001,011,110,101,... yields a table shown above.

O'Brien-Dyck test: (Biometrics 41, 237-244, 1985)
In this test variances of run lengts of 0-runs V0 and 1-runs V1
are computed and pooled together in such a way that the test
statistics will have Chi-square distribution under the null hypothesis.
Simulation experiments (by SM) have shown that this test is usually
giving too small P values when the sequence is not random and thus
it rejects the null hypothesis too easily.
Therefore the P value is evaluated also by simulation.

Example:
DATA K:
0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1
1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0

RUNTEST K,K,CUR+1
Run tests for K in data K:
N=68 N0=35 N1=33 p=0.485294 run0=17 run1=16
Wald-Wolfowitz test: P=0.359033
Geometric distribution test: X2=5.20952 df=1 P=0.0224635
     for 0-runs separately: X20=0.730043 df=1 P=0.392869
     for 1-runs separately: X21=4.38174 df=1 P=0.0363259
Chi_square test for triples: X2=8.37124 df=3 P=0.0389311
O'Brien-Dyck test: X2=17.7686 df=21.1611 P=0.67298
   1000000 0.68589000 0.68498026 lower limit (O'Brien-Dyck test)
      s.e. 0.00046416 0.68679974 upper limit (conf.level=0.95)

T = More information on statistical tests