The idea and practice of making SURVO 84C modules is first illustrated
by an example. To save space and to highlight the main principles, we
shall describe coding of a simple module for calculating weighted means
from statistical data.
Usually it is good to start by making a synopsis from the user's point
of view and imagine how the things should look if we already had the new
operation. In this case we could type following text in the edit field:
13 1 SURVO 84C EDITOR Wed Feb 15 11:46:19 1989 D:\C\PROG\ 100 100 0
1 *SAVE TEST1
2 *
3 *Here is our data set:
4 *DATA TEST
5 *Name Sex Test1 Test2 Test3
6 *Karen F 1.45 3.46 5
7 *Charles M 3.22 2.43 3
8 *Anthony M 5.00 3.27 2
9 *Lisa F -0.76 4.03 3
10 *Mike M 1.37 1.88 3
11 *William M 4.65 - 2
12 *Ann F 2.16 4.98 2
13 *
14 *MASK=--AAW / to indicate selection of variables (columns)
15 *CASES=Sex:M / to indicate selection of observations (lines)
16 *
17 *MEAN TEST,19_
18 *
19 * Means of variables in TEST N=4 Weight=Test3
20 * Variable Mean N(missing)
21 * Test1 3.307000 0
22 * Test2 2.433750 1
23 *
Here we have a small application where the data set is on edit lines 4-12, the MEAN operation on line 17 and results (which we hope to receive after activation of the MEAN line) on lines 19-22.
We assume that the MEAN operation has the following syntax:
MEAN <SURVO_84C_data>,<first_line_for_the_results>
To select variables and observations, we have used two extra
specifications (on lines 14-15). There MASK=--AAW
selects only columns
#3 and #4 (Test1,Test2)
for the analysis and column #5 (Test3)
is used
as a weight variable. CASES=Sex:M
indicates that only observations with
Sex=M
are selected.
We shall see that there will be still more options available if the MEAN module is written according to the standards of SURVO 84C, and all this is achieved with a minimal effort by using ready-made tools of the SURVO 84C libraries.
It should also be noted that the structure of more complicated modules does not differ from that of this example.
The !MEAN module has only one compiland and its main function is listed below in several parts. The line numbers have been added for easier reference.
1 /* !mean.c 21.2.1986/SM (19.3.1989)
2 */
3
4 #include <stdio.h>
5 #include <stdlib.h>
6 #include <conio.h>
7 #include <malloc.h>
8 #include "survo.h"
9 #include "survoext.h"
10 #include "survodat.h"
11
12 SURVO_DATA d;
13 double *sum; /* sums of active variables */
14 long *f; /* frequencies */
15 double *w; /* sums of weigths */
16
17 long n;
18 int weight_variable;
19 int results_line;
20
21 main(argc,argv)
22 int argc; char *argv[];
23 {
24 int i;
25
26 if (argc==1)
27 {
28 printf("This program can be used as a SURVO 84C module only.");
29 return;
30 }
31 s_init(argv[1]);
32 if (g<2)
33 {
34 init_remarks();
35 rem_pr("MEAN <data>,<output_line> / S.Mustonen 4.3.1989");
36 rem_pr("computes means of active variables. Cases can be limited");
37 rem_pr("by IND and CASES specifications. The observations can be");
38 rem_pr("weighted by a variable activated by 'W'.");
39 wait_remarks(2);
40 return;
41 }
42 results_line=0;
43 if (g>2)
44 {
45 results_line=edline2(word[2],1,1);
46 if (results_line==0) return;
47 }
48 i=data_open(word[1],&d); if (i<0) return;
49 i=sp_init(r1+r-1); if (i<0) return;
50 i=mask(&d); if (i<0) return;
51 weight_variable=activated(&d,'W');
52 i=test_scaletypes(); if (i<0) return;
53 i=conditions(&d); if (i<0) return; /* permitted only once */
54 i=space_allocation(); if (i<0) return;
55 compute_sums();
56 printout();
57 free(sum); free(f); free(w);
58 data_close(&d);
59 }
Among the include lines, 8-10 refer to special SURVO 84C include
files. Lines 8-9 should always be present in modules. Line 10
(survodat.h
) is needed especially in those modules where SURVO 84C data
sets and data files are employed.
Line 12 declares the SURVO_DATA
structure d
which may represent
either a data set in the edit field (as DATA TEST in our example) or a
SURVO 84C data file or part of it or even a matrix file. The writer of
the module has no need to know the actual form of the data set. By using
the tools provided by the SURVO 84C library (like data_open
on line 48),
all these alternatives can be handled similarly. In rare cases where a
distinction has to be made, the d.type
member of the SURVO_DATA
structure d
gives the type of the data set at hand.
On lines 13-15, pointers to various arrays used in MEAN are declared.
In order to make the modules general and flexible, we avoid fixed limits
in arrays. Therefore all arrays whose sizes depend on application (like
number of variables in the analysis) should be defined dynamically. This
is done by using the standard space allocation function malloc
. It has
been employed here for all space reservations through the
space_allocation
call on line 54.
Finally, before the main function starts, certain global variables
are declared on lines 17-19. To shorten the function calls, we usually
prefer using static variables.
When calling the !MEAN module as a child process, the main program of
SURVO 84C passes only one parameter (address of the pointer to the array
of system pointers as a string). In the main function of !MEAN this
parameter (argv[1]
) is needed in the s_init
call (line 31). It declares
all important SURVO 84C system parameters and variables for !MEAN.
Thereafter writing of code in !MEAN is like making more functions for
the main program.
However, before the s_init
call, lines 26-30 are given in order to
prevent misuse of !MEAN (direct call of !MEAN from the MS-DOS level).
After the s_init
call we have, for example, r
=current line on the
screen and r1
=first visible edit line on the screen. Hence r1+r-1
is the
current (activated) edit line. See the library reference of s_init for
the the complete list of system variables which are initialized by
s_init
.
The s_init
function also analyzes the edit line (MEAN TEST,19
) which
was activated by the user and splits it into parts word[0]="MEAN"
,
word[1]="TEST"
and word[2]="19"
giving the total number of `words' found
as g
. (In this case g
=3).
Lines 32-41 are for testing the completeness of the user's call.
Observe that MEAN TEST
without an edit line for the results is allowed
and thus only the case (g
<2) (mere MEAN
activated) leads to an error
message.
In such a case, the standard modules typically give a short notice of
their usage like
"Usage: MEAN <data>, L"
and the user can get more information by consulting
the inquiry system of SURVO 84C.
On a new module written by the user, the inquiry system cannot
provide any information. Therefore it is important to give longer
explanations telling all essential features. This should be done with
functions init_remarks
, rem_pr
, and wait_remarks
as shown on lines
32-41. These functions emulate the behaviour of the inquiry system. For
example, the user can load the explanations appearing on the screen to
the edit field.
The next section in the main function (lines 42-47) deals with output
in the edit field. As pointed out earlier, the line label (or number)
for the results in the edit field may be omitted (case results_line=0
).
If the line for the results is given (i.e. g
>2), it is found by the
SURVO 84C library function edline2
(line 45). If no edit line
corresponding to the user's command is found, edline2
gives an error
message and returns 0 instead of the line number.
Line 48
i=data_open(word[1],&d); if (i<0) return;
opens the data set
and initializes several variables (members of structure SURVO_DATA d
)
describing the size and the structure of the data set. For example, we
have the following information readily available for the subsequent
processing:
d.m
| # of variables in data (type int) |
d.m_act
| # of active variables (int) |
d.n
| # of observations in data (long) |
d.l1
| first active observation (long) |
d.l2
| last active observation (long) |
d.varname[0], ..., d.varname[d.m-1]
| names of variables (char **) |
d.vartype[0], ..., d.vartype[d.m-1]
| types of variables (char **)
|
d.v[0], ..., d.v[d.m_act-1]
| indices of the active variables (int *) |
If the data is not available, data_open
displays an error message and
returns -1. In that case there is an immediate return to the main
program of SURVO 84C.
In SURVO 84C, the operations are not only controlled by parameters
written on the activated line (like TEST
and 19
in our example), but the
modules can also be guided by using various specifications written
around the activated line anywhere in the edit field. In our example,
such specifications are MASK=--AAW
and CASES=Sex:M
.
To take their effects into consideration, we must first read all the
specifications written in the current edit field. This happens by
calling the sp_init
function once (line 49: sp_init(r1+r-1);
) where the
argument refers to the line currently activated. It implies sp_init
to
look for specifications primarily around that line. Later the spfind
function is called repeatedly to find specifications from a list
generated by sp_init
.
The mask
function (on line 50) has the task of analysing the VARS
specification (or if it does not appear, the MASK specification) through
the spfind
function. If VARS or MASK exists, mask
corrects the
activation status of each variable accordingly. If VARS (MASK) is not
given, the status of the data set itself determines which are active
variables.
Line 51 checks whether any of the variables in the data set have been
activated by `W
' (using the activated
function). If such a variable is
found (as Test3
in our example) the index of that variable is returned
and it serves as a weight variable in the computations. Otherwise
activated
returns -1.
One of the unique features of SURVO 84C is the possibility to assess
the validity of various statistical methods by checking the scale types
of variables. Scale types can be declared for variables in data files
only. The user has the freedom to use or not to use this facility. The
test_scaletypes
call on line 52 does the job in a positive case.
The observations may be restricted by the CASES and IND
specifications. The conditions
function (called on line 53) tests that
those specifications, if used at all, are written correctly and
initializes system variables which are used for scanning data during the
computation (through a function called unsuitable
).
After these preliminary checks, we are ready to allocate space for
frequencies, sums of weights and weighted sums of observations. The
dimension of these arrays must be d.m_act
. This happens by calling
space_allocation
(line 54).
If the space is succesfully allocated (there is no negative
response), the actual computations can start (compute_sums
) and the
results are printed (printout
).
Finally (on lines 57-58), the allocated space is freed and the data
set closed before returning to the main program of SURVO 84C and to the
normal editing mode.
Most of the functions called by the main function of !MEAN are either
in the Microsoft C run-time library or in the SURVO 84C libraries. The
descriptions of the SURVO 84C library functions will be given later in
this paper.
There are only 4 functions called in the main function being specific
for the !MEAN module, namely test_scaletypes
, space_allocation
,
compute_sums
, and printout
. Since !MEAN is a very small module, all of
them are in the same compiland together with the main function.
The test_scaletypes
function has the following form:
61 test_scaletypes()
62 {
63 int i,scale_error;
64
65 scales(&d);
66 if (weight_variable>=0)
67 {
68 if (!scale_ok(&d,weight_variable,RATIO_SCALE))
69 {
70 sprintf(sbuf,"\nWeight variable %.8s must have ratio scale!",
71 d.varname[weight_variable]); sur_print(sbuf);
72 WAIT; if (scale_check==SCALE_INTERRUPT) return(-1);
73 }
74 }
75 scale_error=0;
76 for (i=0; i<d.m_act; ++i)
77 {
78 if (!scale_ok(&d,d.v[i],SCORE_SCALE))
79 {
80 if (!scale_error)
81 sur_print("\nInvalid scale in variables: ");
82 scale_error=1;
83 sprintf(sbuf,"%.8s ",d.varname[d.v[i]]); sur_print(sbuf);
84 }
85 }
86 if (scale_error)
87 {
88 sur_print("\nIn MEAN score scale at least is expected!");
89 WAIT; if (scale_check==SCALE_INTERRUPT) return(-1);
90 }
91 return(1);
92 }
The task of this function is to check the scale types of variables
selected for the analysis. In small data sets written in the edit field,
the scale types of the variables (columns) cannot be given and then no
checks are performed; test_scaletypes
will simply return 1 which means
that everything is OK. However, in data sets saved in SURVO 84C data
files, each variable can be labelled with a one character label (mask
column #3) which tells the scale type. For example, variables with a
ratio scale are labelled with `R
' (discrete) or with `r
' (continuous) or
with `F
' (variable is a frequency). If the user omits these labels (each
scale label is then ` '), SURVO 84C will skip all scale checks.
In any case, at first the scales
function is called to remove
variables which have the scale type label `-
', which means that the
variable in question has no scale at all. For example, `names' and
`addresses' are typically variables (fields) without a scale. Of course,
a careful user does not select such variables for computations, but it
is safer to have an extra check by the scales
function in order to avoid
harmful consequences.
On lines 66-74 the program tests the scale of the weight variable (if
it is used). It is done by using the scale_ok
function which is set to
require RATIO_SCALE
for the weight variable. RATIO_SCALE
is a predefined
(in survodat.h
) string constant "
RrF"
telling the permitted scale type
alternatives.
If the scale is not OK, an error message is displayed (on lines
70-71). The continuation depends on the value of the SURVO 84C system
parameter scale_check
. This parameter can be set to 0, 1 or 2 by the
user where 0 means that scale_ok
always returns 1 and no warning error
messages are given, i.e. everything is accepted. The value
scale_check
=1 implies that messages are given as warnings, but the
analysis can be continued. At the strictest level (value
SCALE_INTERRUPT
=2) the process is actually interrupted as we can see on
line 72.
The remaining lines of test_scaletypes
are devoted to corresponding
checks for active variables which now should have a SCORE_SCALE
at
least. See how the d.v[]
array selects the d.m_act
variables from all
d.m
variables. (In our example d.m
=5, d.m_act
=3 and d.v[0]
=2, d.v[1]
=3,
d.v[2]
=4.)
The error messages and warnings are given by producing an output
string by the standard sprintf
function (usually to a global buffer
sbuf
of max. 256 characters) and then yielding the output by
sur_print(sbuf)
.
The next function to be introduced is space_allocation
:
94 space_allocation()
95 {
96 sum=(double *)malloc(d.m_act*sizeof(double));
97 if (sum==NULL) { not_enough_memory(); return(-1); }
98 f=(long *)malloc(d.m_act*sizeof(long));
99 if (f==NULL) { not_enough_memory(); return(-1); }
100 w=(double *)malloc(d.m_act*sizeof(double));
101 if (w==NULL) { not_enough_memory(); return(-1); }
102 return(1);
103 }
104
105 not_enough_memory()
106 {
107 sur_print("\nNot enough memory! (MEAN)");
108 WAIT;
109 }
This function allocates memory for arrays sum
, f
and w
, which all
should have d.m_act
elements.
It is strongly recommended to use dynamic memory allocation for all
working space which is dependent on the size of the data set. Then no
theoretical limits appear for the number of variables, etc. In practice
there are always some limits. On the 16 bit micros we typically have
still the 64KB limit for a single array unless the huge memory model is
used.
Since errors in memory allocation may have very surprising
consequences, it is, of course, possible to start with fixed dimensions
and later when all the space requirements are clear, dynamic arrays are
established.
For example, the lines 13-16 in the main function could read:
13 #define MAX 100
14 double sum[MAX]; /* sums of active variables */
15 long f[MAX]; /* frequencies */
16 double w[MAX]; /* sums of weights */
and space_allocation
is not needed at all, but this should be a
temporary arrangement only.
The data set will be scanned by the compute_sums
function:
111 compute_sums()
112 {
113 int i;
114 long l;
115
116 n=0L;
117 for (i=0; i<d.m_act; ++i)
118 { f[i]=0L; w[i]=0.0; sum[i]=0.0; }
119
120 sur_print("\n");
121 for (l=d.l1; l<=d.l2; ++l)
122 {
123 double weight;
124
125 if (unsuitable(&d,l)) continue;
126 if (weight_variable==-1) weight=1.0;
127 else
128 {
129 data_load(&d,l,weight_variable,&weight);
130 if (weight==MISSING8) continue;
131 }
132 ++n;
133 sprintf(sbuf,"%ld ",l); sur_print(sbuf);
134 for (i=0; i<d.m_act; ++i)
135 {
136 double x;
137
138 if (d.v[i]==weight_variable) continue;
139 data_load(&d,l,d.v[i],&x);
140 if (x==MISSING8) continue;
141 ++f[i]; w[i]+=weight; sum[i]+=weight*x;
142 }
143 }
144 }
At first, the work space is cleared (lines 116-118) and then the rest
of the function consists of a loop for active observations (from d.l1
to
d.l2
). In this loop the function unsuitable
checks (line 125) whether
the conditions (set by conditions
in the main module) are met in the
current observation j
. If not, the rest of the loop is skipped.
If the observation is accepted, first the value of the possible
weight variable is read by the data_load
function (line 129). If weight
is missing (line 130), the rest of the loop is skipped. If there is no
weight variable, weight=1.0
is selected (line 126).
Thereafter the number of cases n
is increased by one and the order of
the current observation is displayed on the screen to indicate that the
run is going on (lines 132-133).
In the inner loop (lines 134-142) all the active variables are
scanned and the cumulative sums updated. However, the weight variable is
skipped (on line 138). Similarly, possible missing values of active
variables are omitted. By comparing n
to f[i]
we can see the number of
missing observations in each variable separately.
The final task of the !MEAN module is to give the results by calling
the printout
function:
146 printout()
147 {
148 int i;
149 char line[LLENGTH];
150 char mean[32];
151
152 output_open(eout);
153 sprintf(line," Means of variables in %s N=%ld%c",
154 word[1],n,EOS);
155 if (weight_variable>=0)
156 {
157 strcat(line," Weight=");
158 strncat(line,d.varname[weight_variable],8);
159 }
160 print_line(line);
161 strcpy(line," Variable Mean N(missing)");
162 print_line(line);
163 for (i=0; i<d.m_act; ++i)
164 {
165 if (d.v[i]==weight_variable) continue;
166 if (w[i]==0.0)
167 sprintf(line," %-8.8s - %6ld",d.varname[d.v[i]],
168 n-f[i]);
169 else
170 {
171 fnconv(sum[i]/w[i],accuracy+2,mean);
172 sprintf(line," %-8.8s %s %6ld",d.varname[d.v[i]],
173 mean,n-f[i]);
174 }
175 print_line(line);
176 }
177 output_close(eout);
178 }
179
180 print_line(line)
181 char *line;
182 {
183 output_line(line,eout,results_line);
184 if (results_line) ++results_line;
185 }
At first the output file/device eout
is opened by the output_open
function. Thereafter lines can be written to eout
by the output_line
function (called in the function print_line
on line 183). The lines are
appended to the file. So no previous results are overwritten.
The SURVO 84C library function output_line
writes also lines in the
current edit field provided that the third argument (here results_line
)
gives a valid line number. Remember that the first line for the results
was optional in the MEAN operation and we set results_line=0
(on line
42) if that line label was missing.
print_line
(lines 180-185) is only an auxiliary function to keep an
eye on the current output line in the edit field.
It is a practice in SURVO 84C that the numerical accuracy of the
printed numbers can be controlled by the user. This happens by using the
system parameter accuracy
(typically set to the value 7 in SURVO.APU)
which gives the desired number of significant digits and such. The
writers of the modules must take the current value of accuracy
into
account when selecting the printout parameters. The library function
fnconv
is often useful in this task. Here (on line 171) it formats the
means. accuracy+2
gives the total length of the resulting string mean
;
we must have one extra place for sign and one for the decimal point.
These 185 lines constitute the whole !MEAN module in its source form. Since several library functions were employed and there are many `hidden' or optional properties included, the total amount of code after compiling and linking is about 60KB. However, if the module grows, the actual code size is not growing proportionally. For example, !MEAN can be considered a tiny special case of the !CORR module which computes standard deviations and correlations in addition to means, but the size of !CORR is only 6KB more than the size of !MEAN. Thus it is profitable to create modules with several tasks and options.
All SURVO 84C compilands of SURVO 84C modules have to be compiled in
the large memory model because the SURVO 84C libraries (SURVO.LIB
,
SURVOMAT.LIB
, etc.) are available in this model only. Thus, the !MEAN.C
file is compiled by the command
CL /c /AL !MEAN.C
and it is linked by
LINK !MEAN,,NUL.MAP,SURVO /STACK:4000 /NOE .
!MEAN was made and presented only for illustration. Source codes for selected true SURVO 84C modules are available separately.
Each module (as an .EXE
file) is normally saved in the SURVO 84C
system directory (typically C:\E
) and activated by the user as MEAN
.
During the testing stage, it can be activated from any disk or path. For
example, if !MEAN.EXE
is on the disk A:
,
A:!MEAN DATA1,11
is a valid command in SURVO 84C.