A structure is a heterogeneous container object, i.e., it is an object with elements whose values do not have to be of the same data type. The elements or fields of a structure are named, and one accesses a particular field of the structure via the field name. This should be contrasted with an array whose values are of the same type, and whose elements are accessed via array indices.
A user-defined data type is a structure with a fixed set of fields defined by the user.
The struct
keyword is used to define a structure. The syntax
for this operation is:
struct {field-name-1, field-name-2, ... field-name-N};
This creates and returns a structure with N fields whose names
are specified by field-name-1, field-name-2, ...,
field-name-N. When a structure is created, the values of its
fields are initialized to NULL
.
For example,
variable t = struct { city_name, population, next };
creates a structure with three fields and assigns it to the
variable t
.
Alternatively, a structure may be created by dereferencing
Struct_Type
. Using this technique, the above structure may
be created using one of the two forms:
t = @Struct_Type ("city_name", "population", "next");
t = @Struct_Type (["city_name", "population", "next"]);
This approach is useful when creating structures dynamically where
one does not know the name of the fields until run-time.
Like arrays, structures are passed around by reference. Thus,
in the above example, the value of t
is a reference to the
structure. This means that after execution of
u = t;
both t
and u
refer to the same underlying
structure, since only the reference was copied by the assignment. To
actually create a new copy of the structure, use the
dereference operator, e.g.,
variable u = @t;
It create new structure whose field names are identical to the old
and copies the field values to the new structure. If any of the
values are objects that are passed by reference, then only the
references will be copied. In other words,
t = struct{a};
t.a = [1:10];
u = @t;
will produce a structure u
that references the same array as
t
.
The dot (.
) operator is used to specify the particular
field of structure. If s
is a structure and field_name
is a field of the structure, then s.field_name
specifies
that field of s
. This specification can be used in
expressions just like ordinary variables. Again, consider
t = struct { city_name, population, next };
described in the last section. Then,
t.city_name = "New York";
t.population = 13000000;
if (t.population > 200) t = t.next;
are all valid statements involving the fields of t
.
One of the most important uses of structures is the creation of dynamic data structures such as linked-lists. A linked-list is simply a chain of structures that are linked together such that one structure in the chain is the value of a field of the previous structure in the chain. To be concrete, consider the structure discussed earlier:
t = struct { city_name, population, next };
and suppose that it is desired to create a linked-list of such
objects to store population data.
The purpose of the next
field is to provide the link to the
next structure in the chain. Suppose that there exists a function,
read_next_city
, that reads city names and populations from a
file. Then the list may be created using:
define create_population_list ()
{
variable city_name, population, list_root, list_tail;
variable next;
list_root = NULL;
while (read_next_city (&city_name, &population))
{
next = struct {city_name, population, next };
next.city_name = city_name;
next.population = population;
next.next = NULL;
if (list_root == NULL)
list_root = next;
else
list_tail.next = next;
list_tail = next;
}
return list_root;
}
In this function, the variables list_root
and list_tail
represent the beginning and end of the list, respectively. As long
as read_next_city
returns a non-zero value, a new structure is
created, initialized, and then appended to the list via the
next
field of the list_tail
structure. On the first
time through the loop, the list is created via the assignment to the
list_root
variable.
This function may be used as follows:
Population_List = create_population_list ();
if (Population_List == NULL)
throw RunTimeError, "List is empty";
Other functions may be created that manipulate the list. Here is one
that finds the city with the largest population:
define get_largest_city (list)
{
variable largest;
largest = list;
while (list != NULL)
{
if (list.population > largest.population)
largest = list;
list = list.next;
}
return largest.city_name;
}
vmessage ("%s is the largest city in the list",
get_largest_city (Population_List));
The get_largest_city
is a typical example of how one traverses
a linear linked-list by starting at the head of the list and
successively moves to the next element of the list via the
next
field.
In the previous example, a while
loop was used to traverse the
linked list. It is also possible to use a foreach
loop for this:
define get_largest_city (list)
{
variable largest, elem;
largest = list;
foreach elem (list)
{
if (elem.population > largest.population)
largest = elem;
}
return largest.city_name;
}
Here a foreach
loop has been used to walk the list via its
next
field. If the field name linking the elements was not
called next
, then it would have been necessary to use the
using
form of the foreach
statement. For example, if the
field name implementing the linked list was next_item
, then
foreach item (list) using ("next_item")
{
.
.
}
would have been used. In other words, unless otherwise indicated
via the using
clause, foreach
walks the list using a field
named next
by default.
Now consider a function that sorts the list according to population. To illustrate the technique, a bubble-sort will be used, not because it is efficient (it is not), but because it is simple, intuitive, and provides another example of structure manipulation:
define sort_population_list (list)
{
variable changed;
variable node, next_node, last_node;
do
{
changed = 0;
node = list;
next_node = node.next;
last_node = NULL;
while (next_node != NULL)
{
if (node.population < next_node.population)
{
% swap node and next_node
node.next = next_node.next;
next_node.next = node;
if (last_node != NULL)
last_node.next = next_node;
if (list == node) list = next_node;
node = next_node;
next_node = node.next;
changed++;
}
last_node = node;
node = next_node;
next_node = next_node.next;
}
}
while (changed);
return list;
}
Note the test for equality between list
and node
, i.e.,
if (list == node) list = next_node;
It is important to appreciate the fact that the values of these
variables are references to structures, and that the
comparison only compares the references and not the actual
structures they reference. If it were not for this, the algorithm
would fail.
A user-defined data type may be defined using the typedef
keyword. In the current implementation, a user-defined data type
is essentially a structure with a user-defined set of fields. For
example, in the previous section a structure was used to represent
a city/population pair. We can define a data type called
Population_Type
to represent the same information:
typedef struct
{
city_name,
population
} Population_Type;
This data type can be used like all other data types. For example,
an array of Population_Type types can be created,
variable a = Population_Type[10];
and `populated' via expressions such as
a[0].city_name = "Boston";
a[0].population = 2500000;
The new type Population_Type
may also be used with the
typeof
function:
if (Population_Type == typeof (a))
city = a.city_name;
The dereference @
may be used to create an instance of the
new type:
a = @Population_Type;
a.city_name = "Calcutta";
a.population = 13000000;
Another feature that user-defined types possess is that the action of the binary and unary operations may be defined for them. This idea is discussed in more detail below.
The binary and unary operators may be extended to user-defined types. To illustrate how this works, consider a data type that represents a vector in 3-space:
typedef struct { x, y, z } Vector_Type;
and a function that instantiates such an object:
define vector_new (x, y, z)
{
variable v = @Vector_Type;
v.x = double(x); v.y = double(y); v.z = double(z);
return v;
}
This function may be used to define a function that adds two vectors
together:
define vector_add (v1, v2)
{
return vector_new (v1.x+v2.x, v1.y+v2.y, v1.z+v2.z);
}
Using these functions, three vectors representing the points
(2,3,4)
, (6,2,1)
, and (-3,1,-6)
may be created using
V1 = vector_new (2,3,4);
V2 = vector_new (6,2,1);
V3 = vector_new (-3,1,-6);
and then added together via
V4 = vector_add (V1, vector_add (V2, V3));
The problem with the last statement is that it is not a very natural
way to express the addition of three vectors. It would be far better
to extend the action of the binary +
operator to the
Vector_Type
objects and then write the above sum more simply as
V4 = V1 + V2 + V3;
The __add_binary
function defines the result of a binary
operation between two data types:
__add_binary (op, result-type, funct, typeA,typeB);
Here, op is a string representing any one of the binary operators
("+"
, "-"
, "*"
, "/"
, "=="
,...),
and funct is reference to a function that carries out the binary
operation between objects of types typeA and typeB to
produce an object of type result-type.
This function may be used to extend the +
operator to
Vector_Type objects:
__add_binary ("+", Vector_Type, &vector_add, Vector_Type, Vector_Type);
Similarly the subtraction and equality operators may be extended to
Vector_Type
via
define vector_minus (v1, v2)
{
return vector_new (v1.x-v2.x, v1.y-v2.y, v1.z-v2.z);
}
__add_binary ("-", Vector_Type, &vector_minus, Vector_Type, Vector_Type);
define vector_eqs (v1, v2)
{
return (v1.x==v2.x) and (v1.y==v2.y) and (v1.z==v2.z);
}
__add_binary ("==", Char_Type, &vector_eqs, Vector_Type, Vector_Type);
permitting a statement such as
if (V2 != V1) V3 = V2 - V1;
The -
operator is also an unary operator that is customarily
used to change the sign of an object. Unary operations may be
extended to Vector_Type
objects using the __add_unary
function:
define vector_chs (v)
{
return vector_new (-v.x, -v.y, -v.z);
}
__add_unary ("-", Vector_Type, &vector_chs, Vector_Type);
A trivial example of the use of the unary minus is V4 = -V2
.
It is interesting to consider the extension of the multiplication
operator *
to Vector_Type
. A vector may be multiplied
by a scalar to produce another vector. This can happen in two ways as
reflected by the following functions:
define vector_scalar_mul (v, a)
{
return vector_new (a*v.x, a*v.y, a*v.z);
}
define scalar_vector_mul (a, v)
{
return vector_new (a*v.x, a*v.y, a*v.z);
}
Here a
represents the scalar, which can be any object that may
be multiplied by a Double_Type
, e.g., Int_Type
,
Float_Type
, etc. Instead of using multiple statements
involving __add_binary
to define the action of
Int_Type+Vector_Type
, Float_Type+Vector_Type
, etc, a
single statement using Any_Type
to represent a ``wildcard''
type may be used:
__add_binary ("*", Vector_Type, &vector_scalar_mul, Vector_Type, Any_Type);
__add_binary ("*", Vector_Type, &scalar_vector_mul, Any_Type, Vector_Type);
There are a couple of natural possibilities for
Vector_Type*Vector_Type
: The cross-product defined by
define crossprod (v1, v2)
{
return vector_new (v1.y*v2.z-v1.z*v2.y,
v1.z*v2.x-v1.x*v2.z,
v1.x*v2.y-v1.y*v2.x);
}
and the dot-product:
define dotprod (v1, v2)
{
return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
}
The binary *
operator between two vector types may be defined
to be just one of these functions--- it cannot be extended to both.
If the dot-product is chosen then one would use
__add_binary ("*", Double_Type, &dotprod, Vector_Type_Type, Vector_Type);
Just because it is possible to define the action of a binary or unary operator on an user-defined type, it is not always wise to do so. A useful rule of thumb is to ask whether defining a particular operation leads to more readable and maintainable code. For example, simply looking at
c = a + b;
in isolation one can easily overlook the fact that a function such as
vector_add
may be getting executed. Moreover, in cases where
the action is ambiguous such as Vector_Type*Vector_Type
it may
not be clear what
c = a*b;
means unless one knows exactly what choice was made when extending
the *
operator to the types. For this reason it may
be wise to leave Vector_Type*Vector_Type
undefined and use
``old-fashioned'' function calls such as
c = dotprod (a, b);
d = crossprod (a, b);
to avoid the ambiguity altogether.
Finally, the __add_string
function may be used to define the
string representation of an object. Examples involving the string
representation include:
message ("The value is " + string (V));
vmessage ("The result of %S+%S is %S", V1, V1, V1+V2);
str = "The value of V is $V"$;
For the Vector_Type
one might want to use the string
represention generated by
define vector_string (v)
{
return sprintf ("(%S,%S,%S)", v.x, v.y, v.z);
}
__add_string (Vector_Type, &vector_string);