An array is a container object that can contain many values of one data type. Arrays are very useful objects and are indispensable for certain types of programming. The purpose of this chapter is to describe how arrays are defined and used in the S-Lang language.
The S-Lang language supports multi-dimensional arrays of all data
types. Since the Array_Type
is a data type, one can even
have arrays of arrays. To create a multi-dimensional array of
SomeType and assign to some variable, use:
a = SomeType [dim0, dim1, ..., dimN];
Here dim0, dim1, ... dimN specify the size of
the individual dimensions of the array. The current implementation
permits arrays to contain as many as 7 dimensions. When a
numeric array is created, all its elements are initialized to zero.
The initialization of other array types depend upon the data type,
e.g., the elements in String_Type
and
Struct_Type
arrays are
initialized to NULL
.
As a concrete example, consider
a = Integer_Type [10];
which creates a one-dimensional array of 10 integers and
assigns it to a
.
Similarly,
b = Double_Type [10, 3];
creates a 30
element array of double precision numbers
arranged in 10
rows and 3
columns, and assigns it to
b
.
There is a more convenient syntax for creating and initializing 1-d arrays. For example, to create an array of ten integers whose elements run from 1 through 10, one may simply use:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
Similarly,
b = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
specifies an array of ten doubles.
An even more compact way of specifying a numeric array is to use a range-array. For example,
a = [0:9];
specifies an array of 10 integers whose elements range from 0
through 9. The syntax for the most general form of range array is
given by
[first-value : last-value : increment]
where the increment is optional and defaults to 1. This
creates an array whose first element is first-value and whose
successive values differ by increment. last-value sets
an upper limit upon the last value of the array as described below.
If the range array [a:b:c]
is integer valued, then the
interval specified by a
and b
is closed. That is, the
kth element of the array x_k is given by x_k=a+kc and
satisfies a<=x_k<=b. Hence, the number of elements in an
integer range array is given by the expression 1 + (b-a)/c.
The situation is somewhat more complicated for floating point range
arrays. The interval specified by a floating point range array
[a:b:c]
is semi-open such that b
is not contained in
the interval. In particular, the kth element of [a:b:c]
is
given by x_k=a+kc such that a<=x_k<b when
c>=0, and b<x_k<=a otherwise. The number of elements
in the array is one greater than the largest k that
satisfies the open interval constraint.
In contrast, a range-array expressed in the form [a:b:#n]
represents an array of exactly n elements running from a
to
b
inclusive. It is equivalent to
a+[0:n-1]*(b-a)/(n-1)
.
Here are a few examples that illustrate the above comments:
[1:5:1] ==> [1,2,3,4,5]
[1.0:5.0:1.0] ==> [1.0, 2.0, 3.0, 4.0]
[5:1:-1] ==> [5,4,3,2,1]
[5.0:1.0:-1.0] ==> [5.0, 4.0, 3.0, 2.0];
[1:1] ==> [1]
[1.0:1.0] ==> []
[1.0:1.0001] ==> [1.0]
[1:-3] ==> []
[0:1:#5] ==> [0.0, 0.25, 0.50, 0.75, 1.0]
[0:-1:#3] ==> [0.0, -0.5, -1.0]
Currently Int_Type
is the only integer type supported
by range arrays--- arbitrary integer types will be supported in a
future version. This means that [1h:5h]
will not produce
an array of Short_Type
, rather it will produce an
Int_Type
array. However, [1h,2h,3h,4h,5h]
will
produce an array of Short_Type
integers.
Another way to create an array is to apply the dereference operator
@
to the DataType_Type
literal Array_Type
. The
actual syntax for this operation resembles a function call
variable a = @Array_Type (data-type, integer-array);
where data-type is of type DataType_Type
and
integer-array is a 1-d array of integers that specify the size
of each dimension. For example,
variable a = @Array_Type (Double_Type, [10, 20]);
will create a 10 by 20 array of doubles and assign it
to a
. This method of creating arrays derives its power from
the fact that it is more flexible than the methods discussed in this
section. It is particularly useful for creating arrays during
run-time in situations where the data-type can vary.
It is sometimes useful to change the `shape' of an array using
the reshape
function. For example, a 1-d 10 element array
may be reshaped into a 2-d array consisting of 5 rows and 2
columns. The only restriction on the operation is that the arrays
must be commensurate. The reshape
function follows the
syntax
reshape (array-name, integer-array);
where array-name specifies the array to be reshaped to
the dimensions given by integer-array
, a 1-dimensional array of
integers. It is important to note that this does not create a
new array, it simply reshapes the existing array. Thus,
variable a = Double_Type [100];
reshape (a, [10, 10]);
turns a
into a 10 by 10 array, as well as any
other variables attached to the array.
The _reshape
function works like reshape
except that
it creates a new array instead of changing the shape of an existing
array:
new_a = _reshape (a, [10,10]);
An individual element of an array may be referred to by its
index. For example, a[0]
specifies the zeroth element
of the one dimensional array a
, and b[3,2]
specifies
the element in the third row and second column of the two
dimensional array b
. As in C, array indices are numbered
from 0. Thus if a
is a one-dimensional array of ten
integers, the last element of the array is given by a[9]
.
Using a[10]
would result in an IndexError
exception.
A negative index may be used to index from the end of the array,
with a[-1]
referring to the last element of a
.
Similarly, a[-2]
refers to the next to the last element, and
so on.
One may use the indexed value like any other variable. For example, to set the third element of an integer array to 6, use
a[2] = 6;
Similarly, that element may be used in an expression, such as
y = a[2] + 7;
Unlike other S-Lang variables which inherit a type upon assignment,
array elements already have a type and any attempt to assign a
value with an incompatible type will result in a
TypeMismatchError
exception. For example, it is illegal to
assign a string value to an integer array.
One may use any integer expression to index an array. A simple example that computes the sum of the elements of a 10 element 1-d array is
variable i, s;
s = 0;
for (i = 0; i < 10; i++) s += a[i];
(In practice, do not carry out sums this way--- use the
sum
function instead, which is much simpler and faster, i.e.,
s=sum(a)
).
Unlike many other languages, S-Lang permits arrays to be indexed by
other integer arrays. Suppose that a
is a 1-d array of 10
doubles. Now consider:
i = [6:8];
b = a[i];
Here, i
is a 1-dimensional range array of three integers with
i[0]
equal to 6, i[1]
equal to 7,
and i[2]
equal to 8. The statement b = a[i];
will create a 1-d array of three doubles and assign it to b
.
The zeroth element of b
, b[0]
will be set to the sixth
element of a
, or a[6]
, and so on. In fact, these two simple
statements are equivalent to
b = Double_Type [3];
b[0] = a[6];
b[1] = a[7];
b[2] = a[8];
except that using an array of indices is not only much more
convenient, but executes much faster.
More generally, one may use an index array to specify which elements are to participate in a calculation. For example, consider
a = Double_Type [1000];
i = [0:499];
j = [500:999];
a[i] = -1.0;
a[j] = 1.0;
This creates an array of 1000 doubles and sets the first
500 elements to -1.0
and the last 500 to
1.0
. Actually, one may do away with the i
and j
variables altogether and use
a = Double_Type [1000];
a[[0:499]] = -1.0;
a[[500:999]] = 1.0;
It is important to note that the syntax requires the use of the
double square brackets, and in particular that a[[0:499]]
is
not the same as a[0:499]
. In fact, the latter will
generate a syntax error.
Index-arrays are not contrained to be one-dimensional arrays. Suppose
that I
represents a multidimensional index array, and that A
is the array to be indexed. Then what does A[I]
represent?
Its value will be an array of the same type as A
, but with
the dimensionality of I. For example,
a = 1.0*[1:10];
i = _reshape ([4,5,6,7,8,9], [2,3]);
defines a
to be a 10 element array of doubles, and i
to be 2x3
array of integers. Then a[i]
will be a
2x3
array of doubles with elements:
a[4] a[5] a[6]
a[7] a[8] a[9]
Often, it is convenient to use a ``rubber'' range to specify
indices. For example, a[[500:]]
specifies all elements of
a
whose index is greater than or equal to 500
. Similarly,
a[[:499]]
specifies the first 500 elements of a
.
Finally, a[[:]]
specifies all the elements of a
. The
latter form may also be written as a[*]
.
One should be careful when using index arrays with negative
elements. As pointed out above, a negative index is used to index
from the end of the array. That is, a[-1]
refers to the
last element of a
. How should a[[[0:-1]]
be
interpreted?
In version 1 of the interpreter, when used in an
array indexing context, a construct such as [0:-1]
was taken
to mean from the first element through the last. While this might
seem like a convenient shorthand, in retrospect it was a bad idea.
For this reason, the meaning of a ranges over negative valued
indices was changed in version 2 of the interpreter as follows:
First the index-range gets expanded to an array of indices
according to the rules for range arrays described above. Then if
any of the resulting indices are negative, they are interpreted as
indices from the end of the array. For example, if a
is an array of 10 elements, then a[[-2:3]]
is first expanded
to a[[-2,-1,0,1,2,3]]
, and then to the 6 element array
[ a[8], a[9], a[0], a[1], a[2], a[3] ]
So, what does a[[0:-1]]
represent in the new interpretation?
Since [0:-1]
expands to an empty array, a[[0:-1]]
will
also produce an empty array.
Indexing of multidimensional arrays using ranges works similarly.
Suppose a
is a 100 by 100 array of doubles. Then
the expression a[0, *]
specifies all elements in the zeroth
row. Similarly, a[*, 7]
specifies all elements in the
seventh column. Finally, a[[3:5],[6:12]]
specifies the
3 by 7 region consisting of rows 3, 4,
and 5, and columns 6 through 12 of a
.
Before leaving this section, a few examples are presented to illustrate some of these points.
The ``trace'' of a matrix is an important concept that occurs frequently in linear algebra. The trace of a 2d matrix is given by the sum of its diagonal elements. Consider the creation of a function that computes the trace of such a matrix.
The most straightforward implementation of such a function uses an explicit loop:
define array_trace (a, n)
{
variable s = 0, i;
for (i = 0; i < n; i++) s += a[i, i];
return s;
}
Better yet is to recognize that the diagonal elements of an
n
by n
array are given by an index array I
with elements 0, n+1
, 2*n+2
, ..., n*n-1
, or
more precisely as
[0:n*n-1:n+1]
Hence the above may be written more simply as
define array_trace (a, n)
{
return sum (a[[0:n*n-1:n+1]]);
}
The following example creates a 10 by 10 integer array, sets its diagonal elements to 5, and then computes the trace of the array:
a = Integer_Type [10, 10];
a[[0:99:11]] = 5;
the_trace = array_trace(a, 10);
In the previous examples, the size of the array was passed as an
additional argument. This is unnecessary because the size may be
obtained from array itself by using the array_shape
function. For example, the following function may be used to
obtain the indices of the diagonal element of an array:
define diag_indices (a)
{
variable dims = array_shape (a);
if (length (dims) != 2)
throw InvalidParmError, "Expecting a 2d array";
if (dims[0] != dims[1])
throw InvalidParmError, "Expecting a square array";
variable n = dims[0];
return [0:n*(n-1):n+1];
}
Using this function, the trace function may be written more simply as
define array_trace (a)
{
return sum (a[diag_indices(a)]);
}
Another example of this technique is a function that creates an
n
by n
unit matrix:
define unit_matrix (n)
{
variable a = Int_Type[n, n];
a[diag_indices(a)] = 1;
return a;
}
When an array is created and assigned to a variable, the interpreter allocates the proper amount of space for the array, initializes it, and then assigns to the variable a reference to the array. So, a variable that represents an array has a value that is really a reference to the array. This has several consequences, most good and some bad. It is believed that the advantages of this representation outweigh the disadvantages. First, we shall look at the positive aspects.
When a variable is passed to a function, it is always the value of the variable that gets passed. Since the value of a variable representing an array is a reference, a reference to the array gets passed. One major advantage of this is rather obvious: it is a fast and efficient way to pass the array. This also has another consequence that is illustrated by the function
define init_array (a)
{
variable i;
variable n = length(a);
_for i (0, n-1, 1)
a[i] = some_function (i);
}
where some_function
is a function that generates a scalar
value to initialize the ith element. This function can be
used in the following way:
variable X = Double_Type [100000];
init_array (X);
Since the array is passed to the function by reference, there is no
need to make a separate copy of the 100000
element array. As
pointed out above, this saves both execution time and memory. The
other salient feature to note is that any changes made to the
elements of the array within the function will be manifested in the
array outside the function. Of course, in this case this is a
desirable side-effect.
To see the downside of this representation, consider:
a = Double_Type [10];
b = a;
a[0] = 7;
What will be the value of b[0]
? Since the value of a
is really a reference to the array of ten doubles, and that
reference was assigned to b
, b
also refers to the same
array. Thus any changes made to the elements of a
, will also
be made implicitly to b
.
This begs the question: If the assignment of a variable attached to
an an array to another variable results in the assignment of the
same array, then how does one make separate copies
of the array? There are several answers including using an index
array, e.g., b = a[*]
; however, the most natural method is
to use the dereference operator:
a = Double_Type [10];
b = @a;
a[0] = 7;
In this example, a separate copy of a
will be created and
assigned to b
. It is very important to note that S-Lang
never implicitly dereferences an object. So, one must explicitly use
the dereference operator. This means that the elements of a
dereferenced array are not themselves dereferenced. For example,
consider dereferencing an array of arrays, e.g.,
a = Array_Type [2];
a[0] = Double_Type [10];
a[1] = Double_Type [10];
b = @a;
In this example, b[0]
will be a reference to the array that
a[0]
references because a[0]
was not explicitly
dereferenced.
Many functions and operations work transparently with arrays.
For example, if a
and b
are arrays, then the sum
a + b
is an array whose elements are formed from the sum of
the corresponding elements of a
and b
. A similar
statement holds for all other binary and unary operations.
Let's consider a simple example. Suppose, that we wish to solve a
set of n
quadratic equations whose coefficients are given by
the 1-d arrays a
, b
, and c
. In general, the
solution of a quadratic equation will be two complex numbers. For
simplicity, suppose that all we really want is to know what subset of
the coefficients, a
, b
, c
, correspond to
real-valued solutions. In terms of for
loops, we can write:
index_array = Char_Type [n];
_for i (0, n-1, 1)
{
d = b[i]^2 - 4 * a[i] * c[i];
index_array [i] = (d >= 0.0);
}
In this example, the array index_array
will contain a
non-zero value if the corresponding set of coefficients has a
real-valued solution. This code may be written much more compactly
and with more clarity as follows:
index_array = ((b^2 - 4 * a * c) >= 0.0);
Moreover, it executes about 20 times faster than the version using
an explicit loop.
S-Lang has a powerful built-in function called where
. This
function takes an array of boolean values and returns an array of
indices that correspond to where the elements of the input array
are non-zero. The utility of this simple operation cannot be
overstated. For example, suppose a
is a 1-d array of n
doubles, and it is desired to set all elements of the array whose
value is less than zero to zero. One way is to use a for
loop:
_for i (0, n-1, 1)
if (a[i] < 0.0) a[i] = 0.0;
If n
is a large number, this statement can take some time to
execute. The optimal way to achieve the same result is to use the
where
function:
a[where (a < 0.0)] = 0;
Here, the expression (a < 0.0)
returns a boolean array whose
dimensions are the same size as a
but whose elements are
either 1 or 0, according to whether or not the corresponding
element of a
is less than zero. This array of zeros and ones
is then passed to the where
function, which returns a 1-d
integer array of indices that indicate where the elements of
a
are less than zero. Finally, those elements of a
are
set to zero.
Consider once more the example involving the set of n
quadratic equations presented above. Suppose that we wish to get
rid of the coefficients of the previous example that generated
non-real solutions. Using an explicit for
loop requires code
such as:
nn = 0;
_for i (0, n-1, 1)
if (index_array [i]) nn++;
tmp_a = Double_Type [nn];
tmp_b = Double_Type [nn];
tmp_c = Double_Type [nn];
j = 0;
_for i (0, n-1, 1)
{
if (index_array [i])
{
tmp_a [j] = a[i];
tmp_b [j] = b[i];
tmp_c [j] = c[i];
j++;
}
}
a = tmp_a;
b = tmp_b;
c = tmp_c;
Not only is this a lot of code, making it hard to digest, but it is
also clumsy and error-prone. Using the where
function, this
task is trivial and executes in a fraction of the time:
i = where (index_array != 0);
a = a[i];
b = b[i];
c = c[i];
Most of the examples up till now assumed that the dimensions of the
array were known. Although the intrinsic function length
may be used to get the total number of elements of an array, it
cannot be used to get the individual dimensions of a
multi-dimensional array. The array_shape
function may
be used to determine the dimensionality of an array. It may be used
to determine the number of rows of an array as follows:
define num_rows (a)
{
return array_shape (a)[0];
}
The number of columns may be obtained in a similar manner:
define num_cols (a)
{
variable dims = array_shape (a);
if (length(dims) > 1) return dims[1];
return 1;
}
The array_shape
function may also be used to create an array
that has the same number of dimensions as another array:
define make_int_array (a)
{
return @Array_Type (Int_Type, array_shape (a));
}
Finally, the array_info
function may be used to get
additional information about an array, such as its data type and
size.
Sometimes it is desirable to create an array of arrays. For example,
a = Array_Type[3];
a[0] = [1:10];
a[1] = [1:100];
a[2] = [1:1000];
will produce an array of the 3 arrays [1:10]
, [1:100]
,
and [1:1000]
. Index arrays may be used to access elements of
an array of arrays: a[[1,2]] will produce an array of arrays that
consists of the elements a[1] and a[2]. However, it is important to
note that setting the elements of an array of arrays via an index
array does not work as one might naively expect. Consider the
following:
b = Array_Type[3];
b[*] = a[[2,1,0]];
where a
is the array of arrays given in the previous example.
The reader might expect b
to have elements
b[0]=a[2]
, b[1]=a[1]
, and b[2]=a[0]
, and be
surprised to learn that b[0]=b[1]=b[2]=a[[2,1,0]]
. The reason
for this is that, by definition, b
is an array of arrays, and
even though a[[2,1,0]]
is an array of arrays, it is
first and foremost an array, and it is that array that is assigned
to the elements of b
.