The current implementation of the S-Lang language permits up to 65535
distinct data types, including predefined data types such as integer and
floating point, as well as specialized application-specific data
types. It is also possible to create new data types in the
language using the typedef
mechanism.
Literal constants are objects such as the integer 3 or the
string "hello"
. The actual data type given to a literal
constant depends upon the syntax of the constant. The following
sections describe the syntax of literals of specific data types.
The current version of S-Lang defines integer, floating point,
complex, and string types. It also defines special purpose data
types such as Null_Type
, DataType_Type
, and
Ref_Type
. These types are discussed below.
The S-Lang language supports both signed and unsigned characters, short integer, long integer, and long long integer types. On most 32 bit systems, there is no difference between an integer and a long integer; however, they may differ on 16 and 64 bit systems. Generally speaking, on a 16 bit system, plain integers are 16 bit quantities with a range of -32767 to 32767. On a 32 bit system, plain integers range from -2147483648 to 2147483647.
An plain integer literal can be specified in one of several ways:
A
through F
. The hexadecimal
number must be preceded by the characters 0x
. For example,
0x7F
specifies an integer using hexadecimal notation and has
the same value as decimal 127.
0b
prefix. For example, 21 may be expressed in binary using
0b10101
.Short, long, long long, and unsigned types may be specified by
using the proper suffixes: L
indicates that the integer is a
long integer, LL
indicates a long long integer, h
indicates that the integer is a short integer, and U
indicates that it is unsigned. For example, 1UL
specifies
an unsigned long integer.
Finally, a character literal may be specified using a notation
containing a character enclosed in single quotes as 'a'
.
The value of the character specified this way will lie in the
range 0 to 256 and will be determined by the ASCII value of the
character in quotes. For example,
i = '0';
assigns to i
the character 48 since the '0'
character
has an ASCII value of 48.
A ``wide'' character (unicode) may be specified using the form
'\x{y...y}' where y...y
are hexadecimal digits. For example,
'\x{12F}' % Latin Small Letter I With Ogonek;
'\x{1D7BC}' % Mathematical Sans-Serif Bold Italic Small Sigma
Any integer may be preceded by a minus sign to indicate that it is a negative integer.
Single and double precision floating point literals must contain either a decimal point or an exponent (or both). Here are examples of specifying the same double precision point number:
12. 12.0 12e0 1.2e1 120e-1 .12e2 0.12e2
Note that 12 is not a floating point number since it
contains neither a decimal point nor an exponent. In fact,
12 is an integer.
One may append the f
character to the end of the number to
indicate that the number is a single precision literal. The
following are all single precision values:
12.f 12.0f 12e0f 1.2e1f 120e-1f .12e2f 0.12e2f
The language implements complex numbers as a pair of double precision floating point numbers. The first number in the pair forms the real part, while the second number forms the imaginary part. That is, a complex number may be regarded as the sum of a real number and an imaginary number.
Strictly speaking, the current implementation of the S-Lang does not support generic complex literals. However, it does support imaginary literals permitting a more generic complex number with a non-zero real part to be constructed from the imaginary literal via addition of a real number.
An imaginary literal is specified in the same way as a floating
point literal except that i
or j
is appended. For
example,
12i 12.0i 12e0j
all represent the same imaginary number.
A more generic complex number may be constructed from an imaginary literal via addition, e.g.,
3.0 + 4.0i
produces a complex number whose real part is 3.0
and whose
imaginary part is 4.0
.
The intrinsic functions Real
and Imag
may be used to
retrieve the real and imaginary parts of a complex number,
respectively.
A string literal must be enclosed in double quotes as in:
"This is a string".
As described below, the string literal may contain a suffix that
specifies how the string is to be interpreted, e.g., a string
literal such as
"$HOME/.jedrc"$
with the '$' suffix will be subject to variable name expansion.
Although there is no imposed limit on the length of a string, single-line string literals must be less than 256 characters in length. It is possible to construct strings longer than this by string concatenation, e.g.,
"This is the first part of a long string"
+ " and this is the second part"
S-Lang version 2.2 introduced support for multi-line string literals. There are basic variants supported. The first makes use of the backslash at the end of a line to indicate that the string is continued onto the next line:
"This is a \
multi-line string. \
Note the presence of the \
backslash character at the end \
of each of the lines."
The second form of multiline string is delimited by the backquote
character (`) and does not require backslashes:
`This form does not
require backslash characters.
In fact, here the backslash
character \ has no special
meaning (unless given the ``Q' suffix`
Note that if a backquote is to appear in such a string, then it
must be doubled, as illustrated in the above example.
Any character except a newline (ASCII 10) or the null character (ASCII 0) may appear explicitly in a string literal. However, these characters may embedded implicitly using the mechanism described below.
The backslash character is a special character and is used to include other special characters (such as a newline character) in the string. The special characters recognized are:
\" -- double quote
\' -- single quote
\\ -- backslash
\a -- bell character (ASCII 7)
\t -- tab character (ASCII 9)
\n -- newline character (ASCII 10)
\e -- escape character (ASCII 27)
\xhh -- byte expressed in HEXADECIMAL notation
\ooo -- byte expressed in OCTAL notation
\dnnn -- byte expressed in DECIMAL
\u{h..h} -- the Unicode character U+h..h
\x{h..h} -- the Unicode character U+h..h [modal]
In the above table, h
represents one of the HEXADECIMAL
characters from the set [0-9A-Fa-f]. It is important to
understand the distinction between the \x{h..h}
and
\u{h..h}
forms. When used in a string, the \u
form always expands to the corresponding UTF-8 sequence regardless
of the UTF-8 mode. In contrast, when in non-UTF-8 mode, the
\x
form expands to a byte when given two hex characters,
or to the corresponding UTF-8 sequence when used with three or
more hex characters.
For example, to include the double quote character as part of the string, it must be preceded by a backslash character, e.g.,
"This is a \"quote\"."
Similarly, the next example illustrates how a newline character
may be included:
"This is the first line\nand this is the second."
Alternatively, slang-2.2 or newer permits
`This is a "quote".`
`This is the first line
and this is the second.`
A string literal may be contain a suffix that specifies how the string is to be interpreted. The suffix may consist of one or more of the following characters:
Backslash substitution will not be performed on the string. This is the default when using back-quoted strings.
Backslash substitution will be performed on the string. This is the default when using strings using the double-quote character.
If this suffix is present, the string will be interpreted as a binary string (BString_Type).
Variable name substitution will be performed on the string.
Not all combinations of the above controls characters are
supported, nor make sense. For example, a string with the suffix
QR
will cause a parse-error because Q
and R
have opposing meanings.
These suffixes turn on and off backslash expansion. Unless the
R
suffix is present, all double-quoted string literals will
have backslash substitution performed. By default, backslash
expansion is turned off for backquoted strings.
Sometimes it is desirable to turn off backslash expansion for
double-quoted strings. For example, pathnames on an MSDOS or
Windows system use the backslash character as a path separator. The
R
prefix turns off backslash expansion, and as a result the
following statements are equivalent:
file = "C:\\windows\\apps\\slrn.rc";
file = "C:\\windows\\apps\\slrn.rc"Q;
file = "C:\windows\apps\slrn.rc"R;
file = `C:\windows\apps\slrn.rc`; % slang-2.2 and above
The only exception is that a backslash character is not permitted
as the last character of a string with the R
suffix. That is,
string = "This is illegal\"R;
is not permitted. Without this exception, a string such as
string = "Some characters: \"R, S, T\"";
would not be parsed properly.
If the string contains the $
suffix, then variable name
expansion will be performed upon names prefixed by a $
character occurring within the string, e.g.,
"The value of X is $X and the value of Y is $Y"$.
with variable name substitution to be performed on the
names X
and Y
. Such strings may be used as a
convenient alternative to the sprintf
function.
Name expansion is carried out according to the following rules: If the string literal occurs in a function, and the name corresponds to a variable local to the function, then the string representation of the value of that variable will be substituted. Otherwise, if the name corresponds to a variable that is local to the compilation unit (i.e., is declared as static or private), then its value's string representation will be used. Otherwise, if the name corresponds to a variable that exists as a global (public) then its value's string representation will be substituted. If the above searches fail and the name exists in the environment, then the value of the corresponding environment variable will be used. Otherwise, the variable will expand to the empty string.
Consider the following example:
private variable bar = "two";
putenv ("MYHOME=/home/baz");
define funct (foo)
{
variable bar = 1;
message ("file: $MYHOME/foo: garage=$MYGARAGE,bar=$bar"$);
}
When executed, this will produce the message:
file: /home/baz/foo: garage=,bar=1
assuming that MYGARAGE
is not defined anywhere.
A name may be enclosed in braces. For example,
"${MYHOME}/foo: bar=${bar}"$
This is useful in cases when the name is followed immediately by
other characters that may be interpreted as part of the name, e.g.,
variable HELLO="Hello ";
message ("${HELLO}World"$);
will produce the message "Hello World".
Objects of type Null_Type
can have only one value:
NULL
. About the only thing that you can do with this data
type is to assign it to variables and test for equality with
other objects. Nevertheless, Null_Type
is an important and
extremely useful data type. Its main use stems from the fact that
since it can be compared for equality with any other data type, it
is ideal to represent the value of an object which does not yet
have a value, or has an illegal value.
As a trivial example of its use, consider
define add_numbers (a, b)
{
if (a == NULL) a = 0;
if (b == NULL) b = 0;
return a + b;
}
variable c = add_numbers (1, 2);
variable d = add_numbers (1, NULL);
variable e = add_numbers (1,);
variable f = add_numbers (,);
It should be clear that after these statements have been executed,
c
will have a value of 3. It should also be clear
that d
will have a value of 1 because NULL
has
been passed as the second parameter. One feature of the language
is that if a parameter has been omitted from a function call, the
variable associated with that parameter will be set to NULL
.
Hence, e
and f
will be set to 1 and 0,
respectively.
The Null_Type
data type also plays an important role in the
context of structures.
Objects of Ref_Type
are created using the unary
reference operator &
. Such objects may be
dereferenced using the dereference operator @
. For
example,
sin_ref = &sin;
y = (@sin_ref) (1.0);
creates a reference to the sin
function and assigns it to
sin_ref
. The second statement uses the dereference operator
to call the function that sin_ref
references.
The Ref_Type
is useful for passing functions as arguments to
other functions, or for returning information from a function via
its parameter list. The dereference operator may also used to create
an instance of a structure. For these reasons, further discussion
of this important type can be found in the section on
Referencing Variables.
Variables of type
Array_Type
,
Assoc_Type
,
List_Type
, and
Struct_Type
are known as
container objects. They are more complicated than the
simple data types discussed so far and each obeys a special syntax.
For these reasons they are discussed in a separate chapters.
S-Lang defines a type called DataType_Type
. Objects of
this type have values that are type names. For example, an integer
is an object of type Integer_Type
. The literals of
DataType_Type
include:
Char_Type (signed character)
UChar_Type (unsigned character)
Short_Type (short integer)
UShort_Type (unsigned short integer)
Integer_Type (plain integer)
UInteger_Type (plain unsigned integer)
Long_Type (long integer)
ULong_Type (unsigned long integer)
LLong_Type (long long integer)
ULLong_Type (unsigned long long integer)
Float_Type (single precision real)
Double_Type (double precision real)
Complex_Type (complex numbers)
String_Type (strings, C strings)
BString_Type (binary strings)
Struct_Type (structures)
Ref_Type (references)
Null_Type (NULL)
Array_Type (arrays)
Assoc_Type (associative arrays/hashes)
List_Type (lists)
DataType_Type (data types)
as well as the names of any other types that an application
defines.
The built-in function typeof
returns the data type of
its argument, i.e., a DataType_Type
. For instance
typeof(7)
returns Integer_Type
and
typeof(Integer_Type)
returns DataType_Type
. One can use this
function as in the following example:
if (Integer_Type == typeof (x)) message ("x is an integer");
The literals of DataType_Type
have other uses as well. One
of the most common uses of these literals is to create arrays, e.g.,
x = Complex_Type [100];
creates an array of 100 complex numbers and assigns it to
x
.
Strictly speaking, S-Lang has no separate boolean type; rather it
represents boolean values as Char_Type
objects. In
particular, boolean FALSE is equivalent to Char_Type
0,
and TRUE as any non-zero Char_Type
value. Since the
exact value of TRUE is unspecified, it is unnecessary and even
pointless to define TRUE and FALSE literals in S-Lang.
Occasionally, it is necessary to convert from one data type to
another. For example, if you need to print an object as a string,
it may be necessary to convert it to a String_Type
. The
typecast
function may be used to perform such conversions.
For example, consider
variable x = 10, y;
y = typecast (x, Double_Type);
After execution of these statements, x
will have the integer
value 10 and y
will have the double precision floating
point value 10.0
. If the object to be converted is an
array, the typecast
function will act upon all elements of
the array. For example,
x = [1:10]; % Array of integers
y = typecast (x, Double_Type);
will create an array of 10 double precision values and assign it to
y
. One should also realize that it is not always possible
to perform a typecast. For example, any attempt to convert an
Integer_Type
to a Null_Type
will result in a
run-time error. Typecasting works only when datatypes are similar.
Often the interpreter will perform implicit type conversions as necessary
to complete calculations. For example, when multiplying an
Integer_Type
with a Double_Type
, it will convert the
Integer_Type
to a Double_Type
for the purpose of the
calculation. Thus, the example involving the conversion of an
array of integers to an array of doubles could have been performed
by multiplication by 1.0
, i.e.,
x = [1:10]; % Array of integers
y = 1.0 * x;
The string
intrinsic function should be used whenever a
string representation is needed. Using the typecast
function
for this purpose will usually fail unless the object to be
converted is similar to a string--- most are not. Moreover, when
typecasting an array to String_Type
, the typecast
function acts on each element of the array to produce another
array, whereas the string
function will produce a string.
One use of string
function is to print the value of an
object. This use is illustrated in the following simple example:
define print_object (x)
{
message (string (x));
}
Here, the message
function has been used because it writes a
string to the display. If the string
function was not used
and the message
function was passed an integer, a
type-mismatch error would have resulted.