S-Lang provides built-in support for two different I/O facilities.
The simplest interface is modeled upon the C language stdio
interface and consists of functions such as fopen
,
fgets
, etc. The other interface is modeled on a lower level
POSIX interface consisting of functions such as open
,
read
, etc. In addition to permitting more control, the lower
level interface permits one to access network objects as well as disk
files.
For reading data formatted in text files, e.g., columns of numbers,
then do not overlook the high-level routines in the slsh library. In
particular, the readascii
function is quite flexible and can
read data from text files that are formatted in a variety of ways.
For data stored in a standard binary format such as HDF or FITS, then
the corresponding modules should be used.
The stdio interface consists of the following functions:
fopen
: opens a file for reading or writing.
fclose
: closes a file opened by fopen
.
fgets
: reads a line from a file.
fputs
: writes text to a file.
fprintf
: writes formatted text to a file.
fwrite
: writes one of more objects to a file.
fread
: reads a specified number of objects from
a file.
fread_bytes
: reads a specified number of bytes from a
file and returns them as a string.
feof
: tests if a file pointer is at the
end of the file.
ferror
: tests whether or not the stream
associated with a file has an error.
clearerr
: clears the end-of-file and error
indicators for a stream.
fflush
, forces all buffered data associated with
a stream to be written out.
ftell
: queries the file position indicator
a the stream.
fseek
: sets the position of a file
position indicator of the stream.
fgetslines
: reads all the lines from a text file and
returns them as an array of strings.
In addition, the interface supports the popen
and pclose
functions on systems where the corresponding C functions are available.
Before reading or writing to a file, it must first be opened using
the fopen
function. The only exceptions to this rule involve
use of the pre-opened streams: stdin
, stdout
, and
stderr
. fopen
accepts two arguments: a file name and a
string argument that indicates how the file is to be opened, e.g.,
for reading, writing, update, etc. It returns a File_Type
stream object that is used as an argument to all other functions of
the stdio interface. Upon failure, it returns NULL
. See the
reference manual for more information about fopen
.
In this section, some simple examples of the use of the stdio interface is presented. It is important to realize that all the functions of the interface return something, and that return value must be handled in some way by the caller.
The first example involves writing a function to count the number of lines in a text file. To do this, we shall read in the lines, one by one, and count them:
define count_lines_in_file (file)
{
variable fp, line, count;
fp = fopen (file, "r"); % Open the file for reading
if (fp == NULL)
throw OpenError, "$file failed to open"$;
count = 0;
while (-1 != fgets (&line, fp))
count++;
() = fclose (fp);
return count;
}
Note that &line
was passed to the fgets
function. When
fgets
returns, line
will contain the line of text read in
from the file. Also note how the return value from fclose
was
handled (discarded in this case).
Although the preceding example closed the file via fclose
,
there is no need to explicitly close a file because the interpreter will
automatically close a file when it is no longer referenced. Since
the only variable to reference the file is fp
, it would have
automatically been closed when the function returned.
Suppose that it is desired to count the number of characters in the
file instead of the number of lines. To do this, the while
loop could be modified to count the characters as follows:
while (-1 != fgets (&line, fp))
count += strlen (line);
The main difficulty with this approach is that it will not work for
binary files, i.e., files that contain null characters. For such
files, the file should be opened in binary mode via
fp = fopen (file, "rb");
and then the data read using the fread
function:
while (-1 != fread (&line, Char_Type, 1024, fp))
count += length (line);
The fread
function requires two additional arguments: the type
of object to read (Char_Type
in the case), and the number of
such objects to be read. The function returns the number of objects
actually read in the form of an array of the specified type, or -1
upon failure.
Sometimes it is more convenient to obtain the data from a file in the form
of a character string instead of an array of characters. The
fread_bytes
function may be used in such situations. Using
this function, the equivalent of the above loop is
while (-1 != fread_bytes (&line, 1024, fp))
count += bstrlen (line);
The foreach
construct also works with File_Type
objects.
For example, the number of characters in a file may be counted via
foreach ch (fp) using ("char")
count++;
Similarly, one can count the number of lines using:
foreach line (fp) using ("line")
{
num_lines++;
count += strlen (line);
}
Often one is not interested in trailing whitespace in the lines of a
file. To have trailing whitespace automatically stripped from the
lines as they are read in, use the "wsline"
form, e.g.,
foreach line (fp) using ("wsline")
{
.
.
}
Finally, it should be mentioned that none of these examples should
be used to count the number of bytes in a file when that
information is more readily accessible by another means. For
example, it is preferable to get this information via the
stat_file
function:
define count_chars_in_file (file)
{
variable st;
st = stat_file (file);
if (st == NULL)
throw IOError, "stat_file failed";
return st.st_size;
}
The previous examples illustrate how to read and write objects of a single data-type from a file, e.g.,
num = fread (&a, Double_Type, 20, fp);
would result in a Double_Type[num]
array being assigned to
a
if successful. However, suppose that the binary data file
consists of numbers in a specified byte-order. How can one read
such objects with the proper byte swapping? The answer is to use
the fread_bytes
function to read the objects as a (binary)
character string and then unpack the resulting string into the
specified data type, or types. This process is facilitated using
the pack
and unpack
functions.
The pack
function follows the syntax
BString_Type pack (format-string, item-list);
and combines the objects in the item-list according to
format-string into a binary string and returns the result.
Likewise, the unpack
function may be used to convert a binary
string into separate data objects:
(variable-list) = unpack (format-string, binary-string);
The format string consists of one or more data-type specification characters, and each may be followed by an optional decimal length specifier. Specifically, the data-types are specified according to the following table:
c char
C unsigned char
h short
H unsigned short
i int
I unsigned int
l long
L unsigned long
j 16 bit int
J 16 unsigned int
k 32 bit int
K 32 bit unsigned int
f float
d double
F 32 bit float
D 64 bit float
s character string, null padded
S character string, space padded
z character string, null padded
x a null pad character
A decimal length specifier may follow the data-type specifier. With
the exception of the s
and S
specifiers, the length
specifier indicates how many objects of that data type are to be
packed or unpacked from the string. When used with the s
or
S
specifiers, it indicates the field width to be used. If the
length specifier is not present, the length defaults to one.
With the exception of c
, C
, s
, S
, z
, and
x
, each of these may be prefixed by a character that indicates
the byte-order of the object:
> big-endian order (network order)
< little-endian order
= native byte-order
The default is to use the native byte order.
Here are a few examples that should make this more clear:
a = pack ("cc", 'A', 'B'); % ==> a = "AB";
a = pack ("c2", 'A', 'B'); % ==> a = "AB";
a = pack ("xxcxxc", 'A', 'B'); % ==> a = "\0\0A\0\0B";
a = pack ("h2", 'A', 'B'); % ==> a = "\0A\0B" or "\0B\0A"
a = pack (">h2", 'A', 'B'); % ==> a = "\0\xA\0\xB"
a = pack ("<h2", 'A', 'B'); % ==> a = "\0B\0A"
a = pack ("s4", "AB", "CD"); % ==> a = "AB\0\0"
a = pack ("s4s2", "AB", "CD"); % ==> a = "AB\0\0CD"
a = pack ("S4", "AB", "CD"); % ==> a = "AB "
a = pack ("S4S2", "AB", "CD"); % ==> a = "AB CD"
When unpacking, if the length specifier is greater than one, then an
array of that length will be returned. In addition, trailing
whitespace and null characters are stripped when unpacking an object
given by the S
specifier. Here are a few examples:
(x,y) = unpack ("cc", "AB"); % ==> x = 'A', y = 'B'
x = unpack ("c2", "AB"); % ==> x = ['A', 'B']
x = unpack ("x<H", "\0\xAB\xCD"); % ==> x = 0xCDABuh
x = unpack ("xxs4", "a b c\0d e f"); % ==> x = "b c\0"
x = unpack ("xxS4", "a b c\0d e f"); % ==> x = "b c"
Consider the task of reading the Unix system file
/var/log/utmp
, which contains login records about who logged
onto the system. This file format is documented in section 5 of the
online Unix man pages, and consists of a sequence of entries
formatted according to the C structure utmp
defined in the
utmp.h
C header file. The actual details of the structure
may vary from one version of Unix to the other. For the purposes of
this example, consider its definition under the Linux operating
system running on an Intel 32 bit processor:
struct utmp {
short ut_type; /* type of login */
pid_t ut_pid; /* pid of process */
char ut_line[12]; /* device name of tty - "/dev/" */
char ut_id[2]; /* init id or abbrev. ttyname */
time_t ut_time; /* login time */
char ut_user[8]; /* user name */
char ut_host[16]; /* host name for remote login */
long ut_addr; /* IP addr of remote host */
};
On this system, pid_t
is defined to be an int
and
time_t
is a long
. Hence, a format specifier for the
pack
and unpack
functions is easily constructed to be:
"h i S12 S2 l S8 S16 l"
However, this particular definition is naive because it does not
allow for structure padding performed by the C compiler in order to
align the data types on suitable word boundaries. Fortunately, the
intrinsic function pad_pack_format
may be used to modify a
format by adding the correct amount of padding in the right places.
In fact, pad_pack_format
applied to the above format on an
Intel-based Linux system produces the result:
"h x2 i S12 S2 x2 l S8 S16 l"
Here we see that 4 bytes of padding were added.
The other missing piece of information is the size of the structure.
This is useful because we would like to read in one structure at a
time using the fread
function. Knowing the size of the
various data types makes this easy; however it is even easier to use
the sizeof_pack
intrinsic function, which returns the size (in
bytes) of the structure described by the pack format.
So, with all the pieces in place, it is rather straightforward to write the code:
variable format, size, fp, buf;
typedef struct
{
ut_type, ut_pid, ut_line, ut_id,
ut_time, ut_user, ut_host, ut_addr
} UTMP_Type;
format = pad_pack_format ("h i S12 S2 l S8 S16 l");
size = sizeof_pack (format);
define print_utmp (u)
{
() = fprintf (stdout, "%-16s %-12s %-16s %s\n",
u.ut_user, u.ut_line, u.ut_host, ctime (u.ut_time));
}
fp = fopen ("/var/log/utmp", "rb");
if (fp == NULL)
throw OpenError, "Unable to open utmp file";
() = fprintf (stdout, "%-16s %-12s %-16s %s\n",
"USER", "TTY", "FROM", "LOGIN@");
variable U = @UTMP_Type;
while (-1 != fread (&buf, Char_Type, size, fp))
{
set_struct_fields (U, unpack (format, buf));
print_utmp (U);
}
() = fclose (fp);
A few comments about this example are in order. First of all, note
that a new data type called UTMP_Type
was created, although
this was not really necessary. The file was opened in binary mode,
but this too was optional because, for example, on a Unix system
there is no distinction between binary and text modes. The
print_utmp
function does not print all of the structure
fields. Finally, last but not least, the return values from
fprintf
and fclose
were handled by discarding them.