- Subject: [jed-users] Re: Concatenating arrays
- From: Morten Bo Johansen <mbj@xxxxxxxxx>
- Date: Mon, 15 Jun 2020 11:44:35 +0200
John E. Davis <jed@xxxxxxxxxxx> wrote:
> Morten Bo Johansen <mbj@xxxxxxxxx> wrote:
> > Concatenating arrays can be quite slow. In casu, I want to
> > concatenate arrays with file contents obtained with fgetslines
> > () - which returns the file as an array of lines. I wonder if
> > converting these arrays to lists and then concatenate them
> > would be more efficient?
> >
> > But if so, I am looking for an "array_to_list" function. There is
> > one such function in datutils.sl on jedmodes, but it fails with
> > a buffer overflow error.
>
> My feeling is that array_to_list would result in a performace hit that
> would defeat the purpose of using it.
>
> > To describe my problem in detail, then I want to hash a
> > spellchecking wordlist of ~500.000 words, each on a line by
> > themselves, to complete from but also include other files to
> > complete from, e.g. in markup modes. Obtaining the array (a) of
> > the 500.000 lines with fgetslines and hashing them takes about
> > 1.5 seconds, but if I add just a tiny array (b) to (a) like in
> > c = [a, b], then the execution time more than doubles.
>
> How do you intend to use the result? For example, could your code use
> a list of arrays? That is, instead of c = [a, b], could you code use
> c = {a, b}, which is a list of 2 arrays?
Hi all,
John provided some assistance to me with the above problem,
off-list. So to conclude this thread properly - dangling
threads are not good for list morale ;) - I will give you
a run-down of our discussion.
To recapitulate, my goal was to concatenate arrays of lines
from different files and hash the concatenated array for use
with a word completion function.
As it turned out the concatenation of arrays - c = (a, b, ..) -
was only very partially the problem in the slow execution time,
even if concatenating or joining lists is still much faster.
By using the slang profiler, slprof, which is part of slsh,
many hotspots in my code could be pinpointed and improved for
better speed. Using tic/toc does not provide the same accuracy
as slprof, as it does not compensate for additional overhead.
I attach here two test files, "test0.sl" and "test1.sl". The
former is my original code and the latter is my code improved
by John following the analysis from slprof's output. I also
attach the outputs, "prof-test0.dat" and "prof-test1.dat"
from the application of slprof on the two test files.
You can run slprof like this:
slprof -l -o <out.dat> <infile.sl>
The input files to the "make_completions_hash()" function was a
large text file of 471.000 lines and a couple of smaller files.
From the data of the two profiles, one can readily see that in
my original code I ran the "strchop2()" function ~471.000
times, one for each line in the input files even if it was only
needed 101 times. That shaved off 2 seconds.
The "read_file_lines()" function in test1.sl was an alternative
to "fgetslines()" that stripped the "\n" characters so I could
avoid running strtrim() on all the lines, which further reduced
the execution time.
In a line like this
lines = lines [where (0 != strncmp ("#", lines, 1))];
I always created a new array, even if some files did not
contain any comments ("#").
As you can see the execution time was brought down quite
remarkably - from 6.5324 secs to 0.5960 secs.
You might of course attribute the fact that the code could be
improved so much to my rather modest skill level in slang
programming, so the real object of this post probably lies more
in drawing attention to the slang profiler. The profiler is not
yet incorporated into jed, but the test files show some
examples of using it with jed specific code. I hope that
someone might find it useful.
Morten
% Use a string delimiter instead of a character delimiter to chop up a
% string - from a JED post on jed-users
private define strchop2 (str, delim)
{
variable list = {};
variable len = strbytelen (delim);
variable i0 = 1, i;
while (i = is_substrbytes (str, delim, i0), i != 0)
{
list_append (list, substrbytes (str, i0, i-i0));
i0 = i + len;
}
list_append (list, substrbytes (str, i0, -1));
return list_to_array (list);
}
#ifexists SLSH_PATH_ENV
private define flush (x)
{
message (x);
}
#endif
private variable F = NULL;
private variable Words;
%% Create the hash of the completions file to complete from
private define make_completions_hash (Completions_File)
{
#ifnexists SLSH_PATH_ENV
% In slang_mode, if no completions file, then generate one
if ((0 == file_status (Completions_File)) && ("slang" == detect_mode ()))
{
variable funs = _apropos ("Global", ".", 15);
Words = funs[array_sort (funs)];
() = write_string_to_file (strjoin (Words, "\n"), Completions_File);
}
#endif
variable
fp = fopen (Completions_File, "r"),
include_files = [""],
line_array = [""],
inc_lines = [""],
val_array = [""],
line_arr = [""],
lines = [""],
include_file = "",
all_lines = {},
line_arrays,
line = "",
key = "";
if (fp == NULL)
throw OpenError, "could not open completions file, $Completions_File"$;
if (length (F)) % purge a possibly existing hash
F = Assoc_Type[Array_Type];
% tic ();
flush ("hashing $Completions_File ..."$);
lines = fgetslines (fp);
() = fclose (fp);
include_files = lines [where (0 == strncmp ("#INC", lines, 4))]; % include other files
if (length (include_files))
{
foreach include_file (include_files)
{
include_file = strtok (include_file)[1];
include_file = strtrim (include_file, "\"");
fp = fopen (include_file, "r");
if (fp == NULL)
{
flush ("Could not open \"$include_file\""$);
sleep (2);
continue;
}
inc_lines = fgetslines (fp);
lines = [lines, inc_lines];
() = fclose (fp);
}
}
lines = strtrim (lines);
lines = lines [where (0 != strncmp ("#", lines, 1))];
if (any (is_substr (lines, "::")))
{
line_arrays = array_map (Array_Type, &strchop2, lines, "::");
foreach line_array (line_arrays)
{
line_array = strtrim (line_array);
key = line_array[0];
val_array = line_array[[1:]];
F[key] = val_array;
}
Words = assoc_get_keys (F);
}
else
Words = lines;
#ifnexists SLSH_PATH_ENV
define_blocal_var ("Words", Words);
#endif
flush ("completing from $Completions_File ..."$);
% vmessage ("%f", toc ());
}
define slsh_main ()
{
make_completions_hash (".completions_text");
% Note: Words contains all of the words defined in the input files
% F contains only those words with non-empty values.
% But, F has been defined such that F[ANYTHING] will produce
% an array.
vmessage ("Length of Words: %u, length of F: %u",
length(Words), length(F));
}
% Use a string delimiter instead of a character delimiter to chop up a
% string - from a JED post on jed-users
private define strchop2 (str, delim)
{
variable list = {};
variable len = strbytelen (delim);
variable i0 = 1, i;
while (i = is_substrbytes (str, delim, i0), i != 0)
{
list_append (list, substrbytes (str, i0, i-i0));
i0 = i + len;
}
list_append (list, substrbytes (str, i0, -1));
return list_to_array (list);
}
#ifexists SLSH_PATH_ENV
private define flush (x)
{
message (x);
}
#endif
private define read_file_lines (file)
{
variable st = stat_file (file);
if (st == NULL) return NULL;
variable bytes;
() = fread_bytes (&bytes, st.st_size, fopen (file, "r"));
return strtok (bytes, "\n"); % use strtok remove sequences of \n chars
}
private variable F = NULL;
private variable Words;
%% Create the hash of the completions file to complete from
private define make_completions_hash (Completions_File)
{
#ifnexists SLSH_PATH_ENV
% In slang_mode, if no completions file, then generate one
if ((0 == file_status (Completions_File)) && ("slang" == detect_mode ()))
{
variable funs = _apropos ("Global", ".", 15);
Words = funs[array_sort (funs)];
() = write_string_to_file (strjoin (Words, "\n"), Completions_File);
}
#endif
variable
include_files,
inc_lines,
lines,
include_file,
i, j;
if ((F == NULL) || length (F)) % purge a possibly existing hash
F = Assoc_Type[Array_Type, String_Type[0]];
% tic ();
flush ("hashing $Completions_File ..."$);
lines = read_file_lines (Completions_File);
if (lines == NULL)
throw OpenError, "could not open completions file, $Completions_File"$;
include_files = lines [where (0 == strncmp ("#INC", lines, 4), &i)]; % include other files
lines = lines[i]; % don't include #INC lines
variable lines_list = {lines};
foreach include_file (include_files)
{
include_file = strtok (include_file)[1];
include_file = strtrim (include_file, "\"");
lines = read_file_lines (include_file);
if (lines == NULL)
{
flush ("Could not open \"$include_file\""$);
sleep (2);
continue;
}
i = where (strncmp (lines, "#", 1));
if (length(i) != length(lines)) lines = lines [i];
list_append (lines_list, lines);
}
lines = [__push_list(lines_list)];
% lines = strtrim (lines);
foreach i (where (is_substr (lines, "::"), &j))
{
variable line_array = strtrim (strchop2 (lines[i], "::"));
variable key = line_array[0];
F[key] = line_array[[1:]];
lines[i] = key;
}
Words = lines;
#ifnexists SLSH_PATH_ENV
define_blocal_var ("Words", Words);
#endif
flush ("completing from $Completions_File ..."$);
% vmessage ("%f", toc ());
}
define slsh_main ()
{
tic();
make_completions_hash (".completions_text");
% Note: Words contains all of the words defined in the input files
% F contains only those words with non-empty values.
% But, F has been defined such that F[ANYTHING] will produce
% an array.
vmessage ("T=%S, Length of Words: %u, length of F: %u", toc(),
length(Words), length(F));
% print(Words[array_sort(Words)], "/tmp/4.dat");
}
# slprof profile report for:
# /home/mojo/bin/slprof -l -o prof-test0.dat ./test0.sl
#----------------------------------------------------------------
# Function Call Profile Report
#----------------------------------------------------------------
#function ncalls ms/call totalselfms totalsecs Function File
make_completions_hash 1 6531.9493 4501.3176 6.5319 471898 1887625 /home/mojo/./test0.sl
strchop2 471896 0.0043 2030.6072 2.0306 0 2831979 /home/mojo/./test0.sl
slsh_main 1 6532.4024 0.4531 6.5324 1 2 /home/mojo/./test0.sl
flush 2 0.0123 0.0246 0.0000 0 2 /home/mojo/./test0.sl
#----------------------------------------------------------------
# Line by Line Profile Report
#----------------------------------------------------------------
#ncalls ms/call totalselfms totalsecs Fcalls Scalls File:line
1 631.06179 631.06179 0.63106 0 0 /home/mojo/./test0.sl:89
2 268.62279 537.24558 0.53725 0 0 /home/mojo/./test0.sl:84
471896 0.00099 466.88256 0.46688 0 0 /home/mojo/./test0.sl:97
1 391.84379 391.84379 0.39184 0 0 /home/mojo/./test0.sl:90
471896 0.00080 379.15456 0.37915 0 0 /home/mojo/./test0.sl:99
472097 0.00079 372.81917 0.37282 0 0 /home/mojo/./test0.sl:9
471896 0.00078 369.86056 0.36986 0 0 /home/mojo/./test0.sl:15
1 2386.15895 355.55177 2.38616 471896 2831979 /home/mojo/./test0.sl:94
2 172.14379 344.28758 0.34429 0 0 /home/mojo/./test0.sl:83
471896 0.00052 244.31156 0.24431 0 0 /home/mojo/./test0.sl:16
471896 0.00047 219.77956 0.21978 0 0 /home/mojo/./test0.sl:100
1 142.46779 142.46779 0.14247 0 0 /home/mojo/./test0.sl:103
471896 0.00019 88.75056 0.08875 0 0 /home/mojo/./test0.sl:98
471896 0.00015 70.91356 0.07091 0 0 /home/mojo/./test0.sl:6
471896 0.00012 56.07456 0.05607 0 0 /home/mojo/./test0.sl:5
471896 0.00005 22.29756 0.02230 0 0 /home/mojo/./test0.sl:7
1 5.83379 5.83379 0.00583 0 0 /home/mojo/./test0.sl:92
1 0.44979 0.44979 0.00045 0 0 /home/mojo/./test0.sl:124
201 0.00080 0.16161 0.00016 0 0 /home/mojo/./test0.sl:11
2 0.01229 0.02458 0.00002 0 0 /home/mojo/./test0.sl:22
1 0.02379 0.02379 0.00002 0 0 /home/mojo/./test0.sl:64
201 0.00011 0.02161 0.00002 0 0 /home/mojo/./test0.sl:12
2 0.00779 0.01558 0.00002 0 0 /home/mojo/./test0.sl:74
2 0.00729 0.01458 0.00001 0 0 /home/mojo/./test0.sl:85
1 0.03007 0.01429 0.00003 1 1 /home/mojo/./test0.sl:112
2 0.00479 0.00958 0.00001 0 0 /home/mojo/./test0.sl:72
1 0.00879 0.00879 0.00001 0 0 /home/mojo/./test0.sl:66
1 0.00679 0.00679 0.00001 0 0 /home/mojo/./test0.sl:60
1 0.00579 0.00579 0.00001 0 0 /home/mojo/./test0.sl:43
2 0.00229 0.00458 0.00000 0 0 /home/mojo/./test0.sl:73
1 0.01307 0.00429 0.00001 1 1 /home/mojo/./test0.sl:63
1 0.00279 0.00279 0.00000 0 0 /home/mojo/./test0.sl:65
1 0.00179 0.00179 0.00000 0 0 /home/mojo/./test0.sl:44
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:95
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:51
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:56
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:54
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:68
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:49
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:45
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:46
1 0.00079 0.00079 0.00000 0 0 /home/mojo/./test0.sl:26
2 0.00029 0.00058 0.00000 0 0 /home/mojo/./test0.sl:76
1 6531.94961 0.00029 6.53195 471899 4719606 /home/mojo/./test0.sl:118
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:70
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:50
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:47
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:48
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:53
1 -0.00021 -0.00021 -0.00000 0 0 /home/mojo/./test0.sl:59
# slprof profile report for:
# /home/mojo/bin/slprof -l -o prof-test1.dat ./test1.sl
#----------------------------------------------------------------
# Function Call Profile Report
#----------------------------------------------------------------
#function ncalls ms/call totalselfms totalsecs Function File
make_completions_hash 1 596.2284 400.5660 0.5962 106 431 /home/mojo/./test1.sl
read_file_lines 3 64.8555 194.5666 0.1946 0 12 /home/mojo/./test1.sl
strchop2 101 0.0105 1.0613 0.0011 0 1209 /home/mojo/./test1.sl
flush 2 0.0173 0.0346 0.0000 0 2 /home/mojo/./test1.sl
slsh_main 1 595.9904 -0.2380 0.5960 1 3 /home/mojo/./test1.sl
#----------------------------------------------------------------
# Line by Line Profile Report
#----------------------------------------------------------------
#ncalls ms/call totalselfms totalsecs Fcalls Scalls File:line
3 64.25180 192.75539 0.19276 0 0 /home/mojo/./test1.sl:32
1 179.83680 179.83680 0.17984 0 0 /home/mojo/./test1.sl:89
2 14.14130 28.28260 0.02828 0 0 /home/mojo/./test1.sl:83
1 11.18180 11.18180 0.01118 0 0 /home/mojo/./test1.sl:92
3 0.55046 1.65139 0.00165 0 0 /home/mojo/./test1.sl:31
101 0.01338 0.28997 0.00135 101 1209 /home/mojo/./test1.sl:94
302 0.00087 0.26387 0.00026 0 0 /home/mojo/./test1.sl:9
201 0.00088 0.17731 0.00018 0 0 /home/mojo/./test1.sl:11
101 0.00113 0.11456 0.00011 0 0 /home/mojo/./test1.sl:96
101 0.00112 0.11356 0.00011 0 0 /home/mojo/./test1.sl:16
101 0.00101 0.10156 0.00010 0 0 /home/mojo/./test1.sl:15
3 0.01246 0.03739 0.00004 0 0 /home/mojo/./test1.sl:28
101 0.00033 0.03356 0.00003 0 0 /home/mojo/./test1.sl:97
1 0.03180 0.03180 0.00003 0 0 /home/mojo/./test1.sl:119
101 0.00027 0.02756 0.00003 0 0 /home/mojo/./test1.sl:95
101 0.00024 0.02456 0.00002 0 0 /home/mojo/./test1.sl:5
201 0.00012 0.02331 0.00002 0 0 /home/mojo/./test1.sl:12
2 0.00930 0.01860 0.00002 0 0 /home/mojo/./test1.sl:22
1 0.04115 0.01636 0.00004 1 1 /home/mojo/./test1.sl:105
101 0.00014 0.01456 0.00001 0 0 /home/mojo/./test1.sl:6
1 0.00880 0.00880 0.00001 0 0 /home/mojo/./test1.sl:67
1 0.00880 0.00880 0.00001 0 0 /home/mojo/./test1.sl:59
2 0.00380 0.00760 0.00001 0 0 /home/mojo/./test1.sl:73
1 0.01515 0.00536 0.00002 1 1 /home/mojo/./test1.sl:62
2 0.00230 0.00460 0.00000 0 0 /home/mojo/./test1.sl:74
101 0.00005 0.00456 0.00000 0 0 /home/mojo/./test1.sl:7
2 97.25805 0.00371 0.19452 2 8 /home/mojo/./test1.sl:75
2 0.00130 0.00260 0.00000 0 0 /home/mojo/./test1.sl:86
3 0.00080 0.00239 0.00000 0 0 /home/mojo/./test1.sl:29
1 0.00180 0.00180 0.00000 0 0 /home/mojo/./test1.sl:35
2 0.00080 0.00160 0.00000 0 0 /home/mojo/./test1.sl:76
1 0.00080 0.00080 0.00000 0 0 /home/mojo/./test1.sl:100
1 0.00080 0.00080 0.00000 0 0 /home/mojo/./test1.sl:58
1 0.00080 0.00080 0.00000 0 0 /home/mojo/./test1.sl:70
1 0.00080 0.00080 0.00000 0 0 /home/mojo/./test1.sl:64
1 0.00080 0.00080 0.00000 0 0 /home/mojo/./test1.sl:68
2 0.00030 0.00060 0.00000 0 0 /home/mojo/./test1.sl:84
1 596.22877 0.00036 0.59623 107 1654 /home/mojo/./test1.sl:113
1 0.05455 0.00036 0.00005 1 4 /home/mojo/./test1.sl:63
1 -0.00020 -0.00020 -0.00000 0 0 /home/mojo/./test1.sl:71
1 -0.27220 -0.27220 -0.00027 0 0 /home/mojo/./test1.sl:111
[2020 date index]
[2020 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]