slang-users mailing list

[2016 Date Index] [2016 Thread Index] [Other years]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]

[slang-users] PCRE and word boundary matching with non-US-ascii chars

Subject: [slang-users] PCRE and word boundary matching with non-US-ascii chars
From: Morten Bo Johansen <mbj@xxxxxxxxxxx>
Date: Sun, 3 Jan 2016 14:25:43 +0100

Hi,

I have this pcre search and replace function as shown in the
little test script below. But it fails as the 'ä' is not
replaced. With US-ascii characters the word boundary matching
works. I have tried to give the options PCRE_UTF8 and PCRE_UCP
(for the latter I patched the pcre module and compiled it with
this option enabled) but it makes no difference. GNU sed has no
problem and it is compiled against the same version of libpcre3
as the slang pcre module.

Is there something wrong with my pcre_replace function or am I
overlooking something else?

Thanks,
Morten

                     ----------------------

  import ("pcre");
  
  private define pcre_replace (str, pat, rep)
  {
     variable begstr = "", endstr = "", match, pos = 0;
     
     % pat = pcre_compile (pat, PCRE_UCP; PCRE_UTF8);
     pat = pcre_compile (pat);
  
     while (pcre_exec (pat, str, pos))
       {
          match = pcre_nth_match (pat, 0);
          begstr = str[[:match[0]-1]];
          endstr = str[[match[1]:]];
          str = strcat (begstr, rep, endstr);
          pos = match[0] + strlen (rep);
       }
     
     return str;
  }
  
  define slsh_main ()
  {
     variable str = pcre_replace ("a ä a", "\\bä\\b", "a");
     () = printf ("%s\n", str);
  } 
_______________________________________________
For list information, visit <http://jedsoft.org/slang/mailinglists.html>.

Follow-Ups:
- Re: [slang-users] PCRE and word boundary matching with non-US-ascii chars
  - From: John E. Davis

[2016 date index] [2016 thread index]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]