Erm... the problem with this code is that it is specific to UTF-8
only... but there are other multibyte encodings which are still in
common use (like ja_JP.PCK/ShiftJIS on Solaris/Linux, both are used
and mandated by goverment customers) and GB18030 (which is a) "modern"
(e.g. not "legacy" like some people call the EUC encodings) and b)
mandatory in PRC[1] (China)).

AFAIK a possible fix would be to pass the data through the libc
multibyte functions and filter anything out which looks like the ASCII
control characters (since more or less all multibyte characters have
ASCII as basis) + anything which matches |iswcntrl()|

[1]=Erm... does anyone know how "Red Flag Linux" solved this ?



