[patch] scp + UTF-8
Ingo Schwarze
schwarze at usta.de
Wed Jan 20 10:54:41 AEDT 2016
Hi Darren,
Darren Tucker wrote on Wed, Jan 20, 2016 at 09:39:33AM +1100:
> On Wed, Jan 20, 2016 at 8:48 AM, Ingo Schwarze <schwarze at usta.de> wrote:
>> Martijn sent the following patch to me in private and agreed that i
>> post it here.
>>
>> In any other program in OpenBSD base, i'd probably agree with the
>> basic approach. Regarding OpenSSH, however, i worry whether wcwidth(3)
>> can be used. While wcwidth(3) is POSIX, it is not ISO C. Does
>> OpenSSH target platforms that don't provide wcwidth(3)?
> OpenSSH nominally targets POSIX, but it builds on a wide enough range
> of platforms that it's likely at least some don't have it.
>
> Our general approach is to target POSIX then implement any needed
> missing bits either by stealing the implementation from OpenBSD, some
> other BSD licensed source or writing from scratch. If we have to
> we'll ifdef stuff but prefer not to.
Sure, that's what i expected.
>> If so, do you think the problem can be solved by simply providing
>> US-ASCII support only on such platforms, but no UTF-8 support at all?
> Yes. That's what I did with mblen when we picked up a need for that
> via libedit for platforms with no wide character support.
>
> $ grep -i mblen openbsd-compat/*.h
> openbsd-compat/openbsd-compat.h:#ifndef HAVE_MBLEN
> openbsd-compat/openbsd-compat.h:# define mblen(x, y) (1)
Uh oh. I'm not quite sure what consequences that might entail in
libedit for sftp(1), which does use setlocale(LC_CTYPE, "")?
Did you audit those consequences?
> Is there any reason the same approach would not work with wcwidth?
#define wcwidth(x) (1) /* NO!! */
would be a security risk. One purpose of wcwidth(3) is to weed out
non-printable characters. Whatever replacement we use, we have to
make sure it returns -1 for every non-printable character on every
platform. We MUST NOT let the scp(1) progressmeter spew random
Unicode characters taken from the network at the user's terminal.
They might be control codes.
#define wcwidth(x) (-1) /* not really */
is not a security issue, but it would completely break filename
display even with the C/POSIX locale on those platforms. I briefly
considered
int
wcwidth(wchar_t wc) /* might break? */
{
if (wc < 0x20 || wc > 0x7e)
return -1;
return isprint((unsigned char)wc) ? 1 : -1;
}
But that isn't ideal either because as far as i know, ISO C
doesn't require that wchar_t is internally represented in a way
that puts ASCII in the range 0x00 to 0x7f. Using iswprint(3)
is not a very good idea either because that is C99, not C89,
and may not be available either.
So if we can't get a real implementation of wcwidth(3) on some
platform, it's better to completely disable UTF-8 and only allow
US-ASCII.
A real replacement implementation of wcwidth(3) is MUCH harder
than a real replacement implementation of mbtowc(3) and mblen(3).
It needs a big table of character ranges (see tmux(1)), while
the latter can be done in 50 lines (see mandoc(1)).
That's why i said: If we want full UTF-8 support on all platforms
no matter what and must have a complete replacement wcwidth(3), we
should complete djm@'s character filtering first.
>> P.S.
>> This patch also uses mbtowc(3), but i assume that's no problem
>> because that's ANSI C.
> I would not assume that its existence in the standard is equal to its
> existence in all deployments :-) That said it looks like we can
> implement it in libcompat if needed.
Yes, mbtowc(3) and mblen(3), certainly, but even those only for UTF-8,
not for any other locale.
Yours,
Ingo
More information about the openssh-unix-dev
mailing list