[patch] scp + UTF-8

Roland Mainz roland.mainz at nrubsig.org
Wed Jan 20 11:13:10 AEDT 2016


On Tue, Jan 19, 2016 at 10:48 PM, Ingo Schwarze <schwarze at usta.de> wrote:
> Martijn sent the following patch to me in private and agreed that i post
> it here.
>
> In any other program in OpenBSD base, i'd probably agree with the
> basic approach.  Regarding OpenSSH, however, i worry whether wcwidth(3)
> can be used.  While wcwidth(3) is POSIX, it is not ISO C.  Does
> OpenSSH target platforms that don't provide wcwidth(3)?  If so,
> do you think the problem can be solved by simply providing US-ASCII
> support only on such platforms, but no UTF-8 support at all?
>
> If you think we can require wcwidth(3), or we can ditch UTF-8 support
> where wcwidth(3) it isn't available, i will work with Martijn to
> iron out a few style issues such that we can submit a patch that
> is ready for commit.

Some generic portability comments:
1. There are other modern encodings like GB18030 (support is even
mandatory for software sold to the goverment in PRC China) currently
in use and many "legacy" ones, so the current locale may be multibyte
but does not use UTF-8 as encoding
2. |wcwidth()| counts in terminal cells and not number of characters
(where one character might occupy one or more bytes), e.g. there are
characters which may occupy from zero to four terminal cells (acual
number of cells is a bit (not much) OS specific).
3. I am not sure whether there is a specific byte limit for UTF-8 in
any of the standards, e.g. "- To support terminals larger then
MAX_WINSIZE and still be properly indented I increased the buf size to
4x the size of MAX_WINSIZE, since the maximum size of an UTF-8 char
<should> be 4 bytes." might not be a portable assumption and I would
at least safeguard it.

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)


More information about the openssh-unix-dev mailing list