[patch] scp + UTF-8

Ingo Schwarze schwarze at usta.de
Thu Jan 21 01:53:13 AEDT 2016


Hi Michael,

Michael Stone wrote on Wed, Jan 20, 2016 at 07:50:03AM -0500:
> On Wed, Jan 20, 2016 at 01:13:10AM +0100, Roland Mainz wrote:

>> 3. I am not sure whether there is a specific byte limit for UTF-8 in
>> any of the standards, e.g. "- To support terminals larger then
>> MAX_WINSIZE and still be properly indented I increased the buf size to
>> 4x the size of MAX_WINSIZE, since the maximum size of an UTF-8 char
>> <should> be 4 bytes." might not be a portable assumption and I would
>> at least safeguard it.

> Isn't that assumption completely broken in the presence of combining
> characters?

It is.  As far as i understand, given any natural number N > 0,
you can construct a valid Unicode string of display width 1
such that its UTF-8 encoding consists of N bytes.

Not that such strings are terribly useful for large N, but we do
indeed have to keep in mind that the code must not break when it
encouters them.  Martijn's code already seems safe in *that*
respect, it just cuts the string early and pads with blanks if
the buffer is too short, but i'll re-check before deciding on a
final version.

Thanks for the reminder!

Yours,
  Ingo


More information about the openssh-unix-dev mailing list