[Bug 1632] [PATCH] UTF-8 hint sftp-server extension

bugzilla-daemon at bugzilla.mindrot.org bugzilla-daemon at bugzilla.mindrot.org
Thu Jan 7 22:39:30 EST 2010


https://bugzilla.mindrot.org/show_bug.cgi?id=1632

Salvador Fandiño <sfandino at yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sfandino at yahoo.com

--- Comment #6 from Salvador Fandiño <sfandino at yahoo.com> 2010-01-07 22:39:29 EST ---
Hi,

>From my point of view the proposed patch is useless as file system
encoding is not an on-off thing.

Most servers nowadays have their file systems encoded as utf8 (even if
the OS knows nothing about it) and any modern SFTP client should
already default to utf8.

Using one bit for this task as in the patch would just say, "if I am
set use utf8, if not, use also utf8 anyway because it is the encoding
most likely".

> I'm not sure what problem this patch solves - I suppose it is
> technically possible for platforms that OpenSSH runs on to use a
> non-UTF8 encoding, but in does anyone really do it in practice? (I
> don't know)

Well, a possible scenario is some server running an old application not
supporting utf8 or hard coded to use some specific encoding or some
server configured a long time ago when utf8 was not the default.

> From a client perspective UTF-8 should be quite easily distinguished
> from other non-ASCII encodings by looking at the first character
> sequence with the high bit set.

AFAIK this is not true, utf8 encoded strings do not necessarily have
the high bit of the first byte set.

> Some other questions:
> 
> Is it really the filesystem that encodes filenames as UTF-8? or is it a convention used by application developers using the filesystem?

On Unix file systems, the OS just sees null terminated strings, it does
not perform any conversion itself and is up to the application to
decide how to render that strings (usually taking into consideration
the locale configuration).

> Perhaps it would be better to just ensure that we always render
> filenames in UTF-8.

That would require linking sftp-server against one of the libraries
supporting conversion of strings between different encodings.

As the client should also perform the inverse operation in order to
save the file using the local encoding, the full conversion process can
be pushed there.

> but really sftp-server has no way of knowing what
> encoding has been used and since Unix filesystems have traditionally
> been pretty agnostic about the structure of filenames (other than to
> exclude '\0' and '/') they may be entirely unstructured or have
> multiple encodings active on the same filesystem. I'm not sure what the
> answer is, but I'm reluctant to add a protocol extension that we will
> have to honour perpetually without understanding it better.

My conclusion is that at least a string should be used to define the
encoding. Maybe you can abuse the extension mechanism of the
SSH_FXP_INIT packet passing something as

  fs_encoding(latin1)@openssh.org = 1

-- 
Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.


More information about the openssh-bugs mailing list