[Bug 1632] [PATCH] UTF-8 hint sftp-server extension

bugzilla-daemon at bugzilla.mindrot.org bugzilla-daemon at bugzilla.mindrot.org
Fri Jan 8 08:43:56 EST 2010


https://bugzilla.mindrot.org/show_bug.cgi?id=1632

--- Comment #9 from Salvador Fandiño <sfandino at yahoo.com> 2010-01-08 08:43:55 EST ---
> Do you think that this is a bug in WinSCP, and the default remote
> charset should be UTF-8? Well, maybe, but currently it isn't, and I
> don't think the developer will be eager to change it because of
> compatibility reasons. That's why I came up with this solution.

Well I would not call it a bug. There is no way to know the encoding so
WinSCP guesses it as latin1 that used to be the most common encoding.
That's fine. It is just that nowadays, utf8 would probably make more
hits.

> In the UNIX/BSD world the current logic is simple: don't convert the
> charset, it's not the filesystem's, or the file transfer program's job
> to decide the charset. I'm actually fine with that, it makes sense.

No, it doesn't either. For instance, if you transfer files from a
server using latin1 to a client using utf8, file names need to be
converted or you will get broken ones.

> However with a UNIX server and a Windows client, charset conversion is
> inevitable, and somehow we have to give a sign to the client which is
> the preferred remote charset, otherwise clients which historically
> defaulted to latin1 won't work with modern UTF-8 aware servers.

The world is not all utf8 or latin1, there are several other encodings
in use. If you want to solve that problem do it right and in a general
way. Instead of a bit, use a string to pass the encoding from server to
client.

Actually, I have found that later versions of the SFTP draft already
define a similar extension:

  A server MAY include the following extension with it's version
  packet.

      string "filename-charset"
      string charset-name

  A server that can always provide a valid UTF-8 translation for
  filenames SHOULD NOT send this extension.  Otherwise, the server
  SHOULD send this extension and include the encoding most likely to be
  used for filenames.  This value will most likely be derived from the
  LC_CTYPE on most unix-like systems.

(extracted from
http://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/draft-ietf-secsh-filexfer-13.txt)

-- 
Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.


More information about the openssh-bugs mailing list