AcceptEnv LANG LC_* vs available locales

Thu Apr 28 19:05:31 AEST 2022

Hi Jochen,

Jochen Bern wrote on Thu, Apr 28, 2022 at 08:53:32AM +0200:
> On 27.04.22 21:48, Thorsten Glaser wrote:
>> On Wed, 27 Apr 2022, Harald Dunkel wrote:

>>> Maybe you should consider to add some guidelines about how to handle
>>> locales into a README within the openssh source package, or in
>>> ssh_config(5), sshd_config(5), etc.?

>> Pretty easy: handle them on the server, period. There’s no way to
>> do it otherwise.

> Sure, because "have server setting blindly trump clients'" is *so* much 
> better than "have client setting forwarded blindly to the server".
> 
> And here I thought that, after OpenVPN painstakingly retrofitting one 
> into their data channel setup, "when you need an agreement, you need an 
> explicit negotiation phase" would be a commonly accepted tenet now ...

This would be a nice to have if it were possible.
But calling an impossible task "a commonly accepted tenet"
feels unwise to me.

> (As in, *let* the client send his settings, then have the shell's 
> startup phase run a helper application that edits the env vars as 
> necessary to achieve compatibility with the server's capabilities.)

And if the client says they want "zh_CN.UTF-16" but the server only
knowns how to do "C" and "en_US.UTF-8", then what are you going to do,
as just one example among a host of tricky combinations?  In fact,
the only safe option in such a situation is to reject the connection.

And even if a safe setting existed in every case, which it does not,
it would be a complex and open-ended task to figure out what that setting
is on a given machine to be compatible with an arbitrary locale name
received over the wire, since POSIX explicitly says that the meaning
of locale names (apart from "C" and "POSIX") is implementation-defined.

So for every operating system and every possible subset of locales that
might be installed on a server, OpenSSH maintainers would have to maintain
a list of how to best match *any* valid locale from *any* other system to
one of the locales available on the server.  Even in ideal circumstances,
that's a nightmarish task that can never be completed but is highly
error-prone.  Even if somebody would be willing to do all that work,
having complex scripts or helper applications called at this stage would
cause a security concern by adding attack surface.

Besides, in this world, circumstances are rarely ideal.  In theory,
the same locale name might even refer to different locales on different
systems.  Maybe even in practice: for example, "C" used to refer to an
effectively ISO-Latin-1 locale on old releases of OpenBSD (it no longer
does).  It's an ugly memory even though reasons existed to do that.
Users might even install their own, personal locale, so a locale string
might even indicate a locale that is non-standard even for the operating
system the client happens to be running...

Sure, any magic scheme you might devise to achieve the above would
likely work in many cases, so users would get trained into an utterly
unsafe attitude of "locales just work with SSH" and stop thinking
about it.  If the beast would then suddenly bite, it would bite all
the deeper.

In conclusion, i think it is better to take a firm stand and say:

  It is purely the responsibility of the user to make sure that
  the server locale and the client locale are compatible *before
  connecting*.  Otherwise, all bets are off.  Ideally, use UTF-8
  on both sides *and* make sure your xterm(1) runs in UTF-8 mode.

Yours,
  Ingo