AcceptEnv LANG LC_* vs available locales

Fri Apr 29 21:29:16 AEST 2022

Hi,

Demi Marie Obenour wrote on Thu, Apr 28, 2022 at 08:29:24PM -0400:
> On 4/27/22 05:40, Ingo Schwarze wrote:
>> Demi Marie Obenour wrote on Tue, Apr 26, 2022 at 09:12:07PM -0400:
>>> On 4/25/22 08:23, Ingo Schwarze wrote:

>> In OpenBSD, we are used to the deliberate
>> decision that the C library ignores all aspects of the locale except
>> the character encoding, [...]

> Off-topic: Why did OpenBSD make this decision?  In particular,
> LC_MESSAGES seems to be essential to internationalization support,
> without being very problematic otherwise.

I think having libc and POSIX utility programs always reliably print
diagnostics in the same way, and always in US-ASCII rather than sometimes
in UTF-8, is more valuable than internationalization of operating
system diagnostics, both from the user perspective (predictability and
comprehensibility) and from the OS maintainer perspective (code simplicity
and hence better change for correctness and reliability).  Even as a
native German speaker, i regularly get confused when seeing German
error messages because they usually feel quite incomprehensible.

Besides, LC_CTYPE is essential for important functionality, but picking
individual features from all the rest of LC_* for implementation isn't
going to help.  It will increase code complexity without really
achieving internationalization (even full LC_* support is not really
sufficient for complete internationalization...).  So better ditch
it outright than attempt some piece-meal approach.

Besides, even LC_MESSAGES has features that are prone to causing
trouble, for example changing the meaning of "yes" and "no".

> Also, is it safe if the server uses the C locale (LC_ALL=C) and the
> client uses UTF-8?

Yes, because US-ASCII is a subset of UTF-8, so what a well-behaved
server sends in the C locale is supposed to be a subset of what it
might send in a UTF-8 locale.

Of course, whether it is safe when both the server and the client use
a UTF-8 locale obviously depends on the terminal or terminal emulator,
but at least xterm(1) in UTF-8 mode [but not in the traditional 8-bit
mode that may still be the default on some operating systems] is safe
when the server runs either the C locale or a UTF-8 locale.

[...]
>> That said, on non-OpenBSD systems, if the locale used by a program does
>> not match watch the user thinks, the *semantics* of the program may still
>> screw up horribly, even if the character encoding matches.  For example,
>> consider user input of floating point numbers with LC_NUMERIC set to a
>> cultural convention the user isn't aware of.  But such issues are
>> only loose related to ssh(1) and to terminal security.

> When it comes to terminal security, another approach is to use
> a transient tmux(1) pane or terminal window that is closed once
> the session is complete.

Frankly, i don't know anything about tmux(1) and simply don't know
whether it can or cannot help with the topic at hand.

> This assumes that the mismatch cannot be
> exploited for code execution, but I would be highly surprised if it
> could be, especially with the client in UTF-8 mode.

xterm(1) in UTF-8 mode is quite good because it never interprets
multibyte characters as in-band terminal control codes.  Your
mileage might vary with other terminals or emulators.

Yours,
  Ingo