Phasing out forwarding of locale settings

Sat Sep 11 23:44:51 AEST 2021

On Sat, 11 Sep 2021, Ingo Schwarze wrote:

> When you first log into a machine, for security reasons, you have
[…]
> connecting from.  From that point onward, whatever locale(1) defaults
> the sysop on the target machine may have chosen no longer apply to you.

Exactly!

> When connecting from a new machine, you need to check your terminal
[…]
> for this unusual connection before you start any real work on the
> remote machine.

Right.

You can make this a little safer by using a terminal in ASCII mode and…

$ ssh -t env LC_ALL=C sh
(or mksh -l or bash --login)

… for that first connection. There can still be attack vectors then,
but in that case/lack of trust, you probably don’t want to connect
at all.

>  * Which terminals or terminal emulators and which modes you use

This is really tricky. xterm, for example, has control sequences to
trigger hardcopies and other fun, which aren’t disabled *that* easily.
Moreover, not just ESC but also CSI (\x9B) triggers them.

Running something like GNU screen, tmux, or perhaps even window(1)
(I don’t even think this is available on GNU/Linux but OpenBSD at
least had it at some point), that reduces the capabilities of your
terminal, might make sense in a lack of trust scenario. Heh, isn’t
that a good project idea, make a reduced-functionality terminal-in-
terminal emulator. Perhaps even one that can recode like luit(1).

Hmm, I guess when I ever have the time for that I could add this
to screen whose codebase, horrid though it is, I’m at least somewhat
familiar with…

>  * Which locales are available on the target machine (none of the
>    machines you are connecting from can know that).

The C locale should be ubiquitous, though, on the other hand, it
only makes the first 128 chars defined; what the upper half is is
up to the system. \x9B, for example, is ¢ in cp437 so it could even
show up in a welcoming message showing the prices to use the machine.

> Again, SendEnv / AcceptEnv cannot *make* any of this safe.
> Users need to use their brains to make their connections safe.

For some reason it’s hard to get that point across ☻

But to get back to Florian’s initial question… OpenSSH by default
doesn’t accept or send the locale-related environment variables,
though whether this is because of forethought or simply because
OpenBSD didn’t use them is up to interpretation. So accepting and
sending them is a somewhat cross-distro deviation from the normal
behaviour anyway and “Phasing out forwarding of locale settings”
would just be returning to the upstream default, so it’s probably
not questionable to do. Getting maintainers to actually do it now…

> > *Especially* not $TERM with all its historic baggage, I guess.
> 
> At least $TERM is usually set by the terminal emulator, so it usually
> matches the terminal you are really using on the client side.

And it’s not $TERMCAP. That’s the funny one.

> Besides, the ssh_config(5) manual explains that passing it is
> required by the protocol, and it is indeed not clear to me how
> a pseudo terminal on the server should behave without it.

Right. Having used systems from a (real or emulated) serial console,
which (obviously) cannot set $TERM properly at first, this is not fun.

> I don't really see the problem here.  In that company, you would
> obviously set all computers to a default of LC_ALL=en_US.UTF-8

I’ve proposed a C.UTF-8 around 2013 which has made its way into first
eglibc then glibc in Debian, and musl, and AFAIHH FreeBSD(?), and
there’s talk of supporting this more broadly (glibc upstream I hope).

On the other hand, on systems where this doesn’t exist, there’s
usually en_US.UTF-8 unless the system doesn’t ship all locales by
default (Debian again) or is HP/UX (which needs en_US.utf8).

But setting one of these as sensible default in the global shell
initialisation file on all servers, allowing users to customise
it in their local ones if needed, and…

> tell all employees to make sure all their terminals run in UTF-8
> mode all the time, on all company and private computers they use

… that, indeed solves this problem.

Another thing you could do, server-side, is to guess the terminal
encoding. This is fragile as hell though. Years ago I’ve come up
with:

• flush all I/O
• output "\030\032\r\xE2\x82\xAC.\033[6n"
• read back the terminal’s response
  ‣ 1 is probably EUC-JP, EUC-KR
  ‣ 3 is probably UTF-8
  ‣ 4 is probably ISO-8859
  ‣ 5 is probably Shift-JIS
• output "\r      \r" and flush

This is fragile for multiple reasons. It depends on the terminal
actually responding to the column enquiry, not exploding on the
characters sent, etc. and (because it needs to flush, send, then
wait for the response) takes a noticeable amount of time. It’ll
also return the wrong cursor position if the user begins typing
while this is running.

Standardising on UTF-8 terminals is the way to go, in 2021 even
more so than in 2006. Looking at the CVS log I’ve only written
this because Linux’ vt-is-UTF8 utility is Linux-specific.

> can still set LC_ALL=de_DE.UTF-8 or even LC_ALL=ja_JA.UTF-8 to their

(ja_JP, I think)

> > language will avoid any mismatch ... seriously?
> 
> No, it will not, and i didn't intend to claim that.
> 
> What matters is how people behave, not whether these variables are
> passed around or not.

Right. This suggestion has the greatest potential to avoid mismatches
if users avoid doing some things (like running nōn-UTF-8 terminals)
though.

> send to it installed.  Also remember that locale names are not
> standardized, so your preferred locale might be installed, but using

*cough* HP/UX *cough*

Incidentally “locale -a” on a GNU system also shows the “.utf8” variant
but there may be systems that don’t work with *that*, so…

> > least reflect the *current* mode into $TERM, which already *is* both

$TERM is an index into a list of terminals shipped with the server OS.
Adding anything to this is a multi-year process (consider how long it
took for screen to be added) and must be avoided at all cost. (There’s
this Debian package called ncurses-term that ships some extra entries,
so for example GNU screen in xterm has TERM=screen-xterm instead of
just screen, leading to failures with all servers that don’t have this
extra package installed… or simply run a different operating system. I
consider installing this package harmful.) st and tmux are also usually
missing etc.

The termcap/terminfo databases are AFAICT also not concerned about the
encoding, so this isn’t the right place. It would work if, decades ago,
people had done something like “append +anything to a TERM and it’ll
look it up by basename” but they didn’t and we can’t change this now.

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

                        ****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
╱ ╲ header encryption!
                        ****************************************************