ssh host keys on cloned virtual machines

Tue Feb 28 11:58:57 AEDT 2023

On Sun, Feb 26, 2023 at 2:51 PM Thorsten Glaser <t.glaser at tarent.de> wrote:
>
> On Fri, 24 Feb 2023, Keine Eile wrote:
>
> > does any one of you have a best practice on renewing ssh host keys on cloned
> > machines?
>
> Yes: not cloning machines.

Good luck with *that*. Building VM's from media is a far, far too
lengthy process for production deployment, especially for auto-scaling
clusters.

> There’s too many things to take care of for these. The VM UUID in
> libvirt. The systemd machine ID. SSH hostkey and SSL private key.
> The RNG seed. The various places where the hostname is written to
> during software installation. The inode generation IDs, depending
> on the filesystem. Other things that are created depending on the
> machine and OS… such as the Debian popcon host ID, even.

That's what the "sysprep" procedure is for when generating reference
VM images, and "cloud-utils" for setting up new VMs from images, at
least for Linux and Windows and MacOS. I've no idea if OpenBSD has a
similar utility, I've never tried to cloud or enterprise deploy that.
I've done such cloning very effectively on very large scales, up to
roughly 20,000 servers at a time, and the relevant procedures are
decades old. It's a solved problem, all the way back to cloning
hardware images form CD's and USB sticks and re-imaging images
remotely. "virt-utils" has been handling host-specific boot-time
tuning for years, most of them are solved problems.

> The effort to clean/regenerate these and possibly more, which, in
> addition, often needs new per-machine random bytes introduced, is
> more than just installing fresh machines all the time, especially
> if you script that (in which I personally even consider moving a‐
> way from d-i with preseed and towards debootstrap with (scripted)
> manual pre‑ (partitioning, mkfs, …) and post-steps).

If people really feel the need for robust random number services,
they've got other problems. I'd suggest they either apply an init
script to reset whatever they feel they need on every reboot, or find
a genuine physical RNG to tap.

The more host-by-host customization, admittedly the more billable
powers and the more yourself personally into each and every stop. But
it doesn't scale, and you will eventually be told to stop wasting your
time if your manager is attentive to how much time you're burning on
each deployment.

> This is even more true as every new machine tends to get just the
> little bit of difference from the old ones that is easier to make
> when not cloning (such as different filesystem layout, software).

And *that* is one of the big reasons for virtualization based
deployments, so people can stop caring about the physical subtleties.
I've been the poor beggar negotiating all distinct physical platforms
for deployment, I used to build production operating systems. Getting
the SSH setups straight was something I had to pay attention to, as
was the destabilizing effects of using .ssh/known_hosts in such
broadly and erratically deployed environments were IP addresses in
remote data centers could not possibly be reliably controlled or
predicted, nor was reverse DNS likely to work at all which was its own
distinct burden for logging *on* those remote servers.

> I know, this is not the answer you want to hear, but it’s the one
> that works reliably, without investing too much while still being
> not 100% sure you caught everything.

It Does Not Help when unpredictable assignment of IP addresses of
different classes of hosts inside the same VLAN lead to different
hostkeys migrating back and forth to the same IP address, which I'm
afraid occurs far too often with careless DHCP setups.

> (Fun fact on the side, while doing admin stuff at $dayjob, I even
> didn’t automate VM creation as clicking through d-i those times I
> was installing some took less time in summary than creating auto‐
> mation for it would’ve. I used xlax (like clusterssh, but for any
> X11 window) for starting installation, then d-i network-console +
> cssh for the remainder; a private APT repository with config pak‐
> kages, to install dependencies and configure some things, rounded
> it off.)

You probably don't work on the same scale I've worked, or had to
publish infrastructure-as-code so that the poor guy in India getting
the call at 2 AM to triple a cluster can push a few buttons and not go
through manual partitioning, or lots and lots and lots of AWS
consoles. It can be a very different world when you have that much
time to invest on individual servers. It's also a great way to stretch
our billable hours with very familiar tasks which only you know how to
do. I've been targeted before for layoffs because I successfully did
so much automation, but I know a few of my trainees who are doing
*very well* with quite large environments, so I'm glad to have taught
them well.

Nico Kadel-Garcia

> bye,
> //mirabilos
> --
> Infrastrukturexperte • tarent solutions GmbH
> Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
> Telephon +49 228 54881-393 • Fax: +49 228 54881-235
> HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
> Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg
>
>                         ****************************************************
> /⁀\ The UTF-8 Ribbon
> ╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
>  ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
> ╱ ╲ header encryption!
>                         ****************************************************
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev