GSSAPI vs load-balanced servers - anything we can do?

Sat Sep 15 02:53:38 EST 2007

Dear all,

(apologoies - this has nothing to do with 4.7 being out, but is rather a 
long-standing issue that regularly bites us).

Is there anything I could do to further the case of
https://bugzilla.mindrot.org/show_bug.cgi?id=1008

As a summary, GSSAPI auth against machine in a DNS load-balanced server 
farm fails. SSH-1 Kerberos works.

DNS load-balanced farm:
Individual machines in the farm  have separate IP addresses (ipA, ipB), 
separate  hostnames (nameA, nameB, ..) and separate Kerberos identities 
(host/nameA.domain at REALM) . A common DNS name (clustername) resolves to 
one or several IPs. Reverse lookup on the IP gives the individual 
machine name. (seems to be a common & cheap way to spread load).

The problem is that GSSAPI insists on doing its own DNS lookup 
(forward+reverse) to determine the (Kerberos) identity of the server, 
and has a fair chance of getting a different reply. So a typical session 
looks like

client: gethostbyname( clustername) -> ipA
         connect (ipA)
         (KEX and other wonderful SSH stuff)
         do GSSAPI auth
             gethostbyname(clustername) -> ipB
             gethostbyaddr( ipB) -> nameB
             get service ticket for host/nameB.domain at REALM
             send ticket to connected machine (nameA)
server: huh? Enotmynameinticketgoaway.

The GSSAPI behaviour is apparently mandated by RFC1964 (2.1.3):
>    When a reference to a name of this type is resolved, the "hostname"
>    is canonicalized by attempting a DNS lookup and using the fully-
>    qualified domain name which is returned, or by using the "hostname"
>    as provided if the DNS lookup fails.  The canonicalization operation
>    also maps the host's name into lower-case characters.

so is unlikely to change.

The only workaround seems to be feeding the canonical hostname (or IP) 
of the  currently-connected server machine into GSSAPI, instead of the 
hostname the user provided (this is what SSH-1 Kerberos did, by the way).
While in principle we could change the reverse DNS of the cluster 
machines to point to the cluster name, this would introduce confusion 
for everything that known already which exact host to connect to.

This is a client-side issue, so no amount of patching on the server will 
make this issue go away. In addition, we need to convince vendors to 
provide patches to their deployed "legacy" versions, which is made 
difficult by the fact that this is not fixed "upstream". We seem to have 
convinced Red Hat that this is an issue.

Two-line patch is at https://bugzilla.mindrot.org/attachment.cgi?id=1202.
https://bugzilla.mindrot.org/show_bug.cgi?id=1008 also has a more 
elaborate version by Simon that introduces a new config option.

I'd be happy to forward-port either to 4.7, if there is a chance that 
this will get applied one day.

Sorry for the lengthy post, thanks for your time.
Jan