This document desribes a «break-glass» mechanism based on SSH certificate-based authentication with authorization implemented via NSS and PAM modules
Problem Statement
Disaster recovery is a critical part of any infrastructure. On-call or support engineers must have secure access to critical systems in case any disruption. Recovery mechanism must be secure and protected because it implies access to critical systems and data bypassing traditional authentication and authorization process. This mechanism is usually called «break-glass». It includes special credentials usage in an emergency case when traditional access methods do not work.
Break-glass access refers to a procedure used in critical emergencies or exceptional cases, when a user with insufficient access is granted elevated access rights to bypass normal access controls - SSH Academy 1
In common case companies use SSH protocol and dedicate highest-level account to access to infrastructure in emergency case. This approach brings the following issues:
- all systems must have pre-created share accounts. Those accounts create issues in case of a potential investigation
 - after using the «
break-glass» process password should be changed in order to prevent no unauthorized access - An on-call engineer must have access to password manager where credentials are stored for emergency accounts. In fact without a password manager a company is cut off from its systems. A comprehensive «
break-glass» solution is required to give the engineers access back to their critical systems when password manager fails. 
The most common way of handling SSH authentication is public key authentication. This is much stronger than simply using a password, but it creates a problem of how to securely manage changes to SSH keys over time. So, if ten new people join a company and five others leave, someone has to add the ten new keys to each server and remove the previous five. Although, pubkeys partly solve issue related to authentication, but do not solve the limitations described above. Also pubkey(s) adds new challenges and some researches show the it:
Monitoring of the usage of the keys has revealed that typically about 90% of the authorized keys are unused. That is, they are access credentials that were provisioned years ago, the need for which ceased to exist or the person having the private key left, and the authorized key was never deprovisioned. Thus, the access was not terminated when the need for it ceased to exist.
...
In many organizations – even very security-conscious organizations – there are many times more obsolete authorized keys than they have employees. Worse, authorized keys generally grant command-line shell access, which in itself is often considered privileged. We have found that in many organizations about 10% of the authorized keys grant root or administrator access. SSH keys never expire. 2
Historically, most organizations have not touched the location of the authorized keys files. This means they are in each user’s home directory, and each user can configure additional permanent credentials for themselves and their friends. They can also add additional permanent credentials for any service account or root account they are able to log into. This has lead to massive problems in large organizations around managing SSH keys.
AuthorizedKeysFile /etc/ssh/authorized-keys/%u
Enterprises should also pay attention to the AuthorizedKeysCommand and AuthorizedKeysCommandUser options. They are typically used when SELinux is enabled and to fetch SSH keys from LDAP directories or other data sources. Their use can make auditing SSH keys cumbersome and they can be used to hide backdoor keys from casual observation. 3
Although, pubkeys have advantages over password keys are not passwords. There are several significant differences between SSH keys and passwords: 4
- Passwords are related to user accounts. SSH user keys don not have to be
 - Passwords usually have expiration times SSH user keys don not
 - Passwords cannot be generated without oversight SSH user keys can
 - Passwords are mostly used for interactive authentication. SSH keys are can used for machine-to-machine authentication
 - Passwords grant access to the operating system level without additional restrictions SSH user keys can control both access and privilege levels
 
That’s why the way where advantages passwords and pubkeys is needed. SSH supports such way of handling authenticatio via Certificate Authorities (CAs). Certificates enable to associate credentials with user, use audit, create short-lived identity and use metadata as an extending point for authentication/authorization, etc.
Traditional pubkey(s) have metadata, but it can be changed any users.
Finally, implementations of ephemeral certificates provide the ability to utilize approaches, such as: Keyless, Zero Trust, Just-In-Time for access to remote systems using short-lived identity instead of static keys and passwords.
Specification
Obviously, certificates have more advantages, but certificates and SSH protocol itself have some limitations. SSH protocol and certificates do not solve and do not have to solve user management and authorization issues (e.g. assigning sudo rules). That’s why account must be pre-created together with sudoers files.
In order to understand which solution can help with limitations related to user management and assigning permissions it’s necessary to consider SSH protocol. It is designed as three protocols that typically run-on top of TCP:
SSH Transport Layer Protocolis responsible for server authentication, confidentiality, integrity and compressionSSH User Authentication Protocolis responsible for client (user) authentication to the serverSSH Connection Protocolis responsible for multiplexing the encrypted tunnel into several logical channel
block-beta
    columns 4
    block:SSH_p:5
        ssh_auth_p["SSH User Authentication Protocol"]
        ssh_conn_p["SSH Connection Protocol"]
    end
    block:SSH_t:5
        ssh_transport_p["SSH Transport Layer Protocol"]
    end
    block:TCP:5
        tcp["TCP"]
    end
    block:IP:5
        ip["IP"]
    end
    style ip fill:#d4efdf, stroke-width:0px
    style tcp fill:#fcf3cf, stroke-width:0px
    style ssh_transport_p fill:#d4e6f1, stroke-width:0px
    style ssh_conn_p fill:#f5b7b1, stroke-width:0px
    style ssh_auth_p fill: #d7bde2, stroke-width:0px
    style SSH_p fill:#ddd, stroke:#000,stroke-width:1px
    style SSH_t fill:#ddd, stroke:#000,stroke-width:1px
    style TCP fill:#ddd, stroke:#000,stroke-width:1px
    style IP fill:#ddd, stroke:#000,stroke-width:1px
The last step in the SSH Transport Layer Protocol is service request. A client sends an SSH_MSG_SERVICE_REQUEST to request the SSH User Authentication Protocol or SSH Connection Protocol. All the data will be sent protected by encryption and MAC.
According to Authentication Requests section in the RFC4252: 5
If the requested ‘user name’ does not exist, the server MAY disconnect, or MAY send a bogus list of acceptable authentication ‘method name’ values, but never accept any. This makes it possible for the server to avoid disclosing information on which accounts exist. In any case, if the ‘user name’ does not exist, the authentication request MUST NOT be accepted.
%%{
  init: {
    "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' }
  }
}%%
flowchart LR
    subgraph sshd_p[sshd process]
        direction TB
        sshd("sshd") ==> |Look up user|libnss(NSS)
        subgraph libnss[NSS]
            direction RL
            nss{{"libs"}}
        end
    end
    %% subgraph nss_config[NSS config]
        %% direction TB
        cfg{{"/etc/nsswitch.conf"}} ==> libnss
    %% end
    subgraph sources[Data Sources]
        nss ==> passwd ==> pwd_src(files<br>systemd)
        nss ==> group ==> grp_src(files)
        nss ==> networks ==> net_src(files<br>dns)
        nss ==> etc ==> etc.
    end
    libnss ==> |Response|sshd
    classDef nss fill:#eeac4d;
    classDef sshd_p fill:#f6f7fb
    class libnss nss
    class sshd_p sshd_p
    class sources sshd_p
That’s why it’s necessary to consider SSH User Authentication Protocol in a more detailed way. It performs the following functions:
- Message Types and Formats
 - Message Exchange
 - Authentication Methods
 
SSH User Authentication Protocol phases:
- client sends 
SSH_MSG_USERAUTH_REQUESTmessage - if username is not valid then server sends either 
SSH_MSG_USERAUTH_FAILUREor authentication method list - client selects one of the methods from the list and again sends the request to the server
 - if the server requires more than one authentication method then server sends partial success
 - when all required authentication methods succeed the server sends a 
SSH_MSG_USERAUTH_SUCCESSmessage. 
sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    Note over c, s: TCP connection has been established
    Note over c, s: SSH key exchange has been done
    Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent
    c->>s: SSH_MSG_USERAUTH_REQUEST
    activate s
    alt is the user invalid
        s ->> c: SSH_MSG_USERAUTH_FAILURE
        opt
            s ->> c: Authentication method list
        end
    else the user is valid
        s ->> c: Authentication method list
    end
    deactivate s
    c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br>Authentication method has been selected
    activate s
    alt is additional an authentication method(s) required
        s ->> c: Partial success
        Note over c, s: Step(s) related to additional authentication method(s)
    else
        s ->>c: SSH_MSG_USERAUTH_SUCCESS
    end
    deactivate s
The server may require one or more of the following authentication methods:
- Public key
 - Password
 - Host-based
 
sequenceDiagram
    participant c as SSH client
    participant CA as Certificate Authority
    participant s as SSH server
    c ->> CA: send SSH certificate
    activate CA
        Note right of CA: Generate short-lived certificate
        CA ->> c: Certificate has been generated
    deactivate CA
    c ->> s: SSH authentication via certificate
    activate s
        Note right of s: Validate certificate by CA
        s ->> c: SSH authentication has been successful
    deactivate s
Certificate-based authentication is an extension of public key authentication where there is CA role for enhancement security. It uses three main components: a private key, a public key, and a certificate signed by the CA.
Certificate-based authentication phases are:
- client sends 
SSH_MSG_USERAUTH_REQUESTmessage - username is not valid then server sends either 
SSH_MSG_USERAUTH_FAILUREor authentication method list - client sends SSH certificate signed by a trusted CA to the server
 - server makes the following verifications:
- signature on a client certificate based on the public key CA
 - validity period certificate
 - requested user account (principals)
 
 - if the certificate is valid then server grants access to the client based on the identity
 - when all required authentication methods succeed the server sends a 
SSH_MSG_USERAUTH_SUCCESSmessage 
sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    Note over c, s: TCP connection has been established
    Note over c, s: SSH key exchange has been done
    Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent
    Note over c, s: Authentication method has been selected
    c ->> s: send SSH certificate signed by CA
    activate s
    critical validate SSH certificate
        s-->s: Certificate Authority
        s-->s: Expiration date
        s-->s: Principals
        s-->s: etc.
            alt is not valid
                s->>c: SSH_MSG_USERAUTH_FAILURE
            else is valid
        s->>c: SSH_MSG_USERAUTH_SUCCESS
    end
    end
    deactivate s
According to Problem Statement section it’s necessary to pay attention on second and last phases in the certificate-based authentication. So, if username does not exist then ssh server will not continue authentication process. That’s why on this phase it’s necessary to create user, home directory, etc. SSH server must call Name Service Switch (NSS) which looks up user in different data sources (depends on settings in the /etc/nsswitch.conf). If NSS returns success then user exists. Thus, SSH server continues authentication process depending on authentication methods (password, pubkey, etc.). All authentication methods depend on NSS answer. SSH server checks settings related to authentication methods (e.g. looks up password in the /etc/shadow or keys in AuthorizedKeysFile 6). In order to create user on-demand it’s necessary to implement custom NSS module and configure it in the /etc/nsswitch.conf.
6 man 5 sshd_config
After successful authentication (last authentication phase) the next stage is Session Establishment. On that stage the client is allowed to access to the server. Session is opened after all Linux Pluggable Authentication (PAM) verification. In order to configure user’s session it’s necessary to implement custom PAM module and configure it in one of files in the /etc/pam.d. During performing PAM stage some environment variables will be defined. One of them is SSH_AUTH_INFO_0.7 It exposes authentication information to PAM module (e.g. pubkey, certificate, etc.). This variable can be used as source for making decisions during authorization process (e.g. assigning sudo group to user).
UsePAM Enables the Pluggable Authentication Module interface. If set to yes this will enable PAM authentication using KbdInteractiveAuthentication and PasswordAuthentication in addition to PAM account and session module processing for all authentication types.
Because PAM keyboard-interactive authentication usually serves an equivalent role to password authentication, you should disable either PasswordAuthentication or KbdInteractiveAuthentication. 8
8 man 5 sshd_config
%%{init: {
    "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' }
  }
}%%
flowchart LR
    subgraph sshd_p[sshd process]
        direction TB
        sshd("sshd") ======> |If user exist <br>and<br> UsePAM enabled|libpam(PAM)
        subgraph libnss[NSS]
            direction RL
            nss{{"libs"}}
        end
        subgraph libpam[PAM]
            direction RL
            pam{{"libs"}}
        end
    end
    %% subgraph pamcfg[PAM configs]
        %% direction TB
        cfg{{"/etc/pam.d/*"}} ==> libpam
    %% end
    subgraph modules[PAM modules]
        pam ==> account
        pam ==> authentication
        pam ==> password
        pam ==> session
    end
    libnss <==> |Request<br>Response|sshd
    libpam ==> |Response|sshd
    classDef pam fill:#0f9d58;
    classDef nss fill:#eeac4d;
    classDef sshd_p fill:#f6f7fb
    class libpam pam
    class libnss nss
    class sshd_p sshd_p
    class modules sshd_p
One of the ways to get authentication information during ssh connection it’s possible to use -A flag. This flag enables forwarding of connections from an authentication agent (ssh-agent) via a socket to a remote host. Path to socket is stored in the SSH_AUTH_SOCK environment variable. It possible to get an access to the variable on a remote host, but this way has some security issues related to forwarding the socket to all hosts. It’s possible to solve it if user set explicitly a forward socket for each other hosts (e.g. ForwardAgent yes).
When session is closed PAM module must perform the following actions:
- removing record to the /etc/passwd
 - removing home directory
 - killing all process related to the user
 - etc.
 
Thus, all users is temporary
sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    participant n as NSS
    participant p as PAM
    c->>s: SSH_MSG_USERAUTH_REQUEST
    activate s
    s ->> n: Request NSS
    activate n
    Note over c,n: According to settings in the /etc/nsswitch.conf NSS look up user the each data source. On this step the custom NSS <br/>module  must create a new user and return NSS_STATUS_SUCCESS if username matches the requirement
    alt does not user exist?
        n ->> s: NSS_STATUS_NOTFOUND
        s ->> c: SSH_MSG_USERAUTH_FAILURE
        opt
            s ->> c: Authentication method list
        end
    else user exists
        n ->> s: NSS_STATUS_SUCCESS
        c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br> Authentication method has been selected
        deactivate n
        s->>p: Request PAM
        activate p
        Note over s,p: According to settings in files to the /etc/pam.d PAM performs each module. On this step the custom PAM module must check SSH_AUTH_INFO_0, <br/>get pubkey and additional info (e.g. Key ID field) as well as return status if username, pubkey type, etc. matche the requirement.
            alt is not successful
                p ->> s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc.
                s ->> c: SSH_MSG_USERAUTH_FAILURE
            else successful
                p ->> s: PAM_SUCCESS
            end
        alt is additional authentication method required
            s ->> c: Partial success
            s ->> p: .
            Note over c, p: Step(s) related to additional authentication method(s)
            p ->> s: .
            s ->>c: SSH_MSG_USERAUTH_SUCCESS
        else additional authentication method(s) is not required
            s ->>c: SSH_MSG_USERAUTH_SUCCESS
        end
        deactivate p
    end
    deactivate s
    activate c
     Note over c, p: Session has been opened
    c ->>s: Session terminate
    s ->>p: Session will be closed
    p ->>p: Some action(s)
    p ->>s: Sucessfully
    s->>c: Session has been closed
HLD 9
Naming convention
Key ID field
Key ID field usually contains policy name which describes access level on hosts. It makes audit logs more detailed.
Currently, PAM module supports the following format of the field:
resource version: reserved for future usage. Default: ssh_v1
environment: reserved for future usage. If the field is not defined It will be set as !. The ! means that the field does not have value by default.
sudo group: [admins|users]. Default: users
Not all of the fields are required to be filled but Key ID minimum format must be defined as ::. The :: expands as ssh_v1:!:users by default.
Minimum requirements
OpenSSH >= 7.6p1 (has been tested on Fedora 41 and OpenSSH 9.8p1)
1Port 1110
UsePAM yes
Match LocalPort 1110
       TrustedUserCAKeys /path/to/ca
       AuthenticationMethods publickey
       PAMServiceName brkgl2s
Match All- 1
 - Add to /etc/ssh/sshd_config.d/00-break-glass.conf
 
Known limitations
Custom NSS module:
- each time generates a random 
UID/GIDduring the account creation process.UID/GIDwill be different to two hosts for same username. - requires username to contain postfix (
.brkgl2s) as an additional restriction for checking service name which calls NSS - supports only 
twosudo groups (for more details please check Naming convention section) - each user is assigned unique 
UID/GIDbut the group itself related toGIDis not created - changing service name is not supported (option 
PAMServiceName10). 
10 man 5 sshd_config
Custom PAM module:
- removes record about the user and home directory after the session is closed
 - termination all the process related to the user is not implemented
 - only 
ed25519pubkey type is supported - user is created each time when username matches with compliance. If SSH-server sends 
SSH_MSG_USERAUTH_FAILURE(e.g. invalid certificate) for some reason then user record is not deleted 
Pitfals
Checking PAM service name
System calls related to NSS which is used in tools, such as: id, getent, etc. will create a record in the users data source each time when user does not exist. In order to avoid the problem it’s necessary to limit PAM services which can use the custom NSS module and if calling PAM service is not ssh then NSS module must return NSS_STATUS_TRYAGAIN. The nss-devel does not have any functions for checking PAM service which calls NSS, but NSS modules can get some environment variables by analogy with PAM modules. So, SYSTEMD_EXEC_PID11 environment variable stores PID process which calls NSS service. When PID is known it enables to get process name via /proc/PID/comm12. Thus, implementation of checking of process name partly solves the problem and enables to use the tools without adding users to a data source. Unlike the nss-devel in the pam-devel library is an implemented function for getting a PAM service name.
11 man 5 systemd.exec
12 man 5 proc_pid_comm
sequenceDiagram
    participant s as SSH server
    participant n as NSS
    participant p as PAM
    activate s
    activate n
    s ->> n: Request NSS
    alt is user not found
        critical
            Note over s, n: On this step NSS module gets PID <br/>from SYSTED_EXEC_PID and <br/>looks up process name in /proc/PID/comm
            option calling process is not ssh
                n ->> s: NSS_STATUS_TRYAGAIN
            option username does not contain postfix
                n ->> s: NSS_STATUS_TRYAGAIN
        end
        n ->> n: Create user
        n ->> s: NSS_STATUS_SUCCESS
    else user is found
        critical
            option calling process is ssh
            option username does contain postfix
                n ->> s: NSS_STATUS_SUCCESS
        end
    end
    deactivate n
    s->>p: Request PAM
    activate p
    alt successfully
        critical check
        option SSH_AUTH_INFO_0
            p ->> p: looks up pubkey
        option pubkey type
        option gets Key ID
            p->> p: Create sudoerr file
        end
        p->>s:PAM_SUCCESS
    else not successfully
        p->>s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc.
    end
    deactivate p
    deactivate s
Checking username
In fact, the limitation related to postfix in a username is artificial and the postfix can be removed but it can brings to face the following problem:
there is danger that during creating of users at runtime an attacker can attempt to flood waste records to /etc/passwd. In fact, the postfix in username does not solve the problem if the attacker knowns about it. Also regular authentication process should be different from emergency authentication process. If the processes are united then users who already connected to hosts before the emergency situation will have an opportunity to pass authentication without necessity of creating the new user.
Nowadays the processes were splitted in order to improve management regular and emergency users but It does not guarantees that in the future the limitation may be removed.
My thoughts led me to an idea that ssh port should be opened only during emergency situations on network equipments. In regular time ACLs on network equipments should restrict an access to ssh port on hosts. Currently, In my opinion the processes must be splitted in order to manage and develop easily.
Similar projects
Details of implementation
get_pubkey_info function
There is a reason why get_pubkey_info function was implemented via execvp and pipe. The thing is that libraries such as libssh and libssh2 don’t have the functions which look up the fields inside of pubkey (certificate) and also OpenSSHp1 doesn’t have public API for implementing this function. The function can be implemented via using low level primitives. In the future there are a lot of reasons to refactor the function.
In fact the function was implemented as parent and two child processes with redirect stdin/stdout via pipe. So, first child process writes the variable value which contains pubkey to stdout. Second child process reads from stdin via pipe and sends to stdout via pipe to the parent process. The parent process reads from stdin and sends to ssh -L -f- command. It looks like this cat pubkey | ssh -L -f- command in shell interpreter. Next, the parent process looks up some fields and saves into a structure.
adduser function
During user account creation instead of real password is used ! char. According to man 5 shadow:13
13 man 5 shadow
If the password field contains some string that is not a valid result of
crypt(3), for instance ! or *, the user will not be able to use a unix password to log in (but the user may log in the system by other means).
the ! (or *) char means the account doesn’t have a password and no password will allow to access the account. The x char means the password is located in the /etc/shadow file and that’s why the custom NSS module must never create entries in the /etc/shadow and use x char instead of password in the /etc/passwd file.
References
- wh0: The SSH Protocol
 - Teleport: SSH Certificates Security
 - Using certificates for SSH authentication
 - Netburner: Introduction to the SSH Protocol
 - SecureW2: How Does SSH Certificate Authentication Work?
 - NISTIR 7966: Security of Interactive and Automated Access Management Using SSH
 - Cloudflare: Fearless SSH: short-lived certificates bring Zero Trust to infrastructure