RFD: 0001 – Evgenii Frikin’s Blog

Project Logo

This document desribes a «break-glass» mechanism based on SSH certificate-based authentication with authorization implemented via NSS and PAM modules

Problem Statement

Disaster recovery is a critical part of any infrastructure. On-call or support engineers must have secure access to critical systems in case any disruption. Recovery mechanism must be secure and protected because it implies access to critical systems and data bypassing traditional authentication and authorization process. This mechanism is usually called «break-glass». It includes special credentials usage in an emergency case when traditional access methods do not work.

Break-glass access refers to a procedure used in critical emergencies or exceptional cases, when a user with insufficient access is granted elevated access rights to bypass normal access controls - SSH Academy ¹

¹ What is Break-Glass Access?

In common case companies use SSH protocol and dedicate highest-level account to access to infrastructure in emergency case. This approach brings the following issues:

all systems must have pre-created share accounts. Those accounts create issues in case of a potential investigation
after using the «break-glass» process password should be changed in order to prevent no unauthorized access
An on-call engineer must have access to password manager where credentials are stored for emergency accounts. In fact without a password manager a company is cut off from its systems. A comprehensive «break-glass» solution is required to give the engineers access back to their critical systems when password manager fails.

The most common way of handling SSH authentication is public key authentication. This is much stronger than simply using a password, but it creates a problem of how to securely manage changes to SSH keys over time. So, if ten new people join a company and five others leave, someone has to add the ten new keys to each server and remove the previous five. Although, pubkeys partly solve issue related to authentication, but do not solve the limitations described above. Also pubkey(s) adds new challenges and some researches show the it:

Monitoring of the usage of the keys has revealed that typically about 90% of the authorized keys are unused. That is, they are access credentials that were provisioned years ago, the need for which ceased to exist or the person having the private key left, and the authorized key was never deprovisioned. Thus, the access was not terminated when the need for it ceased to exist.
...
In many organizations – even very security-conscious organizations – there are many times more obsolete authorized keys than they have employees. Worse, authorized keys generally grant command-line shell access, which in itself is often considered privileged. We have found that in many organizations about 10% of the authorized keys grant root or administrator access. SSH keys never expire. ²

² Challenges in Managing SSH Keys – and a Call for Solutions

Historically, most organizations have not touched the location of the authorized keys files. This means they are in each user’s home directory, and each user can configure additional permanent credentials for themselves and their friends. They can also add additional permanent credentials for any service account or root account they are able to log into. This has lead to massive problems in large organizations around managing SSH keys.

AuthorizedKeysFile /etc/ssh/authorized-keys/%u

Enterprises should also pay attention to the AuthorizedKeysCommand and AuthorizedKeysCommandUser options. They are typically used when SELinux is enabled and to fetch SSH keys from LDAP directories or other data sources. Their use can make auditing SSH keys cumbersome and they can be used to hide backdoor keys from casual observation. ³

³ AuthorizedKeysFile location

Although, pubkeys have advantages over password keys are not passwords. There are several significant differences between SSH keys and passwords: ⁴

⁴ SSH Key Management Compass - 9 Ways To Manage Your Authentication Keys

Passwords are related to user accounts. SSH user keys don not have to be
Passwords usually have expiration times SSH user keys don not
Passwords cannot be generated without oversight SSH user keys can
Passwords are mostly used for interactive authentication. SSH keys are can used for machine-to-machine authentication
Passwords grant access to the operating system level without additional restrictions SSH user keys can control both access and privilege levels

That’s why the way where advantages passwords and pubkeys is needed. SSH supports such way of handling authenticatio via Certificate Authorities (CAs). Certificates enable to associate credentials with user, use audit, create short-lived identity and use metadata as an extending point for authentication/authorization, etc.

Note

Traditional pubkey(s) have metadata, but it can be changed any users.

Finally, implementations of ephemeral certificates provide the ability to utilize approaches, such as: Keyless, Zero Trust, Just-In-Time for access to remote systems using short-lived identity instead of static keys and passwords.

Specification

Obviously, certificates have more advantages, but certificates and SSH protocol itself have some limitations. SSH protocol and certificates do not solve and do not have to solve user management and authorization issues (e.g. assigning sudo rules). That’s why account must be pre-created together with sudoers files.

In order to understand which solution can help with limitations related to user management and assigning permissions it’s necessary to consider SSH protocol. It is designed as three protocols that typically run-on top of TCP:

SSH Transport Layer Protocol is responsible for server authentication, confidentiality, integrity and compression
SSH User Authentication Protocol is responsible for client (user) authentication to the server
SSH Connection Protocol is responsible for multiplexing the encrypted tunnel into several logical channel

block-beta
    columns 4

    block:SSH_p:5
        ssh_auth_p["SSH User Authentication Protocol"]
        ssh_conn_p["SSH Connection Protocol"]
    end

    block:SSH_t:5
        ssh_transport_p["SSH Transport Layer Protocol"]
    end

    block:TCP:5
        tcp["TCP"]
    end

    block:IP:5
        ip["IP"]
    end
    style ip fill:#d4efdf, stroke-width:0px
    style tcp fill:#fcf3cf, stroke-width:0px
    style ssh_transport_p fill:#d4e6f1, stroke-width:0px
    style ssh_conn_p fill:#f5b7b1, stroke-width:0px
    style ssh_auth_p fill: #d7bde2, stroke-width:0px
    style SSH_p fill:#ddd, stroke:#000,stroke-width:1px
    style SSH_t fill:#ddd, stroke:#000,stroke-width:1px
    style TCP fill:#ddd, stroke:#000,stroke-width:1px
    style IP fill:#ddd, stroke:#000,stroke-width:1px

Figure 1: SSH stack protocol

The last step in the SSH Transport Layer Protocol is service request. A client sends an SSH_MSG_SERVICE_REQUEST to request the SSH User Authentication Protocol or SSH Connection Protocol. All the data will be sent protected by encryption and MAC.

According to Authentication Requests section in the RFC4252: ⁵

⁵ RFC4252: Authentication Requests

If the requested ‘user name’ does not exist, the server MAY disconnect, or MAY send a bogus list of acceptable authentication ‘method name’ values, but never accept any. This makes it possible for the server to avoid disclosing information on which accounts exist. In any case, if the ‘user name’ does not exist, the authentication request MUST NOT be accepted.

%%{
  init: {
    "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' }
  }
}%%
flowchart LR
    subgraph sshd_p[sshd process]
        direction TB
        sshd("sshd") ==> |Look up user|libnss(NSS)
        subgraph libnss[NSS]
            direction RL
            nss{{"libs"}}
        end
    end
    %% subgraph nss_config[NSS config]
        %% direction TB
        cfg{{"/etc/nsswitch.conf"}} ==> libnss
    %% end
    subgraph sources[Data Sources]
        nss ==> passwd ==> pwd_src(files<br>systemd)
        nss ==> group ==> grp_src(files)
        nss ==> networks ==> net_src(files<br>dns)
        nss ==> etc ==> etc.
    end
    libnss ==> |Response|sshd

    classDef nss fill:#eeac4d;
    classDef sshd_p fill:#f6f7fb
    class libnss nss
    class sshd_p sshd_p
    class sources sshd_p

Figure 2: Conversation between sshd and NSS

That’s why it’s necessary to consider SSH User Authentication Protocol in a more detailed way. It performs the following functions:

Message Types and Formats
Message Exchange
Authentication Methods

SSH User Authentication Protocol phases:

client sends SSH_MSG_USERAUTH_REQUEST message
if username is not valid then server sends either SSH_MSG_USERAUTH_FAILURE or authentication method list
client selects one of the methods from the list and again sends the request to the server
if the server requires more than one authentication method then server sends partial success
when all required authentication methods succeed the server sends a SSH_MSG_USERAUTH_SUCCESS message.

sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    Note over c, s: TCP connection has been established
    Note over c, s: SSH key exchange has been done
    Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent
    c->>s: SSH_MSG_USERAUTH_REQUEST
    activate s
    alt is the user invalid
        s ->> c: SSH_MSG_USERAUTH_FAILURE
        opt
            s ->> c: Authentication method list
        end
    else the user is valid
        s ->> c: Authentication method list
    end
    deactivate s
    c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br>Authentication method has been selected
    activate s
    alt is additional an authentication method(s) required
        s ->> c: Partial success
        Note over c, s: Step(s) related to additional authentication method(s)
    else
        s ->>c: SSH_MSG_USERAUTH_SUCCESS
    end
    deactivate s

Figure 3: SSH User Authentication protocol

The server may require one or more of the following authentication methods:

Public key
Password
Host-based

sequenceDiagram
    participant c as SSH client
    participant CA as Certificate Authority
    participant s as SSH server
    c ->> CA: send SSH certificate
    activate CA
        Note right of CA: Generate short-lived certificate
        CA ->> c: Certificate has been generated
    deactivate CA
    c ->> s: SSH authentication via certificate
    activate s
        Note right of s: Validate certificate by CA
        s ->> c: SSH authentication has been successful
    deactivate s

Figure 4: SSH Authentication by Certificate

Certificate-based authentication is an extension of public key authentication where there is CA role for enhancement security. It uses three main components: a private key, a public key, and a certificate signed by the CA.

Certificate-based authentication phases are:

client sends SSH_MSG_USERAUTH_REQUEST message
username is not valid then server sends either SSH_MSG_USERAUTH_FAILURE or authentication method list
client sends SSH certificate signed by a trusted CA to the server
server makes the following verifications:
- signature on a client certificate based on the public key CA
- validity period certificate
- requested user account (principals)
if the certificate is valid then server grants access to the client based on the identity
when all required authentication methods succeed the server sends a SSH_MSG_USERAUTH_SUCCESS message

sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    Note over c, s: TCP connection has been established
    Note over c, s: SSH key exchange has been done
    Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent
    Note over c, s: Authentication method has been selected
    c ->> s: send SSH certificate signed by CA
    activate s
    critical validate SSH certificate
        s-->s: Certificate Authority
        s-->s: Expiration date
        s-->s: Principals
        s-->s: etc.
            alt is not valid
                s->>c: SSH_MSG_USERAUTH_FAILURE
            else is valid
        s->>c: SSH_MSG_USERAUTH_SUCCESS
    end
    end
    deactivate s

Figure 5: SSH Authentication by Certificate

According to Problem Statement section it’s necessary to pay attention on second and last phases in the certificate-based authentication. So, if username does not exist then ssh server will not continue authentication process. That’s why on this phase it’s necessary to create user, home directory, etc. SSH server must call Name Service Switch (NSS) which looks up user in different data sources (depends on settings in the /etc/nsswitch.conf). If NSS returns success then user exists. Thus, SSH server continues authentication process depending on authentication methods (password, pubkey, etc.). All authentication methods depend on NSS answer. SSH server checks settings related to authentication methods (e.g. looks up password in the /etc/shadow or keys in AuthorizedKeysFile ⁶). In order to create user on-demand it’s necessary to implement custom NSS module and configure it in the /etc/nsswitch.conf.

⁶ man 5 sshd_config

⁷ openssh-portable/auth-pam.c

After successful authentication (last authentication phase) the next stage is Session Establishment. On that stage the client is allowed to access to the server. Session is opened after all Linux Pluggable Authentication (PAM) verification. In order to configure user’s session it’s necessary to implement custom PAM module and configure it in one of files in the /etc/pam.d. During performing PAM stage some environment variables will be defined. One of them is SSH_AUTH_INFO_0.⁷ It exposes authentication information to PAM module (e.g. pubkey, certificate, etc.). This variable can be used as source for making decisions during authorization process (e.g. assigning sudo group to user).

Important

UsePAM Enables the Pluggable Authentication Module interface. If set to yes this will enable PAM authentication using KbdInteractiveAuthentication and PasswordAuthentication in addition to PAM account and session module processing for all authentication types.

Because PAM keyboard-interactive authentication usually serves an equivalent role to password authentication, you should disable either PasswordAuthentication or KbdInteractiveAuthentication. ⁸

⁸ man 5 sshd_config

%%{init: {
    "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' }
  }
}%%
flowchart LR
    subgraph sshd_p[sshd process]
        direction TB
        sshd("sshd") ======> |If user exist <br>and<br> UsePAM enabled|libpam(PAM)
        subgraph libnss[NSS]
            direction RL
            nss{{"libs"}}
        end
        subgraph libpam[PAM]
            direction RL
            pam{{"libs"}}
        end
    end
    %% subgraph pamcfg[PAM configs]
        %% direction TB
        cfg{{"/etc/pam.d/*"}} ==> libpam
    %% end

    subgraph modules[PAM modules]
        pam ==> account
        pam ==> authentication
        pam ==> password
        pam ==> session
    end
    libnss <==> |Request<br>Response|sshd
    libpam ==> |Response|sshd

    classDef pam fill:#0f9d58;
    classDef nss fill:#eeac4d;
    classDef sshd_p fill:#f6f7fb
    class libpam pam
    class libnss nss
    class sshd_p sshd_p
    class modules sshd_p

Figure 6: Conversation between sshd, NSS and PAM

Alternative way

One of the ways to get authentication information during ssh connection it’s possible to use -A flag. This flag enables forwarding of connections from an authentication agent (ssh-agent) via a socket to a remote host. Path to socket is stored in the SSH_AUTH_SOCK environment variable. It possible to get an access to the variable on a remote host, but this way has some security issues related to forwarding the socket to all hosts. It’s possible to solve it if user set explicitly a forward socket for each other hosts (e.g. ForwardAgent yes).

When session is closed PAM module must perform the following actions:

removing record to the /etc/passwd
removing home directory
killing all process related to the user
etc.

Thus, all users is temporary

sequenceDiagram
    participant c as SSH client
    participant s as SSH server
    participant n as NSS
    participant p as PAM
    c->>s: SSH_MSG_USERAUTH_REQUEST
    activate s
    s ->> n: Request NSS
    activate n
    Note over c,n: According to settings in the /etc/nsswitch.conf NSS look up user the each data source. On this step the custom NSS <br/>module  must create a new user and return NSS_STATUS_SUCCESS if username matches the requirement
    alt does not user exist?
        n ->> s: NSS_STATUS_NOTFOUND
        s ->> c: SSH_MSG_USERAUTH_FAILURE
        opt
            s ->> c: Authentication method list
        end
    else user exists
        n ->> s: NSS_STATUS_SUCCESS
        c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br> Authentication method has been selected
        deactivate n
        s->>p: Request PAM
        activate p
        Note over s,p: According to settings in files to the /etc/pam.d PAM performs each module. On this step the custom PAM module must check SSH_AUTH_INFO_0, <br/>get pubkey and additional info (e.g. Key ID field) as well as return status if username, pubkey type, etc. matche the requirement.
            alt is not successful
                p ->> s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc.
                s ->> c: SSH_MSG_USERAUTH_FAILURE
            else successful
                p ->> s: PAM_SUCCESS
            end
        alt is additional authentication method required
            s ->> c: Partial success
            s ->> p: .
            Note over c, p: Step(s) related to additional authentication method(s)
            p ->> s: .
            s ->>c: SSH_MSG_USERAUTH_SUCCESS
        else additional authentication method(s) is not required
            s ->>c: SSH_MSG_USERAUTH_SUCCESS
        end
        deactivate p
    end
    deactivate s
    activate c
     Note over c, p: Session has been opened
    c ->>s: Session terminate
    s ->>p: Session will be closed
    p ->>p: Some action(s)
    p ->>s: Sucessfully
    s->>c: Session has been closed

Figure 7: Using NSS and PAM by SSH server

HLD ⁹

High Level Design

⁹ Fortanix Data Security Manager

Naming convention

`Key ID` field

Key ID field usually contains policy name which describes access level on hosts. It makes audit logs more detailed.

Currently, PAM module supports the following format of the field:

resource version: reserved for future usage. Default: ssh_v1

environment: reserved for future usage. If the field is not defined It will be set as !. The ! means that the field does not have value by default.

sudo group: [admins|users]. Default: users

Note

Not all of the fields are required to be filled but Key ID minimum format must be defined as ::. The :: expands as ssh_v1:!:users by default.

Minimum requirements

OpenSSH >= 7.6p1 (has been tested on Fedora 41 and OpenSSH 9.8p1)

1Port 1110
UsePAM yes
Match LocalPort 1110
       TrustedUserCAKeys /path/to/ca
       AuthenticationMethods publickey
       PAMServiceName brkgl2s
Match All

1: Add to /etc/ssh/sshd_config.d/00-break-glass.conf

Known limitations

Custom NSS module:

each time generates a random UID/GID during the account creation process. UID/GID will be different to two hosts for same username.
requires username to contain postfix (.brkgl2s) as an additional restriction for checking service name which calls NSS
supports only two sudo groups (for more details please check Naming convention section)
each user is assigned unique UID/GID but the group itself related to GID is not created
changing service name is not supported (option PAMServiceName¹⁰).

¹⁰ man 5 sshd_config

Custom PAM module:

removes record about the user and home directory after the session is closed
termination all the process related to the user is not implemented
only ed25519 pubkey type is supported
user is created each time when username matches with compliance. If SSH-server sends SSH_MSG_USERAUTH_FAILURE (e.g. invalid certificate) for some reason then user record is not deleted

Pitfals

Checking PAM service name

System calls related to NSS which is used in tools, such as: id, getent, etc. will create a record in the users data source each time when user does not exist. In order to avoid the problem it’s necessary to limit PAM services which can use the custom NSS module and if calling PAM service is not ssh then NSS module must return NSS_STATUS_TRYAGAIN. The nss-devel does not have any functions for checking PAM service which calls NSS, but NSS modules can get some environment variables by analogy with PAM modules. So, SYSTEMD_EXEC_PID¹¹ environment variable stores PID process which calls NSS service. When PID is known it enables to get process name via /proc/PID/comm¹². Thus, implementation of checking of process name partly solves the problem and enables to use the tools without adding users to a data source. Unlike the nss-devel in the pam-devel library is an implemented function for getting a PAM service name.

¹¹ man 5 systemd.exec

¹² man 5 proc_pid_comm

sequenceDiagram
    participant s as SSH server
    participant n as NSS
    participant p as PAM
    activate s
    activate n
    s ->> n: Request NSS
    alt is user not found
        critical
            Note over s, n: On this step NSS module gets PID <br/>from SYSTED_EXEC_PID and <br/>looks up process name in /proc/PID/comm
            option calling process is not ssh
                n ->> s: NSS_STATUS_TRYAGAIN
            option username does not contain postfix
                n ->> s: NSS_STATUS_TRYAGAIN
        end
        n ->> n: Create user
        n ->> s: NSS_STATUS_SUCCESS
    else user is found
        critical
            option calling process is ssh
            option username does contain postfix
                n ->> s: NSS_STATUS_SUCCESS
        end
    end
    deactivate n
    s->>p: Request PAM
    activate p
    alt successfully
        critical check
        option SSH_AUTH_INFO_0
            p ->> p: looks up pubkey
        option pubkey type
        option gets Key ID
            p->> p: Create sudoerr file
        end
        p->>s:PAM_SUCCESS
    else not successfully
        p->>s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc.
    end
    deactivate p
    deactivate s

Figure 8: Using custom NSS and PAM by SSH server

Checking username

In fact, the limitation related to postfix in a username is artificial and the postfix can be removed but it can brings to face the following problem:

there is danger that during creating of users at runtime an attacker can attempt to flood waste records to /etc/passwd. In fact, the postfix in username does not solve the problem if the attacker knowns about it. Also regular authentication process should be different from emergency authentication process. If the processes are united then users who already connected to hosts before the emergency situation will have an opportunity to pass authentication without necessity of creating the new user.

Nowadays the processes were splitted in order to improve management regular and emergency users but It does not guarantees that in the future the limitation may be removed.

@efrikin

My thoughts led me to an idea that ssh port should be opened only during emergency situations on network equipments. In regular time ACLs on network equipments should restrict an access to ssh port on hosts. Currently, In my opinion the processes must be splitted in order to manage and develop easily.

Similar projects

Details of implementation

`get_pubkey_info` function

There is a reason why get_pubkey_info function was implemented via execvp and pipe. The thing is that libraries such as libssh and libssh2 don’t have the functions which look up the fields inside of pubkey (certificate) and also OpenSSHp1 doesn’t have public API for implementing this function. The function can be implemented via using low level primitives. In the future there are a lot of reasons to refactor the function.

In fact the function was implemented as parent and two child processes with redirect stdin/stdout via pipe. So, first child process writes the variable value which contains pubkey to stdout. Second child process reads from stdin via pipe and sends to stdout via pipe to the parent process. The parent process reads from stdin and sends to ssh -L -f- command. It looks like this cat pubkey | ssh -L -f- command in shell interpreter. Next, the parent process looks up some fields and saves into a structure.

`adduser` function

During user account creation instead of real password is used ! char. According to man 5 shadow:¹³

¹³ man 5 shadow

If the password field contains some string that is not a valid result of crypt(3), for instance ! or *, the user will not be able to use a unix password to log in (but the user may log in the system by other means).

the ! (or *) char means the account doesn’t have a password and no password will allow to access the account. The x char means the password is located in the /etc/shadow file and that’s why the custom NSS module must never create entries in the /etc/shadow and use x char instead of password in the /etc/passwd file.

Problem Statement

Specification

HLD 9

Naming convention

Key ID field

Minimum requirements

Known limitations

Pitfals

Checking PAM service name

Checking username

Similar projects

Details of implementation

get_pubkey_info function

adduser function

References

HLD ⁹

`Key ID` field

`get_pubkey_info` function

`adduser` function