This document desribes a «break-glass
» mechanism based on SSH certificate-based authentication with authorization implemented via NSS and PAM modules
Problem Statement
Disaster recovery is a critical part of any infrastructure. On-call or support engineers must have secure access to critical systems in case any disruption. Recovery mechanism must be secure and protected because it implies access to critical systems and data bypassing traditional authentication and authorization process. This mechanism is usually called «break-glass
». It includes special credentials usage in an emergency case when traditional access methods do not work.
Break-glass access refers to a procedure used in critical emergencies or exceptional cases, when a user with insufficient access is granted elevated access rights to bypass normal access controls - SSH Academy 1
In common case companies use SSH protocol and dedicate highest-level account to access to infrastructure in emergency case. This approach brings the following issues:
- all systems must have pre-created share accounts. Those accounts create issues in case of a potential investigation
- after using the «
break-glass
» process password should be changed in order to prevent no unauthorized access - An on-call engineer must have access to password manager where credentials are stored for emergency accounts. In fact without a password manager a company is cut off from its systems. A comprehensive «
break-glass
» solution is required to give the engineers access back to their critical systems when password manager fails.
The most common way of handling SSH authentication is public key authentication. This is much stronger than simply using a password, but it creates a problem of how to securely manage changes to SSH keys over time. So, if ten new people join a company and five
others leave, someone has to add the ten
new keys to each server and remove the previous five. Although, pubkeys partly solve issue related to authentication, but do not solve the limitations described above. Also pubkey(s) adds new challenges and some researches show the it:
Monitoring of the usage of the keys has revealed that typically about 90% of the authorized keys are unused. That is, they are access credentials that were provisioned years ago, the need for which ceased to exist or the person having the private key left, and the authorized key was never deprovisioned. Thus, the access was not terminated when the need for it ceased to exist.
...
In many organizations – even very security-conscious organizations – there are many times more obsolete authorized keys than they have employees. Worse, authorized keys generally grant command-line shell access, which in itself is often considered privileged. We have found that in many organizations about 10% of the authorized keys grant root or administrator access. SSH keys never expire. 2
Historically, most organizations have not touched the location of the authorized keys files. This means they are in each user’s home directory, and each user can configure additional permanent credentials for themselves and their friends. They can also add additional permanent credentials for any service account or root account they are able to log into. This has lead to massive problems in large organizations around managing SSH keys.
AuthorizedKeysFile /etc/ssh/authorized-keys/%u
Enterprises should also pay attention to the AuthorizedKeysCommand and AuthorizedKeysCommandUser options. They are typically used when SELinux is enabled and to fetch SSH keys from LDAP directories or other data sources. Their use can make auditing SSH keys cumbersome and they can be used to hide backdoor keys from casual observation. 3
Although, pubkeys have advantages over password keys are not passwords. There are several significant differences between SSH keys and passwords: 4
- Passwords are related to user accounts. SSH user keys don not have to be
- Passwords usually have expiration times SSH user keys don not
- Passwords cannot be generated without oversight SSH user keys can
- Passwords are mostly used for interactive authentication. SSH keys are can used for machine-to-machine authentication
- Passwords grant access to the operating system level without additional restrictions SSH user keys can control both access and privilege levels
That’s why the way where advantages passwords and pubkeys is needed. SSH supports such way of handling authenticatio via Certificate Authorities
(CAs). Certificates enable to associate credentials with user, use audit, create short-lived identity and use metadata as an extending point for authentication/authorization, etc.
Traditional pubkey(s) have metadata, but it can be changed any users.
Finally, implementations of ephemeral certificates provide the ability to utilize approaches, such as: Keyless
, Zero Trust
, Just-In-Time
for access to remote systems using short-lived identity instead of static keys and passwords.
Specification
Obviously, certificates have more advantages, but certificates and SSH protocol itself have some limitations. SSH protocol and certificates do not solve and do not have to solve user management and authorization issues (e.g. assigning sudo rules). That’s why account must be pre-created together with sudoers files.
In order to understand which solution can help with limitations related to user management and assigning permissions it’s necessary to consider SSH protocol. It is designed as three protocols that typically run-on top of TCP:
SSH Transport Layer Protocol
is responsible for server authentication, confidentiality, integrity and compressionSSH User Authentication Protocol
is responsible for client (user) authentication to the serverSSH Connection Protocol
is responsible for multiplexing the encrypted tunnel into several logical channel
block-beta columns 4 block:SSH_p:5 ssh_auth_p["SSH User Authentication Protocol"] ssh_conn_p["SSH Connection Protocol"] end block:SSH_t:5 ssh_transport_p["SSH Transport Layer Protocol"] end block:TCP:5 tcp["TCP"] end block:IP:5 ip["IP"] end style ip fill:#d4efdf, stroke-width:0px style tcp fill:#fcf3cf, stroke-width:0px style ssh_transport_p fill:#d4e6f1, stroke-width:0px style ssh_conn_p fill:#f5b7b1, stroke-width:0px style ssh_auth_p fill: #d7bde2, stroke-width:0px style SSH_p fill:#ddd, stroke:#000,stroke-width:1px style SSH_t fill:#ddd, stroke:#000,stroke-width:1px style TCP fill:#ddd, stroke:#000,stroke-width:1px style IP fill:#ddd, stroke:#000,stroke-width:1px
The last step in the SSH Transport Layer Protocol
is service request. A client sends an SSH_MSG_SERVICE_REQUEST
to request the SSH User Authentication Protocol
or SSH Connection Protocol
. All the data will be sent protected by encryption and MAC.
According to Authentication Requests
section in the RFC4252: 5
If the requested ‘user name’ does not exist, the server MAY disconnect, or MAY send a bogus list of acceptable authentication ‘method name’ values, but never accept any. This makes it possible for the server to avoid disclosing information on which accounts exist. In any case, if the ‘user name’ does not exist, the authentication request MUST NOT be accepted.
%%{ init: { "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' } } }%% flowchart LR subgraph sshd_p[sshd process] direction TB sshd("sshd") ==> |Look up user|libnss(NSS) subgraph libnss[NSS] direction RL nss{{"libs"}} end end %% subgraph nss_config[NSS config] %% direction TB cfg{{"/etc/nsswitch.conf"}} ==> libnss %% end subgraph sources[Data Sources] nss ==> passwd ==> pwd_src(files<br>systemd) nss ==> group ==> grp_src(files) nss ==> networks ==> net_src(files<br>dns) nss ==> etc ==> etc. end libnss ==> |Response|sshd classDef nss fill:#eeac4d; classDef sshd_p fill:#f6f7fb class libnss nss class sshd_p sshd_p class sources sshd_p
That’s why it’s necessary to consider SSH User Authentication Protocol
in a more detailed way. It performs the following functions:
- Message Types and Formats
- Message Exchange
- Authentication Methods
SSH User Authentication Protocol
phases:
- client sends
SSH_MSG_USERAUTH_REQUEST
message - if username is not valid then server sends either
SSH_MSG_USERAUTH_FAILURE
or authentication method list - client selects one of the methods from the list and again sends the request to the server
- if the server requires more than one authentication method then server sends partial success
- when all required authentication methods succeed the server sends a
SSH_MSG_USERAUTH_SUCCESS
message.
sequenceDiagram participant c as SSH client participant s as SSH server Note over c, s: TCP connection has been established Note over c, s: SSH key exchange has been done Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent c->>s: SSH_MSG_USERAUTH_REQUEST activate s alt is the user invalid s ->> c: SSH_MSG_USERAUTH_FAILURE opt s ->> c: Authentication method list end else the user is valid s ->> c: Authentication method list end deactivate s c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br>Authentication method has been selected activate s alt is additional an authentication method(s) required s ->> c: Partial success Note over c, s: Step(s) related to additional authentication method(s) else s ->>c: SSH_MSG_USERAUTH_SUCCESS end deactivate s
The server may require one or more of the following authentication methods:
- Public key
- Password
- Host-based
sequenceDiagram participant c as SSH client participant CA as Certificate Authority participant s as SSH server c ->> CA: send SSH certificate activate CA Note right of CA: Generate short-lived certificate CA ->> c: Certificate has been generated deactivate CA c ->> s: SSH authentication via certificate activate s Note right of s: Validate certificate by CA s ->> c: SSH authentication has been successful deactivate s
Certificate-based
authentication is an extension of public key authentication where there is CA role for enhancement security. It uses three main components: a private key, a public key, and a certificate signed by the CA.
Certificate-based
authentication phases are:
- client sends
SSH_MSG_USERAUTH_REQUEST
message - username is not valid then server sends either
SSH_MSG_USERAUTH_FAILURE
or authentication method list - client sends SSH certificate signed by a trusted CA to the server
- server makes the following verifications:
- signature on a client certificate based on the public key CA
- validity period certificate
- requested user account (principals)
- if the certificate is valid then server grants access to the client based on the identity
- when all required authentication methods succeed the server sends a
SSH_MSG_USERAUTH_SUCCESS
message
sequenceDiagram participant c as SSH client participant s as SSH server Note over c, s: TCP connection has been established Note over c, s: SSH key exchange has been done Note over c, s: SSH_MSG_SERVICE_REQUEST has been sent Note over c, s: Authentication method has been selected c ->> s: send SSH certificate signed by CA activate s critical validate SSH certificate s-->s: Certificate Authority s-->s: Expiration date s-->s: Principals s-->s: etc. alt is not valid s->>c: SSH_MSG_USERAUTH_FAILURE else is valid s->>c: SSH_MSG_USERAUTH_SUCCESS end end deactivate s
According to Problem Statement
section it’s necessary to pay attention on second and last phases in the certificate-based authentication. So, if username does not exist then ssh server will not continue authentication process. That’s why on this phase it’s necessary to create user, home directory, etc. SSH server must call Name Service Switch
(NSS) which looks up user in different data sources (depends on settings in the /etc/nsswitch.conf
). If NSS returns success then user exists. Thus, SSH server continues authentication process depending on authentication methods (password, pubkey, etc.). All authentication methods depend on NSS answer. SSH server checks settings related to authentication methods (e.g. looks up password in the /etc/shadow
or keys in AuthorizedKeysFile
6). In order to create user on-demand it’s necessary to implement custom NSS module and configure it in the /etc/nsswitch.conf
.
6 man 5 sshd_config
After successful authentication (last authentication phase) the next stage is Session Establishment
. On that stage the client is allowed to access to the server. Session is opened after all Linux Pluggable Authentication
(PAM) verification. In order to configure user’s session it’s necessary to implement custom PAM module and configure it in one of files in the /etc/pam.d
. During performing PAM stage some environment variables will be defined. One of them is SSH_AUTH_INFO_0
.7 It exposes authentication information to PAM module (e.g. pubkey, certificate, etc.). This variable can be used as source for making decisions during authorization process (e.g. assigning sudo group to user).
UsePAM Enables the Pluggable Authentication Module interface. If set to yes this will enable PAM authentication using KbdInteractiveAuthentication and PasswordAuthentication in addition to PAM account and session module processing for all authentication types.
Because PAM keyboard-interactive authentication usually serves an equivalent role to password authentication
, you should disable either PasswordAuthentication
or KbdInteractiveAuthentication
. 8
8 man 5 sshd_config
%%{init: { "flowchart" : { 'curve' : 'stepBefore', 'defaultRenderer': 'elk' } } }%% flowchart LR subgraph sshd_p[sshd process] direction TB sshd("sshd") ======> |If user exist <br>and<br> UsePAM enabled|libpam(PAM) subgraph libnss[NSS] direction RL nss{{"libs"}} end subgraph libpam[PAM] direction RL pam{{"libs"}} end end %% subgraph pamcfg[PAM configs] %% direction TB cfg{{"/etc/pam.d/*"}} ==> libpam %% end subgraph modules[PAM modules] pam ==> account pam ==> authentication pam ==> password pam ==> session end libnss <==> |Request<br>Response|sshd libpam ==> |Response|sshd classDef pam fill:#0f9d58; classDef nss fill:#eeac4d; classDef sshd_p fill:#f6f7fb class libpam pam class libnss nss class sshd_p sshd_p class modules sshd_p
One of the ways to get authentication information during ssh connection it’s possible to use -A
flag. This flag enables forwarding of connections from an authentication agent (ssh-agent) via a socket to a remote host. Path to socket is stored in the SSH_AUTH_SOCK
environment variable. It possible to get an access to the variable on a remote host, but this way has some security issues related to forwarding the socket to all hosts. It’s possible to solve it if user set explicitly a forward socket for each other hosts (e.g. ForwardAgent yes
).
When session is closed PAM module must perform the following actions:
- removing record to the /etc/passwd
- removing home directory
- killing all process related to the user
- etc.
Thus, all users is temporary
sequenceDiagram participant c as SSH client participant s as SSH server participant n as NSS participant p as PAM c->>s: SSH_MSG_USERAUTH_REQUEST activate s s ->> n: Request NSS activate n Note over c,n: According to settings in the /etc/nsswitch.conf NSS look up user the each data source. On this step the custom NSS <br/>module must create a new user and return NSS_STATUS_SUCCESS if username matches the requirement alt does not user exist? n ->> s: NSS_STATUS_NOTFOUND s ->> c: SSH_MSG_USERAUTH_FAILURE opt s ->> c: Authentication method list end else user exists n ->> s: NSS_STATUS_SUCCESS c->>s: SSH_MSG_USERAUTH_REQUEST <br> + <br> Authentication method has been selected deactivate n s->>p: Request PAM activate p Note over s,p: According to settings in files to the /etc/pam.d PAM performs each module. On this step the custom PAM module must check SSH_AUTH_INFO_0, <br/>get pubkey and additional info (e.g. Key ID field) as well as return status if username, pubkey type, etc. matche the requirement. alt is not successful p ->> s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc. s ->> c: SSH_MSG_USERAUTH_FAILURE else successful p ->> s: PAM_SUCCESS end alt is additional authentication method required s ->> c: Partial success s ->> p: . Note over c, p: Step(s) related to additional authentication method(s) p ->> s: . s ->>c: SSH_MSG_USERAUTH_SUCCESS else additional authentication method(s) is not required s ->>c: SSH_MSG_USERAUTH_SUCCESS end deactivate p end deactivate s activate c Note over c, p: Session has been opened c ->>s: Session terminate s ->>p: Session will be closed p ->>p: Some action(s) p ->>s: Sucessfully s->>c: Session has been closed
HLD 9
Naming convention
Key ID
field
Key ID
field usually contains policy name which describes access level on hosts. It makes audit logs more detailed.
Currently, PAM module supports the following format of the field:
resource version:
reserved for future usage. Default: ssh_v1
environment:
reserved for future usage. If the field is not defined It will be set as !
. The !
means that the field does not have value by default.
sudo group:
[admins|users]. Default: users
Not all of the fields are required to be filled but Key ID
minimum format must be defined as ::
. The ::
expands as ssh_v1:!:users
by default.
Minimum requirements
OpenSSH >= 7.6p1
(has been tested on Fedora 41 and OpenSSH 9.8p1
)
1Port 1110
UsePAM yes
Match LocalPort 1110
TrustedUserCAKeys /path/to/ca
AuthenticationMethods publickey
PAMServiceName brkgl2s
Match All
- 1
- Add to /etc/ssh/sshd_config.d/00-break-glass.conf
Known limitations
Custom NSS module:
- each time generates a random
UID/GID
during the account creation process.UID/GID
will be different to two hosts for same username. - requires username to contain postfix (
.brkgl2s
) as an additional restriction for checking service name which calls NSS - supports only
two
sudo groups (for more details please check Naming convention section) - each user is assigned unique
UID/GID
but the group itself related toGID
is not created - changing service name is not supported (option
PAMServiceName
10).
10 man 5 sshd_config
Custom PAM module:
- removes record about the user and home directory after the session is closed
- termination all the process related to the user is not implemented
- only
ed25519
pubkey type is supported - user is created each time when username matches with compliance. If SSH-server sends
SSH_MSG_USERAUTH_FAILURE
(e.g. invalid certificate) for some reason then user record is not deleted
Pitfals
Checking PAM service name
System calls related to NSS which is used in tools, such as: id
, getent
, etc.
will create a record in the users data source each time when user does not exist. In order to avoid the problem it’s necessary to limit PAM services which can use the custom NSS module and if calling PAM service is not ssh then NSS module must return NSS_STATUS_TRYAGAIN
. The nss-devel
does not have any functions for checking PAM service which calls NSS, but NSS modules can get some environment variables by analogy with PAM modules. So, SYSTEMD_EXEC_PID
11 environment variable stores PID
process which calls NSS service. When PID
is known it enables to get process name via /proc/PID/comm
12. Thus, implementation of checking of process name partly solves the problem and enables to use the tools without adding users to a data source. Unlike the nss-devel
in the pam-devel
library is an implemented function for getting a PAM service name.
11 man 5 systemd.exec
12 man 5 proc_pid_comm
sequenceDiagram participant s as SSH server participant n as NSS participant p as PAM activate s activate n s ->> n: Request NSS alt is user not found critical Note over s, n: On this step NSS module gets PID <br/>from SYSTED_EXEC_PID and <br/>looks up process name in /proc/PID/comm option calling process is not ssh n ->> s: NSS_STATUS_TRYAGAIN option username does not contain postfix n ->> s: NSS_STATUS_TRYAGAIN end n ->> n: Create user n ->> s: NSS_STATUS_SUCCESS else user is found critical option calling process is ssh option username does contain postfix n ->> s: NSS_STATUS_SUCCESS end end deactivate n s->>p: Request PAM activate p alt successfully critical check option SSH_AUTH_INFO_0 p ->> p: looks up pubkey option pubkey type option gets Key ID p->> p: Create sudoerr file end p->>s:PAM_SUCCESS else not successfully p->>s: PAM_SESSION_ERR, PAM_AUTH_ERR, etc. end deactivate p deactivate s
Checking username
In fact, the limitation related to postfix in a username is artificial and the postfix can be removed but it can brings to face the following problem:
there is danger that during creating of users at runtime an attacker can attempt to flood waste records to /etc/passwd
. In fact, the postfix in username does not solve the problem if the attacker knowns about it. Also regular authentication process should be different from emergency authentication process. If the processes are united then users who already connected to hosts before the emergency situation will have an opportunity to pass authentication without necessity of creating the new user.
Nowadays the processes were splitted in order to improve management regular and emergency users but It does not guarantees that in the future the limitation may be removed.
My thoughts led me to an idea that ssh port should be opened only during emergency situations on network equipments. In regular time ACLs
on network equipments should restrict an access to ssh port on hosts. Currently, In my opinion the processes must be splitted in order to manage and develop easily.
Similar projects
Details of implementation
get_pubkey_info
function
There is a reason why get_pubkey_info
function was implemented via execvp
and pipe
. The thing is that libraries such as libssh
and libssh2
don’t have the functions which look up the fields inside of pubkey (certificate) and also OpenSSHp1
doesn’t have public API for implementing this function. The function can be implemented via using low level primitives. In the future there are a lot of reasons to refactor the function.
In fact the function was implemented as parent and two child processes with redirect stdin/stdout
via pipe. So, first child process writes the variable value which contains pubkey to stdout
. Second child process reads from stdin
via pipe and sends to stdout
via pipe to the parent process. The parent process reads from stdin
and sends to ssh -L -f-
command. It looks like this cat pubkey | ssh -L -f-
command in shell interpreter. Next, the parent process looks up some fields and saves into a structure.
adduser
function
During user account creation instead of real password is used !
char. According to man 5 shadow
:13
13 man 5 shadow
If the password field contains some string that is not a valid result of
crypt(3)
, for instance ! or *, the user will not be able to use a unix password to log in (but the user may log in the system by other means).
the !
(or *
) char means the account doesn’t have a password and no password will allow to access the account. The x
char means the password is located in the /etc/shadow
file and that’s why the custom NSS module must never create entries in the /etc/shadow
and use x
char instead of password in the /etc/passwd
file.
References
- wh0: The SSH Protocol
- Teleport: SSH Certificates Security
- Using certificates for SSH authentication
- Netburner: Introduction to the SSH Protocol
- SecureW2: How Does SSH Certificate Authentication Work?
- NISTIR 7966: Security of Interactive and Automated Access Management Using SSH
- Cloudflare: Fearless SSH: short-lived certificates bring Zero Trust to infrastructure