Recently I began working on a project to change how we log into our instances in AWS. Like most companies, we have used the default instance user (ec2-user, ubuntu, etc) and a master key to log into a running instance. Unfortunately, this doesn’t provide us with an ability to know what person on the team logged in, only that the default user did. Sharing the same master key creates an issue as well, since somebody leaving the team requires a key rotation across all instances.

I’ve written before on Using Vault for SSH Connections and decided to implement that at my current job. This started out as pretty straightforward, adding the script I use to configure AMIs to my instance user_data instead. This part of my projected worked just fine, but it left me with another problem. I want to be alerted if somebody logs into a production server. Since we use DataDog for all our monitoring and alerting, I decided that I needed to have a way to setup a default datadog configuration that forwards the system logs to DataDog so that DataDog can send an alert anytime somebody logs in to a production instance.

Since we already have ansible roles for configuring both our common instance settings (Vault SSH, motd, etc) and DataDog, I decided to figure out how to run that Ansible via user_data. I quickly learned that to do this I needed to have API credentials for GitHub to pull the Ansible roles I needed, and API credentials for DataDog to configure the agent. Since the credentials are already stored in Vault instance, I decided to configure the AWS Auth backend.

Configuring AWS

On the AWS side, the only think that I needed to do was update my Vault IAM role and add an additional role in each of my accounts.

Updating the vault iam role

For the aws auth backend to work, it needs to have a couple of additional privileges. i updated the iam role that was attached to my vault instance with the following:

{
  "version": "2012-10-17",
  "statement": [
    {
      "effect": "allow",
      "action": [
        "ec2:describeinstances",
        "iam:getinstanceprofile",
        "iam:getuser",
        "iam:getrole"
      ],
      "resource": "*"
    },
    {
      "effect": "allow",
      "action": ["sts:assumerole"],
      "resource": ["arn:aws:iam::111111111111:role/vaultawsauth",
                   "arn:aws:iam::222222222222:role/vaultawsauth"]
    }
  ]
}

The first statement adds some additional permissions that vault uses to validate the instance that is authenticating. the second statement allows vault to use sts to do the same validation in other accounts.

Add iam role in each account

If you are implementing this in a multi-account strategy, you will also need to add a new role in each account that vault can assume to validate instance resources. this role includes the first statement from above that allows vault to validate the instance that is authenticating.

{
  "version": "2012-10-17",
  "statement": [
    {
      "effect": "allow",
      "action": [
        "ec2:describeinstances",
        "iam:getinstanceprofile",
        "iam:getuser",
        "iam:getrole"
      ],
      "resource": "*"
    }
  ]
}

Configuring vault

Once the iam permissions in aws are configured, the next step is to configure vault. enabling the backend is straightforward.

vault auth enable aws

Next, we need to create a policy to limit what the instance has access to. in addition to limiting the instances to only have access to secret paths that it needs, i also only allow read access to the policies. read access requires that somebody know that path of the secret they want to pull.

vault policy write "default-instance-policy" -<<eof
path "acg/globals/data/*" {
	capabilities = ["read"]
}
eof

With the policy created, we can now create the role. i limit the default session time to 15 minutes (and that still may be longer than it needs to be) and usee the bound_* parameters to limit authentication to specific regions in specific acounts. the disallow_reauthentication is set to true so that the instance can only authenticate once. this allows the instance to authenticate to vault to run its user_data script and then never be able to authenticate again.

vault write \
  auth/aws/role/default-instance-policy \
  auth_type=ec2 \
  token_policies=default-instance-policy \
  token_max_ttl=900 \                                # sets the default session to 15m
  bound_account_ids="111111111111","222222222222" \  # requires that the requests come from specific accounts
  bound_regions="us-east-2,us-west-2" \              # only allows requests from specific regions
  disallow_reauthentication=true                      # only allows one time authentication

The last step is to enable cross account access by telling the vault aws auth backend which aws role to use.

vault write auth/aws/config/sts/111111111111 \
  sts_role=arn:aws:iam::111111111111:role/vaultawsauth
vault write auth/aws/config/sts/222222222222 \
  sts_role=arn:aws:iam::222222222222:role/vaultawsauth

Configuring user_data

With vault configured to allow ec2 instances to authenticate and pull secrets, we now need to write a user_data script to take advantage of the access. i don’t want to install vault on every node, so i use curl to access the vault server. to be able to parse the vault responses i install and use jq, so every user_data script will start the same way.

#!/bin/bash

## prepare the node
yum install -y jq

## get vault token
vault_addr="https://vault.example.com"
vault_token=$(curl -x post "$vault_addr/v1/auth/aws/login" -d '{"role":"default-instance-role","pkcs7":"'$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n')'"}'|jq -r .auth.client_token)

Now that I can retrieve secrets from Vault, I can set my GIT_TOKEN and DATADOG_TOKEN. With those tokens set, I can setup my .netrc file so that I can checkout my playbooks and roles from my private GitHub repositories and pass secrets to my Ansible.

#!/bin/bash

## Prepare the node
yum install -y python3 python3-pip python3-libselinux python3-devel git jq

## Get Vault Token
VAULT_ADDR="https://vault.example.com"
VAULT_TOKEN=$(curl -X POST "$VAULT_ADDR/v1/auth/aws/login" -d '{"role":"default-instance-role","pkcs7":"'$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n')'"}'|jq -r .auth.client_token)

## Setup Github Credentials
GIT_TOKEN=$(curl -s --header "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/acg/globals/data/ghe|jq -r .data.data.api_key)
echo "machine github.com login vaultbot password $GIT_TOKEN" > ~/.netrc
chmod 600 ~/.netrc

## Get Datadog Credentials
DATADOG_TOKEN=$(curl -s --header "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/acg/globals/data/datadog|jq -r .data.data.api_key)

## Run Ansible
/usr/bin/python3 -m venv ansible
. ./ansible/bin/activate
pip install ansible
git clone https://github.com/acg/ansible-playbook-template
cd ansible-playbook-template
ansible-galaxy install -r requirements.yml
ansible-playbook -i inventory -e datadog_api_key=$DATADOG_TOKEN playbook.yml

## Cleanup
rm -rf ~/.netrc
unset VAULT_ADDR
unset VAULT_TOKEN
unset GIT_TOKEN
unset DATADOG_TOKEN

While this will help me solve some problems with the few appliances that I run, I can also see this being very helpful for using with autoscaling groups as well. It will reduce the need to have pre-built AMIs for every single application we have in an autoscaling group.

References

AWS Auth Method