🔐Protecting Secrets
Secrets. You've got them, we've got them, everybody's got them. Keeping secrets secret isn't so hard - until someone or something else needs access to them, then things get interesting. Since you want to use secrets in your automation, things are definitely interesting.
In this post, we'll describe how we keep both your secrets and our secrets safe.


First, a brief introduction to our secrets-relevant infrastructure.
We run our infrastructure on AWS using a group of ECS clusters within the same VPC in us-east-2. The clusters are spread across 3 availability zones for redundancy. We provide public access to our service via an Application Load Balancer (ALB). The ALB only talks to our web applications. There is no public access to any of the Vaults in our system.
As you may have noticed in the diagram above, we base our secrets management solution on HashiCorp Vault. Vault is an excellent secret store with lots of additional functionality, such as certificate management, one-time passwords and a bunch of other secrets engines. However, Vault is not particularly easy to use or manage in a production environment at any meaningful scale. To help improve the situation, we put a lot of effort into enhancing the usability and manageability of Vault for our service.

Brief Introduction to Vault

Hashicorp Vault always writes its data to disk fully encrypted. Plaintext data is only ever present in memory. Furthermore, the memory used by Vault is mlocked (mlock is a Linux kernel feature that "locks" a region of memory, ensuring that it is never swapped out to disk).
For the purposes of this document, we can consider the process of encrypting and writing the data to disk as sealing and the process of reading and decrypting the data as unsealing. The actual process Vault uses is more complex and uses multiple levels of encryption keys, but we've simplified the description here so we can communicate the most important aspects of our Vault use. You can read about it in more detail here: https://www.vaultproject.io/docs/concepts/seal.
The key takeaways are:
  1. 1.
    Even Vault cannot read the secrets on disk until the vault has been unsealed.
  2. 2.
    Plaintext secrets are never written to disk.
Unsealing decrypts the data for Vault's use and only Vault's use. It does not make the data generally available. Three things need to occur to actually gain access to data in Vault:
  1. 1.
    The vault data written to disk must be decrypted in memory (ie - unsealed).
  2. 2.
    The requesting user must successfully authenticate to vault.
  3. 3.
    The user must have permission to access the requested secret.
Vault supports many authentication mechanisms, but we only use two:
  1. 1.
    A JWT from our Auth0 tenant.
  2. 2.
    A token generated by Vault.
Once a session has successfully authenticated, Vault will assign a group of policies to that session. The policies determine what secrets that session has access to.

CloudTruth Secrets

We have a bunch of secrets that we need to manage in order to host and deploy our main application. Since we have invested so much effort into our Vault-based solution, we use it for our own secrets. This allows our team to test additional Vault use-cases before exposing them to our customers. We will not rollout new secret management functionality until we are comfortable using the solution ourselves. Therefore, you will see a handful of use cases below that we have not yet opened up to customers.

What kind of parameters does CloudTruth need to keep secret?

Here are some examples:
  1. 1.
    New customer vault root tokens (used during deploy of a new customer org).
    1. 1.
      The freshly created root token is deleted once the deploy is complete.
    2. 2.
      Root tokens are dangerous and are therefore disposed of ASAP.
  2. 2.
    Encryption as a service for customer API keys.
  3. 3.
    Transit encryption engine for customer vault unseal keys.
  4. 4.
    PKI engine for management of internal certificates (Used to secure all communication with TLS).

Internal Vault Availability

Our internal vault is considered critical infrastructure. Our service will not run without it. Therefore, we run our internal vault as a cluster of 3 nodes spread across 3 availability zones within a separate ECS cluster. The storage backing our internal vault is S3 with versioning enabled.

User Secrets (Org Vaults)

Each customer organization gets its own instance of Vault. This instance is never shared or visible to any other organization. We run a customized autounseal process which does not require user interaction in order to unseal org vaults. It is important to note that this autounseal process only unlocks the secrets to the Vault process itself. Autounseal does not grant CloudTruth access to your secrets.
We based our autounseal process on the transit engine autounseal design pattern from HashiCorp. Please see the following tutorial for additional details: https://learn.hashicorp.com/tutorials/vault/autounseal-transit​

Org Vault Unsealing

The unseal keys are unique per customer Vault instance. Our internal vault instance provides the transit autounseal function for all org vaults. When we setup an org vault, we create a new transit unseal key specific to that vault. We also create a vault token that allows access to the unseal key, referred to as the autounseal token below. This autounseal token is stored in the AWS SSM parameter store and is restricted to that org vault instance using a customized IAM policy and workflow. This IAM policy allows a specific org vault and only that org vault to read the vault autounseal token in the SSM parameter store.
When an org vault boots up, it uses the vault autounseal token to authenticate to our internal vault and retrieve its unseal key. It then uses the unseal key to decrypt its data and load it into mlocked memory.

Org Vault Secret Access

Access to secrets in your org vault requires a JWT from CloudTruth's Auth0 tenant. This JWT is signed by a private key unique to CloudTruth's Auth0 tenant. It contains CloudTruth-specific claims. Each org Vault is seeded with the corresponding public key at deploy time, which allows it to validate the signature on the JWT before checking the claims in the JWT. The organization claim in the JWT must match the organization that owns the vault or the connection will be rejected.
If everything is good at this point, the user is considered authenticated and the policy for that user is associated with the session. This allows the user access to organization secrets as specified in the policy.

Org Vault Availability

We currently run org vaults as supervised single instances in a dedicated ECS cluster. Org vaults are run in this manner to help manage costs. The supervisor ensures each org vault has at least one running instance. If it detects a failure or if the instance a vault is running on is being reclaimed by AWS, it will start another vault instance on a different instance - generally within 90 seconds. We plan to offer highly-available vault instances to our customers in the future.
Last modified 2mo ago