Skip to main content

Hashicorp Nomad in Production

Koh Rong, Cambodia

Installation

Download a pre-compiled binary (Check latest Binary Version here) and run it on your machine locally. You can also verify the binary using the available SHA-256 sums:

wget https://releases.hashicorp.com/nomad/1.1.6/nomad_1.1.6_linux_amd64.zip
wget https://releases.hashicorp.com/nomad/1.1.6/nomad_1.1.6_SHA256SUMS

The SHA256SUMS shows me the corresponding check sum for this file:

93f287758a464930e35cd1866167f05a3a6a48af2b0e010dfc0fbc914ae2f830  nomad_1.1.6_linux_amd64.zip

The following command has to give you the same sum - if you downloaded the correct version of the file:

sha256sum nomad_1.1.6_linux_amd64.zip
93f287758a464930e35cd1866167f05a3a6a48af2b0e010dfc0fbc914ae2f830 nomad_1.1.6_linux_amd64.zip

Now that we know that the zip container has not been tempered with we can unzip it to a place that is in our system PATH:

echo $PATH

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
unzip ./nomad_1.1.6_linux_amd64.zip
rm ./nomad_1.1.6_linux_amd64.zip
mv nomad /usr/bin/nomad

Verify that it is working:

nomad -v
Nomad v1.1.6 (b83d623fb5ff475d5e40df21e9e7a61834071078)

Configuration

Prepare configuration and data directories:

mkdir /etc/nomad.d
mkdir -p /opt/nomad/data

Add a default configuration file:

nano /etc/nomad.d/nomad.hcl

I added an advertise block specifies the advertise address for individual network services. Both the RPC and the serf interface will be accessible for the other Nomad clients - so they are bound to the WAN IP of your server. But the HTTP interface (web user interface) I only want to be able to use via SSH on localhost:

Note: Binding http to localhost makes the Nomad Web UI unavailable but also closes the HTTP REST API for clients. You will no longer be able to send GET or POST request to your server.

  • http - The address to advertise for the HTTP interface. This should be reachable by all the nodes from which end users are going to use the Nomad CLI tools.

  • rpc - The address advertised to Nomad client nodes. This allows advertising a different RPC address than is used by Nomad Servers such that the clients can connect to the Nomad servers if they are behind a NAT.

  • serf - The address advertised for the gossip layer. This address must be reachable from all server nodes. It is not required that clients can reach this address. Nomad servers will communicate to each other over RPC using the advertised Serf IP and advertised RPC Port.

## https://www.nomadproject.io/docs/agent/configuration
name = "my_server_name"
datacenter = "my_data_center"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

advertise {
# Defaults to the first private IP address.
http = "127.0.0.1"
rpc = "1.2.3.4"
serf = "1.2.3.4"
}

ports {
# Change the default ports below
http = 4646
rpc = 4647
serf = 4648
}

server {
enabled = true
bootstrap_expect = 1
}

client {
enabled = true
servers = ["127.0.0.1:4647"]
}

## Connect to Consul service if available
## consul {
## address = "1.2.3.4:8500"
## }

## https://www.nomadproject.io/docs/agent/configuration/index.html#log_level
## [WARN|INFO|DEBUG]
log_level = "INFO"
log_rotate_duration = "30d"
log_rotate_max_files = 12

We can test-run this Master/Minion configuration with:

nomad agent -config /etc/nomad.d -bind 1.2.3.4

==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from /etc/nomad.d/nomad.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 1.2.3.4:9020; Serf: 1.2.3.4:9408
Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:9020; Serf: 0.0.0.0:9408
Client: true
Log Level: INFO
Region: global (DC: mydatacenter)
Server: true
Version: 1.1.6

==> Nomad agent started! Log data will stream in below:

nomad.raft: failed to make requestVote RPC: target="{Voter IP-ADDRESS:4647 IP-ADDRESS:4647}" error="dial tcp IP-ADDRESS:4647: connect: connection refused" I started the server with default ports, changed the ports and restarted. But now I started seeing the error message above - connection refused on default port. Clear the data dir rm -rf /opt/nomad/data/* to get rid of it.

To test the Nomad UI just tunnel the user interface through SSH onto your local server:

ssh myuser@my-server-ip -p ssh-port -L4646:localhost:4646

Check the UI on port 4646 - http://localhost:4646/ui/jobs.

Run as a Service

Kill the manually started instance of Nomad and add a SystemD service for Nomad:

nano /lib/systemd/system/nomad.service
[Unit]
Description=Nomad
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target

# When using Nomad with Consul it is not necessary to start Consul first. These
# lines start Consul before Nomad as an optimization to avoid Nomad logging
# that Consul is unavailable at startup.
#Wants=consul.service
#After=consul.service

[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2

## Configure unit start rate limiting. Units which are started more than
## *burst* times within an *interval* time span are not permitted to start any
## more. Use `StartLimitIntervalSec` or `StartLimitInterval` (depending on
## systemd version) to configure the checking interval and `StartLimitBurst`
## to configure how many starts per interval are allowed. The values in the
## commented lines are defaults.

# StartLimitBurst = 5

## StartLimitIntervalSec is used for systemd versions >= 230
# StartLimitIntervalSec = 10s

## StartLimitInterval is used for systemd versions < 230
# StartLimitInterval = 10s

TasksMax=infinity
OOMScoreAdjust=-1000

[Install]
WantedBy=multi-user.target

And enable the service:

systemctl enable --now nomad
systemctl status nomad

Access the UI again to verify that everything is working.

Clusterize

To add clients to our server we now need to open the serf and rpc ports of our server:

ufw

ufw allow 4647/tcp
ufw allow 4648
ufw reload
ufw status verbose

FirewallD

sudo firewall-cmd --permanent --zone=public --add-port=4647/tcp  --add-port=4648/tcp  --add-port=4648/udp
sudo firewall-cmd --reload
sudo firewall-cmd --zone=public --list-ports

Continue by installing Nomad in client configuration on your minion server:

nano /etc/nomad.d/nomad.hcl
## https://www.nomadproject.io/docs/agent/configuration
name = "my_client_name"
datacenter = "my_data_center"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

advertise {
# Defaults to the first private IP address.
http = "127.0.0.1"
rpc = "2.3.4.5"
serf = "2.3.4.5"
}

ports {
# Change the default ports below
http = 4646
rpc = 4647
serf = 4648
}

server {
enabled = false
}

client {
enabled = true
servers = ["1.2.3.4:4647"]
}

## Connect to Consul service if available
## consul {
## address = "2.3.4.5:8500"
## }

## https://www.nomadproject.io/docs/agent/configuration/index.html#log_level
## [WARN|INFO|DEBUG]
log_level = "INFO"
log_rotate_duration = "30d"
log_rotate_max_files = 12

And prepare the service file for the Nomad client. But before starting the service we need to forward the rpc and serf ports on our client.

Now start the service on your client and reload the Nomad service on your master. When successful you should see that the new client is connected by using the following command on your master:

nomad node status

ID DC Name Class Drain Eligibility Status
ehddf7ec my_data_center my_client_name <none> false eligible ready
5a675yye my_data_center my_server_name <none> false eligible ready

Encryption

Gossip Encryption

To encrypt the SERF gossip communication create an encryption key to encrypt the communication between Nomad agents:

nomad operator keygen
4kRkFQfcc3LU0BazP1ca+z==

And add the key to the server block of your master server:

nano /etc/nomad.d/nomad.hcl
server {
enabled = true
encrypt = "4kRkFQfcc3LU0BazP1ca+z=="

...

Mutual TLS Encryption

Nomad optionally uses mutual TLS (mTLS) for all HTTP and RPC communication. You can generate a private CA certificate and key with Cloudflare cfssl. Download the latest release with wget:

wget https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl_1.6.0_linux_amd64 -O cfssl
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssljson_1.6.0_linux_amd64 -O cfssljson

chmod +x cfssljson cfssl
mv cfssl* /usr/local/bin

cfssl version
cfssljson -version

Generate the CA's private key and certificate:

mkdir /opt/nomad/nomad_certs && cd /opt/nomad/nomad_certs
cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca

To create certificates for the client and server in the cluster use the following configuration file as cfssl.json to increase the default certificate expiration time:

nano cfssl.json

Generate a certificate for the Nomad server

echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
-hostname="server.global.nomad,localhost,127.0.0.1" - | cfssljson -bare server

Generate a certificate for the Nomad client

echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
-hostname="client.global.nomad,localhost,127.0.0.1" - | cfssljson -bare client

Generate a certificate for the CLI

echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -profile=client \
- | cfssljson -bare cli

You should now have the following files:

  • cfssl.json - cfssl configuration.
  • nomad-ca.csr - CA signing request.
  • nomad-ca-key.pem - CA private key. Keep safe.
  • nomad-ca.pem - CA public certificate.
  • cli.csr - Nomad CLI certificate signing request.
  • cli-key.pem - Nomad CLI private key.
  • cli.pem - Nomad CLI certificate.
  • client.csr - Nomad client node certificate signing request for the global region.
  • client-key.pem - Nomad client node private key for the global region.
  • client.pem - Nomad client node public certificate for the global region.
  • server.csr - Nomad server node certificate signing request for the global region.
  • server-key.pem - Nomad server node private key for the global region.
  • server.pem - Nomad server node public certificate for the global region.

Each Nomad node should have:

Master Server Configuration:

mkdir /etc/nomad.d/certs
cp server-key.pem /etc/nomad.d/certs
cp server.pem /etc/nomad.d/certs
cp nomad-ca.pem /etc/nomad.d/certs

Minion Server Configuration

The client needs the client-key.pem, client.pem and nomad-ca.pem. Create the same directory and choose you favourite way to copy in the client certs from your master server:

mkdir /etc/nomad.d/certs

Now add them to your Nomad configuration (I will deactivate the HTTP encryption since I bound the HTTP interface to localhost):

Master Server Configuration

tls {
http = false
rpc = true

ca_file = "/etc/nomad.d/certs/nomad-ca.pem"
cert_file = "/etc/nomad.d/certs/server.pem"
key_file = "/etc/nomad.d/certs/server-key.pem"

verify_server_hostname = true
verify_https_client = true
}

Minion Server Configuration

tls {
http = false
rpc = true

ca_file = "/etc/nomad.d/certs/nomad-ca.pem"
cert_file = "/etc/nomad.d/certs/client.pem"
key_file = "/etc/nomad.d/certs/client-key.pem"

verify_server_hostname = true
verify_https_client = true
}

Verify that your servers can still see each other:

nomad node status

ID DC Name Class Drain Eligibility Status
ehddf7ec my_data_center my_client_name <none> false eligible ready
5a675yye my_data_center my_server_name <none> false eligible ready

Note: When you activated TLS for HTTP you will noe need to provide the certificates for the CLI command above!