Skip to main content

App Deployment with Hashicorp Nomad from Gitlab

Shen Zhen, China

Deploy Applications from the Gitlab Docker Registry

I want to download a Docker image from a private Gitlab Docker Registry and run the container on a dynamic port forwarded to a static port on the inside of the container providing a web frontend:

job "wiki_de" {
datacenters = ["instaryun"]

group "wiki_de" {
count = 1

network {
mode = "host"
port "http" {
to = "1234"
}
}

task "container" {
driver = "docker"

config {
image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
ports = ["http"]

auth {
username = "mynomaduserongitlab"
password = "acomplicatedpassword"
}
}
}
}
}

This time I want to use the Nomad web frontend to plan and execute the job:

Nomad & Gitlab

Nomad & Gitlab

After clicking on execute I find the UI a bit lacking - you get feedback if something went wrong. But there is no progress or error log. So let's check the CLI:

nomad job status wiki_de

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wiki_de 1 1 0 0 2022-06-12T11:23:40+02:00

Allocations
ID Node ID Task Group Version Desired Status Created Modified
a98b7e7d 005f708b wiki_de 0 run running 1m5s ago 1s ago
nomad alloc-status a98b7e7d

Recent Events:
Time Type Description
2022-06-12T11:14:45+02:00 Started Task started by client
2022-06-12T11:13:41+02:00 Driver Downloading image
2022-06-12T11:13:41+02:00 Task Setup Building Task Directory
2022-06-12T11:13:40+02:00 Received Task received by client

Everything seemed to have worked. Checking the UI confirms that the allocation was successful:

Nomad & Gitlab

Checking the docker process tells me that the port allocation is in place as well:

docker ps
CONTAINER ID PORTS
ca5e75497442 my.minion.ip:24372->1234/tcp, my.minion.ip:24372->1234/udp

I am able to access my web frontend by running curl http://my.minion.ip:24372.

Add a Healthcheck

Job Specification / Service

The health check initially failed since - even though I am running the container on my host network - I am still assigning a random port to it and forward it to my HTTP port inside the container:

network {
mode = "host"
port "http" {
to = "1234"
}
}

This means that Consul is trying to connect on to the HTTP frontend on this random port instead - that, unfortunately, leads to nothing and makes the health check fail:

Gitlab CI with Nomad

I initially left this part in because I plan to use the Consul service discovery to handle routing automatically. But it seems for now I have to add a static port to continue. This is going to cause an issue later on when trying to update the application:

Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "docker" (failed to place 1 allocation):
* Resources exhausted on 1 nodes
* Dimension "network: reserved port collision http=1234" exhausted on 1 nodes

So I first have to manually stop the running allocation and then plan/run the job to update the application...

network {
mode = "host"
port "http" {
static = "1234"
}
}
job "wiki_en" {
datacenters = ["wiki_search"]
type = "service"

group "docker" {
count = 1

network {
mode = "host"
port "http" {
static = "1234"
}
}

service {
name = "wikiEN"
port = "http"
tags = [
"frontend",
"urlprefix-/en/"
]

check {
name = "HTTP Health"
path = "/"
type = "http"
protocol = "http"
interval = "10s"
timeout = "2s"
}
}

task "wiki_en_container" {
driver = "docker"

config {
image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
ports = ["http"]
network_mode = "host"

auth {
username = "mynomaduserongitlab"
password = "acomplicatedpassword"
}
}
}
}
}

This application provides a web frontend on port 1234. The health check now registers a service with Consul that will check every 10s if the web frontend is available. Once the health check fails Nomad will be triggered to fullfill the requirement of having at least one healthy instance of this app running. We now have a self-healing app! Nice!

But before I run it - let's define some update parameter for the application.

Updating Applications

Job Specification / Update

The Update block below will make sure that only 1 instance of the app is running at a time. It would be better to have count higher than one and a load-balancing service in place - but there are space restraints. So I accept some potential downtime.

The update service makes sure that the application passes the health-check for at least 10s and rolls the application back to the old version if the health-check fails for 2min. Once the application is deemed healthy the canary deployment will be promoted to stable.

To make sure that Nomad always pulls the latest Docker image - this job only going to be used after a new image was committed - you can add the force-pull option:

job "wiki_en" {
datacenters = ["wiki_search"]
type = "service"

group "docker" {
count = 1

network {
mode = "host"
port "http" {
static = "1234"
}
}

update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "2m"
progress_deadline = "5m"
auto_revert = true
auto_promote = true
canary = 1
}

service {
name = "wikiEN"
port = "http"
tags = [
"frontend",
"urlprefix-/en/"
]

check {
name = "HTTP Health"
path = "/"
type = "http"
protocol = "http"
interval = "10s"
timeout = "2s"
}
}

task "wiki_en_container" {
driver = "docker"

config {
image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
ports = ["http"]
network_mode = "host"
force_pull = true

auth {
username = "mynomaduserongitlab"
password = "acomplicatedpassword"
}
}
}
}
}

Starting the job I can now see the canary deployment and job promotion once the Consul health check is successful:

Gitlab CI with Nomad

Gitlab CI with Nomad

Adding a Loadbalancer / App Ingress

see Part II

Use Git to Download Artifacts

Preparation

Create your SSH Key

First we need to create the private and public SSH key on the Nomad Minion node:

ssh-keygen -t rsa -b 4096 -f /etc/nomad.d/.ssh/id_rsa

I realized afterwards that the Nomad process is executed by the root user on each minion. Only the master node uses the nomad user. This means this key could be placed in the root home dir. But I am going to add a default SSH config parameter that will make sure that this key is used - no matter where it is placed. And having everything neatly placed inside the nomad.d dir is maybe not a bad idea. This might even be source controlled and used in provisioning new instances of each Nomad Minion.

This will create the RSA and RSA public key - place them inside your Nomad users home directory. If you used the following command before to create the user useradd --system --home /etc/nomad.d --shell /bin/false nomad this directory will be /etc/nomad.d:

id_rsa

-----BEGIN OPENSSH PRIVATE KEY-----
bYWQ=fDSAFe4 ... 5sdgfdDFSfszgf
-----END OPENSSH PRIVATE KEY-----

id_rsa.pub

ssh-rsa AA ... 7+lU= myuser@Nomad

Make sure those files can be used by the Nomad user chown nomad:nomad /etc/nomad.d/*:

chmod 400 /etc/nomad.d/.ssh/id_rsa

ls -la /etc/nomad.d/.ssh

total 16
drwxr-xr-x 2 root root 4096 Jun 8 12:18 .
drwxr-xr-x 4 nomad nomad 4096 Jun 8 12:18 ..
-r-------- 7 nomad nomad 3.2K Jun 13 07:23 id_rsa
-rw-r--r-- 7 nomad nomad 744 Jun 13 07:23 id_rsa.pub

Make sure that the Nomad user's known hosts file is populated:

ssh-keyscan my.gitlab.address.com | sudo tee -a /etc/nomad.d/.ssh/known_hosts

Make sure that SSH uses the correct public key when connecting to your Gitlab server by adding the following configuration:

nano /etc/ssh/ssh_config
Host my.gitlab.address.com
Preferredauthentications publickey
IdentityFile /etc/nomad.d/.ssh/id_rsa
ssh -T git@my.gitlab.address.com
Welcome to GitLab, @nomaduser!

Configuring Gitlab

Create a Nomad User in Gitlab and add the Public key:

Gitlab CI with Nomad

Test the Connection

runuser -u nomad -- mkdir /etc/nomad.d/test
cd /etc/nomad.d/test
runuser -u nomad -- git clone git@my.gitlab.address.com/group/repo.git

The repository should be downloaded into your test directory without having to type in a password - then you are good to go!

Create a Nomad Job

For testing I am going to download some HTML code from a private Gitlab repository and execute a small terminal Node.js web server called httpster to serve those files:

httpster -p 8080 -d /home/somedir/public_html

So now we can plan and run our Nomad job from the Nomad UI:

job "web_front" {
datacenters = ["kundensysteme"]

group "web" {

task "httpster" {
driver = "exec"

config {
command = "httpster"
args = ["-p", "8080", "-d", "${NOMAD_TASK_DIR}/html"]
}

artifact {
source = "git::git@my.gitlab.address.com/group/repo.git"
destination = "${NOMAD_TASK_DIR}/html"
options {
sshkey = "${base64encode(file(pathexpand("~/.ssh/id_rsa")))}"
depth = 1
}
}

resources {
cpu = 128
memory = 128
}
}
}
}

But it seems that you cannot access the servers filesystem when using the Nomad webUI:

Parse Error: input.hcl:18,41-52: Error in function call; Call to function "pathexpand" failed: filesystem function disabled. input.hcl:18,20-72: Unsuitable value type; Unsuitable value: value must be known

Gitlab CI with Nomad

So let's create this job file on our Nomad master and execute it using the Nomad CLI:

nomad plan test_artifacts.nomad                                                                                      
+/- Job: "web_front"
+/- Stop: "true" => "false"
+/- Task Group: "web" (1 create)
+/- Task: "httpster" (forces create/destroy update)
+/- Artifact {
GetterMode: "any"
GetterOptions[sshkey]: "ADgfdt...tf325sd"
GetterSource: "git::ssh://git@my.gitlab.address.com/group/repo.git"
RelativeDest: "local/html"
}

Scheduler dry-run:
- All tasks successfully allocated.

To submit the job with version verification run:

nomad job run -check-index 8150 test_artifacts.nomad
nomad job run -check-index 8150 test_artifacts.nomad

==> 2022-06-13T08:24:57+02:00: Monitoring evaluation "19cbcc30"
2022-06-13T08:24:57+02:00: Evaluation triggered by job "web_front"
==> 2022-06-13T08:24:58+02:00: Monitoring evaluation "19cbcc30"
2022-06-13T08:24:58+02:00: Evaluation within deployment: "cb210d05"
2022-06-13T08:24:58+02:00: Allocation "ab8494c7" created: node "005f708b", group "web"
2022-06-13T08:24:58+02:00: Evaluation status changed: "pending" -> "complete"
==> 2022-06-13T08:24:58+02:00: Evaluation "19cbcc30" finished with status "complete"
==> 2022-06-13T08:24:58+02:00: Monitoring deployment "cb210d05"
✓ Deployment "cb210d05" successful

2022-06-13T08:25:14+02:00
ID = cb210d05
Job ID = web_front
Job Version = 9
Status = successful
Description = Deployment completed successfully

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
web 1 1 1 0 2022-06-13T08:35:12+02:00
nomad job status web_front

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
web 1 1 0 0 2022-06-13T07:15:19+02:00

Allocations
ID Node ID Task Group Version Desired Status Created Modified
11526379 005f708b web 1 run pending 7s ago 4s ago
nomad alloc-status a6323ccc

Task "httpster" is "running"
Task Resources
CPU Memory Disk Addresses
0/128 MHz 13 MiB/128 MiB 300 MiB

Recent Events:
Time Type Description
2022-06-13T08:25:02+02:00 Started Task started by client
2022-06-13T08:25:01+02:00 Downloading Artifacts Client is downloading artifacts
2022-06-13T08:24:58+02:00 Task Setup Building Task Directory
2022-06-13T08:24:58+02:00 Received Task received by client

I can verify that the web server is running:

netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::8080 :::* LISTEN 7002/node
curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>

...

It works!