Skip to main content

App Deployment with Hashicorp Nomad from Gitlab

Shen Zhen, China

Deploy Applications from the Gitlab Docker Registry

I want to download a Docker image from a private Gitlab Docker Registry and run the container on a dynamic port forwarded to a static port on the inside of the container providing a web frontend:

job "wiki_de" {
	datacenters = ["instaryun"]

	group "wiki_de" {
    count = 1
        
		network {
			mode = "host"
			port "http" {
				to = "1234"
			}
		}

		task "container" {
			driver = "docker"

			config {
				image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
				ports = ["http"]

        auth {
          username = "mynomaduserongitlab"
          password = "acomplicatedpassword"
        }
			}
		}
	}
}

This time I want to use the Nomad web frontend to plan and execute the job:

Nomad & Gitlab

Nomad & Gitlab

After clicking on execute I find the UI a bit lacking - you get feedback if something went wrong. But there is no progress or error log. So let's check the CLI:

nomad job status wiki_de

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
wiki_de     1        1       0        0          2022-06-12T11:23:40+02:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created   Modified
a98b7e7d  005f708b  wiki_de     0        run      running  1m5s ago  1s ago
nomad alloc-status a98b7e7d

Recent Events:
Time                       Type        Description
2022-06-12T11:14:45+02:00  Started     Task started by client
2022-06-12T11:13:41+02:00  Driver      Downloading image
2022-06-12T11:13:41+02:00  Task Setup  Building Task Directory
2022-06-12T11:13:40+02:00  Received    Task received by client

Everything seemed to have worked. Checking the UI confirms that the allocation was successful:

Nomad & Gitlab

Checking the docker process tells me that the port allocation is in place as well:

docker ps
CONTAINER ID        PORTS
ca5e75497442        my.minion.ip:24372->1234/tcp, my.minion.ip:24372->1234/udp

I am able to access my web frontend by running curl http://my.minion.ip:24372.

Add a Healthcheck

Job Specification / Service

The health check initially failed since - even though I am running the container on my host network - I am still assigning a random port to it and forward it to my HTTP port inside the container:

network {
  mode = "host"
  port "http" {
    to = "1234"
  }
}

This means that Consul is trying to connect on to the HTTP frontend on this random port instead - that, unfortunately, leads to nothing and makes the health check fail:

Gitlab CI with Nomad

I initially left this part in because I plan to use the Consul service discovery to handle routing automatically. But it seems for now I have to add a static port to continue. This is going to cause an issue later on when trying to update the application:

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "docker" (failed to place 1 allocation):
    * Resources exhausted on 1 nodes
    * Dimension "network: reserved port collision http=1234" exhausted on 1 nodes

So I first have to manually stop the running allocation and then plan/run the job to update the application...

network {
    mode = "host"
    port "http" {
        static = "1234"
    }
}
job "wiki_en" {
	  datacenters = ["wiki_search"]
    type = "service"

	group "docker" {
    count = 1
        
    network {
        mode = "host"
        port "http" {
            static = "1234"
        }
	  }

    service {
        name = "wikiEN"
        port = "http"
        tags = [
            "frontend",
            "urlprefix-/en/"
        ]

        check {
            name     = "HTTP Health"
            path     = "/"
            type     = "http"
            protocol = "http"
            interval = "10s"
            timeout  = "2s"
        }
    }

    task "wiki_en_container" {
        driver = "docker"

        config {
            image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
            ports = ["http"]
            network_mode = "host"

            auth {
              username = "mynomaduserongitlab"
              password = "acomplicatedpassword"
            }
          }
        }
    }
}

This application provides a web frontend on port 1234. The health check now registers a service with Consul that will check every 10s if the web frontend is available. Once the health check fails Nomad will be triggered to fullfill the requirement of having at least one healthy instance of this app running. We now have a self-healing app! Nice!

But before I run it - let's define some update parameter for the application.

Updating Applications

Job Specification / Update

The Update block below will make sure that only 1 instance of the app is running at a time. It would be better to have count higher than one and a load-balancing service in place - but there are space restraints. So I accept some potential downtime.

The update service makes sure that the application passes the health-check for at least 10s and rolls the application back to the old version if the health-check fails for 2min. Once the application is deemed healthy the canary deployment will be promoted to stable.

To make sure that Nomad always pulls the latest Docker image - this job only going to be used after a new image was committed - you can add the force-pull option:

job "wiki_en" {
	  datacenters = ["wiki_search"]
    type = "service"

	group "docker" {
    count = 1
        
    network {
        mode = "host"
        port "http" {
            static = "1234"
        }
	  }

    update {
      max_parallel = 1
      min_healthy_time  = "10s"
      healthy_deadline  = "2m"
      progress_deadline = "5m"
      auto_revert = true
      auto_promote = true
      canary = 1
    }

    service {
        name = "wikiEN"
        port = "http"
        tags = [
            "frontend",
            "urlprefix-/en/"
        ]

        check {
            name     = "HTTP Health"
            path     = "/"
            type     = "http"
            protocol = "http"
            interval = "10s"
            timeout  = "2s"
        }
    }

    task "wiki_en_container" {
        driver = "docker"

        config {
            image = "mygitlab.mydomain.com:12345/wiki/wiki_de_mdx"
            ports = ["http"]
            network_mode = "host"
            force_pull = true

            auth {
              username = "mynomaduserongitlab"
              password = "acomplicatedpassword"
            }
          }
        }
    }
}

Starting the job I can now see the canary deployment and job promotion once the Consul health check is successful:

Gitlab CI with Nomad

Gitlab CI with Nomad

Adding a Loadbalancer / App Ingress

see Part II

Use Git to Download Artifacts

Preparation

Create your SSH Key

First we need to create the private and public SSH key on the Nomad Minion node:

ssh-keygen -t rsa -b 4096 -f /etc/nomad.d/.ssh/id_rsa

I realized afterwards that the Nomad process is executed by the root user on each minion. Only the master node uses the nomad user. This means this key could be placed in the root home dir. But I am going to add a default SSH config parameter that will make sure that this key is used - no matter where it is placed. And having everything neatly placed inside the nomad.d dir is maybe not a bad idea. This might even be source controlled and used in provisioning new instances of each Nomad Minion.

This will create the RSA and RSA public key - place them inside your Nomad users home directory. If you used the following command before to create the user useradd --system --home /etc/nomad.d --shell /bin/false nomad this directory will be /etc/nomad.d:

id_rsa

-----BEGIN OPENSSH PRIVATE KEY-----
bYWQ=fDSAFe4 ... 5sdgfdDFSfszgf
-----END OPENSSH PRIVATE KEY-----

id_rsa.pub

ssh-rsa AA ... 7+lU= myuser@Nomad

Make sure those files can be used by the Nomad user chown nomad:nomad /etc/nomad.d/*:

chmod 400 /etc/nomad.d/.ssh/id_rsa

ls -la /etc/nomad.d/.ssh

total 16
drwxr-xr-x 2 root  root  4096 Jun  8 12:18 .
drwxr-xr-x 4 nomad nomad 4096 Jun  8 12:18 ..
-r-------- 7 nomad nomad 3.2K Jun 13 07:23 id_rsa
-rw-r--r-- 7 nomad nomad  744 Jun 13 07:23 id_rsa.pub

Make sure that the Nomad user's known hosts file is populated:

ssh-keyscan my.gitlab.address.com | sudo tee -a /etc/nomad.d/.ssh/known_hosts

Make sure that SSH uses the correct public key when connecting to your Gitlab server by adding the following configuration:

nano /etc/ssh/ssh_config
Host my.gitlab.address.com
   Preferredauthentications publickey
   IdentityFile /etc/nomad.d/.ssh/id_rsa
ssh -T git@my.gitlab.address.com
Welcome to GitLab, @nomaduser!

Configuring Gitlab

Create a Nomad User in Gitlab and add the Public key:

Gitlab CI with Nomad

Test the Connection

runuser -u nomad -- mkdir /etc/nomad.d/test
cd /etc/nomad.d/test
runuser -u nomad -- git clone git@my.gitlab.address.com/group/repo.git

The repository should be downloaded into your test directory without having to type in a password - then you are good to go!

Create a Nomad Job

For testing I am going to download some HTML code from a private Gitlab repository and execute a small terminal Node.js web server called httpster to serve those files:

httpster -p 8080 -d /home/somedir/public_html

So now we can plan and run our Nomad job from the Nomad UI:

job "web_front" {
  datacenters = ["kundensysteme"]

  group "web" {

    task "httpster" {
      driver = "exec"

      config {
        command = "httpster"
        args = ["-p", "8080", "-d", "${NOMAD_TASK_DIR}/html"]
      }

      artifact {
        source      = "git::git@my.gitlab.address.com/group/repo.git"
        destination = "${NOMAD_TASK_DIR}/html"
        options {
          sshkey = "${base64encode(file(pathexpand("~/.ssh/id_rsa")))}"
          depth = 1
        }
      }

      resources {
        cpu    = 128
        memory = 128
      }
    }
  }
}

But it seems that you cannot access the servers filesystem when using the Nomad webUI:

Parse Error: input.hcl:18,41-52: Error in function call; Call to function "pathexpand" failed: filesystem function disabled. input.hcl:18,20-72: Unsuitable value type; Unsuitable value: value must be known

Gitlab CI with Nomad

So let's create this job file on our Nomad master and execute it using the Nomad CLI:

nomad plan test_artifacts.nomad                                                                                      
+/- Job: "web_front"
+/- Stop: "true" => "false"
+/- Task Group: "web" (1 create)
  +/- Task: "httpster" (forces create/destroy update)
    +/- Artifact {
          GetterMode:            "any"
          GetterOptions[sshkey]: "ADgfdt...tf325sd"
          GetterSource:          "git::ssh://git@my.gitlab.address.com/group/repo.git"
          RelativeDest:          "local/html"
        }

Scheduler dry-run:
- All tasks successfully allocated.

To submit the job with version verification run:

nomad job run -check-index 8150 test_artifacts.nomad
nomad job run -check-index 8150 test_artifacts.nomad

==> 2022-06-13T08:24:57+02:00: Monitoring evaluation "19cbcc30"
    2022-06-13T08:24:57+02:00: Evaluation triggered by job "web_front"
==> 2022-06-13T08:24:58+02:00: Monitoring evaluation "19cbcc30"
    2022-06-13T08:24:58+02:00: Evaluation within deployment: "cb210d05"
    2022-06-13T08:24:58+02:00: Allocation "ab8494c7" created: node "005f708b", group "web"
    2022-06-13T08:24:58+02:00: Evaluation status changed: "pending" -> "complete"
==> 2022-06-13T08:24:58+02:00: Evaluation "19cbcc30" finished with status "complete"
==> 2022-06-13T08:24:58+02:00: Monitoring deployment "cb210d05"
  ✓ Deployment "cb210d05" successful
    
    2022-06-13T08:25:14+02:00
    ID          = cb210d05
    Job ID      = web_front
    Job Version = 9
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    web         1        1       1        0          2022-06-13T08:35:12+02:00
nomad job status web_front

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
web         1        1       0        0          2022-06-13T07:15:19+02:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
11526379  005f708b  web         1        run      pending   7s ago      4s ago
nomad alloc-status a6323ccc

Task "httpster" is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/128 MHz  13 MiB/128 MiB  300 MiB  

Recent Events:
Time                       Type                   Description
2022-06-13T08:25:02+02:00  Started                Task started by client
2022-06-13T08:25:01+02:00  Downloading Artifacts  Client is downloading artifacts
2022-06-13T08:24:58+02:00  Task Setup             Building Task Directory
2022-06-13T08:24:58+02:00  Received               Task received by client

I can verify that the web server is running:

netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp6       0      0 :::8080                 :::*                    LISTEN      7002/node
curl localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <meta charset='utf-8'>

    ...

It works!