Skip to main content

Hashicorp Nomad with Consul II - The Reckoning

Shen Zhen, China

Continuation of Hashicorp Nomad Dojo

Consul Service Discovery

When two services need to communicate in a Nomad cluster, they need to know where to find each other and that's called Service Discovery. Because Nomad is purely a cluster manager and scheduler, you will need another piece of software to help you with service discovery: Consul.

Register your Service with Consul

We can now add a service block to our frontend configuration file that tells Consul how to verify that the service is operational. This can be done by using the randomly assigned HTTP port and send an HTTP request on localhost:

job "frontend" {
datacenters = ["instaryun"]
type = "service"

group "frontend" {
count = 2

scaling {
enabled = true
min = 2
max = 3
}

network {
mode = "host"
port "http" {
to = "8080"
}
}

service {
name = "frontend"
tags = [
"frontend",
"urlprefix-/website"
]
port = "http"

check {
name = "Frontend HTTP Healthcheck"
path = "/"
type = "http"
protocol = "http"
interval = "10s"
timeout = "2s"
}
}

task "frontend" {
driver = "docker"

config {
image = "thedojoseries/frontend:latest"
ports = ["http"]
}
}
}
}

Note I am also adding a tag urlprefix-/ that will be used by the load balancer later on for routing. In the example above there is the /website prefix. The web frontend will be available on the corresponding route.

nomad plan frontend.nomad
+ Job: "frontend"
+ Task Group: "frontend" (2 create)
+ Task: "app" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 frontend.nomad
nomad job run -check-index 0 frontend.nomad

Hashicorp Nomad Docker Deployment

After running the app we can use Consul's UI to see if the Health Check is working:

Hashicorp Nomad Docker Deployment

Or use the REST API to check if our frontend service has been registered by running the following queries on the Nomad Master server:

curl --insecure https://localhost:8501/v1/catalog/services

{
"consul": [],
"frontend": ["frontend", "urlprefix-/website"],
"nomad-client": ["http"]
}

To get the two service ports we can check:

curl --insecure 'https://localhost:8501/v1/health/service/frontend?passing'

[
{
...
"Address": "my.minion.com",
"TaggedAddresses": {
"lan_ipv4": {
"Address": "my.minion.com",
"Port": 22568
},
"wan_ipv4": {
"Address": "my.minion.com",
"Port": 22568
}
},
...
"Address": "my.minion.com",
"TaggedAddresses": {
"lan_ipv4": {
"Address": "my.minion.com",
"Port": 29222
},
"wan_ipv4": {
"Address": "my.minion.com",
"Port": 29222
}
},
...
}
]

And here we see that the two service instances run on port 22568 and 29222 - which can be confirmed on our Minion:

docker ps
CONTAINER ID IMAGE PORTS
eec03c397329 thedojoseries/frontend:latest my.minion.com:29222->8080/tcp
bf06acaf078e thedojoseries/frontend:latest my.minion.com:27906->8080/tcp

Adding the Fabio Load Balancer

Define a job called fabio in fabio.nomad:

  • Define a group called fabio.
    • Define a task called fabio.
      • Fabio should be using the Docker driver.
      • The image for this container should be fabiolb/fabio.
      • Usually, Docker containers run in a network mode called Bridge. In a bridge mode, containers run on a different network stack than the host. Because Fabio needs to be able to communicate easily with Consul, which is running as a process on the host and not as a Docker container, you should configure fabio to run in a network mode called host instead (which will run the container in the same network stack as the host).
      • Nomad should allocate 200 MHz of cpu and 128 MB of memory to this task.
      • You should allocate two static ports for Fabio: 9999 - Load Balancer & 9998 - UI.
job "fabio" {
datacenters = ["instaryun"]
type = "system"

group "fabio" {

network {
port "http" {
static = 9998
}
port "lb" {
static = 9999
}
}

task "fabio" {
driver = "docker"

config {
image = "fabiolb/fabio:latest"
network_mode = "host"
ports = ["http", "lb"]
}

resources {
cpu = 200
memory = 128
}
}
}
}
nomad plan fabio.nomad
+ Job: "fabio"
+ Task Group: "fabio" (1 create)
+ Task: "fabio" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 fabio.nomad
nomad job run -check-index 0 fabio.nomad

Now check the Fabio log to see if there are any error messages:

nomad status fabio

Allocations
ID Node ID Task Group Version Desired Status Created Modified
029b5c1c 005f708b fabio 0 run running 35s ago 31s ago
nomad alloc logs -stderr 029b5c1c
2022/06/25 09:57:44 [INFO] consul: Connecting to "localhost:8500" in datacenter "consul"
2022/06/25 09:57:44 [INFO] Admin server access mode "rw"
2022/06/25 09:57:44 [INFO] Admin server listening on ":9998"
2022/06/25 09:57:44 [INFO] Waiting for first routing table
2022/06/25 09:57:44 [INFO] consul: Using dynamic routes
2022/06/25 09:57:44 [INFO] consul: Using tag prefix "urlprefix-"
2022/06/25 09:57:44 [INFO] consul: Watching KV path "/fabio/config"
2022/06/25 09:57:44 [INFO] consul: Watching KV path "/fabio/noroute.html"
2022/06/25 09:57:44 [INFO] Config updates
+ route add frontend /login http://my.minion.com:29222/ tags "frontend"
+ route add frontend /login http://my.minion.com:22568/ tags "frontend"
2022/06/25 09:57:44 [INFO] consul: Registered fabio as "fabio"
2022/06/25 09:57:44 [INFO] consul: Registered fabio with id "fabio-mydatacenter-9998"
2022/06/25 09:57:44 [INFO] consul: Registered fabio with address "my.minion.com"

The service is running and it successfully picked up the routes from our frontend containers. We can also check the Fabio UI on Port 9998 on our Minion IP:

Fabio Load Balancer

Start Load Balancing

The frontend (http) above is available on port 9998 (as defined inside the Nomad job file). The load balancing happens on the port we defined as lb -> default 9999. So I should now be round-robbin'ed between my two frontend container when opening my.minion.com:9999/website.

Problem 1: I did get an error 504 inside my browser when I tried to access the URL - the load balancer did not get a reply from the frontend container. Fabio's error log said No route for my.minion.com:9999. I had to restart Fabio for it to actually start catching the traffic.

Problem 2: After the restart the route was actually leading somewhere. But this particular container does not like the URL prefix /website I added. I started seeing requests to the root URL that obviously failed GET http://my.minion.com:9999/app.js [HTTP/1.1 404 Not Found 268ms]. So I needed to remove it.

Changing the URL prefix inside frontend.nomad:

+/- Tags {
+ Tags: "urlprefix-/"
Tags: "frontend"
- Tags: "urlprefix-/website"
}

And redeploying the application:

nomad plan frontend.nomad
nomad job run -check-index 27785 frontend.nomad

Now I can cycle the Fabio deployment off/on:

Fabio Load Balancer

Checking the Fabio UI I can now see that the load balancer is now using the root / for both containers. And I am redirected to the frontend service on my.minion.com:9999/:

Fabio Load Balancer

Canary Deployment

The update block inside the Nomad job configuration allows us to run a new version of our app next to the older one. Enabling us to slowly switch our cluster to a new version and keeping an eye open for potential issues in the progress:

  • max_parallel: Only one instance should be updated at a time.
  • min_healthy_time: Minimum time that the allocation needs to remain healthy to be successful.
  • healthy_deadline: Maximum time to wait before failing the allocation.
  • auto_revert: Revert back to the old version if allocation failed true:false
  • auto_promote: A successful allocation should promote the next one true:false
  • canary: Number of updated allocations after update is successful.

I can now update the Nomad job file to use the update configuration to deploy version 2 of the frontend container thedojoseries/frontend:2.0:

job "frontend" {
datacenters = ["instaryun"]
type = "service"

group "frontend" {
count = 2

scaling {
enabled = true
min = 2
max = 3
}

update {
max_parallel = 1
min_healthy_time = "5s"
healthy_deadline = "30s"
auto_revert = false
auto_promote = false
canary = 1
}

network {
mode = "host"
port "http" {
to = "8080"
}
}

service {
name = "frontend"
tags = [
"frontend",
"urlprefix-/"
]
port = "http"

check {
name = "Frontend HTTP Healthcheck"
path = "/"
type = "http"
protocol = "http"
interval = "10s"
timeout = "2s"
}
}

task "frontend" {
driver = "docker"

config {
image = "thedojoseries/frontend:2.0"
ports = ["http"]
}
}
}
}
nomad plan frontend.nomad                                                                               
+/- Job: "frontend"
+/- Task Group: "frontend" (1 canary, 2 ignore)
+/- Update {
AutoPromote: "false"
AutoRevert: "false"
+/- Canary: "0" => "1"
HealthCheck: "checks"
+/- HealthyDeadline: "300000000000" => "30000000000"
MaxParallel: "1"
+/- MinHealthyTime: "10000000000" => "5000000000"
ProgressDeadline: "600000000000"
}
+/- Task: "app" (forces create/destroy update)
+/- Config {
+/- image: "thedojoseries/frontend:latest" => "thedojoseries/frontend:2.0"
ports[0]: "http"
}

Scheduler dry-run:
- All tasks successfully allocated.

Re-running the job now stops after deploying one canary allocation using the new frontend:

nomad job run -check-index 27823 frontend.nomad


Deployed
Task Group Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline
frontend false 2 1 1 0 1 2022-06-25T14:40:54+02:00

I can see that the old version is still parallel to the canary:

fd1180eead0e        thedojoseries/frontend:2.0      "docker-entrypoint.s…"   3 minutes ago 
d1446f86e3f8 thedojoseries/frontend:latest "sh -c 'npm start'" 3 hours ago
c2eee382ffe5 thedojoseries/frontend:latest "sh -c 'npm start'" 3 hours ago

And the Nomad UI tells me that it is waiting for me to confirm that everything worked - but the failed deployment in this list - under "unhealthy" prevents me from doing so. I am not sure why this allocation failed - I assume that I waited to long so that broke the 30s time limit that I set above?

Hashicorp Nomad Canary Deployment

So I made a small change to the job file and re-run it. Fabio picks up the third container:

Hashicorp Nomad Canary Deployment

To verify the load balancing is working, refresh the frontend URL until you end up on the canary instance:

Hashicorp Nomad Canary Deployment

And it seems that this time I was quick enough and can promote the deployment:

Hashicorp Nomad Canary Deployment

This can also be done by running the following command on your Nomad master:

nomad job promote frontend

Now all allocations will be updated:

0a36c884066d        thedojoseries/frontend:2.0    "docker-entrypoint.s…"
d655d75c75a8 thedojoseries/frontend:2.0 "docker-entrypoint.s…"