Zabbix v6 SMART HDD and CPU Temperature Check

Install Smartmontools and LM Sensors
apt install lm-sensors smartmontools
Harddrive Monitoring
S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system included in computer hard disk drives (HDDs), solid-state drives (SSDs), and eMMC drives
The smartmontools package comes with two utilities, smartctl which you can use to check your hard drives on the command line, and smartd, a daemon that checks your hard disks at a specified interval and logs warnings/errors to the syslog and can also send warnings and errors to a specified email address (usually the admin of the system).
smartctl -v
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-11-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Using Smartctl
Harddrives
Find partition:
df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              6.3G  2.3M  6.3G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv   98G   36G   58G  38% /
tmpfs                               32G   36M   32G   1% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/md126p2                       2.0G  428M  1.4G  24% /boot
/dev/md126p1                       1.1G  6.1M  1.1G   1% /boot/efi
smartctl  --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
In the case of my test server below we have a virtual machine - that, obviously, does not have access to the underlying HDD hardware /dev/sda1:
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-97-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68FYTN0
Serial Number:    WD-WCC4J4NHYJJ2
LU WWN Device Id: 5 0014ee 269c5648a
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Mar  9 09:21:06 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (14100) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 160) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   169   133   021    Pre-fail  Always       -       2516
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       63
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       18640
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       62
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       59
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5458
194 Temperature_Celsius     0x0022   121   094   000    Old_age   Always       -       22
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     18594         -
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
NVME Drives
Additional NVME drive:
df -h
Filesystem                            Size  Used Avail Use% Mounted on
tmpfs                                  32G  2.9M   32G   1% /run
/dev/mapper/ubuntu--vg--1-ubuntu--lv  438G   81G  338G  20% /
tmpfs                                  63G  914M   62G   2% /dev/shm
tmpfs                                 5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p2                        2.0G  304M  1.5G  17% /boot
/dev/nvme0n1p1                        1.1G  6.1M  1.1G   1% /boot/efi
tmpfs                                  13G  4.0K   13G   1% /run/user/1000
smartctl  --scan
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device
/dev/nvme1 -d nvme # /dev/nvme1, NVMe device
smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-91-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number:                       SanDisk Extreme Pro 500GB
Serial Number:                      212181449612
Firmware Version:                   111130WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 500,107,862,016 [500 GB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a4944db11
Local Time is:                      Sat Mar  9 09:45:02 2024 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     4000   10000
 4 -   0.0035W       -        -    4  4  4  4     4000   40000
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        51 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    7,580,500 [3.88 TB]
Data Units Written:                 37,759,770 [19.3 TB]
Host Read Commands:                 38,034,128
Host Write Commands:                1,870,013,477
Controller Busy Time:               279
Power Cycles:                       25
Power On Hours:                     13,542
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
When you see that the SMART support is: disabled run the following command to enable it:
smartctl -s on -a /dev/sda1
CPU Temperature
sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +42.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +46.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 5:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 6:        +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 7:        +47.0°C  (high = +100.0°C, crit = +100.0°C)
nvme-pci-0900
Adapter: PCI adapter
Composite:    +48.9°C  (low  =  -5.2°C, high = +83.8°C)
                       (crit = +87.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +16.8°C  (crit = +20.8°C)
temp2:        +27.8°C  (crit = +105.0°C)
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A  
enp7s0-pci-0700
Adapter: PCI adapter
PHY Temperature:  +47.9°C  
MAC Temperature:  +48.5°C  
nvme-pci-0300
Adapter: PCI adapter
Composite:    +50.9°C  (low  =  -5.2°C, high = +83.8°C)
                       (crit = +87.8°C)
Zabbix
Preparing Zabbix-Agent2
Add the zabbix agent to Sudoers:
nano /etc/sudoers
# Zabbix user SMART control
zabbix ALL=(ALL) NOPASSWD:/usr/sbin/smartctl
# Zabbix user LM Sensors
zabbix ALL=NOPASSWD:/bin/sensors
Let's start by allowing the Zabbix server to execute ANY script (parental supervision advised):
nano /etc/zabbix/zabbix_agent2.conf
### Option: AllowKey
#       Allow execution of item keys matching pattern.
#       Multiple keys matching rules may be defined in combination with DenyKey.
#       Key pattern is wildcard expression, which support "*" character to match any number of any characters in ce>
#       Parameters are processed one by one according their appearance order.
#       If no AllowKey or DenyKey rules defined, all keys are allowed.
#
# Mandatory: no
AllowKey=system.run[*]
Direct CLI Command Execution
Now prepare a few sensor/smarttools commands to extract single values of interest:
smartctl -a /dev/sda | grep Temperature_Celsius | awk {'print $10'}
23
smartctl -a /dev/nvme0 | grep Temperature | awk {'print $2'} | grep -o '[0-9]\+'
51
sudo sensors | grep 'Core 0' | awk -F'[+|.]' {'print $2'}
30
We can add these scripts to our Zabbix Server Scripts config:


As a manual script item we can now execute those scripts directly from our global dashboard:

If you run into the following error message Cannot execute script. Unknown metric system.run you skipped the step above of adding the zabbix agent to your host sudoers - or forgot to restart the Zabbix Agent service:

If everything is set up right your server should now be able to retrieve the Temperature value from your host system:

Working with Shell Scripts
To replace the nasty wildcard execution permission we can now replace the direct commands with a shell script. Just add all CLI commands you want to execute to separate shell files in a directory accessible to the Zabbix Agent:
/opt/zabbix/temp_sda.sh
#!/bin/bash
sudo smartctl -a /dev/sda | grep Temperature_Celsius | awk {'print $10'}
/opt/zabbix/temp_nvme0.sh
#!/bin/bash
smartctl -a /dev/nvme0 | grep Temperature | awk {'print $2'} | grep -o '[0-9]\+'
/opt/zabbix/temp_core0.sh
#!/bin/bash
sudo sensors | grep 'Core 0' | awk -F'[+|.]' {'print $2'}
and so on...
Now replace the wildcard with the explicit script file calls to exclude any script not specifically defined by you:
nano nano /etc/zabbix/zabbix_agent2.conf
### Option: AllowKey
#       Allow execution of item keys matching pattern.
#       Multiple keys matching rules may be defined in combination with DenyKey.
#       Key pattern is wildcard expression, which support "*" character to match any number of any characters in ce>
#       Parameters are processed one by one according their appearance order.
#       If no AllowKey or DenyKey rules defined, all keys are allowed.
#
# Mandatory: no
AllowKey=system.run[sh /opt/zabbix/temp_nvme0.sh]
AllowKey=system.run[sh /opt/zabbix/temp_nvme1.sh]
AllowKey=system.run[sh /opt/zabbix/temp_sda.sh]
AllowKey=system.run[sh /opt/zabbix/temp_sdb.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core0.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core1.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core2.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core3.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core4.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core5.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core6.sh]
AllowKey=system.run[sh /opt/zabbix/temp_core7.sh]
Now change the scripts accordingly on the Zabbix server:



Verify that it is still working:
