I had, in my office, a problem with an AC unit, Tectro, newly installed, that sometimes stops cooling and this for me is a problem because I host a bunch of servers in a Rack that needs some cooling otherwise, when the temperature rises, UPSs go in protection mode, thus leading to power interruption before reaching thermal shutdown of the servers.
Thus led me to start monitoring the temperature in some way and a collegue of mine installed a sonoff with a thermal sensor while I approached the way of monitoring temperature with InletAmbientSensor of HP ILO4 device. So I prepeared a template for zabbix that discovers all thermal sensors of an HP ILO4 device and then puts a trigger with a limit I choose for the ambient temperature. I set it > 29°C. It could seem a very high temperature but in my memories of thermodynamic course at Politecnico di Milano univeristy, I knew that all thermodynamic calculation are conventionally conducted at the temperature of 300°K, so more or less 27°C, so I wouldn’t be bothered before, the alarm has to be triggered slightly after that value.
We setup the Sonoff device to cut power of the AC unit at 27°C and to re-attach after some minutes. We realized that the compressor of the AC unit needed some sort of three minutes rest to be fully operational again. But… The dynamic of the Sonoff thermal sensor was so slow that the alarm from zabbix triggered before the power cycle was even attempted.
So we take a decision, to re-flash the Sonoff with an independent firmware, not to depend on internet connection anymore, and to have a simple way to command it with a simple curl command to power cycle the AC unit.
Thus I used the tasmota firmware to command the AC unit with a simple bash script.
#!/bin/bash curl http://<ip address>/cm?cmnd=Power%20off sleep 240 curl http://<ip address>/cm?cmnd=Power%20On
I had to learn that zabbix only gives 300 seconds of time to global scripts to be executed, and this value have to be set up in zabbix_server.conf as follows.
Option: TrapperTimeout Specifies how many seconds trapper may spend processing new data. # Mandatory: no Range: 1-300 Default: TrapperTimeout=300
And I had to make sleep command in the script only for 240 seconds to let the curl commands complete without generating a timeout.
Then I had to generate a “global script” in “Administration -> Scripts” which is called by an action in “Configuration -> Actions” which is recalled by a new trigger which is set up with t > 26°C thus leading it to be triggered as soon ad the Inlet Ambient sensors reads 27°C.
To avoid being bothered by the messages of the flapping read, and to avoid the trigger launch several instances of the same script that power cycles the Sonoff, I configured the trigger with a 2 °C hysteresis with the restore condition t < 26°C
The result is that, in case of the AC unit failing, I receive on telegram a message, that informs me that perhaps the AC unit is failing, after 4 minutes I receive another message (configured in the action launched by the trigger) that tells me the scripts terminated. In 12 minutes from the first message, I receive another message that informs me that the temperature has fallen below 26°C.
If the power cycle fails, I have the old 29°C alert that informs me that the temperature has risen again and I have to go physically to my office to look what happened.
This because the installer of the AC couldn’t solve the problem, then the covid-19 pandemic avoided me to call the official assistance and I had to be in control of the situation living 60 kms far from the office and in another county and possibly not being able to go to office everyday.
As soon as I return in office day by day I’ll call the official assistance to get this unit fixed but I can be quite relaxed about the situation.