Creating Supervisors in Python with Multi-Threading and Process Manager
Creating supervisors in Python is a necessary joy for a developer, and something that any organisation should consider as their systems expand and gain complexity. What is a supervisor exactly?
A supervisor is an automated way to ensure that a particular service or element of your system is up and running. They check if certain criteria are met in order to determine whether everything is operating as expected. From here they can report their results, as a database record to display on a Service Status page, for example.
Python is a good fit for coding a supervisor, with its ability to make the seemingly complex rather simple to implement. Let’s explore how we would code a supervisor in Python using scheduling and multi-threading to ensure a particular process is running; In this article we will explore how a supervisor can be used.
The packages we will utilise are
threading. Threading is included in Python’s core modules, but requests and schedule cna be installed via pip with the following command:
pip install schedule, requests
But why consider an army of supervisors? Afterall, they are an additional element to your ecosystem that need to be managed and maintained.
It comes down to the complexity of your production environment. Gone are the days when your organisation ran all its processes on one server. With global audiences in mind, it is very likely you will be eyeing your ecoystem to be operating in multiple regions around the world; even for a startup it is very easy to implement a multi-server setup with a $5 VPS running in the US, one in Europe, and another in Asia, for example. On top of this, you may be running a multitude of services on each of those VPSs, to serve various APIs, static assets and so forth.
It very quickly becomes impossible to manually and proactively manage all these processes and respond to failures. In fact, you will most likely notice an error on your front end deliverables as the first sign of a failed process — and by this time it is too late to prevent end users from being effected. Not the most ideal scenario.
Supervisors can be expanded to provide notifications and even attempt to fix an issue if one is found, however this article will not dive into these additional layers to a supervisor script.
How should supervisors be run?
As a multi-threaded background process that is managed by a process manager. In the event that the server hosting your supervisors crash or needs to reboot, your process manager will ensure that all supervisors will restart at boot. The last thing we want are our supervisors lazing around not doing anything.
Let’s break down how the supervisor in this article will be demonstarted:
- Defining the supervisor tasks as functions:
#perform supervisor task
...#record to my database that the supervisor is running
2. Defining a function that can pass my supervisor tasks through a separate thread:
#this function will execute our tasks in a separate thread
def threadTask(job_function, arguments):
3. Scheduling our threads to be executed every 10 minutes:
#schedule my supervisor to run and check itself in every 10 minutesschedule.every(10).minutes.do(threadTask(superviorTask))
4. Starting the supervisor via a process manager:
Using the supervisord CLI, I could start my supervisor like so:
supervisorctl start mySupervisor
Now, let’s get to an example of a Python supervisor.
The supervisor demonstrated here will have the following task:
- To check whether a website is up and running by visiting it every so often and checking the requests status code, and to check whether a content length is returned. Why both? Well, we might receive a 200 status code for a successful request, but there may exist a server side rendering error that fails to render HTML.
- To check itself in every 10 minutes to let my CMS know it is running. How is it checking itself in? By upserting a record in my database with a timestamp that represents the last time the supervisor task was executed.
This is a very basic test, but necessary nonetheless — we want our websites to be reachable, all the time. Supervisors can be as complex as you program them to be. Iterate them as you see fit.
Don’t forget to blacklist your supervisors’ IP address from your Google Analytics or other tracking service, otherwise you may be lead to believe you have a super interested and very persistant visitor to your sites!
So, adhering to the structure outlined previously, my supervisor for this task may look like the following:
import requests, time, schedule, threading
from my_status import *
#perform supervisor task
r = requests.get('https://medium.com')
if r.status_code == 200 and len(r.text) > 0:
status = STATUS_LIVE
status = STATUS_INACTIVE#record to my database that the supervisor is running
#pymongo, mysql, <your favourite database> query here.
#function for handling threads
job_thread = threading.Thread(target=job_function)
schedule.every(10).minutes.do(threadTask, supervisorCheckIn)while 1:
Let’s visit some aspects of this example below:
In the above example, I have used
STATUS_INACTIVE to represent the status of my service, imported from
my_status.py. This file can consist of all my status codes that I wish to use in all of my supervisors:
My Status Codesused throughout my supervisor reporting
"""STATUS_LIVE = 'Live'
STATUS_INACTIVE = 'Inactive'
In my example I wish to call 2 tasks exactly every 10 minutes. In order to do this I have introduced a
threadTask function. This allows me to execute my supervisor tasks on separate threads, passing them as a
job_function which is then passed to the new thread:
job_thread = threading.Thread(target=job_function)
What if I wanted to pass arguments with my
job_function? We can use the
args Thread argument, which takes a tuple:
job_thread = threading.Thread(target=job_function, args=(arg1, arg2, ))
The schedule module is a much more elegant solution than using
time.sleep() or other ways to pause execution of my script. It is clear, concise and self explanatory.
We are not limited to just minutes, but also hours, and even a random moment between 2 predefined times — the documentation to schedule can be found on PyPi at https://pypi.org/project/schedule/.
In addition, passing arguments to your schedule function is very straight forward — just supply them as extra arguments to your
schedule.every(10).minutes.do(threadTask, superviorTask, arg1, arg2)
As you can see, the only limit to how many process you schedule is defined by your hardware; you are free to list as many schedules you wish, which will be executed accordingly as
schedule.run_pending() is called in your main process loop.
Where should supervisors be run?
You may be tempted to run supervisors on a production server that may also be hosting critical services that your supervisor is supervising. This is not wise for a number of reasons:
- Your production servers are subject to resource overloading (CPU, memory and bandwidth) which in such an event could inhibit it from processing your supervisor.
- In the event the server is experiencing downtime, your supervisor will also be experiencing downtime! It would be impossible for it to report that the server is not reachable.
- Supervisors eat up valuable system resources that your server could be using to deliver your front end services.
Instead, host your supervisors on an isolated server, whereby only the supervisors are running. This minimises the risk of unexpected crashes and makes the resource usage of the server very predictable, as only your supervisors will be running.
Lastly, check the logs of your supervisors when you initially start running them to ensure the process is running as expected. Do this even after your debugging or sandbox testing — production enviornments are different to sandbox environments even if you are running a
Supervisors are great at letting you know when things are not working. But you want to be notified immediately after your supervisor finds something wrong, so consider the following:
- Letting your supervisor email your entire team of mission critial failures as they are found
- Letting your supervisor deploy push notifications (E.g. Safari Push Notifications — which I explain how to set up in this article) directly to your team so they can take action immediately.
- Adopting automation services from more robust solutions like Ansible to login via SSH and attempt to identify and fix errors.