As your data science tasks become more and more sophisticated, you may find yourself having to run scripts repeatedly at fixed intervals.
For example, some projects require webscraping, or, as I described in a previous blog post, checking GitHub repos for updates.
Cron is perfect for such purposes.
I’ll describe below how to use
crontab in UNIX systems (Linux or Mac) to schedule for a Python script to run periodically at fixed times.
Set up a Python script as described previously.
Make sure you include a
main() function and use the conditional statement
if __name__ == '__main__'
to call it.
Add shebang to the top of the Python script (the first line). This line tells the UNIX system which interpreter program should be used to run the script. In our case, it would be the system python or python in a virtual environment.
For example, I want to run my script with the python interpreter in a virtual environment called
py36, which was created using
So my shebang line is:
Note: Read more about managing virtual environments with
If you just want to use the system python, you can find out where it is by typing
which python in a terminal:
To run the script with
crontab, you have to first change the permission of the script.
In a terminal, enter
chmod +x <path to .py>
Be sure to replace
<path to .py> with the actual path to your Python script.
This command line makes the script executable. It doesn’t give any output, but if there is no error message, the permission should have been changed successfully.
Note: You can check the permission to a file by running this command line in a terminal:
ls -l <path to file>
Check it before and after you run
chmod, and you can see for yourself if the permission has been changed.
Finally, you’re ready to set up a cronjob! In a terminal, enter:
You’ll see something like below, which is the nano editor interface. Here is where you edit your cronjobs.
As you can see near the bottom of the screen, I already have two cronjobs set up in my system. I’ll explain what it is going on using
30 8 * * 2,4,6 cd /home/fay/code/data_science_general; ./auto_git_pull.py >> cron_git_log.txt 2>&1
as an example.
The first 5 values (separated by space) specify the time to run the commands, which is already described in the commented-out lines near the top of the screen. Note that I use
2,4,6 for day of week (dow), meaning that I want to run the commands every Tuesday, Thursday, and Saturday.
The 6th value is the command to run.
; as if you would press ‘enter’ in a terminal.
In my example command, I first change the directory to where my script is (
Then I call the script (
./auto_git_pull.py). Finally, with
>> cron_git_log.txt 2>&1, I redirect any output or error message to a log file named
cron_git_log.txt. It is generally a good practice to keep a log for debugging.
After you edit crontab, press
X to exit the nano editor. You will be asked if you want to save the changes.
Anytime you want to edit your existing cronjob or add another cronjob, simply run
crontab -e in a terminal.
You should now be able to have your computer automatically run repetitive tasks for you! I’d love to hear what you have automated.