Data Engineering

Debugging Airflow in a Container with VS Code

Use VS Code’s Remote-Container Extension to Connect to Containers Running Airflow and Run Airflow tests.

David Griffiths
5 min readJan 27, 2021
A stack of shipping containers
Photo by Guillaume Bolduc on Unsplash

Why Airflow in a Container?

When working with Airflow you may have airflow locally installed and a airflow scheduler running. However your development and production environments will likely be running in containers to support high availability and fault tolerance.

Debugging Airflow containers requires some additional setup, but it’s worth it! 👇

Benefits of Debugging Airflow in Containers

  • With containers you can run the exact same operating system and python packages your dev and prod servers are using, as well as all your team mates! 😍
  • You can have several versions of airflow and its configuration in different containers, and not polluting your local environment.
  • Running Airflow Test commands does not update nay of the metadata in your airflow database.
  • Within VS Code we can inspect variables and step through your code whilst your DAG or individual task is running via VS Code’s breakpoints.
  • Running airflow locally may also not be an option if you are on Windows. One way to get around this for windows users is to set up Windows Subsystem for Linux (WSL) and run airflow locally inside WSL.

Prerequisites

Using Pycharm instead of VS Code?

Checkout this guide from Andrew Harmon

Install VS Code Remote — Container extension

Open up VS Code and go to extensions.

You can either install Remote Development extension which contains Remote — Containers or just the Remote — Containers from Microsoft. The Remote Development container also has some goodies for remote SSHing and working inside of Windows Subsystem for Linux.

Attach VS Code to a Local Running Container

After you have installed the Remote — Container extension you can open up VS Code’s command pallet (ctrl+shift+p) and type:

Remote-Containers: Attach to a Running Container…

Another drop down will appear and show you all the running containers you have locally. In our case we want to run tests on airflow’s scheduler service (your container name will be different):

VS Code will relaunch and attach to the container. To verify this all worked you should see in the bottom left corner your environment change to the container selected previously:

You can now see your container in VS Code and the containers file system (if you dont see this click explorer and navigate to where airflow is installed on the container). For me thats /usr/local/airflow/

As we have opened up Airflow’s scheduler container we can see all the DAGs in the explorer (/usr/local/airflow/dags/). Now lets try and debug those DAGs!

Launching the terminal will also launch in the container 😻.

Python: Select Interpreter

One additional step is to select your python interpreter to be used inside the container.

Run from the command pallet (ctl+shift+p) the python: Select Interpreter and choose where your python bin lives for me it was /usr/local/bin/python. Hint do which python in your container to find out where python is installed if it doesn't show up.

Launch the debugger

Install the VS Code’s Python extension within the container

To use the debugger you need to install the Python VS Code Extension

Bootstrap a launch.json

Press F5 or click the Run Debug icon on the right. If this is your first time debugging the working directory click “create a launch.json file”.

If you already have a launch.json file skip to the next section.

A drop down will appear to bootstrap how you would like your debugger to run. Select Python File.

This creates a launch.json file in .vscode folder.

The launch.json file configures how your debugger should run. We selected Python File above as our configuraiton. This will run the debugger based on the currently opened file in VS Code.

Airflow tests we want to run in the Debugger

When debugging Airflow we want to run the Airflow test commands (not a .py file) to test our DAGs and tasks. So we need to make a few alterations to launch.json.

Examples of Airflow test commands (note these are the commands for Airflow 2.0, for prior version commands see airflow’s docs):

airflow dags test DAG_NAME YYYY-MM-DD

or

airflow tasks test DAG_NAME TASK_NAME YYYY-MM-DD

YYYY-MM-DD represents the execution dates.

Alter launch.json

To get these commands to run we need to change the debugger from running the currently open python file to using Airflow’s test commands. In .vscode/launch.json we need to modify program to run airflow instead of a .py file and add in args that will be passed to program to do the testing.

The example below points program to where my airflow is installed in the container.

Hint type which airflow in your container’s terminal to get the location of where airflow is stored and enter that into program

The args are a list of arguments that will be passed into the airflow command. Above is an example of the commands for testing a DAG called bashenv with an execution date of 2020–01–15. You can swap these arguments out to run a task instead of a DAG.

Test out the Debugging

Now we have our debug’s launch.json configured we can use all the features of VS Code’s debugger.

Select some breakpoints and alter launch.json for which DAGs or tasks you want to run. And hay presto:

Not running Airflow in a container (Windows)

Checkout this medium article by Philipp Schmalen on setting up airflow locally on Windows Subsystem for Linux

Further Reading

--

--

David Griffiths

I am a data visualisation and engineering geek who loves to build solutions on the cloud.