Upstream or downstream the battle of task dependencies

Upstream or downstream the battle of task dependencies

Task dependencies are useful and a popular feature in Airflow. Simply put they define an order of task execution. Basically which tasks to run and in what order. While it’s not required, task dependencies are normally always set.

What if a task dependence is not defined? Well then Airflow takes matters into it’s own hands with no guarantee of order. Madness I know!

We refer to Task dependencies as either upstream or downstream tasks.
It’s easy to get confused on when to use an upstream or downstream operator. The simplest analogy and the way I remember is that upstream means before and downstream means after. This means that any upstream tasks would need to complete prior to any downstream ones. Simple.

How to you tell airflow which order you would like the tasks to be executed in? by using the bitshift operator (<< or >>).

Lets take a look at an example of task dependencies in action, in the below example we have two BashOperator tasks. Task1 which calls a date command and task2 which echos the word “hi”.

The order in which the tasks are to be executed can be defined using bitshift operators. In this example, task1 will be run before task2, we action this by using the upstream operator (>>) between task1 and task2. You could also define this in reverse using the downstream operator to accomplish the same thing. In this case, it’d be task2 << task1.

task1 = BashOperator(
    task_id="print_date", 
    bash_command="date", 
    dag=dag
    )

task2 = BashOperator(
    task_id="echo", 
    bash_command="echo Hi",
    dag=dag)

# task1 to run before task2 
task1 >> task2 

# or task2 << task1

What would this look like in the Airflow UI?

Below is how the Airflow UI graphically represents the task dependencies of this DAG, we’re looking at the graph view within the Airflow web interface.

Chained dependencies

Dependencies between tasks can be defined in many of ways to suit the specific needs of your workflow. For example, multiple tasks can be chained together by setting one task as a dependency of another, and what do you know, chained dependencies.

In this case, task_1 is dependent on task_2, which is dependent on task_3, and so on. This can be visualised in the Airflow UI via the graph view, which shows the order of dependencies.

Mixed dependencies

It is also possible to use a combination of upstream and downstream dependencies within the same workflow. For example, if task_1 is set as an upstream dependency of task_2, and then task_3 is set as a downstream dependency of task_2, this creates a unique configuration, where the first and third task must finish before the second task can begin.

Mastering the intricacies of task definition and utilising bitshift operators takes practice. As away I recommend you checking out Airflow documentation the best way to learn is to do, so go play with it.

Tim