How to run a pipeline with these 2 taks?

I’m trying to create a DAG for this ETL pipeline that does 2 taks?

  1. Execute a sql query on a raw dataset that stores results in a csv file
  2. Loads the resulting data into Amazon redshift

Can anyone please assist? and hopefully reference a source used that could even help me do this on my own? I’ve fond plenty of useful DAG tutorials on youtube but they dont seem to address my issue. I’m also the only data engineer/analyst in my company so I dont have anyone to ask for help irl

So far, i’ve done the basic for a DAG, it’s just adding the tasks for the 2 that i need:

import pandas as pd

from datetime import datetime

from airflow.models import DAG

from airflow.operators.python import PythonOperator

from airflow.hooks.postgres_hook import PostgresHook

from airflow.models import Variable

from airflow.operators.bash import BashOperator

from airflow.providers.postgres.operators.postgres import PostgresOperator

with DAG(

    dag_id='postgres_db_dag',

    schedule_interval='@daily',

    start_date=datetime(year=2022, month=2, day=1),

    catchup=False

) as dag:

    pass

Thanks!

Hi @redrum

You could give the SqltoS3Operator and S3ToRedshiftOperator a try to accomplish this use case.

Each of those links will have example DAGs where you can see how to set the operators up, or browse other example DAGs in the registry for inspiration.