I’m trying to create a DAG for this ETL pipeline that does 2 taks?
- Execute a sql query on a raw dataset that stores results in a csv file
- Loads the resulting data into Amazon redshift
Can anyone please assist? and hopefully reference a source used that could even help me do this on my own? I’ve fond plenty of useful DAG tutorials on youtube but they dont seem to address my issue. I’m also the only data engineer/analyst in my company so I dont have anyone to ask for help irl
So far, i’ve done the basic for a DAG, it’s just adding the tasks for the 2 that i need:
import pandas as pd from datetime import datetime from airflow.models import DAG from airflow.operators.python import PythonOperator from airflow.hooks.postgres_hook import PostgresHook from airflow.models import Variable from airflow.operators.bash import BashOperator from airflow.providers.postgres.operators.postgres import PostgresOperator with DAG( dag_id='postgres_db_dag', schedule_interval='@daily', start_date=datetime(year=2022, month=2, day=1), catchup=False ) as dag: pass