I’m trying to create a DAG for this ETL pipeline that does 2 taks?
- Execute a sql query on a raw dataset that stores results in a csv file
- Loads the resulting data into Amazon redshift
Can anyone please assist? and hopefully reference a source used that could even help me do this on my own? I’ve fond plenty of useful DAG tutorials on youtube but they dont seem to address my issue. I’m also the only data engineer/analyst in my company so I dont have anyone to ask for help irl
So far, i’ve done the basic for a DAG, it’s just adding the tasks for the 2 that i need:
import pandas as pd
from datetime import datetime
from airflow.models import DAG
from airflow.operators.python import PythonOperator
from airflow.hooks.postgres_hook import PostgresHook
from airflow.models import Variable
from airflow.operators.bash import BashOperator
from airflow.providers.postgres.operators.postgres import PostgresOperator
with DAG(
dag_id='postgres_db_dag',
schedule_interval='@daily',
start_date=datetime(year=2022, month=2, day=1),
catchup=False
) as dag:
pass
Thanks!