Get Found File Name or Path from S3 Sensor

gregJ · December 1, 2022, 8:07pm

I have created the below function that creates a task for a list of files I am looking for.
The file names change daily, so there’s a function that sets the bucket_key.
This part works great.
However, once the sensor sees the file, I can’t seem to find a way to get the filename or path found.
Nothing hits xcom from the sensor.

Is there a way to do this?
The only thing I can come up with is to customize the s3 sensor to do an xcom push once the poke succeeds

sensor_tasks = [
    S3KeySensor(
            task_id=f,
            bucket_key=myFiles[f],
            aws_conn_id='AWS S3',
            poke_interval=10,
            timeout=10800,
            soft_fail=True,
            wildcard_match=True,
            bucket_name=Variable.get('S3-MasterBucket'),
            verify=False,  
            mode='reschedule'          
        )
        for f in myFiles.keys()
    ]

magdagultekin · December 5, 2022, 12:16pm

Hi @gregJ, thanks for reaching out!

The job of sensors is to wait for something to happen, before moving to the next task, not necessarily return anything.

You could create a custom operator that would do what you need or use get_file_list Astro SDK operator (docs) that retrieves a list of available files based on a storage path and the Airflow connection. Please see an example below.

import pendulum

from airflow import DAG

from astro.files import get_file_list

AWS_CONN_ID = "aws_default"
AWS_BUCKET_NAME = "astro-onboarding"
AWS_PREFIX = "astro-sdk-demo"

with DAG(dag_id="astro_sdk_example",
    start_date=pendulum.datetime(2022, 12, 5, tz="UTC"),
    schedule=None,
):

    get_file_list(
        conn_id=AWS_CONN_ID,
        path=f"s3://{AWS_BUCKET_NAME}/{AWS_PREFIX}",
    )

This created the following XCom (two files were found: empdatasample.csv and items.csv):

[
File(path='s3://astro-onboarding/astro-sdk-demo/empdatasample.csv', conn_id='aws_default', filetype=None, normalize_config=None, is_dataframe=False, is_bytes=False, uri='astro+s3://aws_default@astro-onboarding/astro-sdk-demo/empdatasample.csv', extra={}), 
File(path='s3://astro-onboarding/astro-sdk-demo/items.csv', conn_id='aws_default', filetype=None, normalize_config=None, is_dataframe=False, is_bytes=False, uri='astro+s3://aws_default@astro-onboarding/astro-sdk-demo/items.csv', extra={})
]

Also, wanted to mention that you could use dynamic task mapping to generate tasks dynamically based on the output of a previous task, please see this tutorial.

Topic		Replies	Views
Check for multiple keys in different S3 buckets using S3KeySensor Airflow	3	2866	August 23, 2022
Question About Xcoms Airflow	1	1366	May 13, 2021
Want to capture output of the command from SSHOperator Airflow	0	2080	July 13, 2022
Feedback on my implementation Airflow	5	1931	December 18, 2020
Pull xcom inside a task group class instance Airflow airflow , task-group	1	476	February 28, 2024

Get Found File Name or Path from S3 Sensor

Related Topics