Can I export task logs to S3 on Astronomer?

Yes! On Astronomer, you can leverage Airflow’s native remote logging to Amazon S3, Azure Blob Storage, etc. For general Airflow users, this process would typically involve a change in airflow.cfg, but Env Vars on the Astronomer UI will do the trick here (and override airflow.cfg if you did configure it on Astro).

Airflow Doc here for reference, but you can follow the guidelines below.

On the Astronomer UI

via your Astronomer Workspace, navigate to Deployments > Configure > Environment Vars and set the following:

1. Set Remote Logging

  • Set AIRFLOW__CORE__REMOTE__LOGGING to TRUE

image%20(17)

2. AIRFLOW__CORE_REMOTE_LOG_CONN_ID

  • Set to MyS3Conn (or whatever s3 connID you might already have it set to in Airflow)

3. AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER

  • Set to your S3 bucket path, e.g. s3://my-bucket/path/to/logs

4. AIRFLOW__CORE_ENCRYPT_S3_LOGS`

  • Set to FALSE

Airflow Connection

Now, let’s create a connection in the Airflow UI for S3 with the following configs:

  • Conn Id: MyS3Conn
  • Conn Type: S3
  • Extra: {"aws_access_key_id":"your_aws_key_id", "aws_secret_access_key": "your_aws_secret_key"}

Notes:

  • Based on the above example, Airflow will try to use S3Hook('MyS3Conn')
  • Expect your logs to follow this convention: {dag_id}/{task_id}/{execution_date}/{try_number}.log
  • Astronomer by default has a 15-day log retention period, which is hard coded as an env var
    (ASTRONOMER__AIRFLOW__WORKER_LOG_RETENTION) and unfortunately not configurable

Thank you for this!
One thing to note is the underscore’s for the Environment variables seem to be slightly off. Based on my configuration the four env vars keys should be:
AIRFLOW__CORE__REMOTE_LOGGING
AIRFLOW__CORE__REMOTE_LOG_CONN_ID
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER
AIRFLOW__CORE__ENCRYPT_S3_LOGS

Additionally, the values should be “True” or “False” instead of “TRUE” and “FALSE”

Thanks for the update, @bryanlintonguild!

Additionally, make sure that the credentials supplied in the Airflow connection have the aws_access_key_id in the Login field and the aws_secret_access_key in the Password field. That should do the trick!

how can I disable the airflow logs writing to the local machine (Ec2) Airflow 2.0.1?

Airflow logs has to be written/stored on the machine before it is shipped to S3.

You will need to pursue other remote logging options like ElasticSearch to satisfy your requirement.