Log won't show in Airflow UI

We recently upgraded our Astronomer Enterprise platform to v0.23.10.
Since the upgrade, the tasks’ logs do not load into the Airflow UI.

The symptom is the same weather the task was sucessfull or not.
We can see the loading animation but it just hangs forever and the page stays blank.
It’s doing the same thing in both our Airflow 2.0 and Airflow 1.10.5 deployments.

We get this exception in our webserver’s log when we try to access the task log:

[2021-02-09 13:50:21,194] {base.py:150} WARNING - POST http://astronomer-elasticsearch-nginx.bi-astronomer:9200/_count [status:N/A request:30.003s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 242, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 386, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1277, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1323, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1272, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1032, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 972, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 167, in _new_conn
    % (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7fa1b58b0dd0>, 'Connection to astronomer-elasticsearch-nginx.bi-astronomer timed out. (connect timeout=30)')

Do you have any idea of what could be causing this ?

The behaviour is still the same after upgrading to v0.23.11.

Hi @Ubald! Thanks for reaching out, and I’m sorry to hear you’re still having trouble seeing logs in the Airflow UI on Astronomer’s latest.

Are you seeing logs populate in Kibana at all? Do you still see Connection to astronomer-elasticsearch-nginx.bi-astronomer timed out?

Given the potential complexity of this issue, can you reach out to Astronomer Support so our team can dig into the behavior you’re seeing in detail? We’re more than happy to help you there, and I’ll keep an eye on that ticket and post back takeaways here for anyone publicly following.

Hi Paola,
Webserver and scheduler logs seem to be showing normaly in Kibana. I can see the “GET” being done by the webserver.
I see " Connection to astronomer-elasticsearch-nginx.bi-astronomer timed out" in the Kubernetes Dashboard for the webserver pod.

I made a ticket in Astronomer Support and we have a call this afternoon so hopefully we can get this working soon.

Thank you.

1 Like

Hi Ubald,

I seem to get the same issue. Was the problem solved for you?

Best,
Vasu

There’s a NetworkPolicy in the astronomer namespace that is meant to allow communication between your Airflow webserver pods and the elasticsearch-nginx pod. This is how the Airflow UI fetches logs. There are some incorrect selectors in that NetworkPolicy that are preventing the communication.

You can view this with: kubectl get networkpolicy astronomer-elasticsearch-nginx-policy -o yaml​

The part we’re interested in is this:

  podSelector:
    matchLabels:
      component: elasticsearch
      release: astronomer
      role: nginx
      tier: logging

Those selectors should match the labels on the elasticsearch-nginx​pod, but they don’t. To make things work as intended, that section needs to be changed to:

  podSelector:
    matchLabels:
      component: es-ingress-controller
      release: astronomer
      tier: elasticsearch

You can do this by running kubectl edit networkpolicy astronomer-elasticsearch-nginx-policy -n astronomer​ and changing that block as described. It will go into effect immediately so once that is changed you can go check your task logs and they should show up.

This change can get reverted during “helm upgrade” actions. If so, just perform the workaround above again. We have a PR to fix this in a future release.

1 Like

Hi Vasu,
The solution posted by Kris fixed the issue for us.

Thank you.

Hi Kris,

Thanks for replying. I figured out the issue with my problem. The logs were not showing up only when the DAG was failing. The reason for DAG failure was that the task won’t start for 30 seconds and would give me a timeout error.If you see the image below, it shows that the task hasn’t even started.

So I changed the AIRFLOW__ELASTICSEARCH_CONFIGS__TIMEOUT setting to 300 seconds instead of 30 and the DAGs are not failing anymore. I believe this 30 seconds timeout is the new change in Airflow 2.0 because it wasn’t the case before.

Best,
Vasu