그렇지. 배치 스케줄러인데 시간이 중요하지..
툴의 목적과 그에 따른 중요한 설정이 무엇일까 다시한번 생각하자.
logical_date (execution_date)
- batch 처리를 할 때 윈도우 시작 시간.
- 배치처리를 할 때 스케줄 간격이 1시간이고 00분에 시작된다고 하면 오후 2:00에 execution_date(logical_date)는 오후 1:00이다.
Airflow 시간은 UTC에 맞춰져 있다.
로그 볼 때도 시간대가 UTC였다. 다른 시간대이면 타임존을 넣어서 그 지역의 시간으로 볼 수 있게 해두었다. Airflow는 내부적으로 UTC로 정보를 기록한다.
만약 airflow 시간을 UTC가 아닌 시스템이나 IANA 타임존으로 세팅하면 모든 워커 노드들도 시간세팅을 같이 맞춰줘야 한다.(아...그냥 UTC 쓰자..ㅋㅋ)
참조
- https://airflow.apache.org/docs/apache-airflow/stable/timezone.html?highlight=time
- https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html?highlight=logical_date
Time Zones
Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. It allows you to run your DAGs with time zone dependent schedules. At the moment, Airflow does not convert them to the end user’s time zone in the user interface. It will always be displayed in UTC there. Also, templates used in Operators are not converted. Time zone information is exposed and it is up to the writer of DAG to decide what do with it.
This is handy if your users live in more than one time zone and you want to display datetime information according to each user’s wall clock.
Even if you are running Airflow in only one time zone, it is still good practice to store data in UTC in your database (also before Airflow became time zone aware this was also the recommended or even required setup). The main reason is that many countries use Daylight Saving Time (DST), where clocks are moved forward in spring and backward in autumn. If you’re working in local time, you’re likely to encounter errors twice a year, when the transitions happen. (The pendulum and pytz documentation discuss these issues in greater detail.) This probably doesn’t matter for a simple DAG, but it’s a problem if you are in, for example, financial services where you have end of day deadlines to meet.
The time zone is set in airflow.cfg
. By default it is set to utc, but you change it to use the system’s settings or an arbitrary IANA time zone, e.g. Europe/Amsterdam
. It is dependent on pendulum
, which is more accurate than pytz
. Pendulum is installed when you install Airflow.
Default time zone
The default time zone is the time zone defined by the default_timezone
setting under [core]
. If you just installed Airflow it will be set to utc
, which is recommended. You can also set it to system
or an IANA time zone (e.g. Europe/Amsterdam
). DAGs are also evaluated on Airflow workers, it is therefore important to make sure this setting is equal on all Airflow nodes.