본문 바로가기

D.S/DE

210204목 - gke에 airflow 설치하기

728x90

 

 

 

로컬에서 도커로 airflow 테스트를 하다 GKE 위에서 airflow 작업을 하기로 했다. 그 이유는 다음과 같다.

  • 클라우드 서비스를 주로 쓰는 분위기
  • 요즘은 효율적인 서비스, 자원 관리에 쿠버네티스를 주로 사용하므로, 쿠버네티스 위에서 서비스를 할 수 있는 법을 알고자 함.

 

이 작업을 하긴 헀지만 여러 가직 선행지식이 필요함을 느꼈다.

  • 쿠버네티스와 관련 툴 (helm) 사용법
    • 이 분들의 개념과 구조
    • 쿠버와 helm 차트의 yaml 파일 작성법
    • 커맨드
  • gcloud 커맨드에 익숙해지기
    • 클러스터 생성시 설정내용

 

작업하면서 배운 것들은 어느정도 습득을 하게 되면 프로젝트 등에서 툴을 사용하는 데는 문제가 줄어든다. 하지만 지식을 파편적으로 가지고 있게 되어서 전체적으로는 그 툴에 대해 모르면서 안다고 생각할 수 있게 되는 게 함정이다.

최근에 이 문제로 크게 느낀 적이 있어서 지금은 노션에 학습하면서 배운 것들을 정리하고, 툴에 대한 컨셉, 전반적인 활용법 등도 찾아서 정리하기 위해 노력하고 있다.

 

 

이 작업은 다음을 참조했다.

 

이 글에서는 다음을 작업한다.

  • GKE에서 쿠버네티스 클러스터를 생성
  • helm과 values.yaml을 이용해서 airflow 환경을 설정하고 배포
  • GCP 로드밸런스를 통해서 GKE에 airflow 서버를 노출

 

추가적으로 나는 git-sync 설정을 수정해서 내 깃의 dag를 연결시켰다.

처음에는 GKE에서 튜토리얼로 제공하는 my-first-cluster에 이 작업을 하려고했는데 노드 자원이 부족해서 위의 튜토리얼에서처럼 클러스터를 새로 생성해서 다시 작업했다.

보면서 클러스터 region이나 zone개념 등등 공부해야 할 것이 많다.

 

 

다음에 할 작업:

  • helm차트를 보는 게 익숙체 않아서 worker의 log를 cloud storage로 연결 작업중. REMOTE 관련 ENV를 추가했는데 뭔가가 잘 안 되어서 그런지 작동을 안 함
  • DAG 테스트

 

레포추가 & 확인


helm repo add apache-airflow https://airflow.apache.org
"apache-airflow" has been added to your repositories


helm repo list
NAME            URL
stable          https://charts.helm.sh/stable
local           http://127.0.0.1:8879/charts
apache-airflow  https://airflow.apache.org

 

 

helm3 설치


https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
bash ./get_helm.sh


 

https://airflow.apache.org/docs


helm upgrade --install airflow apache-airflow/airflow --namespace airflow --debug

 


kubectl port-forward svc/airflow-webserver 7070:8080 --namespace airflow
Forwarding from 127.0.0.1:7070 -> 8080
Forwarding from [::1]:7070 -> 8080

 

 

작업을 할 때 gke 튜토리얼이 제공하는 노드3개의 클러스터를 이용했는데 에어플로우 웹서버가 port forward가 계속 실패해서 workload에 들어가보니 서버 자원이 모자라다는 경고가 떠있었다. pod상태를 보니 ...

 


 kubectl get pod --namespace airflow
NAME                                 READY   STATUS                   RESTARTS        AGE
airflow-flower-5d59bf75fc-kfjfk      0/1     CrashLoopBackOff         6 (53s ago)     8m54s
airflow-postgresql-0                 1/1     Running                  0               66s
airflow-redis-0                      1/1     Running                  0               65s
airflow-scheduler-c7647fff-trj2n     2/2     Running                  1 (26m ago)     64m
airflow-statsd-7586f9998-mpkcz       1/1     Running                  0               8m53s
airflow-triggerer-799fbf6779-6m9sn   0/1     Init:0/1                 0               8m54s
airflow-webserver-85fb5d6b76-4s8jf   0/1     Error                    1               76m
airflow-webserver-85fb5d6b76-54f97   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-6wns7   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-7mkkp   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-8wfdt   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-96454   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-f7k8k   0/1     ContainerStatusUnknown   1               63m
airflow-webserver-85fb5d6b76-gl869   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-hvmk5   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-kt858   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-tffrv   0/1     Evicted                  0               63m
airflow-webserver-85fb5d6b76-wmv8j   0/1     CrashLoopBackOff         12 (4m5s ago)   58m
airflow-worker-0                     0/2     Init:0/1                 0               67s
(base) lucca@luccaslab-s1:/data/lucca/airflow$

 

 

배포와 서비스 상태.


kubectl get deployment,svc --namespace airflow
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/airflow-flower      1/1     1            1           10h
deployment.apps/airflow-scheduler   1/1     1            1           10h
deployment.apps/airflow-statsd      1/1     1            1           10h
deployment.apps/airflow-triggerer   1/1     1            1           10h
deployment.apps/airflow-webserver   0/1     1            0           10h

NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/airflow-flower                ClusterIP   10.108.11.47            5555/TCP            10h
service/airflow-postgresql            ClusterIP   10.108.9.229            5432/TCP            10h
service/airflow-postgresql-headless   ClusterIP   None                    5432/TCP            10h
service/airflow-redis                 ClusterIP   10.108.10.23            6379/TCP            10h
service/airflow-statsd                ClusterIP   10.108.1.50             9125/UDP,9102/TCP   10h
service/airflow-webserver             ClusterIP   10.108.13.166           8080/TCP            10h
service/airflow-worker                ClusterIP   None                    8793/TCP            10h

 

 

그래서 새 클러스터를 다시 생성. 노드 개수는 1개로 했는데 지정해줘도 기본이 3개인가?

 


 gcloud container clusters create airflow-cluster \
> --machine-type n1-standard-4 \
> --num-nodes 1 \
> --region "us-central1"
Default change: VPC-native is the default mode during cluster creation for versions greater than 1.21.0-gke.1500. To create advanced routes based clusters, please pass the `--no-enable-ip-alias` flag
Note: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
Creating cluster airflow-cluster in us-central1...done.
Created [https://container.googleapis.com/v1/projects/elt-pipeline/zones/us-central1/clusters/airflow-cluster].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1/airflow-cluster?project=elt-pipeline
kubeconfig entry generated for airflow-cluster.
NAME             LOCATION     MASTER_VERSION   MASTER_IP     MACHINE_TYPE   NODE_VERSION     NUM_NODES  STATUS
airflow-cluster  us-central1  1.21.6-gke.1500  34.66.248.71  n1-standard-4  1.21.6-gke.1500  3          RUNNING

 

GKE의 쿠버 클러스터와 연결


gcloud container clusters get-credentials airflow-cluster --region "us-central1"
Fetching cluster endpoint and auth data.
kubeconfig entry generated for airflow-cluster.

 

네임스페이스 생성


kubectl create namespace airflow
namespace/airflow created

 

airflow 설치와 확인

airflow-flower and airflow-redis service는 Celery Excutor를 위한 것임. 후에 이것을 LocalExcutor로 변경할 것임.

 



helm upgrade --install airflow apache-airflow/airflow -n airflow --debug

kubectl get pod --namespace airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-flower-5d59bf75fc-m5vdb      1/1     Running   0          2m50s
airflow-postgresql-0                 1/1     Running   0          2m50s
airflow-redis-0                      1/1     Running   0          2m50s
airflow-scheduler-c7647fff-jkcrz     2/2     Running   0          2m50s
airflow-statsd-7586f9998-stmpc       1/1     Running   0          2m51s
airflow-triggerer-799fbf6779-ps267   1/1     Running   0          2m51s
airflow-webserver-7b4477d47c-kzfkz   1/1     Running   0          74s
airflow-worker-0                     2/2     Running   0          65s


kubectl get deployment,svc --namespace airflow
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/airflow-flower      1/1     1            1           5m1s
deployment.apps/airflow-scheduler   1/1     1            1           5m1s
deployment.apps/airflow-statsd      1/1     1            1           5m1s
deployment.apps/airflow-triggerer   1/1     1            1           5m1s
deployment.apps/airflow-webserver   1/1     1            1           5m1s

NAME                                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
service/airflow-flower                ClusterIP   10.48.14.186           5555/TCP            5m1s
service/airflow-postgresql            ClusterIP   10.48.2.227            5432/TCP            5m1s
service/airflow-postgresql-headless   ClusterIP   None                   5432/TCP            5m1s
service/airflow-redis                 ClusterIP   10.48.8.106            6379/TCP            5m1s
service/airflow-statsd                ClusterIP   10.48.3.22             9125/UDP,9102/TCP   5m1s
service/airflow-webserver             ClusterIP   10.48.6.184            8080/TCP            5m1s
service/airflow-worker                ClusterIP   None                   8793/TCP            5m1s

 

 

성공적으로 웹서버 실행 후 로그인.

 

Airflow deployment는 Helm 차트의 values.yaml의 설정을 따른다. 따라서 설정을 바꾸기 위해 values.yml 파일을 복사한다.

바꿀 설정은:

  • CeleryExcuter를 LocalExcutor로 변경
  • ClusterIP 서비스를 LoadBalancer로 변경

 

참고로 Dag는 어떻게 연결해야 하나 values.yaml을 살펴보다 git-sync 설정을 확인했다.

 


# Git sync
dags:
  persistence:
    # Enable persistent volume for storing dags
    enabled: false
    # Volume size for dags
    size: 1Gi
    # If using a custom storageClass, pass name here
    storageClassName:
    # access mode of the persistent volume
    accessMode: ReadWriteOnce
    ## the name of an existing PVC to use
    existingClaim:
  gitSync:
    enabled: false

    # git repo clone url
    # ssh examples ssh://git@github.com/apache/airflow.git
    # git@github.com:apache/airflow.git
    # https example: https://github.com/apache/airflow.git
    repo: https://github.com/apache/airflow.git
    branch: v2-2-stable
    rev: HEAD
    depth: 1
    # the number of consecutive failures allowed before aborting
    maxFailures: 0
    # subpath within the repo where dags are located
    # should be "" if dags are at repo root
    subPath: "tests/dags"
    # if your repo needs a user name password
    # you can load them to a k8s secret like the one below
    #   ---
    #   apiVersion: v1
    #   kind: Secret
    #   metadata:
    #     name: git-credentials
    #   data:
    #     GIT_SYNC_USERNAME: 
    #     GIT_SYNC_PASSWORD: 
    # and specify the name of the secret below
    #
    # credentialsSecret: git-credentials
    #
    #
    # If you are using an ssh clone url, you can load
    # the ssh private key to a k8s secret like the one below
    #   ---
    #   apiVersion: v1
    #   kind: Secret
    #   metadata:
    #     name: airflow-ssh-secret
    #   data:
    #     # key needs to be gitSshKey
    #     gitSshKey: 
    # and specify the name of the secret below
    # sshKeySecret: airflow-ssh-secret
    #
    # If you are using an ssh private key, you can additionally
    # specify the content of your known_hosts file, example:
    #
    # knownHosts: |
    #    , 
    #    , 
    # interval between git sync attempts in seconds
    wait: 60
    containerName: git-sync
    uid: 65533

 

 

수정한 values.yaml 파일을 이용해서 upgrade -install을 실행하면 수정된 부분만 다시 배포되는 것을 로그로 확인했다.


helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f values.yaml --debug

history.go:56: [debug] getting history for release airflow
upgrade.go:142: [debug] preparing upgrade for airflow
upgrade.go:150: [debug] performing update for airflow
upgrade.go:322: [debug] creating upgraded release for airflow
client.go:218: [debug] checking 24 resources for changes
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-create-user-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-migrate-database-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-scheduler"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-triggerer"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-webserver"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-postgresql"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-airflow-metadata"
client.go:510: [debug] Patch Secret "airflow-webserver-secret-key" in namespace airflow
client.go:510: [debug] Patch ConfigMap "airflow-airflow-config" in namespace airflow
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-launcher-role"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-log-reader-role"
client.go:510: [debug] Patch RoleBinding "airflow-pod-launcher-rolebinding" in namespace airflow
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-log-reader-rolebinding"
client.go:501: [debug] Looks like there are no changes for Service "airflow-postgresql-headless"
client.go:501: [debug] Looks like there are no changes for Service "airflow-postgresql"
client.go:239: [debug] Created a new Service called "airflow-scheduler" in airflow

client.go:501: [debug] Looks like there are no changes for Service "airflow-statsd"
client.go:510: [debug] Patch Service "airflow-webserver" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:510: [debug] Patch StatefulSet "airflow-postgresql" in namespace airflow
client.go:239: [debug] Created a new StatefulSet called "airflow-scheduler" in airflow

client.go:267: [debug] Deleting ServiceAccount "airflow-flower" in namespace airflow...
client.go:267: [debug] Deleting ServiceAccount "airflow-redis" in namespace airflow...
client.go:267: [debug] Deleting ServiceAccount "airflow-worker" in namespace airflow...
client.go:267: [debug] Deleting Secret "airflow-airflow-result-backend" in namespace airflow...
client.go:267: [debug] Deleting Service "airflow-flower" in namespace airflow...
client.go:267: [debug] Deleting Service "airflow-redis" in namespace airflow...
client.go:267: [debug] Deleting Service "airflow-worker" in namespace airflow...
client.go:267: [debug] Deleting Deployment "airflow-flower" in namespace airflow...
client.go:267: [debug] Deleting Deployment "airflow-scheduler" in namespace airflow...
client.go:267: [debug] Deleting StatefulSet "airflow-redis" in namespace airflow...
client.go:267: [debug] Deleting StatefulSet "airflow-worker" in namespace airflow...
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:328: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:299: [debug] Starting delete for "airflow-create-user" Job
client.go:328: [debug] jobs.batch "airflow-create-user" not found
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-create-user with timeout of 5m0s
client.go:557: [debug] Add/Modify event for airflow-create-user: ADDED
client.go:596: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

# ... todfur


You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)

###########################################################
#  WARNING: You should set a static webserver secret key  #
###########################################################

You are using a dynamically generated webserver secret key, which can lead to
unnecessary restarts of your Airflow components.

Information on how to set a static webserver secret key can be found here:
https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key

 

 

 

LocaExcutor는 Redis 브로커와 Flower UI가 필요하지 않기 때문에 이 두 pod는 삭제되고 airflow-scheduler 서비스가 생성되었다.

 

git-sync 부분을 나의 깃허브로 연결하고 다시 파드를 업데이트했다.

 

수정한 부분:


# Git sync
dags:
  persistence:
    # Enable persistent volume for storing dags
    enabled: true
    # Volume size for dags
    size: 1Gi
    # If using a custom storageClass, pass name here
    storageClassName:
    # access mode of the persistent volume
    accessMode: ReadWriteOnce
    ## the name of an existing PVC to use
    existingClaim:
  gitSync:
    enabled: true
    repo: https://github.com/ymmu/airflow_test.git
    # repo: https://github.com/apache/airflow.git
    branch: dag_test
    # branch: v2-2-stable

 

업데이트


helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f values.yaml --debug

# 패치 확인
client.go:510: [debug] Patch Secret "airflow-webserver-secret-key" in namespace airflow
client.go:510: [debug] Patch ConfigMap "airflow-airflow-config" in namespace airflow
client.go:239: [debug] Created a new PersistentVolumeClaim called "airflow-dags" in airflow

client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:510: [debug] Patch StatefulSet "airflow-postgresql" in namespace airflow
client.go:510: [debug] Patch StatefulSet "airflow-scheduler" in namespace airflow


 

airflow ui에서 나의 dag 파일들을 볼 수 있다.

또 persistence storage도 true로 해두었는데 cluster storage에 dag 스토리지가 생성된 걸 확인할 수 있었다.

로그도 이 설정이 있던데. 이렇게 storage에 기록하면 이 기록을 sql같은 다른 툴에 담거나 볼 수 있게 하는 방법도 찾아봐야겠다.

 

 

 

 

 

깃 싱크가 잘 되는지 파일 하나를 추가한 다음 UI를 다시 확인해 보았고 잘 싱크되는 것을 확인했다. yaml파일에는 60초 주기로 싱크해놓게 되어있었다.

반응형