TL; DR;

korbというツールを使うことで既存PVのStorage Classを移行することができた
タイムアウト値が不足していることに起因すると思われるエラーが発生したため最新版に搭載されているオプションを使うことで解消できた

モチベーション

以前、こちらやこちらで書いたように、我が家の自宅サーバー環境に新たにSynology製のNASを導入しました。それに伴い、既存のPersistent Volume (PV)のStorage ClassをNASのNFS共有ファイルに切り替えるというのが自然な発想です。

ただし、PVの中のデータを維持したまま利用するアプリケーションのKubernetesリソースを書き換えるというのは少し難しいです。色々調べてみたところ、korbというOSSのツールがこのモチベーションど真ん中だったため、このツールを利用してStorage Classの変更を行ってみました。

KorbによるStorage Classの変更

前提として、すでに新しいStorage Classは作成されているものとします。

korbはコンパイル済みのバイナリが提供されているため、GitHubからダウンロードします。

1
$ curl -LO https://github.com/BeryJu/korb/releases/download/v2.3.2/korb_2.3.2_linux_amd64.tar.gz
2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
3
                                 Dload  Upload   Total   Spent    Left  Speed
4
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
5
100 10.9M  100 10.9M    0     0  5764k      0  0:00:01  0:00:01 --:--:-- 13.5M
6

7
$ tar -xvzf korb_2.3.2_linux_amd64.tar.gz
8
LICENSE
9
README.md
10
korb
11

12
$ ls
13
LICENSE  README.md  korb  korb_2.3.2_linux_amd64.tar.gz

それでは、OpenSearchのデータストアとして利用しているPVのStorage Classを変更してみましょう。このPVは現在nfs-volume1というStorage Classを利用していますがこれをnfs-volume2に変更します。

1
$ kubectl get pvc -n monitoring
2
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
3
opensearch-cluster-master-opensearch-cluster-master-0                                                    Bound    pvc-ddde3b36-70b2-416d-b394-a2148242fe29   8Gi        RWO            nfs-volume1    15d
4

5
$ kubectl get sc
6
NAME          PROVISIONER                                           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
7
nfs-volume1   k8s-sigs.io/nfs-subdir-external-provisioner-volume1   Delete          Immediate           false                  22d
8
nfs-volume2   k8s-sigs.io/nfs-subdir-external-provisioner-volume2   Delete          Immediate           false                  22d

PVを利用しているPodを一時的に削除し、strategyとしてcopy-twice-name（既存のPVをコピーした新しいPVを作成してその名前を置き換える）を指定してコマンドを実行します。

1
$ ./korb --new-pvc-storage-class nfs-volume2 --source-namespace monitoring --strategy=copy-twice-name  opensearch-cluster-master-opensearch-cluster-master-0
2
...
3
WARN[0018] failed to copy                                component=mover-job error=EOF
4
WARN[0018] failed to copy                                component=mover-job error=EOF
5
WARN[0018] failed to copy                                component=mover-job error=EOF
6
INFO[0018] And we're done                                component=strategy strategy=copy-twice-name
7
INFO[0018] Cleaning up...                                component=strategy strategy=copy-twice-name

WARNINGが多く出て怖いですが、最終的には成功したようです。実際、PVをみてみるとStorage Classが変わっています。

1
$ kubectl get pvc -n monitoring
2
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
3
opensearch-cluster-master-opensearch-cluster-master-0                                                    Bound    pvc-65bbc892-eaa7-4cd0-b478-d7aa4f30ddee   8Gi        RWO            nfs-volume2    102s

サイズが大きいPVの変更時の注意点

上記のような流れで移行作業を行っていたのですが、ファイル数が多かったりするPVの移行で何回かエラーが出たためその対処法を紹介します。

途中までは移行がされているように見えるものの、数分経った時点で以下のようなエラーが出てコマンドの実行が中断されてしまいます。

1
$ ./korb --strategy copy-twice-name --new-pvc-storage-class nfs-client-hdd-ds1522 --source-namespace monitoring prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 --timeout 3600s
2
DEBU[0000] Created client from kubeconfig                component=migrator kubeconfig=/home/localadmin/.kube/config
3
DEBU[0000] Got current namespace                         component=migrator namespace=default
4
DEBU[0000] Got Source PVC                                component=migrator name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 uid=97923774-c989-4e14-b349-1310bd49b80f
5
DEBU[0000] No new Name given, using old name             component=migrator
6
DEBU[0000] Compatible Strategies:                        component=migrator
7
DEBU[0000] Copy the PVC to the new Storage class and with new size and a new name, delete the old PVC, and copy it back to the old name.  component=migrator identifier=copy-twice-name
8
DEBU[0000] Export PVC content into a tar archive.        component=migrator identifier=export
9
DEBU[0000] Import data into a PVC from a tar archive.    component=migrator identifier=import
10
DEBU[0000] User selected strategy                        component=migrator identifier=copy-twice-name
11
DEBU[0000] Set timeout from PVC size                     component=strategy strategy=copy-twice-name timeout=1m0s
12
WARN[0000] This strategy assumes you've stopped all pods accessing this data.  component=strategy strategy=copy-twice-name
13
DEBU[0000] creating temporary PVC                        component=strategy stage=1 strategy=copy-twice-name
14
DEBU[0000] skipping waiting for PVC to be bound          component=strategy stage=2 strategy=copy-twice-name
15
DEBU[0000] starting mover job                            component=strategy stage=2 strategy=copy-twice-name
16
DEBU[0002] Pod not in correct state yet                  component=mover-job phase=Pending
17
...
18
WARN[0064] Failed to move data                           component=strategy error="client rate limiter Wait returned an error: context deadline exceeded" strategy=copy-twice-name
19
INFO[0064] Cleaning up...                                component=strategy strategy=copy-twice-name

エラー文で検索してみるとgolangのKubernetes APIクライアント用ライブラリ内でのタイムアウトのようです。

色々試してみたところ、korbのバージョンを最新（v2.3.2を使っていましたがコミットハッシュ cd58f99e5029a770ef771704a6fe1f1fa9d57404 をcloneしてセルフビルドしました）にしてcopyTimeoutオプションを指定することで解消することができました。

1
$ ./hogehoge --strategy copy-twice-name --new-pvc-storage-class nfs-client-hdd-ds1522 --source-namespace monitoring prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 --timeout 3600s --copyTimeout 3600s
2
DEBU[0000] Created client from kubeconfig                component=migrator kubeconfig=/home/localadmin/.kube/config
3
DEBU[0000] Got current namespace                         component=migrator namespace=default
4
DEBU[0000] Got Source PVC                                component=migrator name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 uid=97923774-c989-4e14-b349-1310bd49b80f
5
DEBU[0000] No new Name given, using old name             component=migrator
6
DEBU[0000] Compatible Strategies:                        component=migrator
7
DEBU[0000] Copy the PVC to the new Storage class and with new size and a new name, delete the old PVC, and copy it back to the old name.  component=migrator identifier=copy-twice-name
8
DEBU[0000] Export PVC content into a tar archive.        component=migrator identifier=export
9
DEBU[0000] Import data into a PVC from a tar archive.    component=migrator identifier=import
10
DEBU[0000] User selected strategy                        component=migrator identifier=copy-twice-name
11
DEBU[0000] Set timeout from PVC size                     component=strategy strategy=copy-twice-name timeout=1h0m0s
12
WARN[0000] This strategy assumes you've stopped all pods accessing this data.  component=strategy strategy=copy-twice-name
13
DEBU[0000] creating temporary PVC                        component=strategy stage=1 strategy=copy-twice-name
14
DEBU[0000] skipping waiting for PVC to be bound          component=strategy stage=2 strategy=copy-twice-name
15
DEBU[0000] starting mover job                            component=strategy stage=2 strategy=copy-twice-name
16
DEBU[0002] Pod not in correct state yet                  component=mover-job phase=Pending
17
DEBU[0144] Cleaning up successful job                    component=mover-job
18
DEBU[0144] deleting original PVC                         component=strategy stage=3 strategy=copy-twice-name
19
DEBU[0144] Waiting for PVC Deletion, retrying            component=strategy pvc-name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 strategy=copy-twice-name
20
DEBU[0146] creating final destination PVC                component=strategy stage=4 strategy=copy-twice-name
21
DEBU[0146] starting mover job to final PVC               component=strategy stage=5 strategy=copy-twice-name
22
DEBU[0148] Pod not in correct state yet                  component=mover-job phase=Pending
23
DEBU[0330] Cleaning up successful job                    component=mover-job
24
DEBU[0330] deleting temporary PVC                        component=strategy stage=6 strategy=copy-twice-name
25
DEBU[0330] Waiting for PVC Deletion, retrying            component=strategy pvc-name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0-copy-1734180022 strategy=copy-twice-name
26
INFO[0332] And we're done                                component=strategy strategy=copy-twice-name
27
INFO[0332] Cleaning up...                                component=strategy strategy=copy-twice-name

おそらくですが、korbはデータサイズからタイムアウトの時間を自動設定していますが、その時間に間に合わなかったことでキャンセルが起きているものと思われます。そのため、タイムアウトを手動で設定してやることでキャンセルされる前に正常終了されます。

なお、エラーが起きてコマンドが停止しても、バックグラウンドでkubernetesのjobが動き続けています。エラーが起きた際にはこのjobを削除しないと再度実行することはできないため注意が必要です。