注意:
- Oracle 提供的免费实验室环境中提供了此教程。
- 它对 Oracle Cloud Infrastructure 身份证明、租户和区间使用示例值。完成实验室后,请用特定于您的云环境的实验室值替换这些值。
在 Oracle Cloud Native Environment 上扩展 Kubernetes 集群
简介
此教程演示如何在 Oracle Cloud Native Environment 中扩展现有 Kubernetes 集群。
要扩展 Kubernetes 集群意味着添加节点,删除节点也同样会发生缩减。节点可以是控制层或 worker 节点。Oracle 建议不要同时纵向扩展和收缩集群。而是在两个单独的命令中执行纵向扩展,然后横向收缩。
为了避免记忆分裂情况和维护法定数目,建议按奇数扩展 Kubernetes 群集控制层或 worker 节点。例如,3、5 或 7 个控制层或 worker 节点可确保集群的可靠性。
此教程使用在 Oracle Cloud Native Environment 上运行的现有高可用性 Kubernetes 集群,并部署了三个模块:
- Kubernetes (
kubernetes
) - Helm (
helm
) - Oracle Cloud Infrastructure Cloud Controller Manager 模块 (
oci-ccm
)
开始部署包括以下内容:
- 1 个操作员节点
- 3 个控制层节点
- 5 个 Worker 节点
它基于实验室:
- 部署 Oracle Cloud Native Environment
- 使用 Oracle Cloud Native Environment 部署外部负载平衡器
- 在 Oracle Cloud Native Environment 上使用 OCI Cloud Controller Manager
目标
此教程/实验分步完成在集群中配置和添加两个新的控制层节点和两个新的 worker 节点。然后,教程/实验演示了如何通过从集群中删除相同的节点来缩减集群。
在此方案中,X.509 专用 CA 证书用于保护节点之间的通信。还有其他方法可以管理和部署证书,例如使用 HashiCorp Vault 密钥管理器,或使用您自己的证书,由受信任的证书颁发机构 (Certificate Authority,CA) 签名。本教程中不包括这些其他方法。
先决条件
注:如果使用免费实验室环境,这些先决条件将作为起点提供。
除了要求在 Oracle Cloud Native Environment 上运行高可用性 Kubernetes 集群之外,还需要以下项:
-
其他 4 个 Oracle Linux 系统,用作:
- 2 个 Kubernetes 控制层节点
- 2 个 Kubernetes worker 节点
-
访问负载平衡器(免费实验室环境使用 OCI 负载平衡器)
-
系统应具有:
- 至少安装并运行 Unbreakable Enterprise Kernel Release 6 (UEK R6) 的最新 Oracle Linux 8 (x86_64)。
- 已完成安装 Oracle Cloud Native Environment 的先决条件步骤
设置实验室环境
注:使用空闲实验环境时,有关连接和其他使用说明,请参见Oracle Linux Lab Basics 。
信息:免费实验室环境在提供的节点上部署 Oracle Cloud Native Environment,可用于创建环境。此部署大约需要 20-25 分钟才能在启动后完成。因此,您可能希望在此运行时离开,然后返回以完成实验室。
除非另有说明,否则可以从 ocne-operator 节点执行空闲实验室环境中的所有步骤,建议首先打开终端窗口并连接到该节点。在 Oracle Cloud Native Environment 的多节点安装中,kubectl
命令在操作员、控制层节点或为 kubectl
配置的其他系统上运行。
-
打开终端并通过 ssh 连接到 ocne-operator 系统。
ssh oracle@<ip_address_of_ol_node>
安装 Kubernetes 模块
重要提示:除非另有说明,否则将从 ocne-operator 节点执行所有操作。
免费实验室环境在部署期间创建高可用性 Oracle Cloud Native Environment 安装,包括准备环境和模块配置。
-
查看
myenvironment.yaml
文件。cat ~/myenvironment.yaml
免费实验室环境部署在创建集群时使用三个控制层节点和五个 worker 节点。
输出示例:
[oracle@ocne-operator ~]$ cat ~/myenvironment.yaml environments: - environment-name: myenvironment globals: api-server: 127.0.0.1:8091 secret-manager-type: file olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert olcne-node-key-path: /etc/olcne/configs/certificates/production/node.key modules: - module: kubernetes name: mycluster args: container-registry: container-registry.oracle.com/olcne load-balancer: 10.0.0.168:6443 master-nodes: - ocne-control01.lv.vcnf998d566.oraclevcn.com:8090 - ocne-control02.lv.vcnf998d566.oraclevcn.com:8090 - ocne-control03.lv.vcnf998d566.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 selinux: enforcing restrict-service-externalip: true restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key - module: helm name: myhelm args: helm-kubernetes-module: mycluster - module: oci-ccm name: myoci oci-ccm-helm-module: myhelm oci-use-instance-principals: true oci-compartment: ocid1.compartment.oc1..aaaaaaaau2g2k23u6mp3t43ky3i4ky7jpyeiqcdcobpbcb7z6vjjlrdnuufq oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaaw6qx2pia2xkfmnnknpk3jll6emb76gtcza3ttbqqofxmwjb45rka oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaawfjs5zrb6wdmg43522a4l5aak5zr6vvkaaa6xogttha2ufsip7fq
节点的 FQDN 域部分在免费实验室环境的每次部署中都是唯一的。
-
安装 Kubernetes 模块。
olcnectl module install --config-file myenvironment.yaml
注:将 Kubernetes 部署到节点需要 20-25 分钟才能完成。
输出示例:
[oracle@ocne-operator ~]$ olcnectl module install --config-file myenvironment.yaml Modules installed successfully. Modules installed successfully. Modules installed successfully.
为什么有三个模块已成功安装响应?那么,这是因为此示例中使用的
myenvironment.yaml
文件定义了三个独立的模块:- module: kubernetes
- module: helm
- module: oci-ccm
必须了解这一点,因为在后面的这些步骤中,还会返回三次响应,一次针对
myenvironment.yaml
文件中定义的每个模块。 -
验证 Kubernetes 模块的部署。
olcnectl module instances --config-file myenvironment.yaml
输出示例:
[oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml INSTANCE MODULE STATE mycluster kubernetes installed myhelm helm installed myoci oci-ccm installed ocne-control01.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-control02.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-control03.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 node installed
设置 kubectl
-
设置
kubectl
命令。-
从控制层节点之一复制配置文件。
mkdir -p $HOME/.kube ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config
输出示例:
[oracle@ocne-operator ~]$ mkdir -p $HOME/.kube [oracle@ocne-operator ~]$ ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config Warning: Permanently added '10.0.0.150' (ECDSA) to the list of known hosts.
-
导出配置以供
kubectl
命令使用。sudo chown $(id -u):$(id -g) $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
-
-
验证
kubectl
是否有效。kubectl get nodes
输出示例:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 17m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 16m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 15m v1.23.7+1.el8 ocne-worker01 Ready <none> 16m v1.23.7+1.el8 ocne-worker02 Ready <none> 15m v1.23.7+1.el8 ocne-worker03 Ready <none> 14m v1.23.7+1.el8 ocne-worker04 Ready <none> 15m v1.23.7+1.el8 ocne-worker05 Ready <none> 15m v1.23.7+1.el8 [oracle@ocne-operator ~]$
确认 Oracle Cloud Infrastructure Cloud Controller Manager 模块已就绪
继续之前,必须等待 Oracle Cloud Infrastructure Cloud Controller Manager 模块与 OCI API 建立通信。Oracle Cloud Infrastructure Cloud Controller Manager 模块在处理诸如附加块存储等功能的每个节点上运行一个 pod。安装后,此控制器会阻止调度任何云池,直到此专用云池确认它已初始化、正在运行并与 OCI API 通信为止。在成功建立此通信之前,任何继续尝试都可能会阻止 Kubernetes 成功使用云存储或负载平衡器。
-
检索组件
oci-cloud-controller-manager
pod 的状态。kubectl -n kube-system get pods -l component=oci-cloud-controller-manager
输出示例:
[[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l component=oci-cloud-controller-manager NAME READY STATUS RESTARTS AGE oci-cloud-controller-manager-9d9gh 1/1 Running 1 (48m ago) 50m oci-cloud-controller-manager-jqzs6 1/1 Running 0 50m oci-cloud-controller-manager-xfm9w 1/1 Running 0 50m
-
检索角色
csi-oci
pod 的状态。kubectl -n kube-system get pods -l role=csi-oci
输出示例:
[[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l role=csi-oci NAME READY STATUS RESTARTS AGE csi-oci-controller-7fcbddd746-2hb5c 4/4 Running 2 (50m ago) 51m csi-oci-node-7jd6t 3/3 Running 0 51m csi-oci-node-fc5x5 3/3 Running 0 51m csi-oci-node-jq8sm 3/3 Running 0 51m csi-oci-node-jqkvl 3/3 Running 0 51m csi-oci-node-jwq8g 3/3 Running 0 51m csi-oci-node-jzxqt 3/3 Running 0 51m csi-oci-node-rmmmb 3/3 Running 0 51m csi-oci-node-zc287 1/3 Running 0 51m
注:等待这两个命令将
STATUS
显示为Running
,然后继续操作。
如果READY
列下的值未显示所有已启动的容器,并且STATUS
列下的值在 15 分钟后未显示为Running
,请重新启动实验室。
(可选)设置新的 Kubernetes 节点
注:在空闲实验室环境中不需要本节中的步骤,因为在实验室的初始部署期间已完成这些步骤。请跳转到下一部分并继续操作。
扩展(添加节点)时,任何新节点都需要满足本教程的 Prerequisites
部分中列出的所有先决条件。
在本教程/实验中,节点 ocne-control04
和 ocne-control05
是新的控制层节点,节点 ocne-worker06
和 ocne-worker07
是新的 worker 节点。除了先决条件外,这些新节点需要安装和启用 Oracle Cloud Native Environment 平台代理。
-
安装并启用平台代理。
sudo dnf install olcne-agent olcne-utils sudo systemctl enable olcne-agent.service
-
如果使用代理服务器,请使用 CRI-O 配置该服务器。在每个 Kubernetes 节点上,创建一个 CRI-O 系统化配置目录。在目录中创建名为
proxy.conf
的文件并添加代理服务器信息。sudo mkdir /etc/systemd/system/crio.service.d sudo vi /etc/systemd/system/crio.service.d/proxy.conf
-
使用示例
proxy.conf
文件替换环境中的代理值:[Service] Environment="HTTP_PROXY=proxy.example.com:80" Environment="HTTPS_PROXY=proxy.example.com:80" Environment="NO_PROXY=.example.com,192.0.2.*"
-
如果
docker
或containerd
服务正在运行,请停止并禁用它们。sudo systemctl disable --now docker.service sudo systemctl disable --now containerd.service
设置 X.509 专用 CA 证书
为新的控制层节点和 worker 节点设置 X.509 专用 CA 证书。
-
创建新节点的列表。
VAR1=$(hostname -d) for NODE in 'ocne-control04' 'ocne-control05' 'ocne-worker06' 'ocne-worker07'; do VAR2+="${NODE}.$VAR1,"; done VAR2=${VAR2%,}
提供的 bash 脚本会抓取操作员节点的域名,并在纵向扩展过程中创建要添加到群集的节点的逗号分隔列表。
-
为新节点生成专用 CA 和证书集。
使用
--byo-ca-cert
选项指定现有 CA 证书的位置,使用--byo-ca-key
选项指定现有 CA 密钥的位置。使用--nodes
选项并提供新控制层和 worker 节点的 FQDN。cd /etc/olcne sudo ./gen-certs-helper.sh \ --cert-dir /etc/olcne/configs/certificates/ \ --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \ --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \ --nodes $VAR2
输出示例:
[oracle@ocne-operator ~]$ cd /etc/olcne [oracle@ocne-operator olcne]$ sudo ./gen-certs-helper.sh \ > --cert-dir /etc/olcne/configs/certificates/ \ > --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \ > --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \ > --nodes $VAR2 [INFO] Generating certs for ocne-control04.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) .............................+++++ ....................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-control05.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ...+++++ ...........................................................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-worker06.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ......+++++ .......................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-worker07.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ....................................................................................+++++ .................................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key ----------------------------------------------------------- Script To Transfer Certs: /etc/olcne/configs/certificates/olcne-tranfer-certs.sh ----------------------------------------------------------- [SUCCESS] Generated certs and file transfer script! [INFO] CA Cert: /etc/olcne/configs/certificates/production/ca.key [INFO] CA Key: /etc/olcne/configs/certificates/production/ca.cert [WARNING] The CA Key is the only way to generate more certificates, ensure it is stored in long term storage [USER STEP #1] Please ensure you have ssh access from this machine to: ocne-control04.lv.vcnf998d566.oraclevcn.com,ocne-control05.lv.vcnf998d566.oraclevcn.com,ocne-worker06.lv.vcnf998d566.oraclevcn.com,ocne-worker07.lv.vcnf998d566.oraclevcn.com
传输证书
将新创建的证书从操作员节点传输到所有新节点。
-
在提供的传输脚本中更新用户详细信息。
sudo sed -i 's/USER=opc/USER=oracle/g' configs/certificates/olcne-tranfer-certs.sh
注:教程需要此步骤,因为脚本的默认用户为
opc
。由于此教程和免费实验室环境都使用用户oracle
安装产品,因此请在脚本中相应地更新USER
变量。 -
更新证书创建脚本生成的每个 node.key 的权限。
sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-control*/node.key sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-operator*/node.key sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-worker*/node.key
-
将证书传输到每个新节点。
注 此步骤需要在节点之间配置无口令 SSH。配置此项不在本教程的范围内,但是在免费实验室环境中预先配置。
bash -ex /etc/olcne/configs/certificates/olcne-tranfer-certs.sh
将平台代理配置为使用证书
在每个新节点上配置平台代理以使用在上一步中复制的证书。我们通过运行 ssh
上的命令从操作员节点完成此任务。
-
配置 ocne-control04 节点。
ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
输出示例:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-control04,10.0.0.153' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:29:37 GMT; 2s ago Main PID: 152809 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 11.1M CGroup: /system.slice/olcne-agent.service ������152809 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:29:37 ocne-control04 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:29:37 ocne-control04 olcne-agent[152809]: time=30/08/22 15:29:37 level=info msg=Started server on[::]:8090
-
配置 ocne-control05 节点。
ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
输出示例:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-control05,10.0.0.154' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:34:13 GMT; 2s ago Main PID: 153413 (olcne-agent) Tasks: 7 (limit: 202294) Memory: 9.1M CGroup: /system.slice/olcne-agent.service ������153413 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:34:13 ocne-control05 systemd[1]: olcne-agent.service: Succeeded. Aug 30 15:34:13 ocne-control05 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments. Aug 30 15:34:13 ocne-control05 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:34:13 ocne-control05 olcne-agent[153413]: time=30/08/22 15:34:13 level=info msg=Started server on[::]:8090
-
配置 ocne-worker06 节点。
ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
输出示例:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-worker06,10.0.0.165' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:41:08 GMT; 2s ago Main PID: 153988 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 5.2M CGroup: /system.slice/olcne-agent.service ������153988 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:41:08 ocne-worker06 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:41:08 ocne-worker06 olcne-agent[153988]: time=30/08/22 15:41:08 level=info msg=Started server on[::]:8090
-
配置 ocne-worker07 节点。
ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
输出示例:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-worker07,10.0.0.166' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:43:23 GMT; 2s ago Main PID: 154734 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 9.1M CGroup: /system.slice/olcne-agent.service ������154734 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:43:23 ocne-worker07 systemd[1]: olcne-agent.service: Succeeded. Aug 30 15:43:23 ocne-worker07 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments. Aug 30 15:43:23 ocne-worker07 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:43:23 ocne-worker07 olcne-agent[154734]: time=30/08/22 15:43:23 level=info msg=Started server on[::]:8090
访问 OCI 负载平衡器并查看后端
因为为 Kubernetes 控制层定义了多个节点需要负载平衡器,所以很有趣的是查看部署了免费实验室环境时自动设置的配置。这将显示创建实验时部署和配置的三个节点(状态为 Healthy
)以及将在即将执行的步骤中添加的两个节点(状态为 Critical
)。
-
从终端切换到 Luna 桌面
-
使用 Luna Lab 图标打开 Luna Lab 详细信息页面。
-
单击 OCI 控制台链接。
-
此时将显示 Oracle Cloud 控制台登录页面。
-
输入
User Name
和Password
(位于身份证明部分的 Luna Lab 选项卡上)。 -
单击汉堡菜单(左上角),然后单击网络和负载平衡器。
-
此时将显示负载平衡器页。
-
从下拉列表中找到要使用的区间。
-
单击表中列出的负载平衡器 (ocne-load-balancer)。
-
向下滚动该页并单击指向后端集的链接(在资源部分的左侧)。
-
此时将显示后端集表。单击 Name 列中名为 ocne-lb-backend-set 的链接。
-
单击指向后端的链接(在资源部分的左侧)。
-
此时将显示表示控制层节点的后端。
注意,两个后端节点处于严重 - 连接失败状态,因为这些节点尚未成为 Kubernetes 控制层集群的一部分。保持此浏览器标签页打开,因为在完成纵向扩展步骤后,我们会重新检查后端节点的状态。
查看 Kubernetes 节点
检查集群中当前可用的 Kubernetes 节点。请注意,有三个控制层节点和五个 worker 节点。
-
确认节点都处于 READY 状态。
kubectl get nodes
输出示例:
[oracle@ocne-operator olcne]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 5h15m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 5h14m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 5h13m v1.23.7+1.el8 ocne-worker01 Ready <none> 5h14m v1.23.7+1.el8 ocne-worker02 Ready <none> 5h13m v1.23.7+1.el8 ocne-worker03 Ready <none> 5h12m v1.23.7+1.el8 ocne-worker04 Ready <none> 5h13m v1.23.7+1.el8 ocne-worker05 Ready <none> 5h14m v1.23.7+1.el8
将控制层和 Worker 节点添加到部署配置文件
将全限定域名 (Fully Qualified Domain Name,FQDN) 和平台代理访问端口 (8090) 添加到要添加到群集中的所有控制层和 worker 节点。
编辑 YAML 部署配置文件以包括新的群集节点。将 worker 节点添加到 worker-node
部分时,在 master-nodes
部分下添加控制层节点。
本教程中配置文件的文件名为 myenvironment.yaml
,当前包含三个控制层和五个 worker 节点。
-
确认当前环境使用三个控制层节点和五个 worker 节点。
cat ~/myenvironment.yaml
输出示例:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 ...
-
将新的控制层和 worker 节点添加到
myenvironment.yaml
文件。cd ~ sed -i '19 i \ - ocne-control04.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '20 i \ - ocne-control05.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '27 i \ - ocne-worker06.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '28 i \ - ocne-worker07.'"$(hostname -d)"':8090' ~/myenvironment.yaml
-
确认已将控制层和 worker 节点添加到
myenvironment.yaml
文件。cat ~/myenvironment.yaml
示例摘录:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 - ocne-control04.lv.vcneea798df.oraclevcn.com:8090 - ocne-control05.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
配置文件现在包含新的控制层节点(ocne-control04
和 ocne-control05
)以及新的 worker 节点(ocne-worker06
和 ocne-worker07
)。这表示纵向扩展完成后应位于群集中的控制层和 worker 节点的所有。
扩展控制层和 Worker 节点
-
运行模块更新命令。
使用带有
--config-file
选项的olcnectl module update
命令指定配置文件的位置。平台 API 服务器使用集群状态验证配置文件,并识别应添加到集群中的节点更多。根据提示回答y
。注:更新每个模块时,终端窗口中的提示之间将存在延迟。在免费实验室环境中,这种延迟可能长达 10-15 分钟。
olcnectl module update --config-file myenvironment.yaml
输出示例:
[oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful
-
(在云控制台中)确认负载平衡器的后端集显示五个运行良好的后端节点。
-
确认新控制层和 worker 节点已添加到集群。
kubectl get nodes
输出示例:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 99m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 97m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 96m v1.23.7+1.el8 ocne-control04 Ready control-plane,master 13m v1.23.7+1.el8 ocne-control05 Ready control-plane,master 12m v1.23.7+1.el8 ocne-worker01 Ready <none> 99m v1.23.7+1.el8 ocne-worker02 Ready <none> 98m v1.23.7+1.el8 ocne-worker03 Ready <none> 98m v1.23.7+1.el8 ocne-worker04 Ready <none> 98m v1.23.7+1.el8 ocne-worker05 Ready <none> 98m v1.23.7+1.el8 ocne-worker06 Ready <none> 13m v1.23.7+1.el8 ocne-worker07 Ready <none> 13m v1.23.7+1.el8
请注意,新的控制层节点(
ocne-control04
和ocne-control05
)以及新的 worker 节点(ocne-work06
和ocne-worker07
)现在包含在集群中。然后确认纵向扩展操作是否有效。
纵向收缩控制层节点
为了证明控制层和 worker 节点可以独立扩展,我们只需在此步骤中收缩(删除)控制层节点。
-
确认当前环境使用五个控制层节点和七个 worker 节点。
cat ~/myenvironment.yaml
输出示例:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 - ocne-control04.lv.vcneea798df.oraclevcn.com:8090 - ocne-control05.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
-
要将群集缩放回原始的三个控制层,请从配置文件中删除
ocne-control04
和ocne-control05
控制层节点。sed -i '19d;20d' ~/myenvironment.yaml
-
确认配置文件现在仅包含三个控制层节点和七个 worker 节点。
cat ~/myenvironment.yaml
示例摘录:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
-
隐藏模块更新警告消息。
通过将
force: true
指令添加到配置文件中,可以避免并禁止在模块更新期间出现确认提示。对于定义的每个模块,此directive
需要立即置于name: <xxxx>
指令下。cd ~ sed -i '12 i \ force: true' ~/myenvironment.yaml sed -i '35 i \ force: true' ~/myenvironment.yaml sed -i '40 i \ force: true' ~/myenvironment.yaml
-
确认配置文件现在包含
force: true
指令。cat ~/myenvironment.yaml
示例摘录:
[oracle@ocne-operator ~]$ cat ~/myenvironment.yaml environments: - environment-name: myenvironment globals: api-server: 127.0.0.1:8091 secret-manager-type: file olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert olcne-node-key-path: /etc/olcne/configs/certificates/production/node.key modules: - module: kubernetes name: mycluster force: true args: container-registry: container-registry.oracle.com/olcne load-balancer: 10.0.0.18:6443 master-nodes: - ocne-control01.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-control02.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-control03.lv.vcn1174e41d.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker02.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker03.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker04.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker05.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 selinux: enforcing restrict-service-externalip: true restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key - module: helm name: myhelm force: true args: helm-kubernetes-module: mycluster - module: oci-ccm name: myoci force: true oci-ccm-helm-module: myhelm oci-use-instance-principals: true oci-compartment: ocid1.compartment.oc1..aaaaaaaanr6cysadeswwxc7sczdsrlamzhfh6scdyvuh4s4fmvecob6e2cha oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaag7acy3iat3duvrym376oax7nxdyqd56mqxtjaws47t4g7vqthgja oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaa6rt6chugbkfhyjyl4exznpxrlvnus2bgkzcgm7fljfkqbxkva6ya
-
运行命令以更新群集并删除节点。
注:完成此操作可能需要几分钟时间。
olcnectl module update --config-file myenvironment.yaml
输出示例:
[oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful
-
(在云控制台中)确认负载平衡器的后端集显示三个运行状况良好的 (
Health = 'OK'
) 节点和两个运行状况不佳 (Health = 'Critical - Connection failed'
) 节点。两个节点显示为具有严重状态的原因是它们已从 Kubernetes 集群中删除。 -
显示平台 API 服务器已从集群中删除控制层节点。确认已删除控制层(
ocne-control04
和ocne-control05
)节点。kubectl get nodes
输出示例:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 164m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 163m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 162m v1.23.7+1.el8 ocne-worker01 Ready <none> 164m v1.23.7+1.el8 ocne-worker02 Ready <none> 163m v1.23.7+1.el8 ocne-worker03 Ready <none> 164m v1.23.7+1.el8 ocne-worker04 Ready <none> 164m v1.23.7+1.el8 ocne-worker05 Ready <none> 164m v1.23.7+1.el8 ocne-worker06 Ready <none> 13m v1.23.7+1.el8 ocne-worker07 Ready <none> 13m v1.23.7+1.el8
汇总
这将完成演示,详细说明如何添加然后从集群中删除 Kubernetes 节点。尽管本练习演示了同时更新控制层和 worker 节点,但建议采用以下方法:扩展或收缩 Oracle Cloud Native Environment Kubernetes 集群,以及在生产环境中最有可能单独执行。
详细信息
更多学习资源
在 docs.oracle.com/learn 上浏览其他实验室,或者在 Oracle Learning YouTube 渠道上访问更多免费学习内容。此外,访问 education.oracle.com/learning-explorer 以成为 Oracle Learning Explorer。
有关产品文档,请访问 Oracle 帮助中心。
Scale a Kubernetes Cluster on Oracle Cloud Native Environment
F30806-12
September 2022
Copyright © 2022, Oracle and/or its affiliates.