注意:

在 Oracle Cloud Native Environment 上扩展 Kubernetes 集群

简介

此教程演示如何在 Oracle Cloud Native Environment 中扩展现有 Kubernetes 集群。

要扩展 Kubernetes 集群意味着添加节点,删除节点也同样会发生缩减。节点可以是控制层或 worker 节点。Oracle 建议不要同时纵向扩展和收缩集群。而是在两个单独的命令中执行纵向扩展,然后横向收缩。

为了避免记忆分裂情况和维护法定数目,建议按奇数扩展 Kubernetes 群集控制层或 worker 节点。例如,3、5 或 7 个控制层或 worker 节点可确保集群的可靠性。

此教程使用在 Oracle Cloud Native Environment 上运行的现有高可用性 Kubernetes 集群,并部署了三个模块:

开始部署包括以下内容:

它基于实验室:

目标

此教程/实验分步完成在集群中配置和添加两个新的控制层节点和两个新的 worker 节点。然后,教程/实验演示了如何通过从集群中删除相同的节点来缩减集群。

在此方案中,X.509 专用 CA 证书用于保护节点之间的通信。还有其他方法可以管理和部署证书,例如使用 HashiCorp Vault 密钥管理器,或使用您自己的证书,由受信任的证书颁发机构 (Certificate Authority,CA) 签名。本教程中不包括这些其他方法。

先决条件

注:如果使用免费实验室环境,这些先决条件将作为起点提供。

除了要求在 Oracle Cloud Native Environment 上运行高可用性 Kubernetes 集群之外,还需要以下项:

设置实验室环境

注:使用空闲实验环境时,有关连接和其他使用说明,请参见Oracle Linux Lab Basics

信息:免费实验室环境在提供的节点上部署 Oracle Cloud Native Environment,可用于创建环境。此部署大约需要 20-25 分钟才能在启动后完成。因此,您可能希望在此运行时离开,然后返回以完成实验室。

除非另有说明,否则可以从 ocne-operator 节点执行空闲实验室环境中的所有步骤,建议首先打开终端窗口并连接到该节点。在 Oracle Cloud Native Environment 的多节点安装中,kubectl 命令在操作员、控制层节点或为 kubectl 配置的其他系统上运行。

  1. 打开终端并通过 ssh 连接到 ocne-operator 系统。

    ssh oracle@<ip_address_of_ol_node>
    

安装 Kubernetes 模块

重要提示:除非另有说明,否则将从 ocne-operator 节点执行所有操作。

免费实验室环境在部署期间创建高可用性 Oracle Cloud Native Environment 安装,包括准备环境和模块配置。

  1. 查看 myenvironment.yaml 文件。

    cat ~/myenvironment.yaml
    

    免费实验室环境部署在创建集群时使用三个控制层节点和五个 worker 节点。

    输出示例:

    [oracle@ocne-operator ~]$ cat ~/myenvironment.yaml
    environments:
      - environment-name: myenvironment
        globals:
          api-server: 127.0.0.1:8091
          secret-manager-type: file
          olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert
          olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert
          olcne-node-key-path:  /etc/olcne/configs/certificates/production/node.key
        modules:
          - module: kubernetes
            name: mycluster
            args:
              container-registry: container-registry.oracle.com/olcne
              load-balancer: 10.0.0.168:6443
              master-nodes:
                - ocne-control01.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-control02.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-control03.lv.vcnf998d566.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090
              selinux: enforcing
              restrict-service-externalip: true
              restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert
              restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert
              restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key
          - module: helm
            name: myhelm
            args:
              helm-kubernetes-module: mycluster      
          - module: oci-ccm
            name: myoci
            oci-ccm-helm-module: myhelm
            oci-use-instance-principals: true
            oci-compartment: ocid1.compartment.oc1..aaaaaaaau2g2k23u6mp3t43ky3i4ky7jpyeiqcdcobpbcb7z6vjjlrdnuufq
            oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaaw6qx2pia2xkfmnnknpk3jll6emb76gtcza3ttbqqofxmwjb45rka
            oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaawfjs5zrb6wdmg43522a4l5aak5zr6vvkaaa6xogttha2ufsip7fq         
    

    节点的 FQDN 域部分在免费实验室环境的每次部署中都是唯一的。

  2. 安装 Kubernetes 模块。

    olcnectl module install --config-file myenvironment.yaml
    

    注:将 Kubernetes 部署到节点需要 20-25 分钟才能完成。

    输出示例:

    [oracle@ocne-operator ~]$ olcnectl module install --config-file myenvironment.yaml
    Modules installed successfully.
    Modules installed successfully.
    Modules installed successfully.
    

    为什么有三个模块已成功安装响应?那么,这是因为此示例中使用的 myenvironment.yaml 文件定义了三个独立的模块:

    • - module: kubernetes
    • - module: helm
    • - module: oci-ccm

    必须了解这一点,因为在后面的这些步骤中,还会返回三次响应,一次针对 myenvironment.yaml 文件中定义的每个模块。

  3. 验证 Kubernetes 模块的部署。

    olcnectl module instances --config-file myenvironment.yaml
    

    输出示例:

    [oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml
    INSTANCE                                        	MODULE    	STATE    
    mycluster                                       	kubernetes	installed
    myhelm                                          	helm      	installed
    myoci                                           	oci-ccm   	installed
    ocne-control01.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-control02.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-control03.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    

设置 kubectl

  1. 设置 kubectl 命令。

    1. 从控制层节点之一复制配置文件。

      mkdir -p $HOME/.kube
      ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config
      

      输出示例:

      [oracle@ocne-operator ~]$ mkdir -p $HOME/.kube
      [oracle@ocne-operator ~]$ ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config
      Warning: Permanently added '10.0.0.150' (ECDSA) to the list of known hosts.
      
    2. 导出配置以供 kubectl 命令使用。

      sudo chown $(id -u):$(id -g) $HOME/.kube/config
      export KUBECONFIG=$HOME/.kube/config
      echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
      
  2. 验证 kubectl 是否有效。

    kubectl get nodes
    

    输出示例:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE   VERSION
    ocne-control01   Ready    control-plane,master   17m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   16m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   15m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 16m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 15m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 14m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 15m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 15m   v1.23.7+1.el8
    [oracle@ocne-operator ~]$
    

确认 Oracle Cloud Infrastructure Cloud Controller Manager 模块已就绪

继续之前,必须等待 Oracle Cloud Infrastructure Cloud Controller Manager 模块与 OCI API 建立通信。Oracle Cloud Infrastructure Cloud Controller Manager 模块在处理诸如附加块存储等功能的每个节点上运行一个 pod。安装后,此控制器会阻止调度任何云池,直到此专用云池确认它已初始化、正在运行并与 OCI API 通信为止。在成功建立此通信之前,任何继续尝试都可能会阻止 Kubernetes 成功使用云存储或负载平衡器。

  1. 检索组件 oci-cloud-controller-manager pod 的状态。

    kubectl -n kube-system get pods -l component=oci-cloud-controller-manager
    

    输出示例:

    [[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l component=oci-cloud-controller-manager
    NAME                                 READY   STATUS    RESTARTS      AGE
    oci-cloud-controller-manager-9d9gh   1/1     Running   1 (48m ago)   50m
    oci-cloud-controller-manager-jqzs6   1/1     Running   0             50m
    oci-cloud-controller-manager-xfm9w   1/1     Running   0             50m
    
  2. 检索角色 csi-oci pod 的状态。

    kubectl -n kube-system get pods -l role=csi-oci
    

    输出示例:

    [[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l role=csi-oci
    NAME                                  READY   STATUS             RESTARTS      AGE
    csi-oci-controller-7fcbddd746-2hb5c   4/4     Running            2 (50m ago)   51m
    csi-oci-node-7jd6t                    3/3     Running            0             51m
    csi-oci-node-fc5x5                    3/3     Running            0             51m
    csi-oci-node-jq8sm                    3/3     Running            0             51m
    csi-oci-node-jqkvl                    3/3     Running            0             51m
    csi-oci-node-jwq8g                    3/3     Running            0             51m
    csi-oci-node-jzxqt                    3/3     Running            0             51m
    csi-oci-node-rmmmb                    3/3     Running            0             51m
    csi-oci-node-zc287                    1/3     Running            0             51m
    

注:等待这两个命令将 STATUS 显示为 Running然后继续操作。

如果 READY 列下的值未显示所有已启动的容器,并且 STATUS 列下的值在 15 分钟后未显示为 Running,请重新启动实验室。

(可选)设置新的 Kubernetes 节点

注:在空闲实验室环境中需要本节中的步骤,因为在实验室的初始部署期间已完成这些步骤。请跳转到下一部分并继续操作。

扩展(添加节点)时,任何新节点都需要满足本教程的 Prerequisites 部分中列出的所有先决条件。

在本教程/实验中,节点 ocne-control04ocne-control05 是新的控制层节点,节点 ocne-worker06ocne-worker07 是新的 worker 节点。除了先决条件外,这些新节点需要安装和启用 Oracle Cloud Native Environment 平台代理。

  1. 安装并启用平台代理。

    sudo dnf install olcne-agent olcne-utils
    sudo systemctl enable olcne-agent.service
    
  2. 如果使用代理服务器,请使用 CRI-O 配置该服务器。在每个 Kubernetes 节点上,创建一个 CRI-O 系统化配置目录。在目录中创建名为 proxy.conf 的文件并添加代理服务器信息。

    sudo mkdir /etc/systemd/system/crio.service.d
    sudo vi /etc/systemd/system/crio.service.d/proxy.conf
    
  3. 使用示例 proxy.conf 文件替换环境中的代理值:

    [Service]
    Environment="HTTP_PROXY=proxy.example.com:80"
    Environment="HTTPS_PROXY=proxy.example.com:80"
    Environment="NO_PROXY=.example.com,192.0.2.*"
    
  4. 如果 dockercontainerd 服务正在运行,请停止并禁用它们。

    sudo systemctl disable --now docker.service
    sudo systemctl disable --now containerd.service
    

设置 X.509 专用 CA 证书

为新的控制层节点和 worker 节点设置 X.509 专用 CA 证书。

  1. 创建新节点的列表。

    VAR1=$(hostname -d)
    for NODE in 'ocne-control04' 'ocne-control05' 'ocne-worker06' 'ocne-worker07'; do VAR2+="${NODE}.$VAR1,"; done
    VAR2=${VAR2%,}
    

    提供的 bash 脚本会抓取操作员节点的域名,并在纵向扩展过程中创建要添加到群集的节点的逗号分隔列表。

  2. 为新节点生成专用 CA 和证书集。

    使用 --byo-ca-cert 选项指定现有 CA 证书的位置,使用 --byo-ca-key 选项指定现有 CA 密钥的位置。使用 --nodes 选项并提供新控制层和 worker 节点的 FQDN。

    cd /etc/olcne
    sudo ./gen-certs-helper.sh \
    --cert-dir /etc/olcne/configs/certificates/ \
    --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \
    --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \
    --nodes $VAR2
    

    输出示例:

    [oracle@ocne-operator ~]$ cd /etc/olcne
    [oracle@ocne-operator olcne]$ sudo ./gen-certs-helper.sh \
    > --cert-dir /etc/olcne/configs/certificates/ \
    > --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \
    > --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \
    > --nodes $VAR2
    [INFO] Generating certs for ocne-control04.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    .............................+++++
    ....................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-control05.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ...+++++
    ...........................................................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-worker06.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ......+++++
    .......................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-worker07.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ....................................................................................+++++
    .................................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    -----------------------------------------------------------
    Script To Transfer Certs: /etc/olcne/configs/certificates/olcne-tranfer-certs.sh
    -----------------------------------------------------------
    [SUCCESS] Generated certs and file transfer script!
    [INFO]    CA Cert: /etc/olcne/configs/certificates/production/ca.key
    [INFO]    CA Key:  /etc/olcne/configs/certificates/production/ca.cert
    [WARNING] The CA Key is the only way to generate more certificates, ensure it is stored in long term storage
    [USER STEP #1]    Please ensure you have ssh access from this machine to: ocne-control04.lv.vcnf998d566.oraclevcn.com,ocne-control05.lv.vcnf998d566.oraclevcn.com,ocne-worker06.lv.vcnf998d566.oraclevcn.com,ocne-worker07.lv.vcnf998d566.oraclevcn.com
    

传输证书

将新创建的证书从操作员节点传输到所有新节点。

  1. 在提供的传输脚本中更新用户详细信息。

    sudo sed -i 's/USER=opc/USER=oracle/g' configs/certificates/olcne-tranfer-certs.sh
    

    注:教程需要此步骤,因为脚本的默认用户为 opc。由于此教程和免费实验室环境都使用用户 oracle 安装产品,因此请在脚本中相应地更新 USER 变量。

  2. 更新证书创建脚本生成的每个 node.key 的权限。

    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-control*/node.key
    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-operator*/node.key
    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-worker*/node.key
    
  3. 将证书传输到每个新节点。

    此步骤需要在节点之间配置无口令 SSH。配置此项不在本教程的范围内,但是在免费实验室环境中预先配置。

    bash -ex /etc/olcne/configs/certificates/olcne-tranfer-certs.sh
    

将平台代理配置为使用证书

在每个新节点上配置平台代理以使用在上一步中复制的证书。我们通过运行 ssh 上的命令从操作员节点完成此任务。

  1. 配置 ocne-control04 节点。

    ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'
    

    输出示例:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-control04,10.0.0.153' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:29:37 GMT; 2s ago
     Main PID: 152809 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 11.1M
       CGroup: /system.slice/olcne-agent.service
               ������152809 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:29:37 ocne-control04 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:29:37 ocne-control04 olcne-agent[152809]: time=30/08/22 15:29:37 level=info msg=Started server on[::]:8090
    
  2. 配置 ocne-control05 节点。

    ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'
    

    输出示例:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-control05,10.0.0.154' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
      Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:34:13 GMT; 2s ago
     Main PID: 153413 (olcne-agent)
        Tasks: 7 (limit: 202294)
       Memory: 9.1M
       CGroup: /system.slice/olcne-agent.service
               ������153413 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:34:13 ocne-control05 systemd[1]: olcne-agent.service: Succeeded.
    Aug 30 15:34:13 ocne-control05 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:34:13 ocne-control05 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:34:13 ocne-control05 olcne-agent[153413]: time=30/08/22 15:34:13 level=info msg=Started server on[::]:8090
    
  3. 配置 ocne-worker06 节点。

    ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'
    

    输出示例:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-worker06,10.0.0.165' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:41:08 GMT; 2s ago
     Main PID: 153988 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 5.2M
       CGroup: /system.slice/olcne-agent.service
               ������153988 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:41:08 ocne-worker06 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:41:08 ocne-worker06 olcne-agent[153988]: time=30/08/22 15:41:08 level=info msg=Started server on[::]:8090
    
  4. 配置 ocne-worker07 节点。

    ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'
    

    输出示例:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-worker07,10.0.0.166' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:43:23 GMT; 2s ago
     Main PID: 154734 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 9.1M
       CGroup: /system.slice/olcne-agent.service
               ������154734 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:43:23 ocne-worker07 systemd[1]: olcne-agent.service: Succeeded.
    Aug 30 15:43:23 ocne-worker07 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:43:23 ocne-worker07 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:43:23 ocne-worker07 olcne-agent[154734]: time=30/08/22 15:43:23 level=info msg=Started server on[::]:8090
    

访问 OCI 负载平衡器并查看后端

因为为 Kubernetes 控制层定义了多个节点需要负载平衡器,所以很有趣的是查看部署了免费实验室环境时自动设置的配置。这将显示创建实验时部署和配置的三个节点(状态为 Healthy)以及将在即将执行的步骤中添加的两个节点(状态为 Critical)。

  1. 从终端切换到 Luna 桌面

  2. 使用 Luna Lab 图标打开 Luna Lab 详细信息页面。

  3. 单击 OCI 控制台链接。

    八边形

  4. 此时将显示 Oracle Cloud 控制台登录页面。

    控制台登录

  5. 输入 User NamePassword(位于身份证明部分的 Luna Lab 选项卡上)。

    根除

  6. 单击汉堡菜单(左上角),然后单击网络负载平衡器

    选择联网

  7. 此时将显示负载平衡器页。

    负载平衡器

  8. 从下拉列表中找到要使用的区间

    oci 区间

  9. 单击表中列出的负载平衡器 (ocne-load-balancer)。

    负载平衡器

  10. 向下滚动该页并单击指向后端集的链接(在资源部分的左侧)。

    后端集

  11. 此时将显示后端集表。单击 Name 列中名为 ocne-lb-backend-set 的链接。

    负载平衡器

  12. 单击指向后端的链接(在资源部分的左侧)。

    后端链接

  13. 此时将显示表示控制层节点的后端

    注意,两个后端节点处于严重 - 连接失败状态,因为这些节点尚未成为 Kubernetes 控制层集群的一部分。保持此浏览器标签页打开,因为在完成纵向扩展步骤后,我们会重新检查后端节点的状态。

    后端表

查看 Kubernetes 节点

检查集群中当前可用的 Kubernetes 节点。请注意,有三个控制层节点和五个 worker 节点。

  1. 确认节点都处于 READY 状态。

    kubectl get nodes
    

    输出示例:

    [oracle@ocne-operator olcne]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE     VERSION
    ocne-control01   Ready    control-plane,master   5h15m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   5h14m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   5h13m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 5h14m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 5h13m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 5h12m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 5h13m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 5h14m   v1.23.7+1.el8
    

将控制层和 Worker 节点添加到部署配置文件

将全限定域名 (Fully Qualified Domain Name,FQDN) 和平台代理访问端口 (8090) 添加到要添加到群集中的所有控制层和 worker 节点。

编辑 YAML 部署配置文件以包括新的群集节点。将 worker 节点添加到 worker-node 部分时,在 master-nodes 部分下添加控制层节点。

本教程中配置文件的文件名为 myenvironment.yaml,当前包含三个控制层和五个 worker 节点。

  1. 确认当前环境使用三个控制层节点和五个 worker 节点。

    cat ~/myenvironment.yaml
    

    输出示例:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
    ...
    
  2. 将新的控制层和 worker 节点添加到 myenvironment.yaml 文件。

    cd ~
    sed -i '19 i \            - ocne-control04.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '20 i \            - ocne-control05.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '27 i \            - ocne-worker06.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '28 i \            - ocne-worker07.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    
  3. 确认已将控制层和 worker 节点添加到 myenvironment.yaml 文件。

    cat ~/myenvironment.yaml
    

    示例摘录:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090   
    ...
    

配置文件现在包含新的控制层节点(ocne-control04ocne-control05)以及新的 worker 节点(ocne-worker06ocne-worker07)。这表示纵向扩展完成后应位于群集中的控制层和 worker 节点的所有

扩展控制层和 Worker 节点

  1. 运行模块更新命令。

    使用带有 --config-file 选项的 olcnectl module update 命令指定配置文件的位置。平台 API 服务器使用集群状态验证配置文件,并识别应添加到集群中的节点更多。根据提示回答 y

    注:更新每个模块时,终端窗口中的提示之间将存在延迟。在免费实验室环境中,这种延迟可能长达 10-15 分钟。

    olcnectl module update --config-file myenvironment.yaml
    

    输出示例:

    [oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    
  2. (在云控制台中)确认负载平衡器的后端集显示五个运行良好的后端节点。

    lb- 健康

  3. 确认新控制层和 worker 节点已添加到集群。

    kubectl get nodes
    

    输出示例:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE   VERSION
    ocne-control01   Ready    control-plane,master   99m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   97m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   96m   v1.23.7+1.el8
    ocne-control04   Ready    control-plane,master   13m   v1.23.7+1.el8
    ocne-control05   Ready    control-plane,master   12m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 99m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker06    Ready    <none>                 13m   v1.23.7+1.el8
    ocne-worker07    Ready    <none>                 13m   v1.23.7+1.el8
    

    请注意,新的控制层节点(ocne-control04ocne-control05)以及新的 worker 节点(ocne-work06ocne-worker07)现在包含在集群中。然后确认纵向扩展操作是否有效。

纵向收缩控制层节点

为了证明控制层和 worker 节点可以独立扩展,我们只需在此步骤中收缩(删除)控制层节点。

  1. 确认当前环境使用五个控制层节点和七个 worker 节点。

    cat ~/myenvironment.yaml
    

    输出示例:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
    ...
    
  2. 要将群集缩放回原始的三个控制层,请从配置文件中删除 ocne-control04ocne-control05 控制层节点。

    sed -i '19d;20d' ~/myenvironment.yaml
    
  3. 确认配置文件现在仅包含三个控制层节点和七个 worker 节点。

    cat ~/myenvironment.yaml
    

    示例摘录:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
    ...
    
  4. 隐藏模块更新警告消息。

    通过将 force: true 指令添加到配置文件中,可以避免并禁止在模块更新期间出现确认提示。对于定义的每个模块,此 directive 需要立即置于 name: <xxxx> 指令下。

    cd ~
    sed -i '12 i \        force: true' ~/myenvironment.yaml
    sed -i '35 i \        force: true' ~/myenvironment.yaml
    sed -i '40 i \        force: true' ~/myenvironment.yaml
    
  5. 确认配置文件现在包含 force: true 指令。

    cat ~/myenvironment.yaml
    

    示例摘录:

    [oracle@ocne-operator ~]$ cat ~/myenvironment.yaml
    environments:
      - environment-name: myenvironment
        globals:
          api-server: 127.0.0.1:8091
          secret-manager-type: file
          olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert
          olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert
          olcne-node-key-path:  /etc/olcne/configs/certificates/production/node.key
        modules:
          - module: kubernetes
            name: mycluster
            force: true
            args:
              container-registry: container-registry.oracle.com/olcne
              load-balancer: 10.0.0.18:6443
              master-nodes:
                - ocne-control01.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-control02.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-control03.lv.vcn1174e41d.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker02.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker03.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker04.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker05.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
              selinux: enforcing
              restrict-service-externalip: true
              restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert
              restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert
              restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key
          - module: helm
            name: myhelm
            force: true
            args:
              helm-kubernetes-module: mycluster      
          - module: oci-ccm
            name: myoci
            force: true
            oci-ccm-helm-module: myhelm
            oci-use-instance-principals: true
            oci-compartment: ocid1.compartment.oc1..aaaaaaaanr6cysadeswwxc7sczdsrlamzhfh6scdyvuh4s4fmvecob6e2cha
            oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaag7acy3iat3duvrym376oax7nxdyqd56mqxtjaws47t4g7vqthgja
            oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaa6rt6chugbkfhyjyl4exznpxrlvnus2bgkzcgm7fljfkqbxkva6ya         
    
  6. 运行命令以更新群集并删除节点。

    注:完成此操作可能需要几分钟时间。

    olcnectl module update --config-file myenvironment.yaml
    

    输出示例:

    [oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    
  7. (在云控制台中)确认负载平衡器的后端集显示三个运行状况良好的 (Health = 'OK') 节点和两个运行状况不佳 (Health = 'Critical - Connection failed') 节点。两个节点显示为具有严重状态的原因是它们已从 Kubernetes 集群中删除。

    lb- 健康

  8. 显示平台 API 服务器已从集群中删除控制层节点。确认已删除控制层(ocne-control04ocne-control05)节点。

    kubectl get nodes
    

    输出示例:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE    VERSION
    ocne-control01   Ready    control-plane,master   164m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   163m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   162m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 163m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker06    Ready    <none>                 13m   v1.23.7+1.el8
    ocne-worker07    Ready    <none>                 13m   v1.23.7+1.el8
    

汇总

这将完成演示,详细说明如何添加然后从集群中删除 Kubernetes 节点。尽管本练习演示了同时更新控制层和 worker 节点,但建议采用以下方法:扩展或收缩 Oracle Cloud Native Environment Kubernetes 集群,以及在生产环境中最有可能单独执行。

详细信息

更多学习资源

docs.oracle.com/learn 上浏览其他实验室,或者在 Oracle Learning YouTube 渠道上访问更多免费学习内容。此外,访问 education.oracle.com/learning-explorer 以成为 Oracle Learning Explorer。

有关产品文档,请访问 Oracle 帮助中心