附註:

在 Pacemaker 管理的 Oracle Cloud Infrastructure 上設定 Linux 虛擬 IP 容錯移轉

簡介

在許多環境中,使用主動或被動 Linux 叢集的基礎架構 (需要使用浮動 IP) 仍然十分重要。在雲端基礎架構中,次要 IP 位址不僅必須由作業系統管理,還必須由雲端基礎架構管理。

在本教學課程中,我們將瞭解 Pacemaker 如何以簡單且無須自訂程式碼的方式,將 Linux 叢集的浮動 IP 管理為整合資源。如需詳細資訊,請參閱工作 3:設定 Samba 叢集在 Oracle Cloud Infrastructure 上自動虛擬 IP 容錯移轉

建築設計

圖像

目標

必要條件

工作 1:設定環境

  1. 啟動兩個運算執行處理,選取 Ubuntu 22 作為每個執行處理的作業系統。

  2. 將次要專用 IP 位址指派給虛擬網路介面卡 (VNIC) 至 node1 。如需詳細資訊,請參閱 Assigning a New Secondary Private IP to a VNIC 。這將會是浮動 IP。例如,10.10.1.115

  3. 建立動態群組。

    1. 登入 OCI 主控台,瀏覽至識別與安全動態群組,然後按一下建立動態群組

    2. 請輸入下列資訊。

      • 名稱:輸入 OCIVIP
      • 新增下列規則,以在指定的區間中包含執行處理。

        All {instance.compartment.id = 'Your compartment OCI ID'}
        
  4. 將原則新增至動態群組。

    1. 瀏覽至識別與安全原則,然後按一下建立原則

    2. 請輸入下列資訊。

      • 名稱:輸入 OCIVIP_policy

      • 新增下列敘述句以允許動態群組使用虛擬網路系列:

        allow dynamic-group OracleIdentityCloudService/OCIVIP to use virtual-network-family in compartment id 'Your compartment OCI ID'
        

作業 2:設定叢集和浮動 IP

設定環境之後,我們可以繼續設定 Pacemaker 並整合 OCIVIP 資源代理程式。使用 SSH 連線至執行處理,並在兩個節點 (包括步驟 10) 執行叢集安裝作業。

  1. 更新作業系統。

    sudo apt update
    sudo apt upgrade
    
  2. 安裝 OCI CLI 並驗證其功能。

    bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"
    

    設定 OCI CLI。

    oci setup config
    

    驗證 OCI CLI 安裝。

    oci os ns get
    
  3. 對於測試環境,您可以移除 iptables 之 INPUT 區段中第 6 行的 reject 規則,然後讓它持續進行以允許執行處理通訊。請記得在生產環境中安全地配置 iptables。

    sudo iptables -D INPUT 6
    sudo su
    sudo iptables-save > /etc/iptables/rules.v4
    sudo ip6tables-save > /etc/iptables/rules.v6
    
  4. 使用指定給兩個執行處理的專用 IP 位址更新 /etc/hosts 檔案:node1node2

    執行下列命令以編輯檔案。

    sudo nano /etc/hosts
    

    新增您的節點名稱和 IP 位址。

    10.10.1.111 node1
    10.10.1.118 node2
    
  5. 安裝與叢集相關的套裝程式 (包括 jq)。

    sudo apt install -y pacemaker corosync pcs jq
    
  6. 備份 corosync.conf 檔案。

    sudo cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf.bk
    

    編輯 corosync.conf 檔案。

    sudo nano /etc/corosync/corosync.conf
    

    將下列內容複製到 corosync.conf 檔案中。

     # Please read the corosync.conf.5 manual page
     system {
             # This is required to use transport=knet in an unprivileged
             # environment, such as a container. See man page for details.
             allow_knet_handle_fallback: yes
     }
    
     totem {
             version: 2
    
             # Corosync itself works without a cluster name, but DLM needs one.
             # The cluster name is also written into the VG metadata of newly
             # created shared LVM volume groups, if lvmlockd uses DLM locking.
            cluster_name: ha_cluster
             transport: udpu
             secauth: off
             # crypto_cipher and crypto_hash: Used for mutual node authentication.
             # If you choose to enable this, then do remember to create a shared
             # secret with "corosync-keygen".
             # enabling crypto_cipher, requires also enabling of crypto_hash.
             # crypto works only with knet transport
             crypto_cipher: none
             crypto_hash: none
     }
    
     logging {
             # Log the source file and line where messages are being
             # generated. When in doubt, leave off. Potentially useful for
             # debugging.
             fileline: off
             # Log to standard error. When in doubt, set to yes. Useful when
             # running in the foreground (when invoking "corosync -f")
             to_stderr: yes
             # Log to a log file. When set to "no", the "logfile" option
             # must not be set.
             to_logfile: yes
             logfile: /var/log/corosync/corosync.log
             # Log to the system log daemon. When in doubt, set to yes.
             to_syslog: yes
             # Log debug messages (very verbose). When in doubt, leave off.
             debug: off
             # Log messages with time stamps. When in doubt, set to hires (or on)
             #timestamp: hires
             logger_subsys {
                     subsys: QUORUM
                     debug: off
             }
     }
    
     quorum {
             # Enable and configure quorum subsystem (default: off)
             # see also corosync.conf.5 and votequorum.5
             provider: corosync_votequorum
             two_node: 1
             wait_for_all: 1
             last_man_standing: 1
             auto_tie_breaker: 0
     }
    
     nodelist {
             # Change/uncomment/add node sections to match cluster configuration
    
             node {
                     # Hostname of the node.
                     # name: node1
                     # Cluster membership node identifier
                     nodeid: 101
                     # Address of first link
                     ring0_addr: node1
                     # When knet transport is used it's possible to define up to 8 links
                     #ring1_addr: 192.168.1.1
             }
             # ...
             node {
                     ring0_addr: node2
                     nodeid: 102
                 }
    }
    
  7. 新增 Pacemaker 將用來管理 /usr/lib/ocf/resource.d/heartbeat/ 目錄中原生 OCI 浮動 IP 的資源。從此處下載檔案內容:ocivip.txt

    注意:此資源不是由 Oracle 開發,而是由第三方開發人員開發。

    這是 ocivip 檔案的內容。

    #!/bin/sh
    #
    #
    # Manage Secondary Private IP in Oracle Cloud Infrastructure with Pacemaker
    #
    #
    # Copyright 2016-2018 Lorenzo Garuti <garuti.lorenzo@gmail.com>
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    #
    
    #
    #  Prerequisites:
    #
    #  - OCI CLI installed (https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/climanualinst.htm)
    #  - jq installed
    #  - dynamic group with a policy attached
    #  - the policy must have this statement:
    #    allow dynamic-group <GROUP_NAME> to use virtual-network-family in compartment id <COMPARTMENT_ID>
    #  - a reserved secondary private IP address for Compute Instances high availability
    #
    
    #######################################################################
    # Initialization:
    
    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
    . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
    
    #######################################################################
    
    #
    # Defaults
    #
    OCF_RESKEY_ocicli_default="/usr/local/bin/oci"
    OCF_RESKEY_api_delay_default="3"
    OCF_RESKEY_cidr_netmask_default="24"
    OCF_RESKEY_interface_alias_default="0"
    export OCI_CLI_AUTH=instance_principal
    
    : ${OCF_RESKEY_ocicli=${OCF_RESKEY_ocicli_default}}
    : ${OCF_RESKEY_api_delay=${OCF_RESKEY_api_delay_default}}
    : ${OCF_RESKEY_cidr_netmask=${OCF_RESKEY_cidr_netmask_default}}
    : ${OCF_RESKEY_interface_alias=${OCF_RESKEY_interface_alias_default}}
    
    meta_data() {
        cat <<END
    <?xml version="1.0"?>
    <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
    <resource-agent name="ocivip">
    <version>1.0</version>
    
    <longdesc lang="en">
    Resource Agent for OCI Compute instance Secondary Private IP Addresses.
    
    It manages OCI Secondary Private IP Addresses for Compute instances with oci cli.
    
    See https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cliconcepts.htm for more information about oci cli.
    
    Prerequisites:
    
    - OCI CLI installed (https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/climanualinst.htm)
    - jq installed
    - dynamic group with a policy attached
    - the policy must have this statement: allow dynamic-group GROUP_NAME to use virtual-network-family in compartment id COMPARTMENT_ID
    - a reserved secondary private IP address for Compute Instances high availability
    
    </longdesc>
    <shortdesc lang="en">OCI Secondary Private IP Address for Compute instances Resource Agent</shortdesc>
    
    <parameters>
    
    <parameter name="ocicli" unique="0">
    <longdesc lang="en">
    OCI Command line interface (CLI) tools
    </longdesc>
    <shortdesc lang="en">OCI cli tools</shortdesc>
    <content type="string" default="${OCF_RESKEY_ocicli_default}" />
    </parameter>
    
    <parameter name="secondary_private_ip" unique="1" required="1">
    <longdesc lang="en">
    reserved secondary private ip for compute instance
    </longdesc>
    <shortdesc lang="en">reserved secondary private ip for compute instance</shortdesc>
    <content type="string" default="" />
    </parameter>
    
    <parameter name="cidr_netmask" unique="0">
    <longdesc lang="en">
    netmask for the secondary_private_ip
    </longdesc>
    <shortdesc lang="en">netmask for the secondary_private_ip</shortdesc>
    <content type="integer" default="${OCF_RESKEY_cidr_netmask_default}" />
    </parameter>
    
    <parameter name="interface_alias" unique="0">
    <longdesc lang="en">
    numeric alias for the interface
    </longdesc>
    <shortdesc lang="en">numeric alias for the interface</shortdesc>
    <content type="integer" default="${OCF_RESKEY_interface_alias_default}" />
    </parameter>
    
    <parameter name="api_delay" unique="0">
    <longdesc lang="en">
    a short delay between API calls, to avoid sending API too quick
    </longdesc>
    <shortdesc lang="en">a short delay between API calls</shortdesc>
    <content type="integer" default="${OCF_RESKEY_api_delay_default}" />
    </parameter>
    
    </parameters>
    
    <actions>
    <action name="start"        timeout="30s" />
    <action name="stop"         timeout="30s" />
    <action name="monitor"      timeout="30s" interval="20s" depth="0" />
    <action name="migrate_to"   timeout="30s" />
    <action name="migrate_from" timeout="30s" />
    <action name="meta-data"    timeout="5s" />
    <action name="validate"     timeout="10s" />
    <action name="validate-all" timeout="10s" />
    </actions>
    </resource-agent>
    END
    }
    
    #######################################################################
    
    ocivip_usage() {
        cat <<END
    usage: $0 {start|stop|monitor|migrate_to|migrate_from|validate|validate-all|meta-data}
    
    Expects to have a fully populated OCF RA-compliant environment set.
    END
    }
    
    ocivip_start() {
        ocivip_monitor && return $OCF_SUCCESS
    
        $OCICLI network vnic assign-private-ip --vnic-id $VNIC_ID \
            --unassign-if-already-assigned \
            --ip-address ${SECONDARY_PRIVATE_IP}
        RETOCI=$?
        ip addr add ${SECONDARY_PRIVATE_IP}/${CIDR_NETMASK} dev ${PRIMARY_IFACE} label ${PRIMARY_IFACE}:${INTERFACE_ALIAS}
        RETIP=$?
    
        # delay to avoid sending request too fast
        sleep ${OCF_RESKEY_api_delay}
    
        if [ $RETOCI -ne 0 ] || [ $RETIP -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
    
        ocf_log info "secondary_private_ip has been successfully brought up (${SECONDARY_PRIVATE_IP})"
        return $OCF_SUCCESS
    }
    
    ocivip_stop() {
        ocivip_monitor || return $OCF_SUCCESS
    
        $OCICLI network vnic unassign-private-ip --vnic-id $VNIC_ID \
            --ip-address ${SECONDARY_PRIVATE_IP}
        RETOCI=$?
        ip addr del ${SECONDARY_PRIVATE_IP}/${CIDR_NETMASK} dev ${PRIMARY_IFACE}:${INTERFACE_ALIAS}
        RETIP=$?
    
        # delay to avoid sending request too fast
        sleep ${OCF_RESKEY_api_delay}
    
        if [ $RETOCI -ne 0 ] || [ $RETIP -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
    
        ocf_log info "secondary_private_ip has been successfully brought down (${SECONDARY_PRIVATE_IP})"
        return $OCF_SUCCESS
    }
    
    ocivip_monitor() {
        $OCICLI network private-ip list --vnic-id $VNIC_ID | grep -q "${SECONDARY_PRIVATE_IP}"
        RETOCI=$?
    
        if [ $RETOCI -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
        return $OCF_SUCCESS
    }
    
    ocivip_validate() {
        check_binary ${OCICLI}
        check_binary jq
    
        if [ -z "${VNIC_ID}" ]; then
            ocf_exit_reason "vnic_id not found. Is this a Compute instance?"
            return $OCF_ERR_GENERIC
        fi
    
        return $OCF_SUCCESS
    }
    
    case $__OCF_ACTION in
        meta-data)
            meta_data
            exit $OCF_SUCCESS
            ;;
    esac
    
    OCICLI="${OCF_RESKEY_ocicli}"
    SECONDARY_PRIVATE_IP="${OCF_RESKEY_secondary_private_ip}"
    CIDR_NETMASK="${OCF_RESKEY_cidr_netmask}"
    INTERFACE_ALIAS="${OCF_RESKEY_interface_alias}"
    VNIC_ID="$(curl -s -H "Authorization: Bearer Oracle" -L http://169.254.169.254/opc/v2/vnics/ | jq -r '.[0].vnicId')"
    PRIMARY_IFACE=$(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)' | head -n1)
    
    case $__OCF_ACTION in
        start)
            ocivip_validate || exit $?
            ocivip_start
            ;;
        stop)
            ocivip_stop
            ;;
        monitor)
            ocivip_monitor
            ;;
        migrate_to)
            ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to ${OCF_RESKEY_CRM_meta_migrate_target}."
            ocivip_stop
            ;;
        migrate_from)
            ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} from ${OCF_RESKEY_CRM_meta_migrate_source}."
            ocivip_start
            ;;
        reload)
            ocf_log info "Reloading ${OCF_RESOURCE_INSTANCE} ..."
            ;;
        validate|validate-all)
            ocivip_validate
            ;;
        usage|help)
            ocivip_usage
            exit $OCF_SUCCESS
            ;;
        *)
            ocivip_usage
            exit $OCF_ERR_UNIMPLEMENTED
            ;;
    esac
    
    rc=$?
    ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc"
    exit $rc
    
  8. 編輯 ocivip 檔案,並使用您的 OCI CLI 路徑變更 OCF_RESKEY_ocicli_default 變數中 OCI CLI 執行檔的路徑。

    如果您在 Ubuntu 上安裝 OCI CLI 時保留預設路徑,變數將會是 /home/ubuntu/bin/oci

    OCF_RESKEY_ocicli_default="/home/ubuntu/bin/oci"
    

    建立檔案,並使用更新的變數從步驟 7 複製下載的程式碼。

    sudo nano /usr/lib/ocf/resource.d/heartbeat/ocivip
    

    變更檔案的權限和擁有者。

    sudo chown root /usr/lib/ocf/resource.d/heartbeat/ocivip
    sudo chmod 755 /usr/lib/ocf/resource.d/heartbeat/ocivip
    
  9. 在啟動時啟用並重新啟動服務,以及檢查這些服務是否正常運作。

    sudo systemctl enable corosync
    sudo systemctl enable pacemaker
    sudo systemctl enable pcsd
    sudo systemctl restart pcsd
    sudo systemctl restart corosync
    sudo systemctl restart pacemaker
    sudo systemctl status pcsd
    sudo systemctl status corosync
    sudo systemctl status pacemaker
    
  10. 設定使用者 ocicluster 的密碼。

    sudo passwd ocicluster
    
  11. 請執行下列命令來認證節點。

    sudo pcs cluster auth node1 node2 -u ocicluster -p YOUR_PASSWORD
    
  12. 建立叢集。

    sudo pcs cluster setup ha_cluster node1 node2
    
  13. 在所有節點啟動時啟動並啟用叢集。

    sudo pcs cluster start --all
    sudo pcs cluster enable --all
    
  14. 檢查叢集是否作用中且正常運作。

    sudo pcs status
    
  15. 新增用於管理浮動 IP 的 OCIVIP 資源。

    注意:在此教學課程的步驟 2 中,將虛擬 IP 位址變更為指定為 VNIC 次要位址的虛擬 IP 位址。

    sudo pcs resource create OCIVIP ocf:heartbeat:ocivip secondary_private_ip="10.10.1.115" cidr_netmask="24" op monitor timeout="30s" interval="20s" OCF_CHECK_LEVEL="0"
    
  16. 請確定已正確新增資源並正常運作。

    sudo pcs status
    
  17. 確認次要 IP 位址可以在執行處理之間移轉,例如重新啟動 node1,然後檢查 OCI 主控台是否已指定給其他執行處理,反之亦然。

    重新啟動 node1 之前,您也可以從第三個虛擬機器偵測浮動位址,並檢查它在 node1 關閉之後是否繼續回應。幾個躍點的短暫中斷是正常的。

您的作用中和被動叢集已啟動並在執行中。您現在可以新增需要業務連續性的服務。

確認

其他學習資源

探索 docs.oracle.com/learn 上的其他實驗室,或存取 Oracle Learning YouTube 頻道上的更多免費學習內容。此外,請造訪 education.oracle.com/learning-explorer 以成為 Oracle Learning Explorer。

如需產品文件,請造訪 Oracle Help Center