注意:

在由 Pacemaker 管理的 Oracle Cloud Infrastructure 上设置 Linux 虚拟 IP 故障转移

简介

在许多环境中,将基础设施与主动或被动 Linux 集群一起使用仍然至关重要,后者需要使用浮动 IP。在云基础设施中,辅助 IP 地址不仅要由操作系统管理,还要由云基础设施管理。

在本教程中,我们将了解 Linux 集群的浮动 IP 如何由 Pacemaker 以简单的方式和无自定义代码作为集成资源进行管理。有关详细信息,请参阅任务 3:设置 Samba 集群在 Oracle Cloud Infrastructure 上自动执行虚拟 IP 故障转移

架构设计

图像

目标

先决条件

任务 1:设置环境

  1. 启动两个计算实例,选择 Ubuntu 22 作为每个实例的操作系统。

  2. node1 的虚拟网络接口卡 (Virtual Network Interface Card,VNIC) 分配辅助专用 IP 地址。有关更多信息,请参见 Assigning a New Secondary Private IP to a VNIC 。这将是浮动 IP。例如,10.10.1.115

  3. 创建动态组。

    1. 登录到 OCI 控制台,导航到身份与安全动态组,然后单击创建动态组

    2. 输入以下信息。

      • 名称:输入 OCIVIP
      • 添加以下规则以将实例包括在指定的区间中。

        All {instance.compartment.id = 'Your compartment OCI ID'}
        
  4. 将策略添加到动态组。

    1. 导航到身份与安全策略,然后单击创建策略

    2. 输入以下信息。

      • 名称:输入 OCIVIP_policy

      • 添加以下语句以允许动态组使用虚拟网络系列:

        allow dynamic-group OracleIdentityCloudService/OCIVIP to use virtual-network-family in compartment id 'Your compartment OCI ID'
        

任务 2:配置群集和浮动 IP

设置环境后,我们可以继续配置 Pacemaker 并集成 OCIVIP 资源代理。使用 SSH 连接到实例,并在两个节点上执行集群安装操作,直至和包括步骤 10。

  1. 更新操作系统。

    sudo apt update
    sudo apt upgrade
    
  2. 安装 OCI CLI 并验证其功能。

    bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"
    

    设置 OCI CLI。

    oci setup config
    

    验证 OCI CLI 安装。

    oci os ns get
    
  3. 对于测试环境,您可以删除 iptables 的 INPUT 部分中第 6 行处的 reject 规则,然后使其保持持久状态以允许实例通信。请记住,要在生产环境中安全、适当地配置 iptables。

    sudo iptables -D INPUT 6
    sudo su
    sudo iptables-save > /etc/iptables/rules.v4
    sudo ip6tables-save > /etc/iptables/rules.v6
    
  4. 使用分配给您的两个实例的专用 IP 地址更新 /etc/hosts 文件:node1node2

    请执行以下命令编辑该文件。

    sudo nano /etc/hosts
    

    添加节点名称和 IP 地址。

    10.10.1.111 node1
    10.10.1.118 node2
    
  5. 安装与群集相关的软件包,包括 jq。

    sudo apt install -y pacemaker corosync pcs jq
    
  6. 备份 corosync.conf 文件。

    sudo cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf.bk
    

    编辑 corosync.conf 文件。

    sudo nano /etc/corosync/corosync.conf
    

    将以下内容复制到 corosync.conf 文件中。

     # Please read the corosync.conf.5 manual page
     system {
             # This is required to use transport=knet in an unprivileged
             # environment, such as a container. See man page for details.
             allow_knet_handle_fallback: yes
     }
    
     totem {
             version: 2
    
             # Corosync itself works without a cluster name, but DLM needs one.
             # The cluster name is also written into the VG metadata of newly
             # created shared LVM volume groups, if lvmlockd uses DLM locking.
            cluster_name: ha_cluster
             transport: udpu
             secauth: off
             # crypto_cipher and crypto_hash: Used for mutual node authentication.
             # If you choose to enable this, then do remember to create a shared
             # secret with "corosync-keygen".
             # enabling crypto_cipher, requires also enabling of crypto_hash.
             # crypto works only with knet transport
             crypto_cipher: none
             crypto_hash: none
     }
    
     logging {
             # Log the source file and line where messages are being
             # generated. When in doubt, leave off. Potentially useful for
             # debugging.
             fileline: off
             # Log to standard error. When in doubt, set to yes. Useful when
             # running in the foreground (when invoking "corosync -f")
             to_stderr: yes
             # Log to a log file. When set to "no", the "logfile" option
             # must not be set.
             to_logfile: yes
             logfile: /var/log/corosync/corosync.log
             # Log to the system log daemon. When in doubt, set to yes.
             to_syslog: yes
             # Log debug messages (very verbose). When in doubt, leave off.
             debug: off
             # Log messages with time stamps. When in doubt, set to hires (or on)
             #timestamp: hires
             logger_subsys {
                     subsys: QUORUM
                     debug: off
             }
     }
    
     quorum {
             # Enable and configure quorum subsystem (default: off)
             # see also corosync.conf.5 and votequorum.5
             provider: corosync_votequorum
             two_node: 1
             wait_for_all: 1
             last_man_standing: 1
             auto_tie_breaker: 0
     }
    
     nodelist {
             # Change/uncomment/add node sections to match cluster configuration
    
             node {
                     # Hostname of the node.
                     # name: node1
                     # Cluster membership node identifier
                     nodeid: 101
                     # Address of first link
                     ring0_addr: node1
                     # When knet transport is used it's possible to define up to 8 links
                     #ring1_addr: 192.168.1.1
             }
             # ...
             node {
                     ring0_addr: node2
                     nodeid: 102
                 }
    }
    
  7. 将 Pacemaker 用于本地管理 OCI 浮动 IP 的资源添加到 /usr/lib/ocf/resource.d/heartbeat/ 目录中。从以下位置下载文件的内容:ocivip.txt

    注意:此资源不是由 Oracle 开发的,而是由第三方开发人员开发。

    这是 ocivip 文件的内容。

    #!/bin/sh
    #
    #
    # Manage Secondary Private IP in Oracle Cloud Infrastructure with Pacemaker
    #
    #
    # Copyright 2016-2018 Lorenzo Garuti <garuti.lorenzo@gmail.com>
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    #
    
    #
    #  Prerequisites:
    #
    #  - OCI CLI installed (https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/climanualinst.htm)
    #  - jq installed
    #  - dynamic group with a policy attached
    #  - the policy must have this statement:
    #    allow dynamic-group <GROUP_NAME> to use virtual-network-family in compartment id <COMPARTMENT_ID>
    #  - a reserved secondary private IP address for Compute Instances high availability
    #
    
    #######################################################################
    # Initialization:
    
    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
    . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
    
    #######################################################################
    
    #
    # Defaults
    #
    OCF_RESKEY_ocicli_default="/usr/local/bin/oci"
    OCF_RESKEY_api_delay_default="3"
    OCF_RESKEY_cidr_netmask_default="24"
    OCF_RESKEY_interface_alias_default="0"
    export OCI_CLI_AUTH=instance_principal
    
    : ${OCF_RESKEY_ocicli=${OCF_RESKEY_ocicli_default}}
    : ${OCF_RESKEY_api_delay=${OCF_RESKEY_api_delay_default}}
    : ${OCF_RESKEY_cidr_netmask=${OCF_RESKEY_cidr_netmask_default}}
    : ${OCF_RESKEY_interface_alias=${OCF_RESKEY_interface_alias_default}}
    
    meta_data() {
        cat <<END
    <?xml version="1.0"?>
    <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
    <resource-agent name="ocivip">
    <version>1.0</version>
    
    <longdesc lang="en">
    Resource Agent for OCI Compute instance Secondary Private IP Addresses.
    
    It manages OCI Secondary Private IP Addresses for Compute instances with oci cli.
    
    See https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cliconcepts.htm for more information about oci cli.
    
    Prerequisites:
    
    - OCI CLI installed (https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/climanualinst.htm)
    - jq installed
    - dynamic group with a policy attached
    - the policy must have this statement: allow dynamic-group GROUP_NAME to use virtual-network-family in compartment id COMPARTMENT_ID
    - a reserved secondary private IP address for Compute Instances high availability
    
    </longdesc>
    <shortdesc lang="en">OCI Secondary Private IP Address for Compute instances Resource Agent</shortdesc>
    
    <parameters>
    
    <parameter name="ocicli" unique="0">
    <longdesc lang="en">
    OCI Command line interface (CLI) tools
    </longdesc>
    <shortdesc lang="en">OCI cli tools</shortdesc>
    <content type="string" default="${OCF_RESKEY_ocicli_default}" />
    </parameter>
    
    <parameter name="secondary_private_ip" unique="1" required="1">
    <longdesc lang="en">
    reserved secondary private ip for compute instance
    </longdesc>
    <shortdesc lang="en">reserved secondary private ip for compute instance</shortdesc>
    <content type="string" default="" />
    </parameter>
    
    <parameter name="cidr_netmask" unique="0">
    <longdesc lang="en">
    netmask for the secondary_private_ip
    </longdesc>
    <shortdesc lang="en">netmask for the secondary_private_ip</shortdesc>
    <content type="integer" default="${OCF_RESKEY_cidr_netmask_default}" />
    </parameter>
    
    <parameter name="interface_alias" unique="0">
    <longdesc lang="en">
    numeric alias for the interface
    </longdesc>
    <shortdesc lang="en">numeric alias for the interface</shortdesc>
    <content type="integer" default="${OCF_RESKEY_interface_alias_default}" />
    </parameter>
    
    <parameter name="api_delay" unique="0">
    <longdesc lang="en">
    a short delay between API calls, to avoid sending API too quick
    </longdesc>
    <shortdesc lang="en">a short delay between API calls</shortdesc>
    <content type="integer" default="${OCF_RESKEY_api_delay_default}" />
    </parameter>
    
    </parameters>
    
    <actions>
    <action name="start"        timeout="30s" />
    <action name="stop"         timeout="30s" />
    <action name="monitor"      timeout="30s" interval="20s" depth="0" />
    <action name="migrate_to"   timeout="30s" />
    <action name="migrate_from" timeout="30s" />
    <action name="meta-data"    timeout="5s" />
    <action name="validate"     timeout="10s" />
    <action name="validate-all" timeout="10s" />
    </actions>
    </resource-agent>
    END
    }
    
    #######################################################################
    
    ocivip_usage() {
        cat <<END
    usage: $0 {start|stop|monitor|migrate_to|migrate_from|validate|validate-all|meta-data}
    
    Expects to have a fully populated OCF RA-compliant environment set.
    END
    }
    
    ocivip_start() {
        ocivip_monitor && return $OCF_SUCCESS
    
        $OCICLI network vnic assign-private-ip --vnic-id $VNIC_ID \
            --unassign-if-already-assigned \
            --ip-address ${SECONDARY_PRIVATE_IP}
        RETOCI=$?
        ip addr add ${SECONDARY_PRIVATE_IP}/${CIDR_NETMASK} dev ${PRIMARY_IFACE} label ${PRIMARY_IFACE}:${INTERFACE_ALIAS}
        RETIP=$?
    
        # delay to avoid sending request too fast
        sleep ${OCF_RESKEY_api_delay}
    
        if [ $RETOCI -ne 0 ] || [ $RETIP -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
    
        ocf_log info "secondary_private_ip has been successfully brought up (${SECONDARY_PRIVATE_IP})"
        return $OCF_SUCCESS
    }
    
    ocivip_stop() {
        ocivip_monitor || return $OCF_SUCCESS
    
        $OCICLI network vnic unassign-private-ip --vnic-id $VNIC_ID \
            --ip-address ${SECONDARY_PRIVATE_IP}
        RETOCI=$?
        ip addr del ${SECONDARY_PRIVATE_IP}/${CIDR_NETMASK} dev ${PRIMARY_IFACE}:${INTERFACE_ALIAS}
        RETIP=$?
    
        # delay to avoid sending request too fast
        sleep ${OCF_RESKEY_api_delay}
    
        if [ $RETOCI -ne 0 ] || [ $RETIP -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
    
        ocf_log info "secondary_private_ip has been successfully brought down (${SECONDARY_PRIVATE_IP})"
        return $OCF_SUCCESS
    }
    
    ocivip_monitor() {
        $OCICLI network private-ip list --vnic-id $VNIC_ID | grep -q "${SECONDARY_PRIVATE_IP}"
        RETOCI=$?
    
        if [ $RETOCI -ne 0 ]; then
            return $OCF_NOT_RUNNING
        fi
        return $OCF_SUCCESS
    }
    
    ocivip_validate() {
        check_binary ${OCICLI}
        check_binary jq
    
        if [ -z "${VNIC_ID}" ]; then
            ocf_exit_reason "vnic_id not found. Is this a Compute instance?"
            return $OCF_ERR_GENERIC
        fi
    
        return $OCF_SUCCESS
    }
    
    case $__OCF_ACTION in
        meta-data)
            meta_data
            exit $OCF_SUCCESS
            ;;
    esac
    
    OCICLI="${OCF_RESKEY_ocicli}"
    SECONDARY_PRIVATE_IP="${OCF_RESKEY_secondary_private_ip}"
    CIDR_NETMASK="${OCF_RESKEY_cidr_netmask}"
    INTERFACE_ALIAS="${OCF_RESKEY_interface_alias}"
    VNIC_ID="$(curl -s -H "Authorization: Bearer Oracle" -L http://169.254.169.254/opc/v2/vnics/ | jq -r '.[0].vnicId')"
    PRIMARY_IFACE=$(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)' | head -n1)
    
    case $__OCF_ACTION in
        start)
            ocivip_validate || exit $?
            ocivip_start
            ;;
        stop)
            ocivip_stop
            ;;
        monitor)
            ocivip_monitor
            ;;
        migrate_to)
            ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to ${OCF_RESKEY_CRM_meta_migrate_target}."
            ocivip_stop
            ;;
        migrate_from)
            ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} from ${OCF_RESKEY_CRM_meta_migrate_source}."
            ocivip_start
            ;;
        reload)
            ocf_log info "Reloading ${OCF_RESOURCE_INSTANCE} ..."
            ;;
        validate|validate-all)
            ocivip_validate
            ;;
        usage|help)
            ocivip_usage
            exit $OCF_SUCCESS
            ;;
        *)
            ocivip_usage
            exit $OCF_ERR_UNIMPLEMENTED
            ;;
    esac
    
    rc=$?
    ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc"
    exit $rc
    
  8. 编辑 ocivip 文件并使用 OCI CLI 路径更改 OCF_RESKEY_ocicli_default 变量中 OCI CLI 可执行文件的路径。

    如果在 Ubuntu 上的 OCI CLI 安装期间保留了默认路径,则变量将为 /home/ubuntu/bin/oci

    OCF_RESKEY_ocicli_default="/home/ubuntu/bin/oci"
    

    创建文件并使用更新的变量复制步骤 7 中下载的代码。

    sudo nano /usr/lib/ocf/resource.d/heartbeat/ocivip
    

    更改文件的权限和所有者。

    sudo chown root /usr/lib/ocf/resource.d/heartbeat/ocivip
    sudo chmod 755 /usr/lib/ocf/resource.d/heartbeat/ocivip
    
  9. 在引导时启用并重新启动服务,并检查它们是否正常运行。

    sudo systemctl enable corosync
    sudo systemctl enable pacemaker
    sudo systemctl enable pcsd
    sudo systemctl restart pcsd
    sudo systemctl restart corosync
    sudo systemctl restart pacemaker
    sudo systemctl status pcsd
    sudo systemctl status corosync
    sudo systemctl status pacemaker
    
  10. 设置用户 ocicluster 的密码。

    sudo passwd ocicluster
    
  11. 运行以下命令对节点进行验证。

    sudo pcs cluster auth node1 node2 -u ocicluster -p YOUR_PASSWORD
    
  12. 创建群集。

    sudo pcs cluster setup ha_cluster node1 node2
    
  13. 在所有节点上引导时启动并启用群集。

    sudo pcs cluster start --all
    sudo pcs cluster enable --all
    
  14. 检查群集是否处于活动状态并正常运行。

    sudo pcs status
    
  15. 添加用于管理浮动 IP 的 OCIVIP 资源。

    注:在本教程的第 2 步中,将虚拟 IP 地址更改为分配给 VNIC 辅助的虚拟 IP 地址。

    sudo pcs resource create OCIVIP ocf:heartbeat:ocivip secondary_private_ip="10.10.1.115" cidr_netmask="24" op monitor timeout="30s" interval="20s" OCF_CHECK_LEVEL="0"
    
  16. 验证资源是否已正确添加且运行正常。

    sudo pcs status
    
  17. 验证辅助 IP 地址是否可以在实例之间迁移,例如重新启动 node1,然后检查 OCI 控制台是否已将其分配给其他实例,反之亦然。

    在重新启动 node1 之前,您还可以从第三个虚拟机 ping 浮动地址,并检查它在 node1 关闭后是否继续响应。短暂中断几个跃点是正常的。

主动和被动集群已启动并正在运行。现在,您可以添加需要业务连续性的服务。

确认

更多学习资源

浏览 docs.oracle.com/learn 上的其他实验室,或者访问 Oracle Learning YouTube 渠道上的更多免费学习内容。此外,请访问 education.oracle.com/learning-explorer 成为 Oracle Learning Explorer。

有关产品文档,请访问 Oracle 帮助中心