Performing Initial Configuration with the CLI
Infiniband Upgrade Procedures for Q3.2010
Associating a LUN with an FC initiator group
Associating a LUN with an FC initiator group
Scripting Aliases for Initiators and Initiator Groups
Configuring FC Client Multipathing
Configuring Solaris Initiators
Configuring Windows Initiators
Windows Tunables - Microsoft DSM Details
Configuring VMware ESX Initiators
Solaris iSCSI/iSER and MPxIO Considerations
Creating an Analytics Worksheet
Adding an iSCSI target with an auto-generated IQN
Adding an iSCSI target with a specific IQN and RADIUS authentication
Adding an iSCSI initiator which uses CHAP authentication
Adding an iSCSI initiator group
Alert action execution context
Example: device type selection
Configuration Changes in a Clustered Environment
Clustering Considerations for Storage
Clustering Considerations for Networking
Clustering Considerations for Infiniband
Preventing "Split-Brain" Conditions
SCSI RDMA Protocol, is a protocol supported by the appliance for sharing SCSI based storage over a network that provides RDMA services (i.e. InfiniBand).
SRP ports are shared with other IB port services such as IPoIB and RDMA. The SRP service may only operate in target mode. SRP targets have the following configurable properties.
|
In addition to those properties, the BUI indicates whether a target is online or offline:
|
On clustered platforms, peer targets should be configured into the same target group for highly available (multi-pathed) configurations. SRP multipathed I/O is an initiator-side configuration option.
SRP initiators have the following configurable properties.
|
SRP performance can be observed via analytics, whereby one can breakdown operations or throughput by initiator or target. {{Server}}/wiki/images/cfg_san_srp.png
The following sections provide a guide for setting up host clients.
The following procedure describes how to setup OFED.
1. Download the Linux OFED packages from: http://www.openfabrics.org/download_linux.htm
2. Run the install.pl script with --all option. Note that the the all option will install SRP and associated tools. If the command fails because of package dependency, review the results and install all the packages it asks for, including 'gcc', if listed.
3. After the install is complete, read all of the SRP Release notes, as follows:
4. The Release Notes recommend using the -n flag for all srp_daemon invocations.
To execute SRP daemon as a daemon you may execute run_srp_daemon. Provide with it the same options used for running srp_daemon. This script is found under /usr/local/ofed/sbin/ or <prefix>/sbin/. Be sure only one instance of run_srp_daemon runs per port.
To execute SRP daemon as a daemon on all the ports, execute srp_daemon.sh (found under /usr/local/ofed/sbin/ or <prefix>/sbin/). Note that srp_daemon.sh sends its log to /var/log/srp_daemon.log.
5. To configure this script to execute automatically when the InfiniBand driver starts, alter one of the values as follows:
Change the value of SRP_DAEMON_ENABLE in /etc/infiniband/openib.conf to "yes".
OR
Change the value of SRPHA_ENABLE in /etc/infiniband/openib.conf to "yes". Note that the latter option also enables SRP High Availability.
6. To use High-Availability - Automatic mode, perform the following:
Edit /etc/infiniband/openib.conf and set SRPHA_ENABLE to "yes".
Restart multipathd and OpenIB, or reboot the initiator system, as follows:
service restart multipathd
/etc/init.d/openibd stop /etc/init.d/openibd start
7. To display general characteristics of each initiator-side IB HCA and Port:
ibstat
8. To display all the available SRP target IO Controllers on the network, use one of the following:
srp_daemon -a -o -v
ibsrpdm
9. To add SCSI devices corresponding to a particular target, connect to appropriate device directory:
cd /sys/class/infiniband_srp/srp-mthca0-1
10. Enumerate the remote IO Controllers in a format that add_target will expect using one of the following, use -n below if you want to set the initiator-extension automatically:
srp_daemon -o -c
OR
ibsrpdm -c
11. Echo the appropriate line of output onto the system file add_target:
echo id_ext=0003ba0001002eac,ioc_guid=0003ba0001002eac,\ dgid=fe800000000000000003ba0001002ead,\ pkey=ffff,service_id=0003ba0001002eac > add_target
12. Use contents of /var/log/messages to determine which scsi device corresponds to each added target
On some Linux, lsscsi should show the newly created device. Using the -H and -v options gives more device information, alternatively run:
cat /proc/scsi/scsi
13. Now you can do something with the device, such as:
mkfs /dev/sdd
For more information see Configuring Linux Initiators
Degradation of performance noticed when stand-by path is activated.
This can happen when the I/O path to the active target is interrupted via a link failure or cluster takeover.
Workaround: To resume performance, I/O should be stopped on the initiator and a new session established to the target. From that point, I/O may be started and continue with the original level of throughput.
Linux SRP initiator may keep session open after a cluster takeover and multi-path failover.
I/Os will continue but the srp_daemon will report that it has not connected to the takeover target. This problem can be seen by running:
srp_daemon -o -c -n -i <ib-device> -p <port-num>
The srp_daemon will report those targets to which it has not already connected. The expected behavior is for the srp_daemon to show the failed controller.
Workaround: Restart the srp_daemon.
Session may hang during cluster takeover.
The client message log will report SRP failures and I/O errors in /var/log/messages:
Jan 27 11:57:03 ib-client-2 kernel: host11: SRP abort called Jan 27 11:57:37 ib-client-2 kernel: host11: ib_srp: failed send status 12 Jan 27 11:57:37 ib-client-2 kernel: ib_srp: host11: add qp_in_err timer Jan 27 11:57:37 ib-client-2 kernel: host11: ib_srp: failed send status 5 Jan 27 11:57:38 ib-client-2 kernel: host11: SRP abort called Jan 27 11:57:38 ib-client-2 kernel: host11: SRP reset_device called Jan 27 11:57:38 ib-client-2 kernel: host11: ib_srp: SRP reset_host called state 0 qp_err 1 Jan 27 11:57:58 ib-client-2 kernel: host11: SRP abort called Jan 27 11:57:58 ib-client-2 kernel: host11: SRP reset_device called Jan 27 11:57:58 ib-client-2 kernel: host11: ib_srp: SRP reset_host called state 0 qp_err 1 Jan 27 11:58:02 ib-client-2 kernel: host11: ib_srp: srp_qp_in_err_timer called Jan 27 11:58:02 ib-client-2 kernel: host11: ib_srp: srp_qp_in_err_timer flushed reset - done Jan 27 11:58:02 ib-client-2 kernel: host11: ib_srp: Got failed path rec status -22 Jan 27 11:58:02 ib-client-2 kernel: host11: ib_srp: Path record query failed Jan 27 11:58:02 ib-client-2 kernel: host11: ib_srp: reconnect failed (-22), removing target port. Jan 27 11:58:08 ib-client-2 kernel: scsi 11:0:0:0: scsi: Device offlined - not ready after error recovery Jan 27 11:58:08 ib-client-2 multipathd: sdc: tur checker reports path is down Jan 27 11:58:08 ib-client-2 multipathd: checker failed path 8:32 in map mpath148 Jan 27 11:58:08 ib-client-2 multipathd: mpath148: Entering recovery mode: max_retries=200 Jan 27 11:58:08 ib-client-2 multipathd: mpath148: remaining active paths: 0 Jan 27 11:58:08 ib-client-2 multipathd: sdc: remove path (uevent) Jan 27 11:58:08 ib-client-2 multipathd: mpath148: map in use Jan 27 11:58:08 ib-client-2 multipathd: mpath148: can't flush Jan 27 11:58:08 ib-client-2 multipathd: mpath148: Entering recovery mode: max_retries=200 Jan 27 11:58:08 ib-client-2 multipathd: dm-2: add map (uevent) Jan 27 11:58:08 ib-client-2 multipathd: dm-2: devmap already registered Jan 27 11:58:08 ib-client-2 kernel: scsi 11:0:0:0: scsi: Device offlined - not ready after error recovery Jan 27 11:58:08 ib-client-2 last message repeated 49 times Jan 27 11:58:08 ib-client-2 kernel: scsi 11:0:0:0: rejecting I/O to dead device Jan 27 11:58:08 ib-client-2 kernel: device-mapper: multipath: Failing path 8:32. Jan 27 11:58:08 ib-client-2 kernel: scsi 11:0:0:0: rejecting I/O to dead device
Workaround The client must be rebooted to recover the SRP service.
Device mapper state may become out of date preventing devices being added to the map table.
When this problem occurs, the /var/log/messages log will show:
device-mapper: table: 253:2: multipath: error getting device device-mapper: ioctl: error adding target to table
The multipath command queries will report the correct state:
ib-client-1:~ # multipath -d reload: maguro2LUN (3600144f08068363800004b6075db0001) n/a SUN,Sun Storage 7310 [size=40G][features=0][hwhandler=0][n/a] \_ round-robin 0 [prio=50][undef] \_ 18:0:0:1 sde 8:64 [undef][ready] \_ round-robin 0 [prio=1][undef] \_ 17:0:0:1 sdc 8:32 [failed][ghost]
Workaround: The client must be rebooted to clear the stale device mapper state.
SRP initiator may go into an infinite retry target loop.
Upon cluster takeover, a multipath device may not swtich to the standby path as expected. The problem is with the SRP initiator. The initiator is in an infinite loop write, fail, abort, reset device, reset target operations. When this problem happens, the following messages will be logged in the /var/log/messages log:
Jan 26 17:42:12 mysystem kernel: sd 13:0:0:0: [sdd] Device not ready: Sense Key : Not Ready [current] Jan 26 17:42:12 mysystem kernel: sd 13:0:0:0: [sdd] Device not ready: Add. Sense: Logical unit not accessible, target port in standby state Jan 26 17:42:12 mysystem kernel: end_request: I/O error, dev sdd, sector 512248 Jan 26 17:42:12 mysystem kernel: scsi host13: SRP abort called Jan 26 17:42:12 mysystem kernel: scsi host13: SRP reset_device called Jan 26 17:42:12 mysystem kernel: scsi host13: ib_srp: SRP reset_host called state 0 qp_err 0 Jan 26 17:42:21 mysystem multipathd: 8:48: mark as failed
Workaround: Remove the device and re-scan.
The VMware Native MultiPath Plugin (nmp) has two components that can be changed on a device by device, path by path, or array basis.
Controls which physical path is used for I/O
# esxcli nmp psp list Name Description VMW_PSP_MRU Most Recently Used Path Selection VMW_PSP_RR Round Robin Path Selection VMW_PSP_FIXED Fixed Path Selection
Controls how failover works
The SATP has to be configured to recognize the array vendor or model string in order to change the basic failover mode from a default Active/Active type array to ALUA.
By default the Sun Storage 7000 cluster was coming up as a Active/Active array only.
To manually change this you need to use the ESX CLI to add a rule to have the ALUA plugin claim the 7000 luns.
# esxcli nmp satp addrule -s VMW_SATP_ALUA \ -e "Sun Storage 7000" -V "SUN" -M "Sun Storage 7410" -c "tpgs_on"
options are:
-s VMW_SATP_ALUA - for the ALUA SATP -e description of the rule -V Vendor -M Model -c claim option for Target Portal Group (7000 seems to support implicit)
If no luns have been scanned/discovered, you can simply rescan the adapter to find new luns, they should be claimed by the ALUA plugin. If luns are already present, reboot the ESX host.
After the reboot, we should see the luns being listed under the VMW_SATP_ALUE array type.
# esxcli nmp device list
naa.600144f096bb823800004b707f2d0001 Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001) Storage Array Type: VMW_SATP_ALUA Storage Array Type Device Config: {implicit_support=on;explicit_support=off;explicit_allow=on; alua_followover=on; {TPG_id=0,TPG_state=AO}{TPG_id=1,TPG_state=STBY}} Path Selection Policy: VMW_PSP_MRU Path Selection Policy Device Config: Current Path=vmhba_mlx4_1.1.1:C0:T1:L0 Working Paths: vmhba_mlx4_1.1.1:C0:T1:L0
Relevant lun path lists should show a Active and a Standby path
# esxcli nmp path list
gsan.80fe53553e0100282100-gsan.80fe8f583e0100282100 -naa.600144f096bb823800004b707f2d0001 Runtime Name: vmhba_mlx4_1.1.1:C0:T2:L0 Device: naa.600144f096bb823800004b707f2d0001 Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001) Group State: standby Storage Array Type Path Config: {TPG_id=1,TPG_state=STBY,RTP_id=256,RTP_health=UP} Path Selection Policy Path Config: {non-current path}
gsan.80fe53553e0100282100-gsan.80fe73583e0100282100 -naa.600144f096bb823800004b707f2d0001 Runtime Name: vmhba_mlx4_1.1.1:C0:T1:L0 Device: naa.600144f096bb823800004b707f2d0001 Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001) Group State: active Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=2,RTP_health=UP} Path Selection Policy Path Config: {current path}
Standby and active paths may not be found
The esxcl nmp path list command should report an active and a standby path one each for the SRP targets in a cluster configuration.
[root@ib-client-5 vmware]# esxcli nmp path list gsan.80fe53553e0100282100-gsan.80fe8f583e0100282100- naa.600144f096bb823800004b707f2d0001 Runtime Name: vmhba_mlx4_1.1.1:C0:T2:L0 Device: naa.600144f096bb823800004b707f2d0001 Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001) Group State: standby Storage Array Type Path Config: {TPG_id=1,TPG_state=STBY,RTP_id=256,RTP_health=UP} Path Selection Policy Path Config: {non-current path}
gsan.80fe53553e0100282100-gsan.80fe73583e0100282100- naa.600144f096bb823800004b707f2d0001 Runtime Name: vmhba_mlx4_1.1.1:C0:T1:L0 Device: naa.600144f096bb823800004b707f2d0001 Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001) Group State: active Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=2,RTP_health=UP} Path Selection Policy Path Config: {current path}
When this problem occurs, the active or standby path may not be shown in the output of esxcli nmp path list.
Workaround: None
VMWare VM Linux guest may hang during cluster takeover
When this problem happens, the Linux guest system log will report in its /var/log/messages log:
Feb 10 16:10:00 ib-client-5 vmkernel: 1:21:41:36.385 cpu3:4421)<3>ib_srp: Send tsk_mgmt target[vmhba_mlx4_1.1.1:2] out of TX_IU head 769313 tail 769313 lim 0
Workaround: Reboot guest VM
This section describes instructions for using the BUI to configure iSER and SRP targets and initiators.
In the BUI, iSER targets are managed as iSCSI targets on the Configuration > SAN screen.
1. To configure ibd interfaces, select the ibd interface (or ipmp), and drag it to the Datalinks list to create the datalink on the Configuration > Network screen. Then drag the Datalink to the Interfaces list to create a new interface.
2. To create an iSER target, got to the Configuration > SAN
screen. Click the iSCSI Targets link and then click the add
icon to add a new iSER target with an alias.
3. To create a target group, drag the target you just created to the iSCSI Target Group list.
4. To create an initiator, click the Initiator link and then click
the iSCSI initiators link. Click the add icon to add a
new initiator. Enter the Initiator IQN and an alias and click OK.
While creating an initiator group is optional, if you don't create a group, the LUN associated with the target will be available to all initiators. To create a group, drag the initiator to the iSCSI Initiator Groups list.
5. To create a LUN, go to the Shares screen and click
the LUN link. Then click the add icon and associate the
new LUN with target or initiator groups you created already using the
Target Group and Initiator Groups menu.
6. Two client initiators are supported: RedHat 4 and SUSE 11, use any method to discover iSER LUNs on the client.
This procedure describes the steps for configuring SRP targets.
1. Connect HCA ports to IB interfaces.
The targets are automatically discovered by the appliance.
2. To create the target group, go to the Configuration > SAN screen.
3. Click the Target link and then click SRP targets
The SRP targets page appears.
4. To create the target group, use the move icon
to drag a target to the Target Groups list.
5. (Optional) To create an initiator and initiator group on the Initiator
screen, click the icon, collect GUID from initiator, assign it a
name, and drag it to initiator group.
6. To create a LUN and associate it with the SRP target and initiators you created in the previous steps, go to the Shares screen.
7. Click the LUN link and then click the LUN icon.
Use the Target Group and Initiator Group menus on the Create LUN
dialog to select the SRP groups to associate with the LUN.
The following SRP initiators have been tested and are known to work:
VMWare ESX
RedHat 5.4
SUSE11
The following example demonstrates how to create an SRP target group named targetSRPgroup using the CLI configuration san targets srp groups context:
swallower:configuration san targets srp groups> create swallower:configuration san targets srp group (uncommitted)> set name=targetSRPgroup name = targetSRPgroup (uncommitted) swallower:configuration san targets srp group (uncommitted)> set targets=eui.0002C903000489A4 targets = eui.0002C903000489A4 (uncommitted) swallower:configuration san targets srp group (uncommitted)> commit swallower:configuration san targets srp groups> list GROUP NAME group-000 targetSRPgroup | +-> TARGETS eui.0002C903000489A4
The following example demonstrates how to create a LUN and associate it with the targetSRPgroup using the CLI shares CLI context:
swallower:shares default> lun mylun swallower:shares default/mylun (uncommitted)> set targetgroup=targetSRPgroup targetgroup = targetSRPgroup (uncommitted) swallower:shares default/mylun (uncommitted)> set volsize=10 volsize = 10 (uncommitted) swallower:shares default/mylun (uncommitted)> commit swallower:shares default> list Filesystems: NAME SIZE MOUNTPOINT test 38K /export/test LUNs: NAME SIZE GUID mylun 10G 600144F0E9D19FFB00004B82DF490001