VMware - VCSA 6.5u0 or PSC appliance - change SCSI block timeout

Overview

In a previous post, I've already talk about an issue occuring in a lab environment with vCSA 6.5u0 and PSC appliance : VCSA or PSC appliance won't boot after hard shutdown.

As the issue became more regular with time, I tried to figure out the root cause of those events.

As system's logs reports SCSI timeout on write operations, I remember myself that the default 30 seconds timeout could be insufficient in some virtualized environment. So the proposal to fix it, is the modification of timeout to a higher value.

Default appliance's settings

We can display the current (default at this time) value of SCSI timeout for any block device of the system with the following command (based on sysfs, a pseudo file system provided by the Linux kernel since version 2.6):

1for d in `ls /dev/sd* | egrep "sd[a-z]$"`; do
2    printf "`basename $d`: "
3    cat /sys/block/`basename $d`/device/timeout;
4done
 1sda: 30
 2sdb: 30
 3sdc: 30
 4sdd: 30
 5sde: 30
 6sdf: 30
 7sdg: 30
 8sdh: 30
 9sdi: 30
10sdj: 30
11sdk: 30
12sdl: 30

Following command shows the same informations:

find /sys/class/scsi_generic/*/device/timeout -exec grep -H . '{}' \;

 1/sys/class/scsi_generic/sg0/device/timeout:30
 2/sys/class/scsi_generic/sg10/device/timeout:30
 3/sys/class/scsi_generic/sg11/device/timeout:30
 4/sys/class/scsi_generic/sg12/device/timeout:30
 5/sys/class/scsi_generic/sg1/device/timeout:30
 6/sys/class/scsi_generic/sg2/device/timeout:30
 7/sys/class/scsi_generic/sg3/device/timeout:30
 8/sys/class/scsi_generic/sg4/device/timeout:30
 9/sys/class/scsi_generic/sg5/device/timeout:30
10/sys/class/scsi_generic/sg6/device/timeout:30
11/sys/class/scsi_generic/sg7/device/timeout:30
12/sys/class/scsi_generic/sg8/device/timeout:30
13/sys/class/scsi_generic/sg9/device/timeout:30

As mentionned in KB #1009465 Increasing the disk timeout values for a Linux 2.6 virtual machine, VMware tools creates a udev rule at /etc/udev/rules.d/99-vmware-scsi-udev.rules that sets the timeout to 180 seconds for each VMware virtual disk device and reloads the udev rules so that it takes effect immediately. But on the Photon appliance this udev rule doesn't exists anymore :

ls -l /etc/udev/rules.d/*vmware*

1-rw-r--r-- 1 root root 268 Sep 30  2016 /etc/udev/rules.d/99-vmware-hotplug.rules
2-rw-r--r-- 1 root root 104 Oct 22  2016 /etc/udev/rules.d/99-vmware-udev.rules

To compare only: on a "non-Photon based" Linux VM, a /etc/udev/rules.d/99-vmware-scsi-udev.rules file exists (created by the VMware-tools installer) and contains:

1#
2# VMware SCSI devices Timeout adjustment
3#
4# Modify the timeout value for VMware SCSI devices so that
5# in the event of a failover, we don't time out.
6# See Bug 271286 for more information.
7
8ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware  ", ATTRS{model}=="Virtual disk    ", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/timeout'"

So we probably need to increase the value by ourselves at each system startup : by using rc.local file for exemple.

@tsugliani remembers me the NetApp recommandations about disk timeout on virtualized guest OS : What are the guest OS tunings needed for a VMware vSphere deployment?

Guest OS TypHistorical Guest OS Tuning for SANUpdated Guest OS Tuning for SAN
Windowsdisk timeout = 190disk timeout = 60
Linuxdisk timeout = 190disk timeout = 60
Solarisdisk timeout = 190
busy retry = 300
not ready retry = 300
reset retry = 30
max.throttle = 32
min.throttle = 8
disk timeout = 60
busy retry = 300
not ready retry = 300
reset retry = 30
max.throttle = 32
min.throttle = 8
corrected VID/PID specification

VMware Support team confirms that the expected value is 180 seconds as configured in VCSA 6.0 build-3339084:

find /sys/class/scsi_generic/*/device/timeout -exec grep -H . '{}' \;

 1/sys/class/scsi_generic/sg0/device/timeout:180
 2/sys/class/scsi_generic/sg1/device/timeout:180
 3/sys/class/scsi_generic/sg10/device/timeout:180
 4/sys/class/scsi_generic/sg11/device/timeout:30
 5/sys/class/scsi_generic/sg2/device/timeout:180
 6/sys/class/scsi_generic/sg3/device/timeout:180
 7/sys/class/scsi_generic/sg4/device/timeout:180
 8/sys/class/scsi_generic/sg5/device/timeout:180
 9/sys/class/scsi_generic/sg6/device/timeout:180
10/sys/class/scsi_generic/sg7/device/timeout:180
11/sys/class/scsi_generic/sg8/device/timeout:180
12/sys/class/scsi_generic/sg9/device/timeout:180

Fix SCSI timeout

There is multiple ways to fix the SCSI timeout value:

  1. Upgrade to VCSA 6.5u1 (aka build 5973321)
  2. Manually add the udev rule
  3. Add a rc.local file (not recommended)

VCSA upgrade way

It's not mentionned in the Relase Notes, but VCSA 6.5 build 5973321 include a fix for the missing udev rule with openvm-tools :

find /sys/class/scsi_generic/*/device/timeout -exec grep -H . '{}' \;

 1/sys/class/scsi_generic/sg0/device/timeout:30
 2/sys/class/scsi_generic/sg10/device/timeout:180
 3/sys/class/scsi_generic/sg11/device/timeout:180
 4/sys/class/scsi_generic/sg12/device/timeout:180
 5/sys/class/scsi_generic/sg1/device/timeout:180
 6/sys/class/scsi_generic/sg2/device/timeout:180
 7/sys/class/scsi_generic/sg3/device/timeout:180
 8/sys/class/scsi_generic/sg4/device/timeout:180
 9/sys/class/scsi_generic/sg5/device/timeout:180
10/sys/class/scsi_generic/sg6/device/timeout:180
11/sys/class/scsi_generic/sg7/device/timeout:180
12/sys/class/scsi_generic/sg8/device/timeout:180
13/sys/class/scsi_generic/sg9/device/timeout:180

An upgrade is the best way to avoid this issue.

Manually add the udev rule

It's possible to manually add the missing udev rule and to apply it :

1echo "
2ACTION==\"add\", SUBSYSTEMS==\"scsi\", ATTRS{vendor}==\"VMware*\" , ATTRS{model}==\"Virtual disk*\", RUN+=\"/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'\"
3ACTION==\"add\", SUBSYSTEMS==\"scsi\", ATTRS{vendor}==\"VMware*\" , ATTRS{model}==\"VMware Virtual S\", RUN+=\"/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'\"
4" > /etc/udev/rules.d/99-vmware-scsi-udev.rules

A reboot is necessary to apply the new rule (the hot command udevadm control --reload-rules && udevadm trigger didn't work for me).

The rc.local way

By default , there is no created rc.local file on the Photon based appliance to run simple commands at every system startup. But it's simple to find out where to create this file by displaying the systemd rc-local service configuration:

systemctl cat rc-local

 1# /etc/systemd/system/rc-local.service
 2#  This file is part of systemd.
 3#
 4#  systemd is free software; you can redistribute it and/or modify it
 5#  under the terms of the GNU Lesser General Public License as published by
 6#  the Free Software Foundation; either version 2.1 of the License, or
 7#  (at your option) any later version.
 8
 9# This unit gets pulled automatically into multi-user.target by
10# systemd-rc-local-generator if /etc/rc.d/rc.local is executable.
11[Unit]
12Description=/etc/rc.d/rc.local Compatibility
13ConditionFileIsExecutable=/etc/rc.d/rc.local
14After=network.target
15
16[Service]
17Type=forking
18ExecStart=/etc/rc.d/rc.local start
19TimeoutSec=0
20RemainAfterExit=yes

As mentionned, the /etc/rc.d/rc.local must be created and executable. Let's do it !

vi /etc/rc.d/rc.local

 1#!/bin/sh -e
 2#
 3# rc.local
 4#
 5# This script is executed at the end of each multiuser runlevel.
 6# Make sure that the script will "exit 0" on success or any other
 7# value on error.
 8#
 9# In order to enable or disable this script just change the execution
10# bits.
11#
12# By default this script does nothing.
13
14
15# Change defaut SCSI timeout on all disks
16for d in `ls /dev/sd* | egrep "sd[a-z]$"`
17do
18  echo 180 > /sys/block/`basename $d`/device/timeout
19done
20
21exit 0

When saved, we change the file permission to make it executable:

chmod +x /etc/rc.d/rc.local

Then we activate the rc-local on system startup:

systemctl enable rc-local

And we test it

systemctl start rc-local

No restart is needed to apply the new timeout settings. At every system startup, the rc.local file will be instantiate and the timeout value increased from 30 seconds to 180.

Check new timeout value

1for d in `ls /dev/sd* | egrep "sd[a-z]$"`; do
2    printf "`basename $d`: "
3    cat /sys/block/`basename $d`/device/timeout;
4done
 1sda: 180
 2sdb: 180
 3sdc: 180
 4sdd: 180
 5sde: 180
 6sdf: 180
 7sdg: 180
 8sdh: 180
 9sdi: 180
10sdj: 180
11sdk: 180
12sdl: 180

Each block device should now use a 180 second timeout for SCSI commands.