Linux SCSI scanning troubleshooting

by Kurt Garloff <garloff@suse.de>, 8/2003, 7/2006

Executive summary

If you fail to detect all LUNs on a storage device, pass the option scsi_reportluns2=1 on the kernel command line.

Most common problems

Most problems to detect all LUNs of a storage device correctly are caused by devices reporting to support the SCSI-2 command set only. This results in three things
  1. As SCSI-2 only supports LUNs 0 -- 7, the kernel stops the sequential scan after LUN 7.
  2. As SCSI-2 does not support the REPORT_LUNS command, the kernel does not try the advanced scanning method based on this command.
  3. In sequential scanning, the kernel by default stops scanning after a non-existing LUN (which is an optimization that can create problems.
The kernel has a blacklist which allows the kernel to behave differently. The respective flags to change this are:
#define BLIST_LARGELUN          0x200   /* LUNs past 7 on a SCSI-2 device */
#define BLIST_REPORTLUN2        0x20000 /* try REPORT_LUNS even for SCSI-2 devs (if HBA supports more than 8 LUNs) */
#define BLIST_SPARSELUN         0x040   /* Non consecutive LUN numbering */
For devices that are not yet listed, these options can be passed as options to the scsi_mod kernel module, by either putting it into module options by passing options on the kernel command line.
Options
Option (SLES9 + SLE10) Option (SLES8)
(works on SLES9/10 kernel
commandline as well)
Description
default_dev_flags=0x20000 scsi_reportlun2=1 Instruct the kernel to try the REPORT_LUNS command even for SCSI-2 devices in case the scsi host controller (HBA) does report to support more than 8 LUNs. This option is safe as the broken USB devices that lockup on REPORT_LUNS are on the USB controller which does not support more than 8 LUNs.
default_dev_flags=0x200 scsi_largelun=1 Try to scan beyond LUN 7 even for SCSI-2 devices.
default_dev_flags=0x040 scsi_sparselun=1 Continue in the sequential scan for LUNs until max_luns even if a LUN has been reported not to exist.

You can combine more than one option. So to pass all three options, use default_dev_flags=0x20240 or scsi_largelun=1 scsi_reportlun2=1 scsi_sparselun=1 respectively.

The syntax to pass this on the command line is to put scsi_mod.default_dev_flags=... there (SLES9+10 -- 2.6 kernels) or simply scsi_...=1 (all SLES kernels).
To build it into your initrd, put

options scsi_mod default_dev_flags=...
into /etc/modprobe.conf.local (SLES9+10) or
options scsi_mod scsi_...=1
into /etc/modules.conf (SLES8 only).

Note that the SLES8 syntax is only supported on SUSE kernels, whereas the newer syntax is supported in all 2.6 kernels and even our reportlun2 feature got into the 2.6.7 kernel.

More specific targetting of devices

The above options set the scanning options for all devices. In case this causes bad side-effects on other devices, you can apply a variant that only applies to a specific device.

The syntax for it on 2.6 kernels is

dev_flags=VENDOR:MODEL:FLAGS[,VENDOR2:MODEL2:FLAGS2[,..]]
The flags are the same as described above. You need to prefix with scsi_mod. if passing on the kernel command line.

For the SLES8 kernel, the syntax is

llun_blklst=C,B,T[,C,B,T[,..]]
It sets BLIST_LARGELUN | BLIST_SPARSELUN for the devices on controller number C, bus number B (normally 0) and for target ID T. Up to 8 devices can be modified this way.

Duplicate LUNs or strange numbers

The serial scan done on SCSI-2 devices (with LARGELUN) may result in duplicate LUNs. The reason for this is that the LUNs actually are not just a flat numberspace, but may be given in various formats that reflect the hierarchy of the LUNs. Bits 14 + 15 are an indicator of the used format. (Full LUNs consist of 8 bytes, which can be grouped into 4 2byte groups.)
Serial scan beyond LUN 16383 can result in LUNs been seen multiple times; using the REPORT_LUNS method to scan will also avoid this.
Limit max_luns does not seem to help as the adapter drivers can override this setting.

The REPORT_LUNS scanning method can also result in strange LUN numbers to get reported to the Linux stack; this happens if the device uses the more esoteric formats to report LUNs. You can avoid this by setting the BLIST_NOREPORTLUN flag.
This translates to to the flag value of 0x40000 for the 2.6 kernel, or scsi_noreportlun=1 on the kernel commandline.

Peripheral device qualifiers

TBW

More options

TBW

Diagnosis

To get verbose output on SCSI scanning pass scsi_mod.scsi_logging=448 on the kernel command line. Reduce the number to 128 in case this is too verbose for your syslog.

Links