linux/drivers
Corinna Vinschen bc6ed2fa24 igb: clean up in all error paths when enabling SR-IOV
After commit 50f303496d ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.

In case of one test machine with a dual port 82580, this hang occurred:

[  232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[  233.093257] igb 0000:41:00.1: IOV Disabled
[  233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[  233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.352248] igb 0000:41:00.0:   device [8086:1516] error status/mask=00100000
[  233.361088] igb 0000:41:00.0:    [20] UnsupReq               (First)
[  233.368183] igb 0000:41:00.0: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.388779] igb 0000:41:00.1:   device [8086:1516] error status/mask=00100000
[  233.397629] igb 0000:41:00.1:    [20] UnsupReq               (First)
[  233.404736] igb 0000:41:00.1: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[  233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[  233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[  234.157244] igb 0000:41:00.0: IOV Disabled
[  371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[  371.627489]       Not tainted 6.4.0-dirty #2
[  371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[  371.641000] task:irq/35-aerdrv   state:D stack:0     pid:257   ppid:2      f0
[  371.650330] Call Trace:
[  371.653061]  <TASK>
[  371.655407]  __schedule+0x20e/0x660
[  371.659313]  schedule+0x5a/0xd0
[  371.662824]  schedule_preempt_disabled+0x11/0x20
[  371.667983]  __mutex_lock.constprop.0+0x372/0x6c0
[  371.673237]  ? __pfx_aer_root_reset+0x10/0x10
[  371.678105]  report_error_detected+0x25/0x1c0
[  371.682974]  ? __pfx_report_normal_detected+0x10/0x10
[  371.688618]  pci_walk_bus+0x72/0x90
[  371.692519]  pcie_do_recovery+0xb2/0x330
[  371.696899]  aer_process_err_devices+0x117/0x170
[  371.702055]  aer_isr+0x1c0/0x1e0
[  371.705661]  ? __set_cpus_allowed_ptr+0x54/0xa0
[  371.710723]  ? __pfx_irq_thread_fn+0x10/0x10
[  371.715496]  irq_thread_fn+0x20/0x60
[  371.719491]  irq_thread+0xe6/0x1b0
[  371.723291]  ? __pfx_irq_thread_dtor+0x10/0x10
[  371.728255]  ? __pfx_irq_thread+0x10/0x10
[  371.732731]  kthread+0xe2/0x110
[  371.736243]  ? __pfx_kthread+0x10/0x10
[  371.740430]  ret_from_fork+0x2c/0x50
[  371.744428]  </TASK>

The reproducer was a simple script:

  #!/bin/sh
  for i in `seq 1 5`; do
    modprobe -rv igb
    modprobe -v igb max_vfs=1
    sleep 1
    modprobe -rv igb
  done

It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210.  Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.

Prior to commit 50f303496d ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch.  After this
commit it only returned the error code.

So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.

Fix this problem by cleaning up again if pci_enable_sriov() fails.

Fixes: 50f303496d ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:24:29 +01:00
..
accel
accessibility
acpi More thermal control updates for 6.6-rc1 2023-09-04 15:17:28 -07:00
amba
android Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
ata ata changes for 6.6 2023-09-05 12:37:28 -07:00
atm
auxdisplay drm for 6.6-rc1 2023-08-30 13:34:34 -07:00
base Driver core changes for 6.6-rc1 2023-09-01 09:43:18 -07:00
bcma
block Mixed with some fixes and cleanups, this brings in reasonably complete 2023-09-06 12:10:15 -07:00
bluetooth TTY/Serial driver changes for 6.6-rc1 2023-09-01 09:38:00 -07:00
bus Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
cdrom
cdx
char tpm: Enable hwrng only for Pluton on AMD CPUs 2023-09-04 21:57:59 +03:00
clk This pull request is full of clk driver changes. In fact, there aren't any 2023-08-30 19:53:39 -07:00
clocksource Updates for clocksource/clockevent drivers: 2023-09-04 13:15:57 -07:00
comedi
connector
counter - New Drivers 2023-09-04 13:47:59 -07:00
cpufreq
cpuidle powerpc updates for 6.6 2023-08-31 12:43:10 -07:00
crypto
cxl
dax
dca
devfreq
dio
dma dmaengine updates for v6.6 2023-09-03 10:49:42 -07:00
dma-buf drm for 6.6-rc1 2023-08-30 13:34:34 -07:00
edac Intel EDAC fixes: 2023-08-30 19:23:00 -07:00
eisa
extcon
firewire
firmware Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
fpga
fsi
genpd ARM: SoC drivers for 6.6 2023-08-30 16:42:21 -07:00
gnss
gpio
gpu ARM: 2023-09-07 13:52:20 -07:00
greybus
hid for-linus-2023083101 2023-09-01 12:31:44 -07:00
hsi
hte
hv hyperv-next for v6.6 2023-09-04 11:26:29 -07:00
hwmon Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
hwspinlock
hwtracing
i2c I2C has mainly cleanups this time and a few driver improvements. Because 2023-09-04 13:44:11 -07:00
i3c i3c: master: svc: fix probe failure when no i3c device exist 2023-09-06 01:21:47 +02:00
idle
iio Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
infiniband SCSI misc on 20230902 2023-09-02 12:02:41 -07:00
input Input updates for 6.6 merge window: 2023-09-06 09:24:25 -07:00
interconnect This pull request is full of clk driver changes. In fact, there aren't any 2023-08-30 19:53:39 -07:00
iommu IOMMU Updates for Linux v6.6 2023-09-01 16:54:25 -07:00
ipack
irqchip Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
isdn
leds - Core Frameworks 2023-09-04 13:52:58 -07:00
macintosh powerpc updates for 6.6 2023-08-31 12:43:10 -07:00
mailbox mailbox: qcom-ipcc: fix incorrect num_chans counting 2023-09-05 10:11:01 -05:00
mcb
md for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
media media updates for v6.6-rc1 2023-09-01 12:21:32 -07:00
memory
memstick
message
mfd
misc Char/Misc driver changes for 6.6-rc1 2023-09-01 09:53:54 -07:00
mmc TTY/Serial driver changes for 6.6-rc1 2023-09-01 09:38:00 -07:00
most
mtd - New Drivers 2023-09-04 13:47:59 -07:00
mux
net igb: clean up in all error paths when enabling SR-IOV 2023-09-13 12:24:29 +01:00
nfc NFC: nxp: add NXP1002 2023-08-30 18:32:24 -07:00
ntb
nubus
nvdimm nvdimm changes for v6.6 merge window 2023-08-30 20:52:08 -07:00
nvme for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
nvmem
of Devicetree updates for v6.6: 2023-08-30 16:59:03 -07:00
opp
parisc
parport TTY/Serial driver changes for 6.6-rc1 2023-09-01 09:38:00 -07:00
pci powerpc updates for 6.6 2023-08-31 12:43:10 -07:00
pcmcia
peci
perf ARM: 2023-09-07 13:52:20 -07:00
phy phy-for-6.6 2023-09-03 10:38:02 -07:00
pinctrl Pin control bulk changes for the v6.6 kernel cycle: 2023-08-30 19:36:19 -07:00
platform USB / Thunderbolt / PHY driver update for 6.6-rc1 2023-09-01 09:23:34 -07:00
pnp
power
powercap
pps
ps3
ptp
pwm pwm: Changes for v6.6-rc1 2023-09-07 18:05:58 -07:00
rapidio
ras
regulator regulator: Fixes for v6.6 2023-09-07 15:51:07 -07:00
remoteproc remoteproc updates for v6.6 2023-09-04 15:12:26 -07:00
reset This pull request is full of clk driver changes. In fact, there aren't any 2023-08-30 19:53:39 -07:00
rpmsg rpmsg updates for v6.6 2023-09-04 15:08:52 -07:00
rtc RTC for 6.6 2023-09-07 16:07:35 -07:00
s390 more s390 updates for 6.6 merge window 2023-09-07 10:52:13 -07:00
sbus
scsi ata changes for 6.6 2023-09-05 12:37:28 -07:00
sh
siox
slimbus
soc This pull request is full of clk driver changes. In fact, there aren't any 2023-08-30 19:53:39 -07:00
soundwire soundwire updates for 6.6 2023-09-03 10:20:57 -07:00
spi spi: Fixes for v6.6 2023-09-07 15:49:20 -07:00
spmi
ssb
staging pwm: Changes for v6.6-rc1 2023-09-07 18:05:58 -07:00
target SCSI misc on 20230902 2023-09-02 12:02:41 -07:00
tc
tee
thermal
thunderbolt
tty TTY/Serial driver changes for 6.6-rc1 2023-09-01 09:38:00 -07:00
ufs Merge branch 'fixes' into misc 2023-09-02 08:25:19 +01:00
uio
usb just cleanups and fixes 2023-09-07 10:35:14 -07:00
vdpa virtio: features 2023-09-04 10:43:44 -07:00
vfio iommufd for 6.6 2023-08-30 20:41:37 -07:00
vhost vdpa: add get_backend_features vdpa operation 2023-09-03 18:10:22 -04:00
video - New Functionality 2023-09-06 09:00:37 -07:00
virt
virtio virtio_ring: fix avail_wrap_counter in virtqueue_add_packed 2023-09-03 18:10:24 -04:00
vlynq
w1
watchdog linux-watchdog 6.6-rc1 tag 2023-09-06 09:19:12 -07:00
xen dma-maping updates for Linux 6.6 2023-08-29 20:32:10 -07:00
zorro
Kconfig
Makefile This pull-request adds a new subsystem for genpd providers in drivers/genpd 2023-08-30 16:37:00 -07:00