In the discovery thread, ibmvfc does a vhost->task_set++ without any lock
held. This could result in two targets getting the same cancel key, which
could have strange effects in error recovery. The actual probability of
this occurring should be extremely small, since this should all be done in
a single threaded loop from the discovery thread, but let's fix it up
anyway to be safe.
Link: https://lore.kernel.org/r/1600286999-22059-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scatter_data_area() has two purposes:
1) Create the iovs for the data area buffer of a SCSI cmd.
2) If there is data in DMA_TO_DEVICE direction, copy
the data from sg_list to data area buffer.
Both are done in a common loop.
In case of DMA_FROM_DEVICE data transfer, scatter_data_area() is called
with parameter copy_data = false. But this flag is just used to skip
memcpy() for data, while radix_tree_lookup still is called for every dbi of
the area area buffer, and kmap and kunmap are called for every page from
sg_list and data_area as well as flush_dcache_page() for the data area
pages. Since the only thing to do with copy_data = false would be to set
up the iovs, this is a noticeable overhead. Rework the iov creation in the
main loop of scatter_data_area() providing the new function
new_block_to_iov(). Based on this, create the short new function
tcmu_setup_iovs() that only writes the iovs with no overhead. This new
function is now called instead of scatter_data_area() for bidi buffers and
for data buffers in those cases where memcpy() would have been skipped.
Link: https://lore.kernel.org/r/20200910155041.17654-4-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
queue_cmd_ring() needs to check whether there is enough space in cmd ring
and data area for the cmd to queue.
Currently the sequence is:
1) Calculate size the cmd will occupy on the ring based on estimation of
needed iovs.
2) Check whether there is enough space on the ring based on size from 1)
3) Allocate buffers in data area.
4) Calculate number of iovs the command really needs while copying
incoming data (if any) to data area.
5) Re-calculate real size of cmd on ring based on real number of iovs.
6) Set up possible padding and cmd on the ring.
Step 1) must not underestimate the cmd size so use max possible number of
iovs for the given I/O data size. The resulting overestimation can be
really high so this sequence is not ideal. The earliest the real number of
iovs can be calculated is after data buffer allocation. Therefore rework
the code to implement the following sequence:
A) Allocate buffers on data area and calculate number of necessary iovs
during this.
B) Calculate real size of cmd on ring based on number of iovs.
C) Check whether there is enough space on the ring.
D) Set up possible padding and cmd on the ring.
The new sequence enforces the split of new function tcmu_alloc_data_space()
from is_ring_space_avail(). Using this function, change queue_cmd_ring()
according to the new sequence.
Change routines called by tcmu_alloc_data_space() to allow calculating and
returning the iov count. Remove counting of iovs in scatter_data_area().
Link: https://lore.kernel.org/r/20200910155041.17654-3-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simplify code by joining tcmu_cmd_get_data_length() and
tcmu_cmd_get_block_cnt() into tcmu_cmd_set_block_cnts(). The new function
sets tcmu_cmd->dbi_cnt and also the new field tcmu_cmd->dbi_bidi_cnt which
is needed for further enhancements in following patches. Simplify some
code by using tcmu_cmd->dbi(_bidi)_cnt instead of calculation from length.
Please note: The calculation of the number of dbis needed for bidi was
wrong. It was based on the length of the first bidi sg only. I changed it
to correctly sum up entire length of all bidi sgs.
Link: https://lore.kernel.org/r/20200910155041.17654-2-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
VirtIO 1.0 spec says:
The removed and rescan events ... when sent for LUN 0, they MAY
apply to the entire target so the driver can ask the initiator
to rescan the target to detect this.
This change introduces the behaviour described above by scanning the entire
SCSI target when LUN is set to 0. This is both a functional and a
performance fix. It aligns the driver with the spec and allows control
planes to hotplug targets with large numbers of LUNs without having to
request a RESCAN for each one of them.
Link: https://lore.kernel.org/r/CY4PR02MB33354370E0A81E75DD9DFE74FB520@CY4PR02MB3335.namprd02.prod.outlook.com
Suggested-by: Felipe Franciosi <felipe@nutanix.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Matej Genci <matej.genci@nutanix.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This addresses the following sparse warning:
drivers/scsi/myrb.c:2229:27: warning: symbol 'myrb_template' was not
declared. Should it be static?
drivers/scsi/myrb.c:2318:31: warning: symbol 'myrb_raid_functions' was
not declared. Should it be static?
drivers/scsi/myrb.c:2492:6: warning: symbol 'myrb_err_status' was not
declared. Should it be static?
Link: https://lore.kernel.org/r/20200915084018.2826922-1-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This addresses the following sparse warning:
drivers/scsi/myrs.c:1532:5: warning: symbol 'myrs_host_reset' was not
declared. Should it be static?
drivers/scsi/myrs.c:1922:27: warning: symbol 'myrs_template' was not
declared. Should it be static?
drivers/scsi/myrs.c:2036:31: warning: symbol 'myrs_raid_functions' was
not declared. Should it be static?
drivers/scsi/myrs.c:2046:6: warning: symbol 'myrs_flush_cache' was not
declared. Should it be static?
Link: https://lore.kernel.org/r/20200915084008.2826835-1-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This eliminates the following sparse warning:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:53:1: warning: symbol
'bnx2fc_global_lock' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:111:6: warning: symbol
'bnx2fc_devloss_tmo' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:116:6: warning: symbol
'bnx2fc_max_luns' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:121:6: warning: symbol
'bnx2fc_queue_depth' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:126:6: warning: symbol
'bnx2fc_log_fka' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20200912033758.142601-1-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This eliminates the following sparse warning:
drivers/scsi/aacraid/aachba.c:245:5: warning: symbol 'aac_convert_sgl'
was not declared. Should it be static?
drivers/scsi/aacraid/aachba.c:293:5: warning: symbol 'acbsize' was not
declared. Should it be static?
drivers/scsi/aacraid/aachba.c:324:5: warning: symbol 'aac_wwn' was not
declared. Should it be static?
Link: https://lore.kernel.org/r/20200912033749.142488-1-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a canister on a FS9100, or similar storage, running in NPIV mode, is
rebooted, its WWPNs will fail over to another canister. When this occurs,
we see a WWPN going away from the fabric at one N-Port ID, and, a short
time later, the same WWPN appears at a different N-Port ID. When the
canister is fully operational again, the WWPNs fail back to the original
canister. If there is any I/O outstanding to the target when this occurs,
it will result in the implicit logout the ibmvfc driver issues before
removing the rport to fail. When the WWPN then shows up at a different
N-Port ID, and we issue a PLOGI to it, the VIOS will see that it still has
a login for this WWPN at the old N-Port ID, which results in the VIOS
simulating a link down / link up sequence to the client, in order to get
the VIOS and client LPAR in sync.
The patch below improves the way we handle this scenario so as to avoid the
link bounce, which affects all targets under the virtual host adapter. The
change is to utilize the Move Login MAD, which will work even when I/O is
outstanding to the target. The change only alters the target state machine
for the case where the implicit logout fails prior to deleting the rport.
If this implicit logout fails, we defer deleting the ibmvfc_target object
after calling fc_remote_port_delete. This enables us to later retry the
implicit logout after terminate_rport_io occurs, or to issue the Move Login
request if a WWPN shows up at a new N-Port ID prior to this occurring.
This has been tested by IBM's storage interoperability team on a FS9100,
forcing the failover to occur. With debug tracing enabled in the ibmvfc
driver, we confirmed the move login was sent in this scenario and confirmed
the link bounce no longer occurred.
[mkp: fix checkpatch warnings]
Link: https://lore.kernel.org/r/1599859706-8505-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Boot occasionally fails with some Samsung low-power UFS devices. The reason
is that these devices have a little bit higher latency for NOP OUT
responses. This causes boot to fail because the NOP OUT command is issued
during initialization to check whether the device transport protocol is
ready or not. Increase NOP_OUT_TIMEOUT value from 30 to 50ms.
Link: https://lore.kernel.org/r/231786897.01599016081767.JavaMail.epsvc@epcpadp2
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver will throw an error message when a tampered type controller
is detected. The intent is to avoid interacting with any firmware
which is not secured/signed by Broadcom. Any tampering on firmware
component will be detected by hardware and it will be communicated to
the driver to avoid any further interaction with that component.
[mkp: switched back to dev_err]
Link: https://lore.kernel.org/r/20200814130426.2741171-1-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While reviewing commit 936e6b85da ("scsi: zfcp: Fix panic on ERP timeout
for previously dismissed ERP action"), I stumbled over
zfcp_fsf_req_complete() and wondered whether it has similar issues wrt
concurrent modification of req->erp_action by
zfcp_erp_strategy_check_fsfreq().
But a closer look shows that both its two callers [zfcp_fsf_reqid_check(),
zfcp_fsf_req_dismiss_all()] remove the request from the adapter's req_list
under the req_list's lock. Hence we can trust that if
zfcp_erp_strategy_check_fsfreq() concurrently looks up the corresponding
req_id, it won't find this request and is thus unable to modify it while
it's being processed by zfcp_fsf_req_complete().
Add a code comment that hopefully makes this easier for future readers, and
condense the two accesses to ->erp_action that made me trip over this code
path in the first place.
Link: https://lore.kernel.org/r/c500eac301fcbba5af942bbd200f2d6b14e46994.1599765652.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This addresses the following gcc warning with "make W=1":
drivers/scsi/qla1280.c: In function ‘qla1280_status_entry’:
drivers/scsi/qla1280.c:3607:28: warning: variable ‘lun’ set but not used
[-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~
drivers/scsi/qla1280.c:3607:20: warning: variable ‘target’ set but not
used [-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~~~~
drivers/scsi/qla1280.c:3607:15: warning: variable ‘bus’ set but not used
[-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~
Link: https://lore.kernel.org/r/20200907074518.2326360-5-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>