linux

Author	SHA1	Message	Date
Christoph Hellwig	68e81eba67	nvme-pci: split out a nvme_pci_ctrl_is_dead helper Clean up nvme_dev_disable by splitting the logic to detect if a controller is dead into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:38:19 +01:00
Christoph Hellwig	8cb9f10b71	nvme-pci: return early on ctrl state mismatch in nvme_reset_work The only way nvme_reset_work could be called when not in resetting state is if a reset and remove happen near the same time. This should not happen, but if it did we don't want the reset work to disable the controller because the remove is already doing that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	7d879c90ae	nvme-pci: rename nvme_disable_io_queues This function really deletes the I/O queues, so rename it to match the functionality. Also move the main wrapper right next to the actual underlying implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	10981f23a1	nvme-pci: cleanup nvme_suspend_queue Remove the unused returne value, pass a dev + qid instead of the queue as that is better for the callers as well as the function itself, and remove the entirely pointless kerneldoc comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	c80767f770	nvme-pci: remove nvme_pci_disable nvme_pci_disable has a single caller, fold it into that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	47d42d229a	nvme-pci: remove nvme_disable_admin_queue nvme_disable_admin_queue has only a single caller, and just calls two other funtions, so remove it to clean up the remove path a little more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	285b6e9b57	nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl Many of the callers decide which one to use based on a bool argument and there is at least some code to be shared, so merge these two. Also move a comment specific to a single callsite to that callsite. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>	2022-12-06 14:36:54 +01:00
Christoph Hellwig	e6d275de2e	nvme: use nvme_wait_ready in nvme_shutdown_ctrl Refactor the code to wait for CSTS state changes so that it can be reused by nvme_shutdown_ctrl. This reduces the delay between each iteration that checks CSTS from 100ms in the shutdown code to the 1 to 2ms range done during enable, matching the changes from commit `3e98c2443f` that were only applied to the enable/disable path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>	2022-12-06 14:36:53 +01:00
Christoph Hellwig	c76b8308e4	nvme-apple: fix controller shutdown in apple_nvme_disable nvme_shutdown_ctrl already shuts the controller down, there is no need to also call nvme_disable_ctrl for the shutdown case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>	2022-12-06 14:36:51 +01:00
Chaitanya Kulkarni	b296958557	nvme-fc: move common code into helper Add a helper to move the duplicate code for error message from nvme_fc_rcv_ls_req() to nvme_fc_rcv_ls_req_err_msg(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2022-12-06 09:27:24 +01:00
Chaitanya Kulkarni	6c90294d72	nvme-fc: avoid null pointer dereference Before using dynamically allcoated variable lsop in the nvme_fc_rcv_ls_req(), add a check for NULL and error out early. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2022-12-06 09:27:24 +01:00
Joel Granados	ea43fceea4	nvme: allow unprivileged passthrough of Identify Controller Add unprivileged passthrough of the I/O Command Set Independent and I/O Command Set Specific Identify Controller sub-command. This will allow access to attributes (e.g. MDTS and WZSL) that are needed to effectively form passthrough I/O to the /dev/ng* character devices. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2022-12-06 09:27:03 +01:00
Sagi Grimberg	d4d957b53d	nvme-multipath: support io stats on the mpath device Our mpath stack device is just a shim that selects a bottom namespace and submits the bio to it without any fancy splitting. This also means that we don't clone the bio or have any context to the bio beyond submission. However it really sucks that we don't see the mpath device io stats. Given that the mpath device can't do that without adding some context to it, we let the bottom device do it on its behalf (somewhat similar to the approach taken in nvme_trace_bio_complete). When the IO starts, we account the request for multipath IO stats using REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable in the middle of the request. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>	2022-12-06 09:17:01 +01:00
Sagi Grimberg	6887fc6495	nvme: introduce nvme_start_request In preparation for nvme-multipath IO stats accounting, we want the accounting to happen in a centralized place. The request completion is already centralized, but we need a common helper to request I/O start. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de>	2022-12-06 09:16:57 +01:00
Christophe JAILLET	99722c8aa8	nvme: use kstrtobool() instead of strtobool() strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Christoph Hellwig <hch@lst.de>	2022-12-06 09:16:56 +01:00
Christoph Hellwig	ba0718a6d6	nvme: don't call blk_mq_{,un}quiesce_tagset when ctrl->tagset is NULL The NVMe drivers support a mode where no tagset is allocated for the I/O queues and only the admin queue is usable. In that case ctrl->tagset is NULL and we must not call the block per-tagset quiesce helpers that dereference it. Fixes: `98d81f0df7` ("nvme: use blk_mq_[un]quiesce_tagset") Reported-by: Gerd Bayer <gbayer@linux.ibm.com> Reported-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Leng <lengchao@huawei.com>	2022-12-06 09:16:56 +01:00
Kemeng Shi	eea3e8b74a	blk-throttle: Use more suitable time_after check for update of slice_start There is no need to update tg->slice_start[rw] to start when they are equal already. So remove "eq" part of check before update slice_start. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-10-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:45:31 -07:00
Kemeng Shi	9c9f209d9d	blk-throttle: remove repeat check of elapsed time There is no need to check elapsed time from last upgrade for each node in hierarchy. Move this check before traversing as throtl_can_upgrade do to remove repeat check. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-9-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:45:11 -07:00
Kemeng Shi	e3031d4c7d	blk-throttle: remove incorrect comment for tg_last_low_overflow_time Function tg_last_low_overflow_time is called with intermediate node as following: throtl_hierarchy_can_downgrade throtl_tg_can_downgrade tg_last_low_overflow_time throtl_hierarchy_can_upgrade throtl_tg_can_upgrade tg_last_low_overflow_time throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse from leaf node to sub-root node and pass traversed intermediate node to tg_last_low_overflow_time. No such limit could be found from context and implementation of tg_last_low_overflow_time, so remove this limit in comment. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:45:05 -07:00
Kemeng Shi	009df34171	blk-throttle: fix typo in comment of throtl_adjusted_limit lapsed time -> elapsed time Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-7-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:59 -07:00
Kemeng Shi	a4d508e333	blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade Commit `c79892c557` ("blk-throttle: add upgrade logic for LIMIT_LOW state") added upgrade logic for low limit and methioned that 1. "To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit." 2. "If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed." Besides, we also determine that cgroup reaches low limit if low limit is 0, see comment in throtl_tg_can_upgrade. Collect the information above, the desgin of upgrade check is as following: 1.The low limit is reached if limit is zero or io is already queued. 2.Cgroup will pass upgrade check if low limits of READ and WRITE are both reached. Simpfy the check code described above to removce repeat check and improve readability. There is no functional change. Detail equivalence proof is as following: All replaced conditions to return true are as following: condition 1 (!read_limit && !write_limit) condition 2 read_limit && sq->nr_queued[READ] && (!write_limit \|\| sq->nr_queued[WRITE]) condition 3 write_limit && sq->nr_queued[WRITE] && (!read_limit \|\| sq->nr_queued[READ]) Transferring condition 2 as following: (read_limit && sq->nr_queued[READ]) && (!write_limit \|\| sq->nr_queued[WRITE]) is equivalent to (read_limit && sq->nr_queued[READ]) && (!write_limit \|\| (write_limit && sq->nr_queued[WRITE])) is equivalent to condition 2.1 (read_limit && sq->nr_queued[READ] && !write_limit) \|\| condition 2.2 (read_limit && sq->nr_queued[READ] && (write_limit && sq->nr_queued[WRITE])) Transferring condition 3 as following: write_limit && sq->nr_queued[WRITE] && (!read_limit \|\| sq->nr_queued[READ]) is equivalent to (write_limit && sq->nr_queued[WRITE]) && (!read_limit \|\| (read_limit && sq->nr_queued[READ])) is equivalent to condition 3.1 ((write_limit && sq->nr_queued[WRITE]) && !read_limit) \|\| condition 3.2 ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) Condition 3.2 is the same as condition 2.2, so all conditions we get to return are as following: (!read_limit && !write_limit) (1) (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1) ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1) ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) (2.2) As we can extract conditions "(a1 \|\| a2) && (b1 \|\| b2)" to: a1 && b1 a1 && b2 a2 && b1 ab && b2 Considering that: a1 = !read_limit a2 = read_limit && sq->nr_queued[READ] b1 = !write_limit b2 = write_limit && sq->nr_queued[WRITE] We can pack replaced conditions to (!read_limit \|\| (read_limit && sq->nr_queued[READ])) && (!write_limit \|\| (write_limit && sq->nr_queued[WRITE])) which is equivalent to (!read_limit \|\| sq->nr_queued[READ]) && (!write_limit \|\| sq->nr_queued[WRITE]) Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:48 -07:00
Kemeng Shi	183daeb11d	blk-throttle: correct calculation of wait time in tg_may_dispatch In C language, When executing "if (expression1 && expression2)" and expression1 return false, the expression2 may not be executed. For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is limited, tg_within_bps_limit will return false and tg_within_iops_limit will not be called. So even bps and iops are both limited, iops_wait will not be calculated and is always zero. So wait time of iops is always ignored. Fix this by always calling tg_within_bps_limit and tg_within_iops_limit to get wait time for both bps and iops. Observed that: 1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always be stored as wait argument is always passed. 2. wait time is stored to zero if iops/bps is limited otherwise non-zero is stored. Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument and return wait time directly. Caller tg_may_dispatch checks if wait time is zero to find if iops/bps is limited. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:42 -07:00
Kemeng Shi	eb18479182	blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios Ignore cgroup without io queued in blk_throtl_cancel_bios for two reasons: 1. Save cpu cycle for trying to dispatch cgroup which is no io queued. 2. Avoid non-consistent state that cgroup is inserted to service queue without THROTL_TG_PENDING set as tg_update_disptime will unconditional re-insert cgroup to service queue. If we are on the default hierarchy, IO dispatched from child in tg_dispatch_one_bio will trigger inserting cgroup to service queue without erase first and ruin the tree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:34 -07:00
Kemeng Shi	84aca0a7e0	blk-throttle: Fix that bps of child could exceed bps limited in parent Consider situation as following (on the default hierarchy): HDD \| root (bps limit: 4k) \| child (bps limit :8k) \| fio bs=8k Rate of fio is supposed to be 4k, but result is 8k. Reason is as following: Size of single IO from fio is larger than bytes allowed in one throtl_slice in child, so IOs are always queued in child group first. When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED is set and these IOs will not be limited by tg_within_bps_limit anymore. Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire tree. There patch has no influence on situation which is not on the default hierarchy as each group is a single root group without parent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:27 -07:00
Kemeng Shi	f56019aef3	blk-throttle: correct stale comment in throtl_pd_init On the default hierarchy (cgroup2), the throttle interface files don't exist in the root cgroup, so the ablity to limit the whole system by configuring root group is not existing anymore. In general, cgroup doesn't wanna be in the business of restricting resources at the system level, so correct the stale comment that we can limit whole system to we can only limit subtree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-05 13:44:19 -07:00
Jens Axboe	b147645148	Merge tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy into for-6.2/block Pull floppy fix from Denis: "Floppy patch for 6.2 The patch from Yuan Can fixes a memory leak in floppy init code. Signed-off-by: Denis Efremov <efremov@linux.com>" * tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy: floppy: Fix memory leak in do_floppy_init()	2022-12-04 08:54:19 -07:00
Yuan Can	f8ace2e304	floppy: Fix memory leak in do_floppy_init() A memory leak was reported when floppy_alloc_disk() failed in do_floppy_init(). unreferenced object 0xffff888115ed25a0 (size 8): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 8 bytes): 00 ac 67 5b 81 88 ff ff ..g[.... backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<00000000a87bfa9e>] blk_mq_realloc_tag_set_tags.part.0+0x6f/0x180 [<000000006f02e8b1>] blk_mq_alloc_tag_set+0x573/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810fc30540 (size 32): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<000000006b91eab4>] blk_mq_alloc_tag_set+0x393/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 If the floppy_alloc_disk() failed, disks of current drive will not be set, thus the lastest allocated set->tag cannot be freed in the error handling path. A simple call graph shown as below: floppy_module_init() floppy_init() do_floppy_init() for (drive = 0; drive < N_DRIVE; drive++) blk_mq_alloc_tag_set() blk_mq_alloc_tag_set_tags() blk_mq_realloc_tag_set_tags() # set->tag allocated floppy_alloc_disk() blk_mq_alloc_disk() # error occurred, disks failed to allocated ->out_put_disk: for (drive = 0; drive < N_DRIVE; drive++) if (!disks[drive][0]) # the last disks is not set and loop break break; blk_mq_free_tag_set() # the latest allocated set->tag leaked Fix this problem by free the set->tag of current drive before jump to error handling path. Cc: stable@vger.kernel.org Fixes: `302cfee150` ("floppy: use a separate gendisk for each media format") Signed-off-by: Yuan Can <yuancan@huawei.com> [efremov: added stable list, changed title] Signed-off-by: Denis Efremov <efremov@linux.com>	2022-12-04 18:03:41 +04:00
Greg Kroah-Hartman	85d6ce58e4	block: remove devnode callback from struct block_device_operations With the removal of the pktcdvd driver, there are no in-kernel users of the devnode callback in struct block_device_operations, so it can be safely removed. If it is needed for new block drivers in the future, it can be brought back. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221203140747.1942969-1-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-03 09:19:48 -07:00
Jens Axboe	368c7f1f8a	Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.2/block Pull MD fixes from Song: "This contains code cleanup by Christoph." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fold unbind_rdev_from_array into md_kick_rdev_from_array md: mark md_kick_rdev_from_array static md: remove lock_bdev / unlock_bdev	2022-12-02 12:36:37 -07:00
Greg Kroah-Hartman	f40eb99897	pktcdvd: remove driver. Way back in 2016 in commit `5a8b187c61` ("pktcdvd: mark as unmaintained and deprecated") this driver was marked as "will be removed soon". 5 years seems long enough to have it stick around after that, so finally remove the thing now. Reported-by: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Maier <balagi@justmail.de> Cc: Peter Osterlund <petero2@telia.com> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221202182758.1339039-1-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-02 12:35:30 -07:00
Christoph Hellwig	b5c1acf012	md: fold unbind_rdev_from_array into md_kick_rdev_from_array unbind_rdev_from_array is only called from md_kick_rdev_from_array, so merge it into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>	2022-12-02 11:21:01 -08:00
Christoph Hellwig	d57d9d6965	md: mark md_kick_rdev_from_array static md_kick_rdev_from_array is only used in md.c, so unexport it and mark the symbol static. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>	2022-12-02 11:21:01 -08:00
Christoph Hellwig	fb541ca4c3	md: remove lock_bdev / unlock_bdev These wrappers for blkdev_get / blkdev_put just horribly confuse the code with their odd naming. Remove them and improve the error unwinding in md_import_device with the now folded code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>	2022-12-02 11:21:01 -08:00
Yang Li	1d6df9d352	blk-cgroup: Fix some kernel-doc comments Make the description of @gendisk to @disk in blkcg_schedule_throttle() to clear the below warnings: block/blk-cgroup.c:1850: warning: Function parameter or member 'disk' not described in 'blkcg_schedule_throttle' block/blk-cgroup.c:1850: warning: Excess function parameter 'gendisk' description in 'blkcg_schedule_throttle' Fixes: `de185b56e8` ("blk-cgroup: pass a gendisk to blkcg_schedule_throttle") Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3338 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20221202011713.14834-1-yang.lee@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 18:22:12 -07:00
Shin'ichiro Kawasaki	d3a5738849	null_blk: support read-only and offline zone conditions In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 14:49:48 -07:00
Christoph Böhmwalder	677b367275	drbd: add context parameter to expect() macro Originally-from: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-6-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 11:33:49 -07:00
Christoph Böhmwalder	e3fa02d7d4	drbd: introduce drbd_ratelimit() Use call site specific ratelimit instead of one single static global. Also ratelimit ASSERTION messages generated by expect(). Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 11:33:49 -07:00
Christoph Böhmwalder	aa03469597	drbd: introduce dynamic debug Incorporate as many out-of-tree changes as possible without changing the genl API. Over the years, we restructured this several times, and also changed the log format. One breaking change is that DRBD 9 gained "implicit options", like a connection name. This cannot be replayed here without changing the API, so save it for later. Originally-from: Andreas Gruenbacher <agruen@linbit.com> Originally-from: Philipp Reisner <philipp.reisner@linbit.com> Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-4-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 11:33:49 -07:00
Christoph Böhmwalder	136160c173	drbd: split polymorph printk to its own file Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-3-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 11:33:49 -07:00
Christoph Böhmwalder	c3f8974198	drbd: unify how failed assertions are logged Unify how failed assertions from D_ASSERT() and expect() are logged. Originally-from: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 11:33:49 -07:00
Randy Dunlap	2e833c8c8c	block: bdev & blktrace: use consistent function doc. notation Use only one hyphen in kernel-doc notation between the function name and its short description. The is the documented kerenl-doc format. It also fixes the HTML presentation to be consistent with other functions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 09:16:46 -07:00
Kemeng Shi	7a88b1a826	blk-iocost: Correct comment in blk_iocost_init There is no iocg_pd_init function. The pd_alloc_fn function pointer of iocost policy is set with ioc_pd_init. Just correct it. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-6-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:13 -07:00
Kemeng Shi	6c31be320c	blk-iocost: Remove vrate member in struct ioc_now If we trace vtime_base_rate instead of vtime_rate, there is nowhere which accesses now->vrate except function ioc_now using now->vrate locally. Just remove it. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-5-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:12 -07:00
Kemeng Shi	63c9eac4b6	blk-iocost: Trace vtime_base_rate instead of vtime_rate Since commit `ac33e91e2d` ("blk-iocost: implement vtime loss compensation") rename original vtime_rate to vtime_base_rate and current vtime_rate is original vtime_rate with compensation. The current rate showed in tracepoint is mixed with vtime_rate and vtime_base_rate: 1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows vtime_base_rate. 2) In function iocg_activate shows vtime_rate by calling TRACE_IOCG_PATH(iocg_activate... 3) In function ioc_check_iocgs shows vtime_rate by calling TRACE_IOCG_PATH(iocg_idle... Trace vtime_base_rate instead of vtime_rate as: 1) Before commit `ac33e91e2d` ("blk-iocost: implement vtime loss compensation"), the traced rate is without compensation, so still show rate without compensation. 2) The vtime_base_rate is more stable while vtime_rate heavily depends on excess budeget on current period which may change abruptly in next period. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-4-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:12 -07:00
Kemeng Shi	c6d2efdd38	blk-iocost: Reset vtime_base_rate in ioc_refresh_params Since commit ac33e91e2daca("blk-iocost: implement vtime loss compensation") split vtime_rate into vtime_rate and vtime_base_rate, we need reset both vtime_base_rate and vtime_rate when device parameters are refreshed. If vtime_base_rate is no reset here, vtime_rate will be overwritten with old vtime_base_rate soon in ioc_refresh_vrate. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-3-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:12 -07:00
Kemeng Shi	ecaaaabeea	blk-iocost: Fix typo in comment soley -> solely Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-2-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:12 -07:00
Jan Kara	36369f46e9	block: Do not reread partition table on exclusively open device Since commit `10c70d95c0` ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: `10c70d95c0` ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz [axboe: fold in followup fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-01 07:44:03 -07:00
Pankaj Raghav	92a34c4617	virtio-blk: replace ida_simple[get\|remove] with ida_[alloc_range\|free] ida_simple[get\|remove] are deprecated, and are just wrappers to ida_[alloc_range\|free]. Replace ida_simple[get\|remove] with their corresponding counterparts. No functional changes. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-30 14:23:34 -07:00
Christoph Hellwig	63f93fd6fa	block: mark blk_put_queue as potentially blocking We can't just say that the last reference release may block, as any reference dropped could be the last one. So move the might_sleep() from blk_free_queue to blk_put_queue and update the documentation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-30 11:09:00 -07:00
Christoph Hellwig	2bd85221a6	block: untangle request_queue refcounting from sysfs The kobject embedded into the request_queue is used for the queue directory in sysfs, but that is a child of the gendisks directory and is intimately tied to it. Move this kobject to the gendisk and use a refcount_t in the request_queue for the actual request_queue refcounting that is completely unrelated to the device model. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-30 11:09:00 -07:00

1 2 3 4 5 ...

1136686 Commits