linux

Author	SHA1	Message	Date
Yu Kuai	8549674990	blk-throttle: remove THROTL_TG_HAS_IOPS_LIMIT Currently, "tg->has_rules" and "tg->flags & THROTL_TG_HAS_IOPS_LIMIT" both try to bypass bios that don't need to be throttled, however, they are a little redundant and both not perfect: 1) "tg->has_rules" only distinguish read and write, but not iops and bps limit. 2) "tg->flags & THROTL_TG_HAS_IOPS_LIMIT" only check if iops limit exist, read and write is not distinguished, and bps limit is not checked. tg->has_rules will extended to distinguish bps and iops in the following patch. There is no need to keep the flag. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921095309.1481289-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-24 08:59:43 -06:00
Tejun Heo	026e14a276	Merge branch 'for-6.0-fixes' into for-6.1 for-6.0 has the following fix for cgroup_get_from_id(). 836ac87d ("cgroup: fix cgroup_get_from_id") which conflicts with the following two commits in for-6.1. `4534dee9` ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id") `fa7e439c` ("cgroup: Homogenize cgroup_get_from_id() return value") While the resolution is straightforward, the code ends up pretty ugly afterwards. Let's pull for-6.0-fixes into for-6.1 so that the code can be fixed up there. Signed-off-by: Tejun Heo <tj@kernel.org>	2022-09-23 07:19:38 -10:00
Li Jinlin	9713a67067	block/blk-rq-qos: delete useless enmu RQ_QOS_IOPRIO Since blk-ioprio handing was converted from a rqos policy to a direct call, RQ_QOS_IOPRIO is not used anymore, just delete it. Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220916023241.32926-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 19:50:53 -06:00
Kanchan Joshi	c6e99ea482	block: export blk_rq_is_poll This is in preparation to support iopoll for nvme passthrough. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-4-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Yu Kuai	8c5035dfbb	blk-wbt: call rq_qos_add() after wb_normal is initialized Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline): t1: device create t2: issue io add_disk blk_register_queue wbt_enable_default wbt_init rq_qos_add // wb_normal is still 0 /* * in mainline, disk can't be opened before * bdev_add(), however, in old kernels, disk * can be opened before blk_register_queue(). */ blkdev_issue_flush // disk size is 0, however, it's not checked submit_bio_wait submit_bio blk_mq_submit_bio rq_qos_throttle wbt_wait bio_to_wbt_flags rwb_enabled // wb_normal is 0, inflight is not increased wbt_queue_depth_changed(&rwb->rqos); wbt_update_limits // wb_normal is initialized rq_qos_track wbt_track rq->wbt_flags \|= bio_to_wbt_flags(rwb, bio); // wb_normal is not 0，wbt_flags will be set t3: io completion blk_mq_free_request rq_qos_done wbt_done wbt_is_tracked // return true __wbt_done wbt_rqw_done atomic_dec_return(&rqw->inflight); // inflight is decreased commit `8235b5c1e8` ("block: call bdev_add later in device_add_disk") can avoid this problem, however it's better to fix this problem in wbt: 1) Lower kernel can't backport this patch due to lots of refactor. 2) Root cause is that wbt call rq_qos_add() before wb_normal is initialized. Fixes: `e34cbd3074` ("blk-wbt: add general throttling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 08:36:13 -06:00
Li zeming	a7609c68f7	blk-iocost: Remove unnecessary (void*) conversions The key pointer is void and hence does not need an explicit cast. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20220919012825.2936-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-20 08:28:30 -06:00
Christoph Hellwig	118f3663fb	block: remove PSI accounting from the bio layer PSI accounting is now done by the VM code, where it should have been since the beginning. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-20 08:24:38 -06:00
Ping-Xiang Chen	e88480871b	block: fix comment typo in submit_bio of block-core.c. This patch fix a comment typo in block-core.c. Signed-off-by: Ping-Xiang Chen <p.x.chen@uci.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220914074237.31621-1-p.x.chen@uci.edu Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-20 08:22:47 -06:00
Christoph Hellwig	4c66a326b5	Revert "block: freeze the queue earlier in del_gendisk" This reverts commit `a09b314005`. Dusty Mabe reported consistent hang during CoreOS shutdown with a MD RAID1 setup. Although apparently similar hangs happened before, and this patch most likely is not the root cause it made it much more severe. Revert it until we can figure out what is going on with the md driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220919144049.978907-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-20 08:15:44 -06:00
Linus Torvalds	68e777e44c	Merge tag 'block-6.0-2022-09-16' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Two fixes for -rc6: - Fix a mixup of sectors and bytes in the secure erase ioctl (Mikulas) - Fix for a bad return value for a non-blocking bio/blk queue enter call (me)" * tag 'block-6.0-2022-09-16' of git://git.kernel.dk/linux-block: blk-lib: fix blkdev_issue_secure_erase block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait	2022-09-16 06:58:04 -07:00
Mikulas Patocka	c4fa368466	blk-lib: fix blkdev_issue_secure_erase There's a bug in blkdev_issue_secure_erase. The statement "unsigned int len = min_t(sector_t, nr_sects, max_sectors);" sets the variable "len" to the length in sectors, but the statement "bio->bi_iter.bi_size = len" treats it as if it were in bytes. The statements "sector += len << SECTOR_SHIFT" and "nr_sects -= len << SECTOR_SHIFT" are thinko. This patch fixes it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v5.19 Fixes: `44abff2c0b` ("block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD") Link: https://lore.kernel.org/r/alpine.LRH.2.02.2209141549480.28100@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-15 00:25:17 -06:00
Stefan Roesch	56f99b8d06	block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait Today blk_queue_enter() and __bio_queue_enter() return -EBUSY for the nowait code path. This is not correct: they should return -EAGAIN instead. This problem was detected by fio. The following command exposed the above problem: t/io_uring -p0 -d128 -b4096 -s32 -c32 -F1 -B0 -R0 -X1 -n24 -P1 -u1 -O0 /dev/ng0n1 By applying the patch, the retry case is handled correctly in the slow path. Signed-off-by: Stefan Roesch <shr@fb.com> Fixes: `bfd343aa17` ("blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-13 15:06:39 -06:00
Yu Kuai	c013710e1a	blk-throttle: cleanup tg_update_disptime() tg_update_disptime() only need to adjust postion for 'tg' in 'parent_sq', there is no need to call throtl_enqueue/dequeue_tg(), since they will set/clear flag THROTL_TG_PENDING and increase/decrease nr_pending, which is useless. By the way, clear the flag/decrease nr_pending while there are still throttled bios is not good for debugging. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:20:08 -06:00
Yu Kuai	8c25ed0cb9	blk-throttle: calling throtl_dequeue/enqueue_tg in pairs It's a little weird to call throtl_dequeue_tg() unconditionally in throtl_select_dispatch(), since it will be called in tg_update_disptime() again if some bio is still throttled. Thus call it later if there are no throttled bio. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:20:07 -06:00
Yu Kuai	7e9c5c54d4	blk-throttle: use 'READ/WRITE' instead of '0/1' Make the code easier to read, like everywhere else. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:20:07 -06:00
Yu Kuai	a880ae93e5	blk-throttle: fix io hung due to configuration updates If new configuration is submitted while a bio is throttled, then new waiting time is recalculated regardless that the bio might already wait for some time: tg_conf_updated throtl_start_new_slice tg_update_disptime throtl_schedule_next_dispatch Then io hung can be triggered by always submmiting new configuration before the throttled bio is dispatched. Fix the problem by respecting the time that throttled bio already waited. In order to do that, add new fields to record how many bytes/io are waited, and use it to calculate wait time for throttled bio under new configuration. Some simple test: 1) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 2048" > blkio.throttle.write_bps_device { sleep 2 echo "8:0 1024" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct 2) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 1024" > blkio.throttle.write_bps_device { sleep 4 echo "8:0 2048" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct test results: io finish time before this patch with this patch 1) 10s 6s 2) 8s 6s Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:19:48 -06:00
Yu Kuai	681cd46fff	blk-throttle: factor out code to calculate ios/bytes_allowed No functional changes, new apis will be used in later patches to calculate wait time for throttled bios when new configuration is submitted. Noted this patch also rename tg_with_in_iops/bps_limit() to tg_within_iops/bps_limit(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:19:48 -06:00
Yu Kuai	8d6bbaada2	blk-throttle: prevent overflow while calculating wait time There is a problem found by code review in tg_with_in_bps_limit() that 'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by calling mul_u64_u64_div_u64() instead. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:19:48 -06:00
Yu Kuai	320fb0f91e	blk-throttle: fix that io throttle can only work for single bio Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX \|\| bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit `9f5ede3c01` ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit `111be88398` ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: `9f5ede3c01` ("block: throttle split bio in case of iops limit") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:19:48 -06:00
Keith Busch	4acb83417c	sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-12 00:10:34 -06:00
Eric Biggers	2d985f8c6b	vfs: support STATX_DIOALIGN on block devices Add support for STATX_DIOALIGN to block devices, so that direct I/O alignment restrictions are exposed to userspace in a generic way. Note that this breaks the tradition of stat operating only on the block device node, not the block device itself. However, it was felt that doing this is preferable, in order to make the interface useful and avoid needing separate interfaces for regular files and block devices. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220827065851.135710-3-ebiggers@kernel.org	2022-09-11 19:47:12 -05:00
Linus Torvalds	9ebc0ecb21	Merge tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - NVMe pull via Christoph: - fix a use after free in nvmet (Bart Van Assche) - fix a use after free when detecting digest errors (Sagi Grimberg) - fix regression that causes sporadic TCP requests to time out (Sagi Grimberg) - fix two off by ones errors in the nvmet ZNS support (Dennis Maisenbacher) - requeue aen after firmware activation (Keith Busch) - Fix missing request flags in debugfs code (me) - Partition scan fix (Ming) * tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-block: block: add missing request flags to debugfs code nvme: requeue aen after firmware activation nvmet: fix mar and mor off-by-one errors nvme-tcp: fix regression that causes sporadic requests to time out nvme-tcp: fix UAF when detecting digest errors nvmet: fix a use-after-free block: don't add partitions if GD_SUPPRESS_PART_SCAN is set	2022-09-09 15:03:08 -04:00
Jens Axboe	745ed37277	block: add missing request flags to debugfs code We're missing TIMED_OUT and RESV. Particularly the former is handy for debugging, let's get them added. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-09 05:57:52 -06:00
Miaohe Lin	bdb7d420c6	block: remove unneeded return value of bio_check_ro() bio_check_ro() always return false now. Remove this unneeded return value and cleanup the sole caller. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905102754.1942-1-linmiaohe@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-05 11:45:35 -06:00
Miaohe Lin	6d5e8d21e8	blk-mq: remove unneeded needs_restart check If code reaches here, needs_restart must be true. Remove this unneeded needs_restart check. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905101950.4606-1-linmiaohe@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-05 11:45:24 -06:00
Jiapeng Chong	91e5adda5c	block/blk-map: Remove set but unused variable 'added' The variable added is not effectively used in the function, so delete it. block/blk-map.c:273:16: warning: variable 'added' set but not used. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2049 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220905063253.120082-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-05 11:44:37 -06:00
Yu Kuai	2d8f7a3b9f	blk-throttle: clean up codes that can't be reached While doing code coverage testing while CONFIG_BLK_DEV_THROTTLING_LOW is disabled, we found that there are many codes can never be reached. This patch move such codes inside "#ifdef CONFIG_BLK_DEV_THROTTLING_LOW". Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220903062826.1099085-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-04 14:38:18 -06:00
Jens Axboe	bce1b56c73	Revert "sbitmap: fix batched wait_cnt accounting" This reverts commit `16ede66973`. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-04 06:39:25 -06:00
Ming Lei	748008e1da	block: don't add partitions if GD_SUPPRESS_PART_SCAN is set Commit `b9684a71fc` ("block, loop: support partitions without scanning") adds GD_SUPPRESS_PART_SCAN for replacing part function of GENHD_FL_NO_PART. But looks blk_add_partitions() is missed, since loop doesn't want to add partitions if GENHD_FL_NO_PART was set. And it causes regression on libblockdev (as called from udisks) which operates with the LO_FLAGS_PARTSCAN. Fixes the issue by not adding partitions if GD_SUPPRESS_PART_SCAN is set. Fixes: `b9684a71fc` ("block, loop: support partitions without scanning") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823103819.395776-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-03 11:29:03 -06:00
Jens Axboe	12c5b70c18	block: enable per-cpu bio caching for the fs bio set This is useful for polled IO on a file, or for polled IO with the io_uring passthrough mechanism. If bio allocations are done with REQ_POLLED for those cases, then initializing the bio set with BIOSET_PERCPU_CACHE enables the local per-cpu cache which eliminates allocations (and frees) of bio structs when possible. Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-02 13:03:33 -06:00
Keith Busch	16ede66973	sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 10:42:41 -06:00
Michal Koutný	fa7e439cf9	cgroup: Homogenize cgroup_get_from_id() return value Cgroup id is user provided datum hence extend its return domain to include possible error reason (similar to cgroup_get_from_fd()). This change also fixes commit `d4ccaf58a8` ("bpf: Introduce cgroup iter") that would use NULL instead of proper error handling in `d4ccaf58a8` ("bpf: Introduce cgroup iter"). Additionally, neither of: fc_appid_store, bpf_iter_attach_cgroup, mem_cgroup_get_from_ino (callers of cgroup_get_from_fd) is built without CONFIG_CGROUPS (depends via CONFIG_BLK_CGROUP, direct, transitive CONFIG_MEMCG respectively) transitive, so drop the singular definition not needed with !CONFIG_CGROUPS. Fixes: `d4ccaf58a8` ("bpf: Introduce cgroup iter") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-08-26 10:57:41 -10:00
Linus Torvalds	3e5c673f0d	Merge tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - MD pull request via Song: - Fix for clustered raid (Guoqing Jiang) - req_op fix (Bart Van Assche) - Fix race condition in raid recreate (David Sloan) - loop configuration overflow fix (Siddh) - Fix missing commit_rqs call for certain conditions (Yu) * tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block: md: call __md_stop_writes in md_stop Revert "md-raid: destroy the bitmap after destroying the thread" md: Flush workqueue md_rdev_misc_wq in md_alloc() md/raid10: Fix the data type of an r10_sync_page_io() argument loop: Check for overflow while configuring loop blk-mq: fix io hung due to missing commit_rqs	2022-08-26 11:05:54 -07:00
Jens Axboe	e88811bc43	block: use on-stack page vec for <= UIO_FASTIOV Avoid a kmalloc+kfree for each page array, if we only have a few pages that are mapped. An alloc+free for each IO is quite expensive, and it's pretty pointless if we're only dealing with 1 or a few vecs. Use UIO_FASTIOV like we do in other spots to set a sane limit for how big of an IO we want to avoid allocations for. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Jens Axboe	8af870aa5b	block: enable bio caching use for passthru IO bdev based polled O_DIRECT is currently quite a bit faster than passthru on the same device, and one of the reaons is that we're not able to use the bio caching for passthru IO. If REQ_POLLED is set on the request, use the fs bio set for grabbing a bio from the caches, if available. This saves 5-6% of CPU over head for polled passthru IO. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Jens Axboe	f5d632d15e	block: shrink rq_map_data a bit We don't need full ints for several of these members. Change the page_order and nr_entries to unsigned shorts, and the true/false from_user and null_mapped to booleans. This shrinks the struct from 32 to 24 bytes on 64-bit archs. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Yu Kuai	d322f355e9	block, bfq: remove useless parameter for bfq_add/del_bfqq_busy() 'bfqd' can be accessed through 'bfqq->bfqd', there is no need to pass it as a parameter separately. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220816015631.1323948-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Yu Kuai	1e3cc2125d	block, bfq: remove useless checking in bfq_put_queue() 'bfqq->bfqd' is ensured to set in bfq_init_queue(), and it will never change afterwards. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220816015631.1323948-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Yu Kuai	c2090eabac	block, bfq: remove unused functions While doing code coverage testing(CONFIG_BFQ_CGROUP_DEBUG is disabled), we found that some functions doesn't have caller, thus remove them. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220816015631.1323948-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:56 -06:00
Bart Van Assche	a4e1d0b76e	block: Change the return type of blk_mq_map_queues() into void Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 10:07:53 -06:00
dougmill@linux.vnet.ibm.com	c6ea706042	block: sed-opal: Add ioctl to return device status Provide a mechanism to retrieve basic status information about the device, including the "supported" flag indicating whether SED-OPAL is supported. The information returned is from the various feature descriptors received during the discovery0 step, and so this ioctl does nothing more than perform the discovery0 step and then save the information received. See "struct opal_status" and OPAL_FL_* bits for the status information currently returned. This is necessary to be able to check whether a device is OPAL enabled, set up, locked or unlocked from userspace programs like systemd-cryptsetup and libcryptsetup. Right now we just have to assume the user 'knows' or blindly attempt setup/lock/unlock operations. Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Scott Bauer <sbauer@plzdonthack.me> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816140713.84893-1-luca.boccassi@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-22 07:52:51 -06:00
Linus Torvalds	b9bce6e553	Merge tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes that should go into this release: - Small series of patches for ublk (ZiyangZhang) - Remove dead function (Yu) - Fix for running a block queue in case of resource starvation (Yufen)" * tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block: blk-mq: run queue no matter whether the request is the last request blk-mq: remove unused function blk_mq_queue_stopped() ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work ublk_drv: update comment for __ublk_fail_req() ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work() ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA	2022-08-20 10:17:05 -07:00
Yu Kuai	65fac0d54f	blk-mq: fix io hung due to missing commit_rqs Currently, in virtio_scsi, if 'bd->last' is not set to true while dispatching request, such io will stay in driver's queue, and driver will wait for block layer to dispatch more rqs. However, if block layer failed to dispatch more rq, it should trigger commit_rqs to inform driver. There is a problem in blk_mq_try_issue_list_directly() that commit_rqs won't be called: // assume that queue_depth is set to 1, list contains two rq blk_mq_try_issue_list_directly blk_mq_request_issue_directly // dispatch first rq // last is false __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // succeed to get first budget __blk_mq_issue_directly scsi_queue_rq cmd->flags \|= SCMD_LAST virtscsi_queuecommand kick = (sc->flags & SCMD_LAST) != 0 // kick is false, first rq won't issue to disk queued++ blk_mq_request_issue_directly // dispatch second rq __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // failed to get second budget ret == BLK_STS_RESOURCE blk_mq_request_bypass_insert // errors is still 0 if (!list_empty(list) \|\| errors && ...) // won't pass, commit_rqs won't be called In this situation, first rq relied on second rq to dispatch, while second rq relied on first rq to complete, thus they will both hung. Fix the problem by also treat 'BLK_STS_RESOURCE' as 'errors' since it means that request is not queued successfully. Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_RESOURCE' can't be treated as 'errors' here, fix the problem by calling commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'. Fixes: `d666ba98f8` ("blk-mq: add mq_ops->commit_rqs()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-19 20:45:58 +00:00
Yufen Yu	d3b3859687	blk-mq: run queue no matter whether the request is the last request We do test on a virtio scsi device (/dev/sda) and the default mq scheduler is 'none'. We found a IO hung as following: blk_finish_plug blk_mq_plug_issue_direct scsi_mq_get_budget //get budget_token fail and sdev->restarts=1 scsi_end_request scsi_run_queue_async //sdev->restart=0 and run queue blk_mq_request_bypass_insert //add request to hctx->dispatch list //continue to dispath plug list blk_mq_dispatch_plug_list blk_mq_try_issue_list_directly //success issue all requests from plug list After .get_budget fail, scsi_mq_get_budget will increase 'restarts'. Normally, it will run hw queue when io complete and set 'restarts' as 0. But if we run queue before adding request to the dispatch list and blk_mq_dispatch_plug_list also success issue all requests, then on one will run queue, and the request will be stall in the dispatch list and cannot complete forever. It is wrong to use last request of plug list to decide if run queue is needed since all the remained requests in plug list may be from other hctxs. To fix the bug, pass run_queue as true always to blk_mq_request_bypass_insert(). Fix-suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Fixes: `dc5fc361d8` ("block: attempt direct issue of plug list") Link: https://lore.kernel.org/r/20220803023355.3687360-1-yuyufen@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-18 07:39:01 -06:00
Yu Kuai	a8239f0342	blk-mq: remove unused function blk_mq_queue_stopped() blk_mq_queue_stopped() doesn't have any caller, which was found by code coverage test, thus remove it. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220818063555.3741222-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-18 07:38:10 -06:00
Linus Torvalds	abe7a481aa	Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - NVMe pull request - print nvme connect Linux error codes properly (Amit Engel) - fix the fc_appid_store return value (Christoph Hellwig) - fix a typo in an error message (Christophe JAILLET) - add another non-unique identifier quirk (Dennis P. Kliem) - check if the queue is allocated before stopping it in nvme-tcp (Maurizio Lombardi) - restart admin queue if the caller needs to restart queue in nvme-fc (Ming Lei) - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang Xiaoxu) - __alloc_disk_node() error handling fix (Rafael) * tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block: block: Do not call blk_put_queue() if gendisk allocation fails nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70 nvme-tcp: check if the queue is allocated before stopping it nvme-fabrics: Fix a typo in an error message nvme-fabrics: parse nvme connect Linux error codes nvmet-auth: use kmemdup instead of kmalloc + memcpy nvme-fc: fix the fc_appid_store return value nvme-fc: restart admin queue if the caller needs to restart queue	2022-08-13 13:37:36 -07:00
Rafael Mendonca	aa0c680c3a	block: Do not call blk_put_queue() if gendisk allocation fails Commit `6f8191fdf4` ("block: simplify disk shutdown") removed the call to blk_get_queue() during gendisk allocation but missed to remove the corresponding cleanup code blk_put_queue() for it. Thus, if the gendisk allocation fails, the request_queue refcount gets decremented and reaches 0, causing blk_mq_release() to be called with a hctx still alive. That triggers a WARNING report, as found by syzkaller: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 23016 at block/blk-mq.c:3881 blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881 [...] stripped RIP: 0010:blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881 [...] stripped Call Trace: <TASK> blk_release_queue+0x153/0x270 block/blk-sysfs.c:780 kobject_cleanup lib/kobject.c:673 [inline] kobject_release lib/kobject.c:704 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1c8/0x540 lib/kobject.c:721 __alloc_disk_node+0x4f7/0x610 block/genhd.c:1388 __blk_mq_alloc_disk+0x13b/0x1f0 block/blk-mq.c:3961 loop_add+0x3e2/0xaf0 drivers/block/loop.c:1978 loop_control_ioctl+0x133/0x620 drivers/block/loop.c:2150 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] stripped Fixes: `6f8191fdf4` ("block: simplify disk shutdown") Reported-by: syzbot+31c9594f6e43b9289b25@syzkaller.appspotmail.com Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220811232338.254673-1-rafaelmendsr@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-12 06:42:06 -06:00
Al Viro	480cb846c2	block: convert to advancing variants of iov_iter_get_pages{,_alloc}() ... doing revert if we end up not using some pages Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:22 -04:00
Al Viro	fcb14cb1bd	new iov_iter flavour - ITER_UBUF Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:15 -04:00
Linus Torvalds	fa9db655d0	Merge tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: - NVMe pull requests via Christoph: - add support for In-Band authentication (Hannes Reinecke) - handle the persistent internal error AER (Michael Kelley) - use in-capsule data for TCP I/O queue connect (Caleb Sander) - remove timeout for getting RDMA-CM established event (Israel Rukshin) - misc cleanups (Joel Granados, Sagi Grimberg, Chaitanya Kulkarni, Guixin Liu, Xiang wangx) - use command_id instead of req->tag in trace_nvme_complete_rq() (Bean Huo) - various fixes for the new authentication code (Lukas Bulwahn, Dan Carpenter, Colin Ian King, Chaitanya Kulkarni, Hannes Reinecke) - small cleanups (Liu Song, Christoph Hellwig) - restore compat_ioctl support (Nick Bowler) - make a nvmet-tcp workqueue lockdep-safe (Sagi Grimberg) - enable generic interface (/dev/ngXnY) for unknown command sets (Joel Granados, Christoph Hellwig) - don't always build constants.o (Christoph Hellwig) - print the command name of aborted commands (Christoph Hellwig) - MD pull requests via Song: - Improve raid5 lock contention, by Logan Gunthorpe. - Misc fixes to raid5, by Logan Gunthorpe. - Fix race condition with md_reap_sync_thread(), by Guoqing Jiang. - Fix potential deadlock with raid5_quiesce and raid5_get_active_stripe, by Logan Gunthorpe. - Refactoring md_alloc(), by Christoph" - Fix md disk_name lifetime problems, by Christoph Hellwig - Convert prepare_to_wait() to wait_woken() api, by Logan Gunthorpe; - Fix sectors_to_do bitmap issue, by Logan Gunthorpe. - Work on unifying the null_blk module parameters and configfs API (Vincent) - drbd bitmap IO error fix (Lars) - Set of rnbd fixes (Guoqing, Md Haris) - Remove experimental marker on bcache async device registration (Coly) - Series from cleaning up the bio splitting (Christoph) - Removal of the sx8 block driver. This hardware never really widespread, and it didn't receive a lot of attention after the initial merge of it back in 2005 (Christoph) - A few fixes for s390 dasd (Eric, Jiang) - Followup set of fixes for ublk (Ming) - Support for UBLK_IO_NEED_GET_DATA for ublk (ZiyangZhang) - Fixes for the dio dma alignment (Keith) - Misc fixes and cleanups (Ming, Yu, Dan, Christophe * tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block: (136 commits) s390/dasd: Establish DMA alignment s390/dasd: drop unexpected word 'for' in comments ublk_drv: add support for UBLK_IO_NEED_GET_DATA ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATA ublk_drv: cleanup ublksrv_ctrl_dev_info ublk_drv: add SET_PARAMS/GET_PARAMS control command ublk_drv: fix ublk device leak in case that add_disk fails ublk_drv: cancel device even though disk isn't up block: fix leaking page ref on truncated direct io block: ensure bio_iov_add_page can't fail block: ensure iov_iter advances for added pages drivers:md:fix a potential use-after-free bug md/raid5: Ensure batch_last is released before sleeping for quiesce md/raid5: Move stripe_request_ctx up md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage() md/raid5: Make is_inactive_blocked() helper md/raid5: Refactor raid5_get_active_stripe() block: pass struct queue_limits to the bio splitting helpers block: move bio_allowed_max_sectors to blk-merge.c block: move the call to get_max_io_size out of blk_bio_segment_split ...	2022-08-04 20:00:14 -07:00

1 2 3 4 5 ...

6455 Commits