linux

Author	SHA1	Message	Date
Keith Busch	266b652c65	nvmet: implement endurance groups Most of the returned information is just stubbed data. The target must support these in order to report rotational media. Since this driver doesn't know any better, each namespace is its own endurance group with the engid value matching the nsid. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:49 -08:00
Keith Busch	81ee2f2811	nvmet: declare 2.1 version compliance The target driver implements all the mandatory logs, identifications, features, and properties up to nvme sepcification 2.1. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:49 -08:00
Keith Busch	1e058089d2	nvmet: implement crto property This property is required for nvme 2.1. The target only supports ready with media, so this is just the same value as CAP.TO. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:49 -08:00
Keith Busch	e973c91727	nvmet: implement supported features log This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:49 -08:00
Keith Busch	83acb24e6d	nvmet: implement supported log pages This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:48 -08:00
Keith Busch	61c9967cd6	nvmet: implement active command set ns list This is required for nvme 2.1 for targets that support multiple command sets. We support NVM and ZNS, so are required to support this identification. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:48 -08:00
Keith Busch	64a51080ea	nvmet: implement id ns for nvm command set We don't report anything here, but it's a mandatory identification for nvme 2.1. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:48 -08:00
Guixin Liu	5a47c2080a	nvmet: support reservation feature This patch implements the reservation feature, including: 1. reservation register(register, unregister and replace). 2. reservation acquire(acquire, preempt, preempt and abort). 3. reservation release(release and clear). 4. reservation report. 5. set feature and get feature of reservation notify mask. 6. get log page of reservation event. Not supported: 1. persistent reservation through power loss. Test cases: Use nvme-cli and fio to test all implemented sub features: 1. use nvme resv-register to register host a registrant or unregister or replace a new key. 2. use nvme resv-acquire to set host to the holder, and use fio to send read and write io in all reservation type. And also test preempt and "preempt and abort". 3. use nvme resv-report to show all registrants and reservation status. 4. use nvme resv-release to release all registrants. 5. use nvme get-log to get events generated by the preceding operations. In addition, make reservation configurable, one can set ns to support reservation before enable ns. The default of resv_enable is false. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-11 09:49:48 -08:00
Guixin Liu	1900e1a449	nvme: add reservation command's defines This is a preparation patch for NVMeOF target reservation commands implantation. Add the defines of reservation command, such as reservation log and sub operations. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Tested-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-07 11:48:46 -08:00
Chaitanya Kulkarni	43d5d3b417	nvme-core: remove repeated wq flags In nvme_core_init() nvme_wq, nvme_reset_wq, nvme_delete_wq share same flags :- WQ_UNBOUND \| WQ_MEM_RECLAIM \| WQ_SYSFS. Insated of repeating these flags in each call use the common variable. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-05 08:43:24 -08:00
Guixin Liu	c74649b6e4	nvmet: make nvmet_wq visible in sysfs In some complex scenarios, we deploy multiple tasks on a single machine (hybrid deployment), such as Docker containers for function computation (background processing), real-time tasks, monitoring, event handling, and management, along with an NVMe target server. Each of these components is restricted to its own CPU cores to prevent mutual interference and ensure strict isolation. To achieve this level of isolation for nvmet_wq we need to use sysfs tunables such as cpumask that are currently not accessible. Add WQ_SYSFS flag to alloc_workqueue() when creating nvmet_wq so workqueue tunables are exported in the userspace via sysfs. with this patch :- nvme (nvme-6.13) # ls /sys/devices/virtual/workqueue/nvmet-wq/ affinity_scope affinity_strict cpumask max_active nice per_cpu power subsystem uevent Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-05 08:36:18 -08:00
Christoph Hellwig	63a5c7a4b4	nvme-pci: use dma_alloc_noncontigous if possible Use dma_alloc_noncontigous to allocate a single IOVA-contigous segment when backed by an IOMMU. This allow to easily use bigger segments and avoids running into segment limits if we can avoid it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-05 07:54:34 -08:00
Christoph Hellwig	3c2fb1ca80	nvme-pci: fix freeing of the HMB descriptor table The HMB descriptor table is sized to the maximum number of descriptors that could be used for a given device, but __nvme_alloc_host_mem could break out of the loop earlier on memory allocation failure and end up using less descriptors than planned for, which leads to an incorrect size passed to dma_free_coherent. In practice this was not showing up because the number of descriptors tends to be low and the dma coherent allocator always allocates and frees at least a page. Fixes: `87ad72a59a` ("nvme-pci: implement host memory buffer support") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-11-05 07:54:34 -08:00
Pavel Begunkov	5e52f71f85	nvme: use helpers to access io_uring cmd space Command implementations shouldn't be directly looking into io_uring_cmd to carve free space. Use an io_uring helper, which will also do build time size sanitisation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-10-22 07:28:36 -07:00
Jens Axboe	fdad1a20cd	Merge branch 'for-6.13/block-atomic' into for-6.13/block Merge in block/fs prep patches for the atomic write support. * for-6.13/block-atomic: block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()	2024-10-22 08:21:51 -06:00
Li Lingfeng	919b5139bd	block: flush all throttled bios when deleting the cgroup When a process migrates to another cgroup and the original cgroup is deleted, the restrictions of throttled bios cannot be removed. If the restrictions are set too low, it will take a long time to complete these bios. Refer to the process of deleting a disk to remove the restrictions and issue bios when deleting the cgroup. This makes difference on the behavior of throttled bios: Before: the limit of the throttled bios can't be changed and the bios will complete under this limit; Now: the limit will be canceled and the throttled bios will be flushed immediately. References: [1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com [2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240817071108.1919729-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:43 -06:00
Muchun Song	96a9fe64bf	block: fix ordering between checking BLK_MQ_S_STOPPED request adding Supposing first scenario with a virtio_blk driver. CPU0 CPU1 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load return __blk_mq_sched_dispatch_requests() Supposing another scenario. CPU0 CPU1 blk_mq_requeue_work() blk_mq_insert_request() 1) store virtblk_done() blk_mq_start_stopped_hw_queue() blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load continue blk_mq_run_hw_queue() Both scenarios are similar, the full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request. The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver. Fixes: `320ae51fee` ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-4-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Muchun Song	6bda857bcb	block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding Supposing the following scenario. CPU0 CPU1 blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests() The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation. So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained. Fixes: `f4560ffe8c` ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Muchun Song	2003ee8a9a	block: fix missing dispatching request when queue is started or unquiesced Supposing the following scenario with a virtio_blk driver. CPU0 CPU1 CPU2 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_try_issue_directly() if (blk_mq_hctx_stopped()) blk_mq_request_bypass_insert() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_insert_request() return After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped. But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt of completion of request, so it will run the hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1 and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware queue and returns. It misses dispatching a request. Fix it by running the hardware queue explicitly. And blk_mq_request_issue_directly() should handle a similar situation. Fix it as well. Fixes: `d964f04a8f` ("blk-mq: fix direct issue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-2-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Xiuhong Wang	732312e183	Revert "blk-throttle: Fix IO hang for a corner case" This reverts commit `5b7048b897`. The main purpose of this patch is cleanup. The throtl_adjusted_limit function was removed after commit `bf20ab538c` ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW"), so the problem of not being able to scale after setting bps or iops to 1 will not occur. So revert this commit that bps/iops can be set to 1. Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241016024508.3340330-1-xiuhong.wang@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Julia Lawall	28878733ca	block: replace call_rcu by kfree_rcu for simple kmem_cache_free callback Since SLOB was removed and since commit `6c6c47b063` ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20241013201704.49576-10-Julia.Lawall@inria.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Greg Joyce	b21d948f4c	block: sed-opal: add ioctl IOC_OPAL_SET_SID_PW After a SED drive is provisioned, there is no way to change the SID password via the ioctl() interface. A new ioctl IOC_OPAL_SET_SID_PW will allow the password to be changed. The valid current password is required. Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240829175639.6478-2-gjoyce@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Uday Shankar	69f407ee8d	Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-6-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:40 -06:00
Uday Shankar	59eaa01ce7	ublk: support device recovery without I/O queueing ublk currently supports the following behaviors on ublk server exit: A: outstanding I/Os get errors, subsequently issued I/Os get errors B: outstanding I/Os get errors, subsequently issued I/Os queue C: outstanding I/Os get reissued, subsequently issued I/Os queue and the following behaviors for recovery of preexisting block devices by a future incarnation of the ublk server: 1: ublk devices stopped on ublk server exit (no recovery possible) 2: ublk devices are recoverable using start/end_recovery commands The userspace interface allows selection of combinations of these behaviors using flags specified at device creation time, namely: default behavior: A + 1 UBLK_F_USER_RECOVERY: B + 2 UBLK_F_USER_RECOVERY\|UBLK_F_USER_RECOVERY_REISSUE: C + 2 The behavior A + 2 is currently unsupported. Add support for this behavior under the new flag combination UBLK_F_USER_RECOVERY\|UBLK_F_USER_RECOVERY_FAIL_IO. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-5-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:37 -06:00
Uday Shankar	27b5d4170c	ublk: merge stop_work and quiesce_work Save some lines by merging stop_work and quiesce_work into nosrv_work, which looks at the recovery flags and does the right thing when the "no ublk server" condition is detected. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-4-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:37 -06:00
Uday Shankar	3b939b8f71	ublk: refactor recovery configuration flag helpers ublk currently supports the following behaviors on ublk server exit: A: outstanding I/Os get errors, subsequently issued I/Os get errors B: outstanding I/Os get errors, subsequently issued I/Os queue C: outstanding I/Os get reissued, subsequently issued I/Os queue and the following behaviors for recovery of preexisting block devices by a future incarnation of the ublk server: 1: ublk devices stopped on ublk server exit (no recovery possible) 2: ublk devices are recoverable using start/end_recovery commands The userspace interface allows selection of combinations of these behaviors using flags specified at device creation time, namely: default behavior: A + 1 UBLK_F_USER_RECOVERY: B + 2 UBLK_F_USER_RECOVERY\|UBLK_F_USER_RECOVERY_REISSUE: C + 2 We can't easily change the userspace interface to allow independent selection of one of {A, B, C} and one of {1, 2}, but we can refactor the internal helpers which test for the flags. Replace the existing helpers with the following set: ublk_nosrv_should_reissue_outstanding: tests for behavior C ublk_nosrv_[dev_]should_queue_io: tests for behavior B ublk_nosrv_should_stop_dev: tests for behavior 1 Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-3-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:37 -06:00
Uday Shankar	d00c0ea179	ublk: check recovery flags for validity Setting UBLK_F_USER_RECOVERY_REISSUE without also setting UBLK_F_USER_RECOVERY is currently silently equivalent to not setting any recovery flags at all, even though that's obviously not intended. Check for this case and fail add_dev (with a paranoid warning to aid debugging any program which might rely on the old behavior) with EINVAL if it is detected. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-2-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:37 -06:00
Keith Busch	110234da18	block: enable passthrough command statistics Applications using the passthrough interfaces for IO want to continue seeing the disk stats. These requests had been fenced off from this block layer feature. While the block layer doesn't necessarily know what a passthrough command does, we do know the data size and direction, which is enough to account for the command's stats. Since tracking these has the potential to produce unexpected results, the passthrough stats are locked behind a new queue flag that needs to be enabled with the /sys/block/<dev>/queue/iostats_passthrough attribute. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241007153236.2818562-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:32 -06:00
Christoph Hellwig	d51c9cdfc2	block: return void from the queue_sysfs_entry load_module method Requesting a module either succeeds or does nothing, return an error from this method does not make sense. Also move the load_module after the store method in the struct declaration to keep the important show and store methods together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20241008050841.104602-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:16:22 -06:00
Konstantin Khlebnikov	758737d86f	block: add partition uuid into uevent as "PARTUUID" Both most common formats have uuid in addition to partition name: GPT: standard uuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx DOS: 4 byte disk signature and 1 byte partition xxxxxxxx-xx Tools from util-linux use the same notation for them. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Kyle Fortin <kyle.fortin@oracle.com> [dianders: rebased to modern kernels] Signed-off-by: Douglas Anderson <dianders@google.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241004171340.v2.1.I938c91d10e454e841fdf5d64499a8ae8514dc004@changeid Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:15:17 -06:00
Jens Axboe	746fc7e9d4	block: move issue side time stamping to blk_account_io_start() It's known needed at that point, and it's cleaner to just assign it there rather than rely on it being reliably set before hitting the IO accounting. Hence, move it out of blk_mq_rq_time_init(), which is now only doing the allocation side timing. While at it, get rid of the '0' time passing to blk_mq_rq_time_init(), just pass in blk_time_get_ns() for the two cases where 0 is being explicitly passed in. The rest pass in the previously cached allocation time. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:15:17 -06:00
Jens Axboe	148e6968f6	block: set issue time stamp based on queue state A previous commit moved RQF_IO_STAT into blk_account_io_done(), where it's being set rather than at allocation time. Unfortunately we do check for that flag in blk_mq_rq_time_init(), and hence setting the start_time_ns wasn't being done. This lead to unwieldy inflight IO counts and times, as IO completion accounting would a 0 value rather than the issue time for it's subtraction math. Fix this by switching the blk_mq_rq_time_init() check to use the queue state rather than the request state. Fixes: b8f762400ae8 ("block: move iostat check into blk_acount_io_start()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202410062110.512391df-oliver.sang@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:15:17 -06:00
Christian Marangi	f7a4b3438c	dt-bindings: mmc: Document support for partition table in mmc-card Document support for defining a partition table in the mmc-card node. This is needed if the eMMC doesn't have a partition table written and the bootloader of the device load data by using absolute offset of the block device. This is common on embedded device that have eMMC installed to save space and have non removable block devices. If an OF partition table is detected, any partition table written in the eMMC will be ignored and won't be parsed. eMMC provide a generic disk for user data and if supported (JEDEC 4.4+) also provide two additional disk ("boot1" and "boot2") for special usage of boot operation where normally is stored the bootloader or boot info. New JEDEC version also supports up to 4 GP partition for other usage called "gp1", "gp2", "gp3", "gp4". Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20241002221306.4403-7-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:15:17 -06:00
Christian Marangi	2e3a191e89	block: add support for partition table defined in OF Add support for partition table defined in Device Tree. Similar to how it's done with MTD, add support for defining a fixed partition table in device tree. A common scenario for this is fixed block (eMMC) embedded devices that have no MBR or GPT partition table to save storage space. Bootloader access the block device with absolute address of data. This is to complete the functionality with an equivalent implementation with providing partition table with bootargs, for case where the booargs can't be modified and tweaking the Device Tree is the only solution to have an usabe partition table. The implementation follow the fixed-partitions parser used on MTD devices where a "partitions" node is expected to be declared with "fixed-partitions" compatible in the OF node of the disk device (mmc-card for eMMC for example) and each child node declare a label and a reg with offset and size. If label is not declared, the node name is used as fallback. Eventually is also possible to declare the read-only property to flag the partition as read-only. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241002221306.4403-6-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Christian Marangi	3ec7cb11bb	mmc: block: attach partitions fwnode if found in mmc-card Attach partitions fwnode if found in mmc-card and register disk with it. This permits block partition to reference the node and register a partition table defined in DT for the special case for embedded device that doesn't have a partition table flashed but have an hardcoded partition table passed from the system. JEDEC BOOT partition boot0/boot1 are supported but in DT we refer with the JEDEC name of boot1 and boot2 to better adhere to documentation. Also JEDEC GP partition gp0/1/2/3 are supported but in DT we refer with the JEDEC name of gp1/2/3/4 to better adhere to documentration. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20241002221306.4403-5-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Christian Marangi	9dfd9ea93a	block: introduce add_disk_fwnode() Introduce add_disk_fwnode() as a replacement of device_add_disk() that permits to pass and attach a fwnode to disk dev. This variant can be useful for eMMC that might have the partition table for the disk defined in DT. A parser can later make use of the attached fwnode to parse the related table and init the hardcoded partition for the disk. device_add_disk() is converted to a simple wrapper of add_disk_fwnode() with the fwnode entry set as NULL. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241002221306.4403-4-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Christian Marangi	592e4deeab	docs: block: Document support for read-only partition in cmdline part Document support for read-only partition in cmdline partition for block devices by appending "ro" after the (partition name). Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20241002221306.4403-3-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Christian Marangi	ba40f4c590	block: add support for defining read-only partitions Add support for defining read-only partitions and complete support for it in the cmdline partition parser as the additional "ro" after a partition is scanned but never actually applied. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241002221306.4403-2-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Jens Axboe	e3569ecae4	block: kill blk_do_io_stat() helper It's now just checking whether or not RQF_IO_STAT is set, so let's get rid of it and just open-code the specific flag that is being checked. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Jens Axboe	fd0a63bcda	block: remove 'req->part' check for stats accounting If RQF_IO_STAT is set, then accounting is enabled. There's no need to further gate this on req->part being set or not, RQF_IO_STAT should never be set if accounting is not being done for this request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Jens Axboe	2c50ec98fc	block: remove redundant passthrough check in blk_mq_need_time_stamp() Simply checking the rq_flags is enough to determine if accounting is being done for this request. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Jens Axboe	8933805623	block: move iostat check into blk_acount_io_start() Rather than have blk_do_io_stat() check for both RQF_IO_STAT and whether the request is a passthrough requests every time, move both of those checks into blk_account_io_start(). Then blk_do_io_stat() can be reduced to just checking for RQF_IO_STAT. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-22 08:14:56 -06:00
Linus Torvalds	42f7652d3e	Linux 6.12-rc4 v6.12-rc4	2024-10-20 15:19:38 -07:00
Linus Torvalds	d7f513ae7b	Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Pull bluetooth fixes from Luiz Augusto Von Dentz: - ISO: Fix multiple init when debugfs is disabled - Call iso_exit() on module unload - Remove debugfs directory on module init failure - btusb: Fix not being able to reconnect after suspend - btusb: Fix regression with fake CSR controllers 0a12:0001 - bnep: fix wild-memory-access in proto_unregister Note: normally the bluetooth fixes go through the networking tree, but this missed the weekly merge, and two of the commits fix regressions that have caused a fair amount of noise and have now hit stable too: https://lore.kernel.org/all/4e1977ca-6166-4891-965e-34a6f319035f@leemhuis.info/ So I'm pulling it directly just to expedite things and not miss yet another -rc release. This is not meant to become a new pattern. * tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001 Bluetooth: bnep: fix wild-memory-access in proto_unregister Bluetooth: btusb: Fix not being able to reconnect after suspend Bluetooth: Remove debugfs directory on module init failure Bluetooth: Call iso_exit() on module unload Bluetooth: ISO: Fix multiple init when debugfs is disabled	2024-10-20 14:08:17 -07:00
Linus Torvalds	dd4f50373e	Merge tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Mostly error path fixes, but one pretty serious interrupt problem in the Ocelot driver as well: - Fix two error paths and a missing semicolon in the Intel driver - Add a missing ACPI ID for the Intel Panther Lake - Check return value of devm_kasprintf() in the Apple and STM32 drivers - Add a missing mutex_destroy() in the aw9523 driver - Fix a double free in cv1800_pctrl_dt_node_to_map() in the Sophgo driver - Fix a double free in ma35_pinctrl_dt_node_to_map_func() in the Nuvoton driver - Fix a bug in the Ocelot interrupt handler making the system hang" * tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: ocelot: fix system hang on level based interrupts pinctrl: nuvoton: fix a double free in ma35_pinctrl_dt_node_to_map_func() pinctrl: sophgo: fix double free in cv1800_pctrl_dt_node_to_map() pinctrl: intel: platform: Add Panther Lake to the list of supported pinctrl: aw9523: add missing mutex_destroy pinctrl: stm32: check devm_kasprintf() returned value pinctrl: apple: check devm_kasprintf() returned value pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment pinctrl: intel: platform: fix error path in device_for_each_child_node()	2024-10-20 13:55:46 -07:00
Linus Torvalds	c55228220d	Merge tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc driver fixes from Greg KH: "Here are a number of small char/misc/iio driver fixes for 6.12-rc4: - loads of small iio driver fixes for reported problems - parport driver out-of-bounds fix - Kconfig description and MAINTAINERS file updates All of these, except for the Kconfig and MAINTAINERS file updates have been in linux-next all week. Those other two are just documentation changes and will have no runtime issues and were merged on Friday" * tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (39 commits) misc: rtsx: list supported models in Kconfig help MAINTAINERS: Remove some entries due to various compliance requirements. misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for OTP device misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for EEPROM device parport: Proper fix for array out-of-bounds access iio: frequency: admv4420: fix missing select REMAP_SPI in Kconfig iio: frequency: {admv4420,adrf6780}: format Kconfig entries iio: adc: ad4695: Add missing Kconfig select iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency() iioc: dac: ltc2664: Fix span variable usage in ltc2664_channel_config() iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig iio: dac: ltc1660: add missing select REGMAP_SPI in Kconfig iio: dac: ad5770r: add missing select REGMAP_SPI in Kconfig iio: amplifiers: ada4250: add missing select REGMAP_SPI in Kconfig iio: frequency: adf4377: add missing select REMAP_SPI in Kconfig iio: resolver: ad2s1210: add missing select (TRIGGERED_)BUFFER in Kconfig iio: resolver: ad2s1210 add missing select REGMAP in Kconfig iio: proximity: mb1232: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig iio: pressure: bm1390: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig ...	2024-10-20 13:10:44 -07:00
Linus Torvalds	c01ac4b944	Merge tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.12-rc4: - qcom-geni serial driver fixes, wow what a mess of a UART chip that thing is... - vt infoleak fix for odd font sizes - imx serial driver bugfix - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in that piece of code All of these have been in linux-next for over a week with no reported issues" * tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: qcom-geni: rename suspend functions serial: qcom-geni: drop unused receive parameter serial: qcom-geni: drop flip buffer WARN() serial: qcom-geni: fix rx cancel dma status bit serial: qcom-geni: fix receiver enable serial: qcom-geni: fix dma rx cancellation serial: qcom-geni: fix shutdown race serial: qcom-geni: revert broken hibernation support serial: qcom-geni: fix polled console initialisation serial: imx: Update mctrl old_status on RTSD interrupt tty: n_gsm: Fix use-after-free in gsm_cleanup_mux vt: prevent kernel-infoleak in con_font_get()	2024-10-20 13:03:30 -07:00
Linus Torvalds	b68c189570	Merge tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes and new device ids for 6.12-rc4: - xhci driver fixes for a number of reported issues - new usb-serial driver ids - dwc3 driver fixes for reported problems. - usb gadget driver fixes for reported problems - typec driver fixes - MAINTAINER file updates All of these have been in linux-next this week with no reported issues" * tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: option: add Telit FN920C04 MBIM compositions USB: serial: option: add support for Quectel EG916Q-GL xhci: dbc: honor usb transfer size boundaries. usb: xhci: Fix handling errors mid TD followed by other errors xhci: Mitigate failed set dequeue pointer commands xhci: Fix incorrect stream context type macro USB: gadget: dummy-hcd: Fix "task hung" problem usb: gadget: f_uac2: fix return value for UAC2_ATTRIBUTE_STRING store usb: dwc3: core: Fix system suspend on TI AM62 platforms xhci: tegra: fix checked USB2 port number usb: dwc3: Wait for EndXfer completion before restoring GUSB2PHYCFG usb: typec: qcom-pmic-typec: fix sink status being overwritten with RP_DEF usb: typec: altmode should keep reference to parent MAINTAINERS: usb: raw-gadget: add bug tracker link MAINTAINERS: Add an entry for the LJCA drivers	2024-10-20 12:57:53 -07:00
Linus Torvalds	db87114dcf	Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Explicitly disable the TSC deadline timer when going idle to address some CPU errata in that area - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the late microcode loading path - Clear CPU buffers later in the NMI exit path on 32-bit to avoid register clearing while they still contain sensitive data, for the RDFS mitigation - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit path on 32-bit - Fix parsing issues of memory bandwidth specification in sysfs for resctrl's memory bandwidth allocation feature - Other small cleanups and improvements * tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Always explicitly disarm TSC-deadline timer x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load x86/bugs: Use code segment selector for VERW operand x86/entry_32: Clear CPU buffers after register restore in NMI return x86/entry_32: Do not clobber user EFLAGS.ZF x86/resctrl: Annotate get_mem_config() functions as __init x86/resctrl: Avoid overflow in MB settings in bw_validate() x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h	2024-10-20 12:04:32 -07:00
Linus Torvalds	949c9ef59b	Merge tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a case for sifive-plic where an interrupt gets disabled and masked and remains masked when it gets reenabled later - Plug a small race in GIC-v4 where userspace can force an affinity change of a virtual CPU (vPE) in its unmapping path - Do not mix the two sets of ocelot irqchip's registers in the mask calculation of the main interrupt sticky register - Other smaller fixlets and cleanups * tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/renesas-rzg2l: Fix missing put_device irqchip/riscv-intc: Fix SMP=n boot with ACPI irqchip/sifive-plic: Unmask interrupt in plic_irq_enable() irqchip/gic-v4: Don't allow a VMOVP on a dying VPE irqchip/sifive-plic: Return error code on failure irqchip/riscv-imsic: Fix output text of base address irqchip/ocelot: Comment sticky register clearing code irqchip/ocelot: Fix trigger register address irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI	2024-10-20 11:44:07 -07:00

1 2 3 4 5 ...

1310060 Commits