linux

Author	SHA1	Message	Date
Kemeng Shi	12f7900c57	writeback: move wb_wakeup_delayed defination to fs-writeback.c The wb_wakeup_delayed is only used in fs-writeback.c. Move it to fs-writeback.c after defination of wb_wakeup and make it static. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240118203339.764093-1-shikemeng@huaweicloud.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:38 +01:00
Rich Felker	73fa7547c7	vfs: add RWF_NOAPPEND flag for pwritev2 The pwrite function, originally defined by POSIX (thus the "p"), is defined to ignore O_APPEND and write at the offset passed as its argument. However, historically Linux honored O_APPEND if set and ignored the offset. This cannot be changed due to stability policy, but is documented in the man page as a bug. Now that there's a pwritev2 syscall providing a superset of the pwrite functionality that has a flags argument, the conforming behavior can be offered to userspace via a new flag. Since pwritev2 checks flag validity (in kiocb_set_rw_flags) and reports unknown ones with EOPNOTSUPP, callers will not get wrong behavior on old kernels that don't support the new flag; the error is reported and the caller can decide how to handle it. Signed-off-by: Rich Felker <dalias@libc.org> Link: https://lore.kernel.org/r/20200831153207.GO3265@brightrain.aerifal.cx Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:37 +01:00
Hu.Yadi	9e3f1c5936	selftests/move_mount_set_group:Make tests build with old libc Replace SYS_<syscall> with __NR_<syscall>. Using the __NR_<syscall> notation, provided by UAPI, is useful to build tests on systems without the SYS_<syscall> definitions. Replace SYS_move_mount with __NR_move_mount Similar changes: commit `87129ef136` ("selftests/landlock: Make tests build with old libc") Acked-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Hu.Yadi <hu.yadi@h3c.com> Link: https://lore.kernel.org/r/20240111113229.10820-1-hu.yadi@h3c.com Reviewed-by: Berlin <berlin@h3c.com> Suggested-by: Jiao <jiaoxupo@h3c.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:37 +01:00
Hu Yadi	0f05ee4479	selftests/filesystems:fix build error in overlayfs One build issue comes up due to both mount.h included dev_in_maps.c In file included from dev_in_maps.c:10: /usr/include/sys/mount.h:35:3: error: expected identifier before numeric constant 35 \| MS_RDONLY = 1, /* Mount read-only. */ \| ^~~~~~~~~ In file included from dev_in_maps.c:13: Remove one of them to solve conflict, another error comes up: dev_in_maps.c:170:6: error: implicit declaration of function ‘mount’ [-Werror=implicit-function-declaration] 170 \| if (mount(NULL, "/", NULL, MS_SLAVE \| MS_REC, NULL) == -1) { \| ^~~~~ cc1: all warnings being treated as errors and then , add sys_mount definition to solve it After both above, dev_in_maps.c can be built correctly on my mache(gcc 10.2,glibc-2.32,kernel-5.10) Signed-off-by: Hu Yadi <hu.yadi@h3c.com> Link: https://lore.kernel.org/r/20240112074059.29673-1-hu.yadi@h3c.com Acked-by: Andrei Vagin <avagin@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:37 +01:00
Baolin Wang	8b3d838139	fs: improve dump_mapping() robustness We met a kernel crash issue when running stress-ng testing, and the system crashes when printing the dentry name in dump_mapping(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : dentry_name+0xd8/0x224 lr : pointer+0x22c/0x370 sp : ffff800025f134c0 ...... Call trace: dentry_name+0xd8/0x224 pointer+0x22c/0x370 vsnprintf+0x1ec/0x730 vscnprintf+0x2c/0x60 vprintk_store+0x70/0x234 vprintk_emit+0xe0/0x24c vprintk_default+0x3c/0x44 vprintk_func+0x84/0x2d0 printk+0x64/0x88 __dump_page+0x52c/0x530 dump_page+0x14/0x20 set_migratetype_isolate+0x110/0x224 start_isolate_page_range+0xc4/0x20c offline_pages+0x124/0x474 memory_block_offline+0x44/0xf4 memory_subsys_offline+0x3c/0x70 device_offline+0xf0/0x120 ...... The root cause is that, one thread is doing page migration, and we will use the target page's ->mapping field to save 'anon_vma' pointer between page unmap and page move, and now the target page is locked and refcount is 1. Currently, there is another stress-ng thread performing memory hotplug, attempting to offline the target page that is being migrated. It discovers that the refcount of this target page is 1, preventing the offline operation, thus proceeding to dump the page. However, page_mapping() of the target page may return an incorrect file mapping to crash the system in dump_mapping(), since the target page->mapping only saves 'anon_vma' pointer without setting PAGE_MAPPING_ANON flag. The page migration issue has been fixed by commit `d1adb25df7` ("mm: migrate: fix getting incorrect page mapping during page migration"). In addition, Matthew suggested we should also improve dump_mapping()'s robustness to resilient against the kernel crash [1]. With checking the 'dentry.parent' and 'dentry.d_name.name' used by dentry_name(), I can see dump_mapping() will output the invalid dentry instead of crashing the system when this issue is reproduced again. [12211.189128] page:fffff7de047741c0 refcount:1 mapcount:0 mapping:ffff989117f55ea0 index:0x1 pfn:0x211dd07 [12211.189144] aops:0x0 ino:1 invalid dentry:74786574206e6870 [12211.189148] flags: 0x57ffffc0000001(locked\|node=1\|zone=2\|lastcpupid=0x1fffff) [12211.189150] page_type: 0xffffffff() [12211.189153] raw: 0057ffffc0000001 0000000000000000 dead000000000122 ffff989117f55ea0 [12211.189154] raw: 0000000000000001 0000000000000001 00000001ffffffff 0000000000000000 [12211.189155] page dumped because: unmovable page [1] https://lore.kernel.org/all/ZXxn%2F0oixJxxAnpF@casper.infradead.org/ Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/937ab1f87328516821d39be672b6bc18861d9d3e.1705391420.git.baolin.wang@linux.alibaba.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:37 +01:00
Kunwu Chan	de8a3207ae	buffer: Use KMEM_CACHE instead of kmem_cache_create() Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Link: https://lore.kernel.org/r/20240116091137.92375-1-chentao@kylinos.cn Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:36 +01:00
Wen Yang	6b6ec4ca4e	eventfd: add a BUILD_BUG_ON() to ensure consistency between EFD_SEMAPHORE and the uapi introduce a BUILD_BUG_ON to check that the EFD_SEMAPHORE is equal to its definition in the uapi file, just like EFD_CLOEXEC and EFD_NONBLOCK. Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Link: https://lore.kernel.org/r/tencent_0BAA2DEAF9208D49987457E6583F9BE79507@qq.com Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:36 +01:00
David Disseldorp	6c8ac6e24e	initramfs: remove duplicate built-in __initramfs_start unpacking If initrd_start cpio extraction fails, CONFIG_BLK_DEV_RAM triggers fallback to initrd.image handling via populate_initrd_image(). The populate_initrd_image() call follows successful extraction of any built-in cpio archive at __initramfs_start, but currently performs built-in archive extraction a second time. Prior to commit `b2a74d5f9d` ("initramfs: remove clean_rootfs"), the second built-in initramfs unpack call was used to repopulate entries removed by clean_rootfs(), but it's no longer necessary now the contents of the previous extraction are retained. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20240111062240.9362-1-ddiss@suse.de Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:36 +01:00
Andreas Gruenbacher	73f65b8b03	fs: Wrong function name in comment This comment refers to function mark_buffer_inode_dirty(), but the function is actually called mark_buffer_dirty_inode(), so fix the comment. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20240108172040.178173-1-agruenba@redhat.com Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:36 +01:00
Jay	fe12cfc174	fs: fix a typo in attr.c The word "filesytem" should be "filesystem" Signed-off-by: Jay <merqqcury@gmail.com> Link: https://lore.kernel.org/r/20240109072927.29626-1-merqqcury@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-22 15:33:30 +01:00
Linus Torvalds	6613476e22	Linux 6.8-rc1 v6.8-rc1	2024-01-21 14:11:32 -08:00
Linus Torvalds	35a4474b5c	Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...	2024-01-21 14:01:12 -08:00
Linus Torvalds	4fbbed7872	Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for time and clocksources: - A fix for the idle and iowait time accounting vs CPU hotplug. The time is reset on CPU hotplug which makes the accumulated systemwide time jump backwards. - Assorted fixes and improvements for clocksource/event drivers" * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug clocksource/drivers/ep93xx: Fix error handling during probe clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings clocksource/timer-riscv: Add riscv_clock_shutdown callback dt-bindings: timer: Add StarFive JH8100 clint dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs	2024-01-21 11:14:40 -08:00
Linus Torvalds	7b297a5cc9	Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Aneesh Kumar: - Increase default stack size to 32KB for Book3S Thanks to Michael Ellerman. * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Increase default stack size to 32KB	2024-01-21 11:04:29 -08:00
Kent Overstreet	249f441f83	bcachefs: Improve inode_to_text() Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	d826cc57c5	bcachefs: logged_ops_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	8d52ba60c4	bcachefs: reflink_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	b2fa1b633b	bcachefs; extents_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	0560eb9abf	bcachefs: ec_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	c6c4ff6507	bcachefs: subvolume_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:11 -05:00
Kent Overstreet	8fed323b14	bcachefs: snapshot_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	d455179fce	bcachefs: alloc_background_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	72e0801049	bcachefs: xattr_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	7ffc4daa5f	bcachefs: dirent_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	b36425da71	bcachefs: inode_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	82de6207fb	bcachefs; quota_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	43314801a4	bcachefs: sb-counters_format.h bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	3a58dfbc46	bcachefs: counters.c -> sb-counters.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	12207f49ef	bcachefs: comment bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	d32088f2f2	bcachefs: bch_snapshot::btime Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	7be0208fc9	bcachefs: add missing __GFP_NOWARN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	d7e77f53e9	bcachefs: opts->compression can now also be applied in the background The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	ec4edd7b9d	bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Su Yue	2acc59dd88	bcachefs: grab s_umount only if snapshotting When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] * DEADLOCK * [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit `42d237320e` ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Su Yue	369acf97d6	bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	00fff4dd58	bcachefs: bios must be 512 byte algined Fixes: `023f9ac9f7` bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Colin Ian King	aead3428e8	bcachefs: remove redundant variable tmp The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	b97de45365	bcachefs: Improve trace_trans_restart_relock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	46bf2e9cc7	bcachefs: Fix excess transaction restarts in __bchfs_fallocate() drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	1a5039041b	bcachefs: extents_to_bp_state Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	ba96d36ca5	bcachefs: bkey_and_val_eq() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	e6a2566f7a	bcachefs: Better journal tracepoints Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	4ae016607b	bcachefs: Print size of superblock with space allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	a6548c8b5e	bcachefs: Avoid flushing the journal in the discard path When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	189c176c5d	bcachefs: Improve move_extent tracepoint Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	ef740a1e29	bcachefs: Add missing bch2_moving_ctxt_flush_all() This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	fa3185af43	bcachefs: Re-add move_extent_write tracepoint It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	d92b83f592	bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code that uses it to be woken up for other reasons, and fixes a bug where rebalance wouldn't wake up when a scan was requested. This raises the possibility of spurious wakeups, but callers should always be able to handle that reasonably well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	741c1d3ec1	bcachefs: Add .val_to_text() for KEY_TYPE_cookie Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	0124f42da7	bcachefs: Don't pass memcmp() as a pointer Some (buggy!) compilers have issues with this. Fixes: https://github.com/koverstreet/bcachefs/issues/625 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:04 -05:00

1 2 3 4 5 ...

1248711 Commits