Commit Graph

3005 Commits

Author SHA1 Message Date
Linus Torvalds
1bc191051d Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - New user_events interface. User space can register an event with the
   kernel describing the format of the event. Then it will receive a
   byte in a page mapping that it can check against. A privileged task
   can then enable that event like any other event, which will change
   the mapped byte to true, telling the user space application to start
   writing the event to the tracing buffer.

 - Add new "ftrace_boot_snapshot" kernel command line parameter. When
   set, the tracing buffer will be saved in the snapshot buffer at boot
   up when the kernel hands things over to user space. This will keep
   the traces that happened at boot up available even if user space boot
   up has tracing as well.

 - Have TRACE_EVENT_ENUM() also update trace event field type
   descriptions. Thus if a static array defines its size with an enum,
   the user space trace event parsers can still know how to parse that
   array.

 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This
   will make one tracepoint be able to trace different content and not
   be stuck at only what the original TRACE_EVENT() macro exports.

 - Fixes to tracing error logging.

 - Better saving of cmdlines to PIDs when tracing (use the wakeup events
   for mapping).

* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
  tracing: Have type enum modifications copy the strings
  user_events: Add trace event call as root for low permission cases
  tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
  tracing: Add snapshot at end of kernel boot up
  tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
  tracing: Fix strncpy warning in trace_events_synth.c
  user_events: Prevent dyn_event delete racing with ioctl add/delete
  tracing: Add TRACE_CUSTOM_EVENT() macro
  tracing: Move the defines to create TRACE_EVENTS into their own files
  tracing: Add sample code for custom trace events
  tracing: Allow custom events to be added to the tracefs directory
  tracing: Fix last_cmd_set() string management in histogram code
  user_events: Fix potential uninitialized pointer while parsing field
  tracing: Fix allocation of last_cmd in last_cmd_set()
  user_events: Add documentation file
  user_events: Add sample code for typical usage
  user_events: Add self-test for validator boundaries
  user_events: Add self-test for perf_event integration
  user_events: Add self-test for dynamic_events integration
  user_events: Add self-test for ftrace integration
  ...
2022-03-23 11:40:25 -07:00
Linus Torvalds
6b1f86f8e9 Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache
Pull filesystem folio updates from Matthew Wilcox:
 "Primarily this series converts some of the address_space operations to
  take a folio instead of a page.

  Notably:

   - a_ops->is_partially_uptodate() takes a folio instead of a page and
     changes the type of the 'from' and 'count' arguments to make it
     obvious they're bytes.

   - a_ops->invalidatepage() becomes ->invalidate_folio() and has a
     similar type change.

   - a_ops->launder_page() becomes ->launder_folio()

   - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the
     address_space as an argument.

  There are a couple of other misc changes up front that weren't worth
  separating into their own pull request"

* tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits)
  fs: Remove aops ->set_page_dirty
  fb_defio: Use noop_dirty_folio()
  fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
  fs: Convert __set_page_dirty_buffers to block_dirty_folio
  nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()
  mm: Convert swap_set_page_dirty() to swap_dirty_folio()
  ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio
  f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio
  f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio
  f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio
  afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()
  btrfs: Convert extent_range_redirty_for_io() to use folios
  fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
  btrfs: Convert from set_page_dirty to dirty_folio
  fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()
  fs: Add aops->dirty_folio
  fs: Remove aops->launder_page
  orangefs: Convert launder_page to launder_folio
  nfs: Convert from launder_page to launder_folio
  fuse: Convert from launder_page to launder_folio
  ...
2022-03-22 18:26:56 -07:00
Linus Torvalds
9030fb0bb9 Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds
3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
Baolin Wang
abd4349ff9 mm: compaction: cleanup the compaction trace events
As Steven suggested [1], we should access the pointers from the trace
event to avoid dereferencing them to the tracepoint function when the
tracepoint is disabled.

[1] https://lkml.org/lkml/2021/11/3/409

Link: https://lkml.kernel.org/r/4cd393b4d57f8f01ed72c001509b28e3a3b1a8c1.1646985115.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:09 -07:00
NeilBrown
a88f2096d5 remove congestion tracking framework
This framework is no longer used - so discard it.

Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
Linus Torvalds
3fe2f7446f Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Cleanups for SCHED_DEADLINE

 - Tracing updates/fixes

 - CPU Accounting fixes

 - First wave of changes to optimize the overhead of the scheduler
   build, from the fast-headers tree - including placeholder *_api.h
   headers for later header split-ups.

 - Preempt-dynamic using static_branch() for ARM64

 - Isolation housekeeping mask rework; preperatory for further changes

 - NUMA-balancing: deal with CPU-less nodes

 - NUMA-balancing: tune systems that have multiple LLC cache domains per
   node (eg. AMD)

 - Updates to RSEQ UAPI in preparation for glibc usage

 - Lots of RSEQ/selftests, for same

 - Add Suren as PSI co-maintainer

* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
  sched/headers: ARM needs asm/paravirt_api_clock.h too
  sched/numa: Fix boot crash on arm64 systems
  headers/prep: Fix header to build standalone: <linux/psi.h>
  sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
  cgroup: Fix suspicious rcu_dereference_check() usage warning
  sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
  sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
  sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
  sched/deadline,rt: Remove unused functions for !CONFIG_SMP
  sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
  sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
  sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
  sched/deadline: Remove unused def_dl_bandwidth
  sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
  sched/tracing: Don't re-read p->state when emitting sched_switch event
  sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  sched/cpuacct: Remove redundant RCU read lock
  sched/cpuacct: Optimize away RCU read lock
  sched/cpuacct: Fix charge percpu cpuusage
  sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
  ...
2022-03-22 14:39:12 -07:00
Linus Torvalds
5191290407 Merge tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "This contains feature updates, performance improvements, preparatory
  and core work and some related VFS updates:

  Features:

   - encoded read/write ioctls, allows user space to read or write raw
     data directly to extents (now compressed, encrypted in the future),
     will be used by send/receive v2 where it saves processing time

   - zoned mode now works with metadata DUP (the mkfs.btrfs default)

   - error message header updates:
      - print error state: transaction abort, other error, log tree
        errors
      - print transient filesystem state: remount, device replace,
        ignored checksum verifications

   - tree-checker: verify the transaction id of the to-be-written dirty
     extent buffer

  Performance improvements for fsync:

   - directory logging speedups (up to -90% run time)

   - avoid logging all directory changes during renames (up to -60% run
     time)

   - avoid inode logging during rename and link when possible (up to
     -60% run time)

   - prepare extents to be logged before locking a log tree path
     (throughput +7%)

   - stop copying old file extents when doing a full fsync()

   - improved logging of old extents after truncate

  Core, fixes:

   - improved stale device identification by dev_t and not just path
     (for devices that are behind other layers like device mapper)

   - continued extent tree v2 preparatory work
      - disable features that won't work yet
      - add wrappers and abstractions for new tree roots

   - improved error handling

   - add super block write annotations around background block group
     reclaim

   - fix device scanning messages potentially accessing stale pointer

   - cleanups and refactoring

  VFS:

   - allow reflinks/deduplication from two different mounts of the same
     filesystem

   - export and add helpers for read/write range verification, for the
     encoded ioctls"

* tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (98 commits)
  btrfs: zoned: put block group after final usage
  btrfs: don't access possibly stale fs_info data in device_list_add
  btrfs: add lockdep_assert_held to need_preemptive_reclaim
  btrfs: verify the tranisd of the to-be-written dirty extent buffer
  btrfs: unify the error handling of btrfs_read_buffer()
  btrfs: unify the error handling pattern for read_tree_block()
  btrfs: factor out do_free_extent_accounting helper
  btrfs: remove last_ref from the extent freeing code
  btrfs: add a alloc_reserved_extent helper
  btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block
  btrfs: add and use helper for unlinking inode during log replay
  btrfs: extend locking to all space_info members accesses
  btrfs: zoned: mark relocation as writing
  fs: allow cross-vfsmount reflink/dedupe
  btrfs: remove the cross file system checks from remap
  btrfs: pass btrfs_fs_info to btrfs_recover_relocation
  btrfs: pass btrfs_fs_info for deleting snapshots and cleaner
  btrfs: add filesystems state details to error messages
  btrfs: deal with unexpected extent type during reflinking
  btrfs: fix unexpected error path when reflinking an inline extent
  ...
2022-03-22 10:51:40 -07:00
Linus Torvalds
9b03992f0c Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Fix some bugs in converting ext4 to use the new mount API, as well as
  more bug fixes and clean ups in the ext4 fast_commit feature (most
  notably, in the tracepoints).

  In the jbd2 layer, the t_handle_lock spinlock has been removed, with
  the last place where it was actually needed replaced with an atomic
  cmpxchg"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (35 commits)
  ext4: fix kernel doc warnings
  ext4: fix remaining two trace events to use same printk convention
  ext4: add commit tid info in ext4_fc_commit_start/stop trace events
  ext4: add commit_tid info in jbd debug log
  ext4: add transaction tid info in fc_track events
  ext4: add new trace event in ext4_fc_cleanup
  ext4: return early for non-eligible fast_commit track events
  ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FC
  ext4: convert ext4_fc_track_dentry type events to use event class
  ext4: fix ext4_fc_stats trace point
  ext4: remove unused enum EXT4_FC_COMMIT_FAILED
  ext4: warn when dirtying page w/o buffers in data=journal mode
  doc: fixed a typo in ext4 documentation
  ext4: make mb_optimize_scan performance mount option work with extents
  ext4: make mb_optimize_scan option work with set/unset mount cmd
  ext4: don't BUG if someone dirty pages without asking ext4 first
  ext4: remove redundant assignment to variable split_flag1
  ext4: fix underflow in ext4_max_bitmap_size()
  ext4: fix ext4_mb_clear_bb() kernel-doc comment
  ext4: fix fs corruption when tring to remove a non-empty directory with IO error
  ...
2022-03-22 10:36:55 -07:00
Linus Torvalds
14705fda8f Merge tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "New features:

   - NFSv3 support in NFSD is now always built

   - Added NFSD support for the NFSv4 birth-time file attribute

   - Added support for storing and displaying sockaddrs in trace points

   - NFSD now recognizes RPC_AUTH_TLS probes

  Performance improvements:

   - Optimized the svc transport enqueuing mechanism

   - Added micro-optimizations for the duplicate reply cache

  Notable bug fixes:

   - Allocation of the NFSD file cache hash table is more reliable"

* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
  nfsd: fix using the correct variable for sizeof()
  nfsd: use correct format characters
  NFSD: prevent integer overflow on 32 bit systems
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
  NFSD: Fix nfsd_breaker_owns_lease() return values
  NFSD: Clean up _lm_ operation names
  arch: Remove references to CONFIG_NFSD_V3 in the default configs
  NFSD: Remove CONFIG_NFSD_V3
  nfsd: more robust allocation failure handling in nfsd_file_cache_init
  SUNRPC: Teach server to recognize RPC_AUTH_TLS
  NFSD: Move svc_serv_ops::svo_function into struct svc_serv
  NFSD: Remove svc_serv_ops::svo_module
  SUNRPC: Remove svc_shutdown_net()
  SUNRPC: Rename svc_close_xprt()
  SUNRPC: Rename svc_create_xprt()
  SUNRPC: Remove svo_shutdown method
  SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
  SUNRPC: Remove the .svo_enqueue_xprt method
  SUNRPC: Record endpoint information in trace log
  ...
2022-03-22 10:29:51 -07:00
Linus Torvalds
616355cc81 Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Linus Torvalds
af472a9efd Merge tag 'for-5.18/io_uring-2022-03-18' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:

 - Fixes for current file position. Still doesn't have the f_pos_lock
   sorted, but it's a step in the right direction (Dylan)

 - Tracing updates (Dylan, Stefan)

 - Improvements to io-wq locking (Hao)

 - Improvements for provided buffers (me, Pavel)

 - Support for registered file descriptors (me, Xiaoguang)

 - Support for ring messages (me)

 - Poll improvements (me)

 - Fix for fixed buffers and non-iterator reads/writes (me)

 - Support for NAPI on sockets (Olivier)

 - Ring quiesce improvements (Usama)

 - Misc fixes (Olivier, Pavel)

* tag 'for-5.18/io_uring-2022-03-18' of git://git.kernel.dk/linux-block: (42 commits)
  io_uring: terminate manual loop iterator loop correctly for non-vecs
  io_uring: don't check unrelated req->open.how in accept request
  io_uring: manage provided buffers strictly ordered
  io_uring: fold evfd signalling under a slower path
  io_uring: thin down io_commit_cqring()
  io_uring: shuffle io_eventfd_signal() bits around
  io_uring: remove extra barrier for non-sqpoll iopoll
  io_uring: fix provided buffer return on failure for kiocb_done()
  io_uring: extend provided buf return to fails
  io_uring: refactor timeout cancellation cqe posting
  io_uring: normilise naming for fill_cqe*
  io_uring: cache poll/double-poll state with a request flag
  io_uring: cache req->apoll->events in req->cflags
  io_uring: move req->poll_refs into previous struct hole
  io_uring: make tracing format consistent
  io_uring: recycle apoll_poll entries
  io_uring: remove duplicated member check for io_msg_ring_prep()
  io_uring: allow submissions to continue on error
  io_uring: recycle provided buffers if request goes async
  io_uring: ensure reads re-import for selected buffers
  ...
2022-03-21 16:24:45 -07:00
Linus Torvalds
5628b8de12 Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
 "There have been a few important changes to the RNG's crypto, but the
  intent for 5.18 has been to shore up the existing design as much as
  possible with modern cryptographic functions and proven constructions,
  rather than actually changing up anything fundamental to the RNG's
  design.

  So it's still the same old RNG at its core as before: it still counts
  entropy bits, and collects from the various sources with the same
  heuristics as before, and so forth. However, the cryptographic
  algorithms that transform that entropic data into safe random numbers
  have been modernized.

  Just as important, if not more, is that the code has been cleaned up
  and re-documented. As one of the first drivers in Linux, going back to
  1.3.30, its general style and organization was showing its age and
  becoming both a maintenance burden and an auditability impediment.

  Hopefully this provides a more solid foundation to build on for the
  future. I encourage you to open up the file in full, and maybe you'll
  remark, "oh, that's what it's doing," and enjoy reading it. That, at
  least, is the eventual goal, which this pull begins working toward.

  Here's a summary of the various patches in this pull:

   - /dev/urandom and /dev/random now do the same thing, per the patch
     we discussed on the list. I think this is worth trying out. If it
     does appear problematic, I've made sure to keep it standalone and
     revertible without any conflicts.

   - Fixes and cleanups for numerous integer type problems, locking
     issues, and general code quality concerns.

   - The input pool's LFSR has been replaced with a cryptographically
     secure hash function, which has security and performance benefits
     alike, and consequently allows us to count entropy bits linearly.

   - The pre-init injection now uses a real hash function too, instead
     of an LFSR or vanilla xor.

   - The interrupt handler's fast_mix() function now uses one round of
     SipHash, rather than the fake crypto that was there before.

   - All additions of RDRAND and RDSEED now go through the input pool's
     hash function, in part to mitigate ridiculous hypothetical CPU
     backdoors, but more so to have a consistent interface for ingesting
     entropy that's easy to analyze, making everything happen one way,
     instead of a potpourri of different ways.

   - The crng now works on per-cpu data, while also being in accordance
     with the actual "fast key erasure RNG" design. This allows us to
     fix several boot-time race complications associated with the prior
     dynamically allocated model, eliminates much locking, and makes our
     backtrack protection more robust.

   - Batched entropy now erases doled out values so that it's backtrack
     resistant.

   - Working closely with Sebastian, the interrupt handler no longer
     needs to take any locks at all, as we punt the
     synchronized/expensive operations to a workqueue. This is
     especially nice for PREEMPT_RT, where taking spinlocks in irq
     context is problematic. It also makes the handler faster for the
     rest of us.

   - Also working with Sebastian, we now do the right thing on CPU
     hotplug, so that we don't use stale entropy or fail to accumulate
     new entropy when CPUs come back online.

   - We handle virtual machines that fork / clone / snapshot, using the
     "vmgenid" ACPI specification for retrieving a unique new RNG seed,
     which we can use to also make WireGuard (and in the future, other
     things) safe across VM forks.

   - Around boot time, we now try to reseed more often if enough entropy
     is available, before settling on the usual 5 minute schedule.

   - Last, but certainly not least, the documentation in the file has
     been updated considerably"

* tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (60 commits)
  random: check for signal and try earlier when generating entropy
  random: reseed more often immediately after booting
  random: make consistent usage of crng_ready()
  random: use SipHash as interrupt entropy accumulator
  wireguard: device: clear keys on VM fork
  random: provide notifier for VM fork
  random: replace custom notifier chain with standard one
  random: do not export add_vmfork_randomness() unless needed
  virt: vmgenid: notify RNG of VM fork and supply generation ID
  ACPI: allow longer device IDs
  random: add mechanism for VM forks to reinitialize crng
  random: don't let 644 read-only sysctls be written to
  random: give sysctl_random_min_urandom_seed a more sensible value
  random: block in /dev/urandom
  random: do crng pre-init loading in worker rather than irq
  random: unify cycles_t and jiffies usage and types
  random: cleanup UUID handling
  random: only wake up writers after zap if threshold was passed
  random: round-robin registers as ulong, not u32
  random: clear fast pool, crng, and batches in cpuhp bring up
  ...
2022-03-21 14:55:32 -07:00
Linus Torvalds
35dc0352bb Merge tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Fix idle detection (Neeraj Upadhyay) and missing access marking
   detected by KCSAN.

 - Reduce coupling between rcu_barrier() and CPU-hotplug operations, so
   that rcu_barrier() no longer needs to do cpus_read_lock(). This may
   also someday allow system boot to bring CPUs online concurrently.

 - Enable more aggressive movement to per-CPU queueing when reacting to
   excessive lock contention due to workloads placing heavy update-side
   stress on RCU tasks.

 - Improvements to RCU priority boosting, including changes from Neeraj
   Upadhyay, Zqiang, and Alison Chaiken.

 - Various fixes improving test robustness and debug information.

 - Add tests for SRCU size transitions, further compress torture.sh
   build products, and improve debug output.

 - Miscellaneous fixes.

* tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
  rcu: Replace cpumask_weight with cpumask_empty where appropriate
  rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
  rcu: Uninline multi-use function: finish_rcuwait()
  rcu: Mark writes to the rcu_segcblist structure's ->flags field
  kasan: Record work creation stack trace with interrupts enabled
  rcu: Inline __call_rcu() into call_rcu()
  rcu: Add mutex for rcu boost kthread spawning and affinity setting
  rcu: Fix description of kvfree_rcu()
  MAINTAINERS:  Add Frederic and Neeraj to their RCU files
  rcutorture: Provide non-power-of-two Tasks RCU scenarios
  rcutorture: Test SRCU size transitions
  torture: Make torture.sh help message match reality
  rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
  rcu-tasks: Use order_base_2() instead of ilog2()
  rcu: Create and use an rcu_rdp_cpu_online()
  rcu: Make rcu_barrier() no longer block CPU-hotplug operations
  rcu: Rework rcu_barrier() and callback-migration logic
  rcu: Refactor rcu_barrier() empty-list handling
  rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
  torture: Change KVM environment variable to RCUTORTURE
  ...
2022-03-21 14:00:56 -07:00
Matthew Wilcox (Oracle)
e0cd5e7ffa mm/vmscan: Convert pageout() to take a folio
We always write out an entire folio at once.  This conversion removes
a few calls to compound_head() and gets the NR_VMSCAN_WRITE statistic
right when writing out a large folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-21 13:01:35 -04:00
Dylan Yudaken
052ebf1fbb io_uring: make tracing format consistent
Make the tracing formatting for user_data and flags consistent.

Having consistent formatting allows one for example to grep for a specific
user_data/flags and be able to trace a single sqe through easily.

Change user_data to 0x%llx and flags to 0x%x everywhere. The '0x' is
useful to disambiguate for example "user_data 100".

Additionally remove the '=' for flags in io_uring_req_failed, again for consistency.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220316095204.2191498-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-16 05:45:28 -06:00
Ritesh Harjani
163f11b8b3 ext4: fix remaining two trace events to use same printk convention
All ext4 & jbd2 trace events starts with "dev Major:Minor".
While we are still improving/adding the ftrace events for FC,
let's fix last two remaining trace events to follow the same
convention.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8f33b163f0f29df2491c03b79f8ac96890ea5184.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15 17:45:36 -04:00
Ritesh Harjani
5641ace544 ext4: add commit tid info in ext4_fc_commit_start/stop trace events
This adds commit_tid info in ext4_fc_commit_start/stop which is helpful
in debugging fast_commit issues.

For e.g. issues where due to jbd2 journal full commit, FC miss to commit
updates to a file.

Also improves TP_prink format string i.e. all ext4 and jbd2 trace events
starts with "dev MAjOR,MINOR". Let's follow the same convention while we
are still at it.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/ebcd6b9ab5b718db30f90854497886801ce38c63.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15 17:45:36 -04:00
Ritesh Harjani
1d2e2440c5 ext4: add transaction tid info in fc_track events
This patch adds the transaction & inode tid info in trace events for
callers of ext4_fc_track_template(). This is helpful in debugging race
conditions where an inode could belong to two different transaction tids.
It also fixes the checkpatch warnings which says use tabs instead of
spaces.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/c203c09dc11bb372803c430f621f25a4b8c2c8b4.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15 17:45:36 -04:00
Ritesh Harjani
08f4c42aba ext4: add new trace event in ext4_fc_cleanup
This adds a new trace event in ext4_fc_cleanup() which is helpful in debugging
some fast_commit issues.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/794cdb1d5d3622f3f80d30c222ff6652ea68c375.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15 17:45:36 -04:00
Matthew Wilcox (Oracle)
ccd16945db ext4: Convert invalidatepage to invalidate_folio
Extensive changes, but fairly mechanical.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15 08:23:30 -04:00
Ingo Molnar
ccdbf33c23 Merge tag 'v5.17-rc8' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15 10:28:12 +01:00
Josef Bacik
9c54e80ddc btrfs: add code to support the block group root
This code adds the on disk structures for the block group root, which
will hold the block group items for extent tree v2.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:48 +01:00
Ritesh Harjani
8cb5a30372 ext4: convert ext4_fc_track_dentry type events to use event class
One should use DECLARE_EVENT_CLASS for similar event types instead of
defining TRACE_EVENT for each event type. This is helpful in reducing
the text section footprint for e.g. [1]

[1]: https://lwn.net/Articles/381064/

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/a019cb46219ef4b30e4d98d7ced7d8819a2fc61d.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-12 21:26:08 -05:00
Ritesh Harjani
7af1974af0 ext4: fix ext4_fc_stats trace point
ftrace's __print_symbolic() requires that any enum values used in the
symbol to string translation table be wrapped in a TRACE_DEFINE_ENUM
so that the enum value can be decoded from the ftrace ring buffer by
user space tooling.

This patch also fixes few other problems found in this trace point.
e.g. dereferencing structures in TP_printk which should not be done
at any cost.

Also to avoid checkpatch warnings, this patch removes those
whitespaces/tab stops issues.

Cc: stable@kernel.org
Fixes: aa75f4d3da ("ext4: main fast-commit commit path")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/b4b9691414c35c62e570b723e661c80674169f9a.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-12 21:26:08 -05:00
David Howells
413a4a6b0b cachefiles: Fix volume coherency attribute
A network filesystem may set coherency data on a volume cookie, and if
given, cachefiles will store this in an xattr on the directory in the
cache corresponding to the volume.

The function that sets the xattr just stores the contents of the volume
coherency buffer directly into the xattr, with nothing added; the
checking function, on the other hand, has a cut'n'paste error whereby it
tries to interpret the xattr contents as would be the xattr on an
ordinary file (using the cachefiles_xattr struct).  This results in a
failure to match the coherency data because the buffer ends up being
shifted by 18 bytes.

Fix this by defining a structure specifically for the volume xattr and
making both the setting and checking functions use it.

Since the volume coherency doesn't work if used, take the opportunity to
insert a reserved field for future use, set it to 0 and check that it is
0.  Log mismatch through the appropriate tracepoint.

Note that this only affects cifs; 9p, afs, ceph and nfs don't use the
volume coherency data at the moment.

Fixes: 32e150037d ("fscache, cachefiles: Store the volume coherency data")
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11 10:24:37 -08:00
Steven Rostedt (Google)
3a73333fb3 tracing: Add TRACE_CUSTOM_EVENT() macro
To make it really easy to add custom events from modules, add a
TRACE_CUSTOM_EVENT() macro that acts just like the TRACE_EVENT() macro,
but creates a custom event to an already existing tracepoint.

The trace_custom_sched.[ch] has been updated to use this new macro to show
how simple it is.

Link: https://lkml.kernel.org/r/20220303220625.738622494@goodmis.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:24 -05:00
Steven Rostedt (Google)
af6b9668e8 tracing: Move the defines to create TRACE_EVENTS into their own files
In an effort to add custom event macros that can be used to create your
own custom events based on existing tracepoints, move the defines of the
special macros used in TRACE_EVENT() into their own files such that they
can be reused for TRACE_CUSTOM_EVENT().

Link: https://lkml.kernel.org/r/20220303220625.553406495@goodmis.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-11 11:49:23 -05:00
Stefan Roesch
502c87d655 io-uring: Make tracepoints consistent.
This makes the io-uring tracepoints consistent. Where it makes sense
the tracepoints start with the following four fields:
- context (ring)
- request
- user_data
- opcode.

Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220214180430.70572-3-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-10 06:32:49 -07:00
Usama Arif
2757be22c0 io_uring: remove trace for eventfd
The information on whether eventfd is registered is not very useful and
would result in the tracepoint being enclosed in an rcu_readlock in a
later patch that tries to avoid ring quiesce for registering eventfd.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Link: https://lore.kernel.org/r/20220204145117.1186568-2-usama.arif@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-10 06:32:49 -07:00
Valentin Schneider
fa2c3254d7 sched/tracing: Don't re-read p->state when emitting sched_switch event
As of commit

  c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")

the following sequence becomes possible:

		      p->__state = TASK_INTERRUPTIBLE;
		      __schedule()
			deactivate_task(p);
  ttwu()
    READ !p->on_rq
    p->__state=TASK_WAKING
			trace_sched_switch()
			  __trace_sched_switch_state()
			    task_state_index()
			      return 0;

TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.

Prevent this by pushing the value read from __schedule() down the trace
event.

Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
2022-03-01 16:18:39 +01:00
Chuck Lever
70a60cbfb6 SUNRPC: Record endpoint information in trace log
To make server-side trace events more useful in container-ized
environments, capture not just the remote's IP address, but the
local IP address and network namespace as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
aca3ed7915 SUNRPC: Same as SVC_RQST_ENDPOINT, but without the xid
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
26ce14e77a SUNRPC: Improve sockaddr handling in the svc_xprt_create_error trace point
Clean up: Use the new __sockaddr field to record the socket address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
d07c9ad622 tracing: Introduce helpers to safely handle dynamic-sized sockaddrs
Enable a struct sockaddr to be stored in a trace record as a
dynamically-sized field. The common cases are AF_INET and AF_INET6
which are different sizes, and are vastly smaller than a struct
sockaddr_storage.

These are safer because, when used properly, the size of the
sockaddr destination field in each trace record is now guaranteed
to be the same as the source address that is being copied into it.

Link: https://lore.kernel.org/all/164182978641.8391.8277203495236105391.stgit@bazille.1015granger.net/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Jason A. Donenfeld
14c174633f random: remove unused tracepoints
These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21 21:14:00 +01:00
Jason A. Donenfeld
04ec96b768 random: make more consistent use of integer types
We've been using a flurry of int, unsigned int, size_t, and ssize_t.
Let's unify all of this into size_t where it makes sense, as it does in
most places, and leave ssize_t for return values with possible errors.

In addition, keeping with the convention of other functions in this
file, functions that are dealing with raw bytes now take void *
consistently instead of a mix of that and u8 *, because much of the time
we're actually passing some other structure that is then interpreted as
bytes by the function.

We also take the opportunity to fix the outdated and incorrect comment
in get_random_bytes_arch().

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21 21:13:54 +01:00
Jason A. Donenfeld
9c07f57869 random: simplify entropy debiting
Our pool is 256 bits, and we only ever use all of it or don't use it at
all, which is decided by whether or not it has at least 128 bits in it.
So we can drastically simplify the accounting and cmpxchg loop to do
exactly this.  While we're at it, we move the minimum bit size into a
constant so it can be shared between the two places where it matters.

The reason we want any of this is for the case in which an attacker has
compromised the current state, and then bruteforces small amounts of
entropy added to it. By demanding a particular minimum amount of entropy
be present before reseeding, we make that bruteforcing difficult.

Note that this rationale no longer includes anything about /dev/random
blocking at the right moment, since /dev/random no longer blocks (except
for at ~boot), but rather uses the crng. In a former life, /dev/random
was different and therefore required a more nuanced account(), but this
is no longer.

Behaviorally, nothing changes here. This is just a simplification of
the code.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21 16:48:06 +01:00
Yang Shi
d5869fdc18 block: introduce block_rq_error tracepoint
Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.

Since the new tracepoint has the similar implementation with
block_rq_complete, so move the existing code from TRACE_EVENT
block_rq_complete() into new event class block_rq_completion(). Then add
event for block_rq_complete and block_rq_err respectively from the newly
created event class per the suggestion from Chaitanya Kulkarni.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220210225222.260069-1-shy828301@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 10:00:16 -07:00
Paul E. McKenney
0cabb47af3 rcu: Refactor rcu_barrier() empty-list handling
This commit saves a few lines by checking first for an empty callback
list.  If the callback list is empty, then that CPU is taken care of,
regardless of its online or nocb state.  Also simplify tracing accordingly
and fold a few lines together.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:12:28 -08:00
Linus Torvalds
df0001545b Merge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pulltracing fixes from Steven Rostedt:

 - Limit mcount build time sorting to only those archs that we know it
   works for.

 - Fix memory leak in error path of histogram setup

 - Fix and clean up rel_loc array out of bounds issue

 - tools/rtla documentation fixes

 - Fix issues with histogram logic

* tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Don't inc err_log entry count if entry allocation fails
  tracing: Propagate is_signed to expression
  tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
  tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
  tools/tracing: Update Makefile to build rtla
  rtla: Make doc build optional
  tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro
  tracing: Avoid -Warray-bounds warning for __rel_loc macro
  tracing/histogram: Fix a potential memory leak for kstrdup()
  ftrace: Have architectures opt-in for mcount build time sorting
2022-01-28 19:30:35 +02:00
Kees Cook
c6d777acdf tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro
As done for trace_events.h, also fix the __rel_loc macro in perf.h,
which silences the -Warray-bounds warning:

In file included from ./include/linux/string.h:253,
                 from ./include/linux/bitmap.h:11,
                 from ./include/linux/cpumask.h:12,
                 from ./include/linux/mm_types_task.h:14,
                 from ./include/linux/mm_types.h:5,
                 from ./include/linux/buildid.h:5,
                 from ./include/linux/module.h:14,
                 from samples/trace_events/trace-events-sample.c:2:
In function '__fortify_strcpy',
    inlined from 'perf_trace_foo_rel_loc' at samples/trace_events/./trace-events-sample.h:519:1:
./include/linux/fortify-string.h:47:33: warning: '__builtin_strcpy' offset 12 is out of the bounds [
0, 4] [-Warray-bounds]
   47 | #define __underlying_strcpy     __builtin_strcpy
      |                                 ^
./include/linux/fortify-string.h:445:24: note: in expansion of macro '__underlying_strcpy'
  445 |                 return __underlying_strcpy(p, q);
      |                        ^~~~~~~~~~~~~~~~~~~

Also make __data struct member a proper flexible array to avoid future
problems.

Link: https://lkml.kernel.org/r/20220125220037.2738923-1-keescook@chromium.org

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 55de2c0b56 ("tracing: Add '__rel_loc' using trace event macros")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:45 -05:00
Masami Hiramatsu
58c5724ec2 tracing: Avoid -Warray-bounds warning for __rel_loc macro
Since -Warray-bounds checks the destination size from the type of given
pointer, __assign_rel_str() macro gets warned because it passes the
pointer to the 'u32' field instead of 'trace_event_raw_*' data structure.
Pass the data address calculated from the 'trace_event_raw_*' instead of
'u32' __rel_loc field.

Link: https://lkml.kernel.org/r/20220125233154.dac280ed36944c0c2fe6f3ac@kernel.org

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ This did not fix the warning, but is still a nice clean up ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:45 -05:00
Linus Torvalds
23a46422c5 Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

  Current release - new code bugs:

   - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

   - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n

   - nf_tables: set last expression in register tracking area

   - nft_connlimit: fix memleak if nf_ct_netns_get() fails

   - mptcp: fix removing ids bitmap setting

   - bonding: use rcu_dereference_rtnl when getting active slave

   - fix three cases of sleep in atomic context in drivers: lan966x, gve

   - handful of build fixes for esoteric drivers after netdev->dev_addr
     was made const

  Previous releases - regressions:

   - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
     Linux compatibility with USGv6 tests

   - procfs: show net device bound packet types

   - ipv4: fix ip option filtering for locally generated fragments

   - phy: broadcom: hook up soft_reset for BCM54616S

  Previous releases - always broken:

   - ipv4: raw: lock the socket in raw_bind()

   - ipv4: decrease the use of shared IPID generator to decrease the
     chance of attackers guessing the values

   - procfs: fix cross-netns information leakage in /proc/net/ptype

   - ethtool: fix link extended state for big endian

   - bridge: vlan: fix single net device option dumping

   - ping: fix the sk_bound_dev_if match in ping_lookup"

* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  net: bridge: vlan: fix memory leak in __allowed_ingress
  net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
  ipv4: remove sparse error in ip_neigh_gw4()
  ipv4: avoid using shared IP generator for connected sockets
  ipv4: tcp: send zero IPID in SYNACK messages
  ipv4: raw: lock the socket in raw_bind()
  MAINTAINERS: add missing IPv4/IPv6 header paths
  MAINTAINERS: add more files to eth PHY
  net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
  net: bridge: vlan: fix single net device option dumping
  net: stmmac: skip only stmmac_ptp_register when resume from suspend
  net: stmmac: configure PTP clock source prior to PTP initialization
  Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
  connector/cn_proc: Use task_is_in_init_pid_ns()
  pid: Introduce helper task_is_in_init_pid_ns()
  gve: Fix GFP flags when allocing pages
  net: lan966x: Fix sleep in atomic context when updating MAC table
  net: lan966x: Fix sleep in atomic context when injecting frames
  ethernet: seeq/ether3: don't write directly to netdev->dev_addr
  ethernet: 8390/etherh: don't write directly to netdev->dev_addr
  ...
2022-01-27 20:58:39 +02:00
Menglong Dong
364df53c08 net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-27 08:45:13 -08:00
Linus Torvalds
0280e3c58f Merge tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "New Features:

   - Basic handling for case insensitive filesystems

   - Initial support for fs_locations and server trunking

  Bugfixes and Cleanups:

   - Cleanups to how the "struct cred *" is handled for the
     nfs_access_entry

   - Ensure the server has an up to date ctimes before hardlinking or
     renaming

   - Update 'blocks used' after writeback, fallocate, and clone

   - nfs_atomic_open() fixes

   - Improvements to sunrpc tracing

   - Various null check & indenting related cleanups

   - Some improvements to the sunrpc sysfs code:
      - Use default_groups in kobj_type
      - Fix some potential races and reference leaks

   - A few tracepoint cleanups in xprtrdma"

[ This should have gone in during the merge window, but didn't. The
  original pull request - sent during the merge window - had gotten
  marked as spam and discarded due missing DKIM headers in the email
  from Anna.   - Linus ]

* tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits)
  SUNRPC: Don't dereference xprt->snd_task if it's a cookie
  xprtrdma: Remove definitions of RPCDBG_FACILITY
  xprtrdma: Remove final dprintk call sites from xprtrdma
  sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()
  net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change
  NFSv4.1 test and add 4.1 trunking transport
  SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
  NFSv4 handle port presence in fs_location server string
  NFSv4 expose nfs_parse_server_name function
  NFSv4.1 query for fs_location attr on a new file system
  NFSv4 store server support for fs_location attribute
  NFSv4 remove zero number of fs_locations entries error check
  NFSv4: nfs_atomic_open() can race when looking up a non-regular file
  NFSv4: Handle case where the lookup of a directory fails
  NFSv42: Fallocate and clone should also request 'blocks used'
  NFSv4: Allow writebacks to request 'blocks used'
  SUNRPC: use default_groups in kobj_type
  NFS: use default_groups in kobj_type
  NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
  NFS: Add a helper to remove case-insensitive aliases
  ...
2022-01-25 20:16:03 +02:00
Linus Torvalds
7fd350f6ff Merge tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull more fscache updates from David Howells:
 "A set of fixes and minor updates for the fscache rewrite:

   - Fix mishandling of volume collisions (the wait condition is
     inverted and so it was only waiting if the volume collision was
     already resolved).

   - Fix miscalculation of whether there's space available in
     cachefiles.

   - Make sure a default cache name is set on a cache if the user hasn't
     set one by the time they bind the cache.

   - Adjust the way the backing inode is presented in tracepoints, add a
     tracepoint for mkdir and trace directory lookup.

   - Add a tracepoint for failure to set the active file mark.

   - Add an explanation of the checks made on the backing filesystem.

   - Check that the backing filesystem supports tmpfile.

   - Document how the page-release cancellation of the read-skip
     optimisation works.

  And I've included a change for netfslib:

   - Make ops->init_rreq() optional"

* tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: Make ops->init_rreq() optional
  fscache: Add a comment explaining how page-release optimisation works
  cachefiles: Check that the backing filesystem supports tmpfiles
  cachefiles: Explain checks in a comment
  cachefiles: Trace active-mark failure
  cachefiles: Make some tracepoint adjustments
  cachefiles: set default tag name if it's unspecified
  cachefiles: Calculate the blockshift in terms of bytes, not pages
  fscache: Fix the volume collision wait condition
2022-01-22 10:59:32 +02:00
David Howells
b64a331498 cachefiles: Trace active-mark failure
Add a tracepoint to log failure to apply an active mark to a file in
addition to tracing successfully setting and unsetting the mark.

Also include the backing file inode number in the message logged to dmesg.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251404666.3435901.17331742792401482190.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
8c39b8bc82 cachefiles: Make some tracepoint adjustments
Make some adjustments to tracepoints to make the tracing a bit more
followable:

 (1) Standardise on displaying the backing inode number as "B=<hex>" with
     no leading zeros.

 (2) Make the cachefiles_lookup tracepoint log the directory inode number
     as well as the looked-up inode number.

 (3) Add a cachefiles_lookup tracepoint into cachefiles_get_directory() to
     log directory lookup.

 (4) Add a new cachefiles_mkdir tracepoint and use that to log a successful
     mkdir from cachefiles_get_directory().

 (5) Make the cachefiles_unlink and cachefiles_rename tracepoints log the
     inode number of the affected file/dir rather than dentry struct
     pointers.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251403694.3435901.9797725381831316715.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00