Commit Graph

1106901 Commits

Author SHA1 Message Date
Masahiro Yamada d6b732666a modpost: fix undefined behavior of is_arm_mapping_symbol()
The return value of is_arm_mapping_symbol() is unpredictable when "$"
is passed in.

strchr(3) says:
  The strchr() and strrchr() functions return a pointer to the matched
  character or NULL if the character is not found. The terminating null
  byte is considered part of the string, so that if c is specified as
  '\0', these functions return a pointer to the terminator.

When str[1] is '\0', strchr("axtd", str[1]) is not NULL, and str[2] is
referenced (i.e. buffer overrun).

Test code
---------

  char str1[] = "abc";
  char str2[] = "ab";

  strcpy(str1, "$");
  strcpy(str2, "$");

  printf("test1: %d\n", is_arm_mapping_symbol(str1));
  printf("test2: %d\n", is_arm_mapping_symbol(str2));

Result
------

  test1: 0
  test2: 1

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-05-27 16:07:44 +09:00
Alexander Lobakin b5beffa20d modpost: fix removing numeric suffixes
With the `-z unique-symbol` linker flag or any similar mechanism,
it is possible to trigger the following:

ERROR: modpost: "param_set_uint.0" [vmlinux] is a static EXPORT_SYMBOL

The reason is that for now the condition from remove_dot():

if (m && (s[n + m] == '.' || s[n + m] == 0))

which was designed to test if it's a dot or a '\0' after the suffix
is never satisfied.
This is due to that `s[n + m]` always points to the last digit of a
numeric suffix, not on the symbol next to it (from a custom debug
print added to modpost):

param_set_uint.0, s[n + m] is '0', s[n + m + 1] is '\0'

So it's off-by-one and was like that since 2014.

Fix this for the sake of any potential upcoming features, but don't
bother stable-backporting, as it's well hidden -- apart from that
LD flag, it can be triggered only with GCC LTO which never landed
upstream.

Fixes: fcd38ed0ff ("scripts: modpost: fix compilation warning")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-05-27 16:05:01 +09:00
Vincent Whitchurch 2a4a62a14b um: Fix out-of-bounds read in LDT setup
syscall_stub_data() expects the data_count parameter to be the number of
longs, not bytes.

 ==================================================================
 BUG: KASAN: stack-out-of-bounds in syscall_stub_data+0x70/0xe0
 Read of size 128 at addr 000000006411f6f0 by task swapper/1

 CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0+ #18
 Call Trace:
  show_stack.cold+0x166/0x2a7
  __dump_stack+0x3a/0x43
  dump_stack_lvl+0x1f/0x27
  print_report.cold+0xdb/0xf81
  kasan_report+0x119/0x1f0
  kasan_check_range+0x3a3/0x440
  memcpy+0x52/0x140
  syscall_stub_data+0x70/0xe0
  write_ldt_entry+0xac/0x190
  init_new_ldt+0x515/0x960
  init_new_context+0x2c4/0x4d0
  mm_init.constprop.0+0x5ed/0x760
  mm_alloc+0x118/0x170
  0x60033f48
  do_one_initcall+0x1d7/0x860
  0x60003e7b
  kernel_init+0x6e/0x3d4
  new_thread_handler+0x1e7/0x2c0

 The buggy address belongs to stack of task swapper/1
  and is located at offset 64 in frame:
  init_new_ldt+0x0/0x960

 This frame has 2 objects:
  [32, 40) 'addr'
  [64, 80) 'desc'
 ==================================================================

Fixes: 858259cf7d ("uml: maintain own LDT entries")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:03:41 +02:00
Johannes Berg 57ae0b67b7 um: chan_user: Fix winch_tramp() return value
The previous fix here was only partially correct, it did
result in returning a proper error value in case of error,
but it also clobbered the pid that we need to return from
this function (not just zero for success).

As a result, it returned 0 here, but later this is treated
as a pid and used to kill the process, but since it's now
0 we kill(0, SIGKILL), which makes UML kill itself rather
than just the helper thread.

Fix that and make it more obvious by using a separate
variable for the pid.

Fixes: ccf1236eca ("um: fix error return code in winch_tramp()")
Reported-and-tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:03:41 +02:00
Johannes Berg af9fb41ed3 um: virtio_uml: Fix broken device handling in time-travel
If a device implementation crashes, virtio_uml will mark it
as dead by calling virtio_break_device() and scheduling the
work that will remove it.

This still seems like the right thing to do, but it's done
directly while reading the message, and if time-travel is
used, this is in the time-travel handler, outside of the
normal Linux machinery. Therefore, we cannot acquire locks
or do normal "linux-y" things because e.g. lockdep will be
confused about the context.

Move handling this situation out of the read function and
into the actual IRQ handler and response handling instead,
so that in the case of time-travel we don't call it in the
wrong context.

Chances are the system will still crash immediately, since
the device implementation crashing may also cause the time-
travel controller to go down, but at least all of that now
happens without strange warnings from lockdep.

Fixes: c8177aba37 ("um: time-travel: rework interrupt handling in ext mode")
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:03:41 +02:00
Johannes Berg d5a9597d69 um: line: Use separate IRQs per line
Today, all possible serial lines (ssl*=) as well as all
possible consoles (con*=) each share a single interrupt
(with a fixed number) with others of the same type.

Now, if you have two lines, say ssl0 and ssl1, and one
of them is connected to an fd you cannot read (e.g. a
file), but the other gets a read interrupt, then both
of them get the interrupt since it's shared. Then, the
read() call will return EOF, since it's a file being
written and there's nothing to read (at least not at
the current offset, at the end).

Unfortunately, this is treated as a read error, and we
close this line, losing all the possible output.

It might be possible to work around this and make the
IRQ sharing work, however, now that we have dynamically
allocated IRQs that are easy to use, simply use that to
achieve separating between the events; then there's no
interrupt for that line and we never attempt the read
in the first place, thus not closing the line.

This manifested itself in the wifi hostap/hwsim tests
where the parallel script communicates via one serial
console and the kernel messages go to another (a file)
and sending data on the communication console caused
the kernel messages to stop flowing into the file.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: anton ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:03:41 +02:00
Miaoqian Lin 02ded5a173 net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.

mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
We don't need the device node after it.

Add missing of_node_put() to avoid refcount leak.

Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27 08:02:33 +01:00
Vincent Whitchurch 2419ac3272 um: Enable ARCH_HAS_GCOV_PROFILE_ALL
Enable ARCH_HAS_GCOV_PROFILE_ALL so that CONFIG_GCOV_PROFILE_ALL can be
selected on UML.  I didn't need to explicitly disable GCOV on anything
to get this to work on the configs I tested.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:01:14 +02:00
Johannes Berg 3657190355 um: Use asm-generic/dma-mapping.h
If DMA (PCI over virtio) is enabled, then some drivers may
enable CONFIG_DMA_OPS as well, and then we pull in the x86
definition of get_arch_dma_ops(), which uses the dma_ops
symbol, which isn't defined.

Since we don't have real DMA ops nor any kind of IOMMU fix
this in the simplest possible way: pull in the asm-generic
file instead of inheriting the x86 one. It's not clear why
those drivers that do (e.g. VDPA) "select DMA_OPS", and if
they'd even work with this, but chances are nobody will be
wanting to do that anyway, so fixing the build failure is
good enough.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:00:45 +02:00
Miaoqian Lin 5dd89d2fc4 net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
the refcount in error case.
Add missing of_node_put() to avoid refcount leak.

Fixes: b1f66a5bee ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27 08:00:40 +01:00
Johannes Berg b0cbccf448 um: daemon: Make default socket configurable
Even if daemon network is deprecated, some configurations may
still use it (e.g. Debian), and not want to default to the
/tmp/uml.ctl socket location. Allow configuring the default
socket location.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-by: Ritesh Raj Sarraf <ritesh@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 09:00:40 +02:00
Dan Carpenter e7e7104e2d net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
The "fsp->location" variable comes from user via ethtool_get_rxnfc().
Check that it is valid to prevent an out of bounds read.

Fixes: 7aab747e55 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27 07:59:41 +01:00
Yuntao Wang 8d3a75078c scripts/kallsyms: update usage message of the kallsyms program
The kallsyms program supports --absolute-percpu option but does not display
it in the usage message, fix it.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-05-27 15:51:48 +09:00
Jing Leng 23a0cb8e32 kbuild: Fix include path in scripts/Makefile.modpost
When building an external module, if users don't need to separate the
compilation output and source code, they run the following command:
"make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)"
and "$(src)" are the same.

If they need to separate them, they run "make -C $(KERNEL_SRC_DIR)
O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the
command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to
prevent compilation failure.

So the kernel should change the included path to avoid the copy operation.

Signed-off-by: Jing Leng <jleng@ambarella.com>
[masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way,
but this patch is a nice clean up anyway.]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-05-27 15:51:48 +09:00
Johannes Berg 16aa835a25 um: xterm: Make default terminal emulator configurable
Make the default terminal emulator configurable so e.g.
Debian can set it to x-terminal-emulator instead of the
current default of xterm.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-by: Ritesh Raj Sarraf <ritesh@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 08:50:07 +02:00
Linus Torvalds 7e284070ab - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL
set. This change improves DM's hipri bio polling (REQ_POLLED)
   performance by 7 - 20% depending on the system.
 
 - Update DM core to use jump_labels to further reduce cost of unlikely
   branches for zoned block devices, dm-stats and swap_bios throttling.
 
 - Various DM core changes to reduce bio-based DM overhead and simplify
   IO accounting.
 
 - Fundamental DM core improvements to dm_io reference counting and the
   elimination of using bio_split()+bio_chain() -- instead DM's
   bio-based IO accounting is updated to account that a split occurred.
 
 - Improve DM core's abnormal bio processing to do less work.
 
 - Improve DM core's hipri polling support to use a single list rather
   than an hlist.
 
 - Update DM core to pass NULL bdev to bio_alloc_clone() so that
   initialization that isn't useful for DM can be elided.
 
 - Add cond_resched to DM stats' various loops that loop over all
   entries.
 
 - Fix incorrect error code return from DM integrity's constructor.
 
 - Make DM crypt's printing of the key constant-time.
 
 - Update bio-based DM multipath to provide high-resolution timer to
   the Historical Service Time (HST) path selector.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmKOiAwACgkQxSPxCi2d
 A1qCxgf/VLmiywUR7zCIDiPyJkc547z2MlPVaTCzpclklyLZ5wgfTtqsb8RnEqRs
 KIXrvKbakbfZtrG0k9ZYdbIIVO4jHQavCXtVZH0Mj4XqQArDhHxZzlBx8i6BirO5
 aya7HKWHcXj+s7GF146hRcPtD53LbSNDrB/bVaiZcHxOemthSgtht2xwmZfzoCcS
 JtorpWkA97lgDbdSAcR0kn1zkHDWFkQMJgnUL9FotnlhOVXZVUvAbgb4OjfxB3fG
 bCYlytDhs+6VwO+/CZXcOA7LNNtkLJPEkxUxjrChS5do64SgJUlDvY66iB6+7K0w
 0Jri5D/NZvjwYVeLgXn5UwyQJ75cjQ==
 =zlaE
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set.
   This change improves DM's hipri bio polling (REQ_POLLED) performance
   by 7 - 20% depending on the system.

 - Update DM core to use jump_labels to further reduce cost of unlikely
   branches for zoned block devices, dm-stats and swap_bios throttling.

 - Various DM core changes to reduce bio-based DM overhead and simplify
   IO accounting.

 - Fundamental DM core improvements to dm_io reference counting and the
   elimination of using bio_split()+bio_chain() -- instead DM's
   bio-based IO accounting is updated to account that a split occurred.

 - Improve DM core's abnormal bio processing to do less work.

 - Improve DM core's hipri polling support to use a single list rather
   than an hlist.

 - Update DM core to pass NULL bdev to bio_alloc_clone() so that
   initialization that isn't useful for DM can be elided.

 - Add cond_resched to DM stats' various loops that loop over all
   entries.

 - Fix incorrect error code return from DM integrity's constructor.

 - Make DM crypt's printing of the key constant-time.

 - Update bio-based DM multipath to provide high-resolution timer to the
   Historical Service Time (HST) path selector.

* tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm: pass NULL bdev to bio_alloc_clone
  dm cache metadata: remove unnecessary variable in __dump_mapping
  dm mpath: provide high-resolution timer to HST for bio-based
  dm crypt: make printing of the key constant-time
  dm integrity: fix error code in dm_integrity_ctr()
  dm stats: add cond_resched when looping over entries
  dm: improve abnormal bio processing
  dm: simplify bio-based IO accounting further
  dm: put all polled dm_io instances into a single list
  dm: improve dm_io reference counting
  dm: don't grab target io reference in dm_zone_map_bio
  dm: improve bio splitting and associated IO accounting
  dm: switch to bdev based IO accounting interfaces
  dm: pass dm_io instance to dm_io_acct directly
  dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
  dm: use bio_sectors in dm_aceept_partial_bio
  dm: simplify basic targets
  dm: conditionally enable branching for less used features
  dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
  dm: move hot dm_io members to same cacheline as dm_target_io
  ...
2022-05-26 21:13:45 -07:00
Linus Torvalds 780d8ce716 v5.19 pull request
Small collection of incremental improvement patches:
 
 - Minor code cleanup patches, comment improvements, etc from static tools
 
 - Clean the some of the kernel caps, reducing the historical stealth uAPI
   leftovers
 
 - Bug fixes and minor changes for rdmavt, hns, rxe, irdma
 
 - Remove unimplemented cruft from rxe
 
 - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
 
 - flush_workqueue(system_unbound_wq) removal
 
 - Ensure rxe waits for objects to be unused before allowing the core to
   free them
 
 - Several rc quality bug fixes for hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
 YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
 AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
 =jgTl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small collection of incremental improvement patches:

   - Minor code cleanup patches, comment improvements, etc from static
     tools

   - Clean the some of the kernel caps, reducing the historical stealth
     uAPI leftovers

   - Bug fixes and minor changes for rdmavt, hns, rxe, irdma

   - Remove unimplemented cruft from rxe

   - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
     layer

   - flush_workqueue(system_unbound_wq) removal

   - Ensure rxe waits for objects to be unused before allowing the core
     to free them

   - Several rc quality bug fixes for hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
  RDMA/rtrs-clt: Fix one kernel-doc comment
  RDMA/hfi1: Remove all traces of diagpkt support
  RDMA/hfi1: Consolidate software versions
  RDMA/hfi1: Remove pointless driver version
  RDMA/hfi1: Fix potential integer multiplication overflow errors
  RDMA/hfi1: Prevent panic when SDMA is disabled
  RDMA/hfi1: Prevent use of lock before it is initialized
  RDMA/rxe: Fix an error handling path in rxe_get_mcg()
  IB/core: Fix typo in comment
  RDMA/core: Fix typo in comment
  IB/hf1: Fix typo in comment
  IB/qib: Fix typo in comment
  IB/iser: Fix typo in comment
  RDMA/mlx4: Avoid flush_scheduled_work() usage
  IB/isert: Avoid flush_scheduled_work() usage
  RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
  RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
  RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
  RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
  RDMA/irdma: Add SW mechanism to generate completions on error
  ...
2022-05-26 21:08:40 -07:00
Linus Torvalds 090b39af59 kernel hardening fix for v5.19-rc1
- Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmKPnCQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkPYD/oD5OvCR37Mr5ZUwxZ28U9pz1YO
 PjAYqS5D6n+pQvdda4hQM6Ac6Ei0wfHvmZKIorp5oWlOZypMD8QmrkvQIjHKqvCO
 vtwaPpFyV69JDC6vx2IJks/Btfi6ZV0YDtGUb9HNA56qCjjtTsXf9SduH0VtGcwb
 tK4uG4BJLPkLq9f3lsYtL025GL+tp7po0OwgKkl/6xbnE+l7/V0ddMdOV961b1zQ
 hYqqCy7tsdThwkpeil1WWvAuo3w0zYa4+LF3WJvYQpua+txtcH8dvwkt/xCa15Uq
 y5u9fkd2p5NDpj+Y1nhcGiOFPBdYvoI7zNvXH1MFZcSakyPuS159tya1OMIdV2RB
 FS2oDxbltZA6cn6a22SjIchjHrgm911aNpPhx3OWuuzDZ9lLv+iwKBE+kIJPPCZx
 IfbatVColopOVXfQ/DpDlPO/7dZPZVJJQrZsnL/6PwLR1EtKMnN5cgXgdstdObnk
 FEEP3Kx9tPX0EOiK1rc3ebXvhlYZyuO47lwnxjRiDrHjh9RPrnqTX1LgRl5WHKzQ
 hZnNQ63N/Si+07Q7mBrRyYKXpzMvB1BCzgHYXpTetviVfaFW2C97nvjd5O27JrqP
 ot37OjW+uogv+1/0NaT76ZZby36Vmr2yGqvA3Zgvdrk1JMpPT8wMys+ER8tVnzzD
 CwByXrQkmrCmmFm9OQ==
 =gF9H
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening fix from Kees Cook:
 "This fixes an unlucky build race condition when using the GCC plugins,
  noticed by a few folks.

   - Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)"

* tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: use KERNELVERSION for plugin version
2022-05-26 21:03:56 -07:00
Linus Torvalds 6d29d7fe4f NFSD 5.19 Release Notes
We introduce "courteous server" in this release. Previously NFSD
 would purge open and lock state for an unresponsive client after
 one lease period (typically 90 seconds). Now, after one lease
 period, another client can open and lock those files and the
 unresponsive client's lease is purged; otherwise if the unrespon-
 sive client's open and lock state is uncontended, the server retains
 that open and lock state for up to 24 hours, allowing the client's
 workload to resume after a lengthy network partition.
 
 A longstanding issue with NFSv4 file creation is also addressed.
 Previously a file creation can fail internally, returning an error
 to the client, but leave the newly created file in place as an
 artifact. The file creation code path has been reorganized so that
 internal failures and race conditions are less likely to result in
 an unwanted file creation.
 
 A fault injector has been added to help exercise paths that are run
 during kernel metadata cache invalidation. These caches contain
 information maintained by user space about exported filesystems.
 Many of our test workloads do not trigger cache invalidation.
 
 There is one patch that is needed to support PREEMPT_RT and a fix
 for an ancient "sleep while spin-locked" splat that seems to have
 become easier to hit since v5.18-rc3.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
 f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
 strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
 N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
 7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
 XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
 tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
 fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
 oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
 HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
 WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
 hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
 =phE1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "We introduce 'courteous server' in this release. Previously NFSD would
  purge open and lock state for an unresponsive client after one lease
  period (typically 90 seconds). Now, after one lease period, another
  client can open and lock those files and the unresponsive client's
  lease is purged; otherwise if the unresponsive client's open and lock
  state is uncontended, the server retains that open and lock state for
  up to 24 hours, allowing the client's workload to resume after a
  lengthy network partition.

  A longstanding issue with NFSv4 file creation is also addressed.
  Previously a file creation can fail internally, returning an error to
  the client, but leave the newly created file in place as an artifact.
  The file creation code path has been reorganized so that internal
  failures and race conditions are less likely to result in an unwanted
  file creation.

  A fault injector has been added to help exercise paths that are run
  during kernel metadata cache invalidation. These caches contain
  information maintained by user space about exported filesystems. Many
  of our test workloads do not trigger cache invalidation.

  There is one patch that is needed to support PREEMPT_RT and a fix for
  an ancient 'sleep while spin-locked' splat that seems to have become
  easier to hit since v5.18-rc3"

* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
  NFSD: nfsd_file_put() can sleep
  NFSD: Add documenting comment for nfsd4_release_lockowner()
  NFSD: Modernize nfsd4_release_lockowner()
  NFSD: Fix possible sleep during nfsd4_release_lockowner()
  nfsd: destroy percpu stats counters after reply cache shutdown
  nfsd: Fix null-ptr-deref in nfsd_fill_super()
  nfsd: Unregister the cld notifier when laundry_wq create failed
  SUNRPC: Use RMW bitops in single-threaded hot paths
  NFSD: Clean up the show_nf_flags() macro
  NFSD: Trace filecache opens
  NFSD: Move documenting comment for nfsd4_process_open2()
  NFSD: Fix whitespace
  NFSD: Remove dprintk call sites from tail of nfsd4_open()
  NFSD: Instantiate a struct file when creating a regular NFSv4 file
  NFSD: Clean up nfsd_open_verified()
  NFSD: Remove do_nfsd_create()
  NFSD: Refactor NFSv4 OPEN(CREATE)
  NFSD: Refactor NFSv3 CREATE
  NFSD: Refactor nfsd_create_setattr()
  NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
  ...
2022-05-26 20:52:24 -07:00
Vincent Ray a54ce37036 net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :

if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))

to be executed before an enqueue operation called before
qdisc_run_begin().

As a result the following race can happen :

           CPU 1                             CPU 2

      qdisc_run_begin()               qdisc_run_begin() /* true */
        set(MISSED)                            .
      /* returns false */                      .
          .                            /* sees MISSED = 1 */
          .                            /* so qdisc not empty */
          .                            __qdisc_run()
          .                                    .
          .                              pfifo_fast_dequeue()
 ----> /* may be done here */                  .
|         .                                clear(MISSED)
|         .                                    .
|         .                                smp_mb __after_atomic();
|         .                                    .
|         .                                /* recheck the queue */
|         .                                /* nothing => exit   */
|   enqueue(skb1)
|         .
|   qdisc_run_begin()
|         .
|     spin_trylock() /* fail */
|         .
|     smp_mb__before_atomic() /* not enough */
|         .
 ---- if (test_bit(MISSED))
        return false;   /* exit */

In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().

CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.

Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.

At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.

We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.

Fixes: 89837eb4b2 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
Signed-off-by: Vincent Ray <vray@kalrayinc.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-26 20:45:46 -07:00
Michael Walle b58cdd4388 net: lan966x: check devm_of_phy_get() for -EDEFER_PROBE
At the moment, if devm_of_phy_get() returns an error the serdes
simply isn't set. While it is bad to ignore an error in general, there
is a particular bug that network isn't working if the serdes driver is
compiled as a module. In that case, devm_of_phy_get() returns
-EDEFER_PROBE and the error is silently ignored.

The serdes is optional, it is not there if the port is using RGMII, in
which case devm_of_phy_get() returns -ENODEV. Rearrange the error
handling so that -ENODEV will be handled but other error codes will
abort the probing.

Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20220525231239.1307298-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-26 20:45:29 -07:00
Jakub Kicinski 4548ad7287 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix UAF when creating non-stateful expression in set.

2) Set limit cost when cloning expression accordingly, from Phil Sutter.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_limit: Clone packet limits' cost value
  netfilter: nf_tables: disallow non-stateful expression in sets earlier
====================

Link: https://lore.kernel.org/r/20220526205411.315136-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-26 20:43:06 -07:00
sunliming 8d4a21b5ac tracing: Fix comments for event_trigger_separate_filter()
The parameter name in comments of event_trigger_separate_filter() is
inconsistent with actual parameter name, fix it.

Link: https://lkml.kernel.org/r/20220526072957.165655-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 22:03:52 -04:00
sunliming aef54851bf x86/traceponit: Fix comment about irq vector tracepoints
Commit:

  4b9a8dca0e ("x86/idt: Remove the tracing IDT completely")

removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c,
but left related comment. So that the comment become anachronistic.
Just remove the comment.

Link: https://lkml.kernel.org/r/20220526110831.175743-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 22:03:52 -04:00
sunliming feccde2a49 x86,tracing: Remove unused headers
Commit 4b9a8dca0e ("x86/idt: Remove the tracing IDT completely")
removed the tracing IDT from the file arch/x86/kernel/tracepoint.c,
but left the related headers unused, remove it.

Link: https://lkml.kernel.org/r/20220525012827.93464-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:01 -04:00
Song Liu 7d54c15cb8 ftrace: Clean up hash direct_functions on register failures
We see the following GPF when register_ftrace_direct fails:

[ ] general protection fault, probably for non-canonical address \
  0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[...]
[ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70
[ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...]
[ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206
[ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b
[ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0
[ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001
[ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0
[ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000
[ ] FS:  00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000
[ ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0
[ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ ] Call Trace:
[ ]  <TASK>
[ ]  register_ftrace_direct+0x54/0x290
[ ]  ? render_sigset_t+0xa0/0xa0
[ ]  bpf_trampoline_update+0x3f5/0x4a0
[ ]  ? 0xffffffffa00a3000
[ ]  bpf_trampoline_link_prog+0xa9/0x140
[ ]  bpf_tracing_prog_attach+0x1dc/0x450
[ ]  bpf_raw_tracepoint_open+0x9a/0x1e0
[ ]  ? find_held_lock+0x2d/0x90
[ ]  ? lock_release+0x150/0x430
[ ]  __sys_bpf+0xbd6/0x2700
[ ]  ? lock_is_held_type+0xd8/0x130
[ ]  __x64_sys_bpf+0x1c/0x20
[ ]  do_syscall_64+0x3a/0x80
[ ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ ] RIP: 0033:0x7fa9421defa9
[ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...]
[ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9
[ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011
[ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000
[ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001
[ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001
[ ]  </TASK>
[ ] Modules linked in: klp_vm(OK)
[ ] ---[ end trace 0000000000000000 ]---

One way to trigger this is:
  1. load a livepatch that patches kernel function xxx;
  2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now);
  3. repeat #2 => gpf.

This is because the entry is added to direct_functions, but not removed.
Fix this by remove the entry from direct_functions when
register_ftrace_direct fails.

Also remove the last trailing space from ftrace.c, so we don't have to
worry about it anymore.

Link: https://lkml.kernel.org/r/20220524170839.900849-1-song@kernel.org

Cc: stable@vger.kernel.org
Fixes: 763e34e74b ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:01 -04:00
sunliming 0a54f556b0 tracing: Fix comments of create_filter()
The name in comments of parameter "filter_string" in function
create_filter is annotated as "filter_str", just fix it.

Link: https://lkml.kernel.org/r/20220524063937.52873-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:01 -04:00
Congyu Liu bb5eb8f3b3 tracing: Disable kcov on trace_preemptirq.c
Functions in trace_preemptirq.c could be invoked from early interrupt
code that bypasses kcov trace function's in_task() check. Disable kcov
on this file to reduce random code coverage.

Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Gautam Menghani 154827f8e5 tracing: Initialize integer variable to prevent garbage return value
Initialize the integer variable to 0 to fix the clang scan warning:
Undefined or garbage value returned to caller
[core.uninitialized.UndefReturn]
        return ret;

Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com

Cc: stable@vger.kernel.org
Fixes: 8993665abc ("tracing/boot: Support multiple handlers for per-event histogram")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Julia Lawall 50c697819d ftrace: Fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/20220521111145.81697-81-Julia.Lawall@inria.fr

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Li kunyu 3a2bfec0b0 ftrace: Remove return value of ftrace_arch_modify_*()
All instances of the function ftrace_arch_modify_prepare() and
  ftrace_arch_modify_post_process() return zero. There's no point in
  checking their return value. Just have them be void functions.

Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
liqiong 2decd16f47 tracing: Cleanup code by removing init "char *name"
The pointer is assigned to "type->name" anyway. no need to
initialize with "preemption".

Link: https://lkml.kernel.org/r/20220513075221.26275-1-liqiong@nfschina.com

Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
liqiong 2d601b9864 tracing: Change "char *" string form to "char []"
The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.

Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com

Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira 9c556e5a4d tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
There is no need to wakeup the timerlat/ thread if stop tracing is hit
at the timerlat's IRQ handler.

Return before waking up timerlat's thread.

Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira 4dd2aea24e tracing/timerlat: Print stacktrace in the IRQ handler if needed
If print_stack and stop_tracing_us are set, and stop_tracing_us is hit
with latency higher than or equal to print_stack, print the
stack at the IRQ handler as it is useful to define the root cause for
the IRQ latency.

Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira aa748949b4 tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
Currently, the notification of a new max latency is sent from
timerlat's IRQ handler anytime a new max latency is found.

While this behavior is not wrong, the send IPI overhead itself
will increase the thread latency and that is not the desired
effect (tracing overhead).

Moreover, the thread will notify a new max latency again because
the thread latency as it is always higher than the IRQ latency
that woke it up.

The only case in which it is helpful to notify a new max latency
from IRQ is when stop tracing (for the IRQ) is set, as in this
case, the thread will not be dispatched.

Notify a new max latency from the IRQ handler only if stop tracing is
set for the IRQ handler.

Link: https://lkml.kernel.org/r/2c2d9a56c0886c8402ba320de32856cbbb10c2bb.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Reported-by: Clark Williams <williams@redhat.com>
Fixes: a955d7eac1 ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Masami Hiramatsu 4399404918 kprobes: Fix build errors with CONFIG_KRETPROBES=n
Max Filippov reported:

When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c
compilation fails with the following messages:

  kernel/kprobes.c: In function ‘recycle_rp_inst’:
  kernel/kprobes.c:1273:32: error: implicit declaration of function
                                   ‘get_kretprobe’

  kernel/kprobes.c: In function ‘kprobe_flush_task’:
  kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member
                                   named ‘kretprobe_instances’

This came from the commit d741bf41d7 ("kprobes: Remove
kretprobe hash") which introduced get_kretprobe() and
kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y,
but did not make recycle_rp_inst() and kprobe_flush_task()
depending on CONFIG_KRETPORBES.

Since those functions are only used for kretprobe, move those
functions into #ifdef CONFIG_KRETPROBE area.

Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devnote2

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Fixes: d741bf41d7 ("kprobes: Remove kretprobe hash")
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Wonhyuk Yang b27f266f74 tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 4
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    ....

This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.

The result of applied patch is below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 5
    close(1)                                = 0

Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Cc: stable@vger.kernel.org
Fixes: 4909010788 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Keita Suzuki 99696a2592 tracing: Fix potential double free in create_var_ref()
In create_var_ref(), init_var_ref() is called to initialize the fields
of variable ref_field, which is allocated in the previous function call
to create_hist_field(). Function init_var_ref() allocates the
corresponding fields such as ref_field->system, but frees these fields
when the function encounters an error. The caller later calls
destroy_hist_field() to conduct error handling, which frees the fields
and the variable itself. This results in double free of the fields which
are already freed in the previous function.

Fix this by storing NULL to the corresponding fields when they are freed
in init_var_ref().

Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp

Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
CC: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang cb24693d94 tracing: Use strim() to remove whitespace instead of doing it manually
The tracing_set_trace_write() function just removes the trailing whitespace
from the user supplied tracer name, but the leading whitespace should also
be removed.

In addition, if the user supplied tracer name contains only a few
whitespace characters, the first one will not be removed using the current
method, which results it a single whitespace character left in the buf.

To fix all of these issues, we use strim() to correctly remove both the
leading and trailing whitespace.

Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang 2889c658b2 ftrace: Deal with error return code of the ftrace_process_locs() function
The ftrace_process_locs() function may return -ENOMEM error code, which
should be handled by the callers.

Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang e4931b824a tracing: Use trace_create_file() to simplify creation of tracefs entries
Creating tracefs entries with tracefs_create_file() followed by pr_warn()
is tedious and repetitive, we can use trace_create_file() to simplify
this process and make the code more readable.

Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:52 -04:00
Darrick J. Wong efc2efeba1 xfs: move xfs_attr_use_log_assist usage out of libxfs
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that sets up logged xattr updates out of libxfs and into
xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:34:04 +10:00
Darrick J. Wong d9c61ccb3b xfs: move xfs_attr_use_log_assist out of xfs_log.c
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that enables logged xattr updates out of "lib"xlog and into
xfs_xattr.c so that it no longer has to know about xlog_* functions.

While we're at it, give xfs_xattr.c its own header file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:33:29 +10:00
Darrick J. Wong 202865cc21 xfs: warn about LARP once per mount
Since LARP is an experimental debug-only feature, we should try to warn
about it being in use once per mount, not once per reboot.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:32:07 +10:00
Darrick J. Wong df5660cf63 xfs: implement per-mount warnings for scrub and shrink usage
Currently, we don't have a consistent story around logging when an
EXPERIMENTAL feature gets turned on at runtime -- online fsck and shrink
log a message once per day across all mounts, and the recently merged
LARP mode only ever does it once per insmod cycle or reboot.

Because EXPERIMENTAL tags are supposed to go away eventually, convert
the existing daily warnings into state flags that travel with the mount,
and warn once per mount.  Making this an opstate flag means that we'll
be able to capture the experimental usage in the ftrace output too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:31:34 +10:00
Darrick J. Wong 374037966d xfs: don't log every time we clear the log incompat flags
There's no need to spam the logs every time we clear the log incompat
flags -- if someone is periodically using one of these features, they'll
be cleared every time the log tries to clean itself, which can get
pretty chatty:

$ dmesg | grep -i clear
[ 5363.894711] XFS (sdd): Clearing log incompat feature flags.
[ 5365.157516] XFS (sdd): Clearing log incompat feature flags.
[ 5369.388543] XFS (sdd): Clearing log incompat feature flags.
[ 5371.281246] XFS (sdd): Clearing log incompat feature flags.

These aren't high value messages either -- nothing's gone wrong, and
nobody's trying anything tricky.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:29:51 +10:00
Linus Torvalds 7f50d4dfe8 OpenRISC updates for 5.19
Fixups and enhancements for OpenRISC:
  - A few sparse warning fixups and other cleanups I noticed when working
    on a recent TLB bug found on a new OpenRISC core bring up.
  - A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
    platforms when running CI tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmKP+24ACgkQw7McLV5m
 J+Q59w//eiFQv/L//IPZ//fDhDKYILs0JYnJGmAY2IBFpI/m/KkSWOeo32L5qyXe
 IlvbxyrM3RVzDQVS/q1DG7vz9HuEI52asuefPEUeCmDSEcZH6g8ZzRHOrYqqDcM6
 nKBoi2CIv9BpfkOdoewgUdaMoijhIWsT5QSZpLjkfPZvc2E78tlHvWiaZffZAJiR
 XliE16XeuuKN6/vu5MNK2qExwsO9ayAiylv1K6bfN/0KkJEysY2FscRifHCbwscs
 bGi367C7XfHE8Tks31SyUDqNlD760FjjL+ygAMG63v20mswhQggwEb/dApjzHj1l
 MwrRBOG6CGYHE+AATsQLrd4N+fM9NNMfIARhtnbQYQlicAQpwGWNIPbpe1WOnBcw
 KfXaiJwUcntf3bfBLyfZVvWs0ZYMgfUD8NGX4nDtTomywdxSUlTPefj2KP7UKiaV
 NRwCOgZaT5iRzuJT832wCIRU75CjbFLvZ3JgMVwZtAlmGKTUopAikxvHTKRDDybn
 O9ghq1b34hJf0BmnPTWF0pELTu2z3Vpt8qrUdNtfy3ICNiMDJV0e3aUH1gFJ1LtA
 ++bh98wErqaxVsNzQXS7PUwiB2xmUB8YGMfJoQIN/spysJ3mAlOjVPxFKLVXHopi
 aHatlCvIV/Lc4ygt7no8PHtjvt8JckwCRpG0BLnrw09TokAQkp0=
 =JxMd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

 - A few sparse warning fixups and other cleanups I noticed when working
   on a recent TLB bug found on a new OpenRISC core bring up.

 - A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
   platforms when running CI tests

* tag 'for-linus' of https://github.com/openrisc/linux:
  openrisc: Allow power off handler overriding
  openrisc: Remove unused IMMU tlb workardound
  openrisc/fault: Fix symbol scope warnings
  openrisc/delay: Add include to fix symbol not declared warning
  openrisc/time: Fix symbol scope warnings
  openrisc/traps: Declare unhandled_exception for asmlinkage
  openrisc/traps: Remove die_if_kernel function
  openrisc/traps: Declare file scope symbols as static
  openrisc: Update litex defconfig to support glibc userland
  openrisc: Pretty print show_registers memory dumps
  openrisc: Add syscall details to emergency syscall debugging
  openrisc: Add support for liteuart emergency printing
  openrisc: Cleanup emergency print handling
  openrisc: Add gcc machine instruction flag configuration
  openrisc: define nop command for simulator reboot
  openrisc: remove bogus nops and shutdowns
  openrisc: fix typos in comments
2022-05-26 17:27:49 -07:00
Darrick J. Wong 910bbdf2f4 xfs: convert buf_cancel_table allocation to kmalloc_array
While we're messing around with how recovery allocates and frees the
buffer cancellation table, convert the allocation to use kmalloc_array
instead of the old kmem_alloc APIs, and make it handle a null return,
even though that's not likely.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:27:19 +10:00
Darrick J. Wong 8db074bd84 xfs: don't leak xfs_buf_cancel structures when recovery fails
If log recovery fails, we free the memory used by the buffer
cancellation buckets, but we don't actually traverse each bucket list to
free the individual xfs_buf_cancel objects.  This leads to a memory
leak, as reported by kmemleak in xfs/051:

unreferenced object 0xffff888103629560 (size 32):
  comm "mount", pid 687045, jiffies 4296935916 (age 10.752s)
  hex dump (first 32 bytes):
    08 d3 0a 01 00 00 00 00 08 00 00 00 01 00 00 00  ................
    d0 f5 0b 92 81 88 ff ff 80 64 64 25 81 88 ff ff  .........dd%....
  backtrace:
    [<ffffffffa0317c83>] kmem_alloc+0x73/0x140 [xfs]
    [<ffffffffa03234a9>] xlog_recover_buf_commit_pass1+0x139/0x200 [xfs]
    [<ffffffffa032dc27>] xlog_recover_commit_trans+0x307/0x350 [xfs]
    [<ffffffffa032df15>] xlog_recovery_process_trans+0xa5/0xe0 [xfs]
    [<ffffffffa032e12d>] xlog_recover_process_data+0x8d/0x140 [xfs]
    [<ffffffffa032e49d>] xlog_do_recovery_pass+0x19d/0x740 [xfs]
    [<ffffffffa032f22d>] xlog_do_log_recovery+0x6d/0x150 [xfs]
    [<ffffffffa032f343>] xlog_do_recover+0x33/0x1d0 [xfs]
    [<ffffffffa032faba>] xlog_recover+0xda/0x190 [xfs]
    [<ffffffffa03194bc>] xfs_log_mount+0x14c/0x360 [xfs]
    [<ffffffffa030bfed>] xfs_mountfs+0x50d/0xa60 [xfs]
    [<ffffffffa03124b5>] xfs_fs_fill_super+0x6a5/0x950 [xfs]
    [<ffffffff812b92a5>] get_tree_bdev+0x175/0x280
    [<ffffffff812b7c3a>] vfs_get_tree+0x1a/0x80
    [<ffffffff812e366f>] path_mount+0x6ff/0xaa0
    [<ffffffff812e3b13>] __x64_sys_mount+0x103/0x140

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:26:38 +10:00