Commit Graph

1296460 Commits

Author SHA1 Message Date
Johannes Weiner
b82b530740 mm: vmscan: restore incremental cgroup iteration
Currently, reclaim always walks the entire cgroup tree in order to ensure
fairness between groups.  While overreclaim is limited in shrink_lruvec(),
many of our systems have a sizable number of active groups, and an even
bigger number of idle cgroups with cache left behind by previous jobs; the
mere act of walking all these cgroups can impose significant latency on
direct reclaimers.

In the past, we've used a save-and-restore iterator that enabled
incremental tree walks over multiple reclaim invocations.  This ensured
fairness, while keeping the work of individual reclaimers small.

However, in edge cases with a lot of reclaim concurrency, individual
reclaimers would sometimes not see enough of the cgroup tree to make
forward progress and (prematurely) declare OOM.  Consequently we switched
to comprehensive walks in 1ba6fc9af3 ("mm: vmscan: do not share cgroup
iteration between reclaimers").

To address the latency problem without bringing back the premature OOM
issue, reinstate the shared iteration, but with a restart condition to do
the full walk in the OOM case - similar to what we do for memory.low
enforcement and active page protection.

In the worst case, we do one more full tree walk before declaring
OOM. But the vast majority of direct reclaim scans can then finish
much quicker, while fairness across the tree is maintained:

- Before this patch, we observed that direct reclaim always takes more
  than 100us and most direct reclaim time is spent in reclaim cycles
  lasting between 1ms and 1 second. Almost 40% of direct reclaim time
  was spent on reclaim cycles exceeding 100ms.

- With this patch, almost all page reclaim cycles last less than 10ms,
  and a good amount of direct page reclaim finishes in under 100us. No
  page reclaim cycles lasting over 100ms were observed anymore.

The shared iterator state is maintaned inside the target cgroup, so
fair and incremental walks are performed during both global reclaim
and cgroup limit reclaim of complex subtrees.

Link: https://lkml.kernel.org/r/20240514202641.2821494-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Facebook Kernel Team <kernel-team@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:53 -07:00
Ran Xiaokai
7f83bf1460 mm/huge_memory: mark racy access onhuge_anon_orders_always
huge_anon_orders_always is accessed lockless, it is better to use the
READ_ONCE() wrapper.  This is not fixing any visible bug, hopefully this
can cease some KCSAN complains in the future.  Also do that for
huge_anon_orders_madvise.

Link: https://lkml.kernel.org/r/20240515104754889HqrahFPePOIE1UlANHVAh@zte.com.cn
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:53 -07:00
Kefeng Wang
6f775463d0 mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()
Let's change shmem_alloc_folio() to take a order and use
folio_alloc_mpol() helper, then directly use it for normal or large folio
to cleanup code.

Link: https://lkml.kernel.org/r/20240515070709.78529-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:53 -07:00
Kefeng Wang
1d9cb7852b mm: mempolicy: use folio_alloc_mpol() in alloc_migration_target_by_mpol()
Convert to use folio_alloc_mpol() to make vma_alloc_folio_noprof() to use
folio throughout.

Link: https://lkml.kernel.org/r/20240515070709.78529-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:53 -07:00
Kefeng Wang
3174d70cf6 mm: mempolicy: use folio_alloc_mpol_noprof() in vma_alloc_folio_noprof()
Convert to use folio_alloc_mpol_noprof() to make vma_alloc_folio_noprof()
to use folio throughout.

Link: https://lkml.kernel.org/r/20240515070709.78529-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:52 -07:00
Kefeng Wang
a19621ed4e mm: add folio_alloc_mpol()
Patch series "mm: convert to folio_alloc_mpol()".


This patch (of 4):

This adds a new folio_alloc_mpol() like folio_alloc() but allocate folio
according to NUMA mempolicy.

Link: https://lkml.kernel.org/r/20240515070709.78529-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20240515070709.78529-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:52 -07:00
Oscar Salvador
6584a14a37 mm/hugetlb: drop node_alloc_noretry from alloc_fresh_hugetlb_folio
Since commit d67e32f267 ("hugetlb: restructure pool allocations"), the
parameter node_alloc_noretry from alloc_fresh_hugetlb_folio() is not used,
so drop it.

Link: https://lkml.kernel.org/r/20240516081035.5651-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:52 -07:00
Illia Ostapyshyn
0ba5e806e1 mm/vmscan: update stale references to shrink_page_list
Commit 49fd9b6df5 ("mm/vmscan: fix a lot of comments") renamed
shrink_page_list() to shrink_folio_list().  Fix up the remaining
references to the old name in comments and documentation.

Link: https://lkml.kernel.org/r/20240517091348.1185566-1-illia@yshyn.com
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:52 -07:00
Thomas Weißschuh
525c303049 mm/hugetlb: constify ctl_table arguments of utility functions
The sysctl core is preparing to only expose instances of struct ctl_table
as "const".  This will also affect the ctl_table argument of sysctl
handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Link: https://lkml.kernel.org/r/20240518-sysctl-const-handler-hugetlb-v1-1-47e34e2871b2@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:52 -07:00
Jakub Kicinski
e19f67df9c Merge branch 'selftests-openvswitch-address-some-flakes-in-the-ci-environment'
Aaron Conole says:

====================
selftests: openvswitch: Address some flakes in the CI environment

These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series.  There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).
====================

Link: https://patch.msgid.link/20240702132830.213384-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:18 -07:00
Aaron Conole
7abfd8ecb7 selftests: openvswitch: Be more verbose with selftest debugging.
The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases.  Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.

Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Aaron Conole
818481db3d selftests: openvswitch: Attempt to autoload module.
Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module.  The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development.  However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests.  This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Aaron Conole
ff015706fc selftests: openvswitch: Bump timeout to 15 minutes.
We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded.  Bump the timeout to 15 minutes.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Jakub Kicinski
1a16cdf77e net: ethtool: fix compat with old RSS context API
Device driver gets access to rxfh_dev, while rxfh is just a local
copy of user space params. We need to check what RSS context ID
driver assigned in rxfh_dev, not rxfh.

Using rxfh leads to trying to store all contexts at index 0xffffffff.
From the user perspective it leads to "driver chose duplicate ID"
warnings when second context is added and inability to access any
contexts even tho they were successfully created - xa_load() for
the actual context ID will return NULL, and syscall will return -ENOENT.

Looks like a rebasing mistake, since rxfh_dev was added relatively
recently by commit fb6e30a725 ("net: ethtool: pass a pointer to
parameters to get/set_rxfh ethtool ops").

Fixes: eac9122f0c ("net: ethtool: record custom RSS contexts in the XArray")
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20240702164157.4018425-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:22:17 -07:00
Radu Rendec
9a0c28efee net: rswitch: Avoid use-after-free in rswitch_poll()
The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.

Let's move around the instructions to use skb->len before the skb is
freed.

This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.

Fixes: 271e015b91 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:15:22 -07:00
Jakub Kicinski
0b8774586b selftests: drv-net: rss_ctx: allow more noise on default context
As predicted by David running the test on a machine with a single
interface is a bit unreliable. We try to send 20k packets with
iperf and expect fewer than 10k packets on the default context.
The test isn't very quick, iperf will usually send 100k packets
by the time we stop it. So we're off by 5x on the number of iperf
packets but still expect default context to only get the hardcoded
10k. The intent is to make sure we get noticeably less traffic
on the default context. Use half of the resulting iperf traffic
instead of the hard coded 10k.

Link: https://patch.msgid.link/20240702233728.4183387-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:13:59 -07:00
Paolo Abeni
8c5a9f290e tools: ynl: use ident name for Family, too.
This allow consistent naming convention between Family and others
element's name.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/9bbcab3094970b371bd47aa18481ae6ca5a93687.1719930479.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:13:20 -07:00
Boris Burkov
a56c85fa2d btrfs: fix folio refcount in __alloc_dummy_extent_buffer()
Another improper use of __folio_put() in an error path after freshly
allocating pages/folios which returns them with the refcount initialized
to 1. The refactor from __free_pages() -> __folio_put() (instead of
folio_put) removed a refcount decrement found in __free_pages() and
folio_put but absent from __folio_put().

Fixes: 13df3775ef ("btrfs: cleanup metadata page pointer usage")
CC: stable@vger.kernel.org # 6.8+
Tested-by: Ed Tomlinson <edtoml@gmail.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-04 02:19:10 +02:00
Boris Burkov
da0386c1c7 btrfs: fix folio refcount in btrfs_do_encoded_write()
The conversion to folios switched __free_page() to __folio_put() in the
error path in btrfs_do_encoded_write().

However, this gets the page refcounting wrong. If we do hit that error
path (I reproduced by modifying btrfs_do_encoded_write to pretend to
always fail in a way that jumps to out_folios and running the fstests
case btrfs/281), then we always hit the following BUG freeing the folio:

  BUG: Bad page state in process btrfs  pfn:40ab0b
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x61be5 pfn:0x40ab0b
   flags: 0x5ffff0000000000(node=0|zone=2|lastcpupid=0x1ffff)
  raw: 05ffff0000000000 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000061be5 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: nonzero _refcount
  Call Trace:
  <TASK>
  dump_stack_lvl+0x3d/0xe0
  bad_page+0xea/0xf0
  free_unref_page+0x8e1/0x900
  ? __mem_cgroup_uncharge+0x69/0x90
  __folio_put+0xe6/0x190
  btrfs_do_encoded_write+0x445/0x780
  ? current_time+0x25/0xd0
  btrfs_do_write_iter+0x2cc/0x4b0
  btrfs_ioctl_encoded_write+0x2b6/0x340

It turns out __free_page() decreases the page reference count while
__folio_put() does not. Switch __folio_put() to folio_put() which
decreases the folio reference count first.

Fixes: 400b172b8c ("btrfs: compression: migrate compression/decompression paths to folios")
Tested-by: Ed Tomlinson <edtoml@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-04 02:18:45 +02:00
Nicolas Schier
608c3b1e61 perf install: Don't propagate subdir to Documentation submake
Explicitly reset 'subdir' variable when descending to
tools/perf/Documentation.  Similar to commit f89fb55714 ("perf build:
Don't propagate subdir to submakes for install_headers", 2023-01-02),
calling the 'tools/perf_install' target via top-levels Makefile results
in repeated subdir components when attempting to call the perf
documentation installation rules:

    $ make tools/perf_install NO_LIBTRACEEVENT=1 JOBS=1
    [...]
    /bin/sh: 1: cd: can't cd to /data/linux/kbuild/tools/perf/tools/perf/
    ../../scripts/Makefile.include:17: *** output directory "/data/linux/kbuild/tools/perf/tools/perf/" does not exist.  Stop.
    make[5]: *** [Makefile.perf:1096: try-install-man] Error 2
    make[4]: *** [Makefile.perf:264: sub-make] Error 2
    make[3]: *** [Makefile:113: install] Error 2
    make[2]: *** [Makefile:131: perf_install] Error 2

Resetting 'subdir' fixes the call from top-level Makefile.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Nicolas Schier <n.schier@avm.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20240523-make-tools-perf-install-v1-1-3903499e637f@avm.de
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03 17:11:38 -07:00
Takashi Sakamoto
526e21a2aa firewire: ohci: add tracepoints event for data of Self-ID DMA
In 1394 OHCI, the SelfIDComplete event occurs when the hardware has
finished transmitting all of the self ID packets received during the bus
initialization process to the host memory by DMA.

This commit adds a tracepoints event for this event to trace the timing
and packet data of Self-ID DMA. It is the part of following tracepoints
events helpful to debug some events at bus reset; e.g. the issue addressed
at a commit d0b06dc48f ("firewire: core: use long bus reset on gap count
error")[1]:

* firewire_ohci:irqs
* firewire_ohci:self_id_complete
* firewire:bus_reset_handle
* firewire:self_id_sequence

They would be also helpful in the problem about invocation timing of
hardIRQ and process (workqueue) contexts. We can often see this kind of
problem with -rt kernel[2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0b06dc48fb1
[2] https://lore.kernel.org/linux-rt-users/YAwPoaUZ1gTD5y+k@hmbx/

Link: https://lore.kernel.org/r/20240702222034.1378764-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-04 09:07:14 +09:00
Xu Yang
3710578d2d perf vendor events arm64:: Add i.MX95 DDR Performance Monitor metrics
Add JSON metrics for i.MX95 DDR Performance Monitor.

Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: festevam@gmail.com
Cc: conor+dt@kernel.org
Cc: robh+dt@kernel.org
Cc: shawnguo@kernel.org
Cc: will@kernel.org
Cc: krzysztof.kozlowski+dt@linaro.org
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: imx@lists.linux.dev
Cc: kernel@pengutronix.de
Cc: s.hauer@pengutronix.de
Cc: devicetree@vger.kernel.org
Link: https://lore.kernel.org/r/20240529080358.703784-8-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03 16:46:05 -07:00
Xu Yang
2697b79a46 perf vendor events arm64:: Add i.MX93 DDR Performance Monitor metrics
Add JSON metrics for i.MX93 DDR Performance Monitor.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: festevam@gmail.com
Cc: conor+dt@kernel.org
Cc: robh+dt@kernel.org
Cc: shawnguo@kernel.org
Cc: will@kernel.org
Cc: krzysztof.kozlowski+dt@linaro.org
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: imx@lists.linux.dev
Cc: john.g.garry@oracle.com
Cc: kernel@pengutronix.de
Cc: s.hauer@pengutronix.de
Cc: devicetree@vger.kernel.org
Link: https://lore.kernel.org/r/20240529080358.703784-7-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03 16:46:05 -07:00
Takashi Sakamoto
4a13617ef3 firewire: ohci: use inline functions to operate data of self-ID DMA
The code of 1394 OHCI driver includes hard-coded magic number to operate
data of Self-ID DMA.

This commit replaces them with the inline functions added/tested in the
former commit.

Link: https://lore.kernel.org/r/20240702222034.1378764-5-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-04 08:12:48 +09:00
Takashi Sakamoto
7a14f78d70 firewire: ohci: add static inline functions to deserialize for Self-ID DMA operation
The SelfI-ID is one type of DMAs defined in 1394 OHCI specification. It is
operated by two registers, one interrupt, and has one format of buffer.

This commit adds some static inline functions to deserialize the data in
the buffer and registers. Some KUnit tests are also added to check their
reliability.

Link: https://lore.kernel.org/r/20240702222034.1378764-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-04 08:12:48 +09:00
Takashi Sakamoto
c538b06de6 firewire: ohci: use static function to handle endian issue on PowerPC platform
It is preferable to use static function instead of functional macro in
some points. It checks type of argument, but would be optimized to
embedded code instead of function calls.

This commit obsoletes the functional macro with the static function.
Additionally this commit refactors quirk detection to ease the later work.

Link: https://lore.kernel.org/r/20240702222034.1378764-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-04 08:12:48 +09:00
Takashi Sakamoto
f26a38e61c firewire: ohci: use common macro to interpret be32 data in le32 buffer
The 1394 OHCI driver configures the hardware to transfer the data quadlets
of packet via DMA after converting it to little endian, therefore the data
is typed as __le32. Nevertheless some actual hardware ignores the
configuration. In the case, the data in DMA buffer is aligned to big endian
(__be32).

For the case in big-endian machine, the driver includes the following
interpretation from __le32 to u32 (host-endian = __be32):

    * (__force __u32)(v)

In include/linux/byteorder/generic.h, be32_to_cpu() is available. It is
expanded to the following expression in
'include/uapi/linux/byteorder/big_endian.h':

    * (__force __u32)(__be32)(x)

This commit replace the ad-hoc endian interpretation with the above.

Link: https://lore.kernel.org/r/20240702222034.1378764-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-04 08:12:48 +09:00
Konstantin Ryabitsev
127734e23a Documentation: best practices for using Link trailers
Based on multiple conversations, most recently on the ksummit mailing
list [1], add some best practices for using the Link trailer, such as:

- how to use markdown-like bracketed numbers in the commit message to
indicate the corresponding link
- when to use lore.kernel.org vs patch.msgid.link domains

Cc: ksummit@lists.linux.dev
Link: https://lore.kernel.org/20240617-arboreal-industrious-hedgehog-5b84ae@meerkat # [1]
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240619-docs-patch-msgid-link-v2-2-72dd272bfe37@linuxfoundation.org
2024-07-03 16:59:08 -06:00
Konstantin Ryabitsev
413e775efa Documentation: fix links to mailing list services
There have been some changes to the way mailing lists are hosted at
kernel.org. This patch does the following:

1. fixes links that are pointing at the outdated resources
2. removes an outdated patchbomb admonition

We still don't particularly want or welcome huge patchbombs, but they
are less likely to overload our systems.

Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240619-docs-patch-msgid-link-v2-1-72dd272bfe37@linuxfoundation.org
2024-07-03 16:52:54 -06:00
Li Zhijian
b393590992 Documentation: exception-tables.rst: Fix the wrong steps referenced
When it was in text format, it correctly hardcoded steps 8a to 8c.
However, after it was converted to RST, the sequence numbers were
auto-generated during rendering and became incorrect after some
steps were inserted.

Change it to refer to steps a to c in a relative way.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
[jc: Indented the line to make the relative reference more clear]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614010028.48262-1-lizhijian@fujitsu.com
2024-07-03 16:50:47 -06:00
Marc Zyngier
3cfde36df7 KVM: arm64: nv: Truely enable nXS TLBI operations
Although we now have support for nXS-flavoured TLBI instructions,
we still don't expose the feature to the guest thanks to a mixture
of misleading comment and use of a bunch of magic values.

Fix the comment and correctly express the masking of LS64, which
is enough to expose nXS to the world. Not that anyone cares...

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240703154743.824824-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-03 22:46:14 +00:00
Dongliang Mu
0e5fbf627f docs/zh_CN: add process/researcher-guidelines Chinese translation
Finish the translation of researcher-guidelines and add it to the
index file.

Update to commit 27103dddc2 ("Documentation: update mailing list
addresses")

Reviewed-by: Alex Shi <alexs@kernel.org>
Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614032211.241899-1-dzm91@hust.edu.cn
2024-07-03 16:41:26 -06:00
Jiri Kastner
b38fdfebba Documentation/tools/rv: fix document header
align header of document with filename and rest of the content

Signed-off-by: Jiri Kastner <cz172638@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240626203906.191841-1-cz172638@gmail.com
2024-07-03 16:36:21 -06:00
Carlos Bilbao
96408beeef docs/sp_SP: Add translation of process/maintainer-kvm-x86.rst
Translate Documentation/process/maintainer-kvm-x86.rst into Spanish.

Co-developed-by: Juan Embid <jembid@ucm.es>
Signed-off-by: Juan Embid <jembid@ucm.es>
Signed-off-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
[jc: fixed apply- and build-time warnings]
Link: https://lore.kernel.org/r/20240626221942.2780668-1-carlos.bilbao.osdev@gmail.com
2024-07-03 16:34:59 -06:00
Florian Westphal
9f6958ba2e netfilter: nf_tables: unconditionally flush pending work before notifier
syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f029328 ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-04 00:28:27 +02:00
Daniel Watson
6b2fa426df docs/admin-guide/mm: correct typo 'quired' to 'queried'
Convert the word "quired" to the word "queried" which makes more
sense in this context.

Signed-off-by: Daniel Watson <ozzloy@each.do>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/878qymrjrg.fsf@trent-reznor
2024-07-03 16:22:36 -06:00
Dmitry Torokhov
df472c2b69 Add libps2 to the input section of driver-api
libps2 has been using kerneldoc to document its methods, but was not
actually plugged into driver-api.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/ZoMQhkyUQYi1Bx4t@google.com
2024-07-03 16:20:49 -06:00
SeongJae Park
d436a97181 Docs/mm/index: move allocation profiling document to unsorted documents chapter
The memory allocation profiling document was added to the bottom of the
new outline.  Apparently it was not decided by well-defined guidelines
or a thorough discussions.  Rather than that, it was added there just
because there was no place for such unsorted documents.  Now there is
the chapter.  Move the document to the new place.

Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-5-sj@kernel.org
2024-07-03 16:19:15 -06:00
SeongJae Park
9472274c92 Docs/mm/index: rename 'Legacy Documentation' to 'Unsorted Documentation'
The intention of 'Legacy Documentation' chapter is to keep the old
documents that not yet sorted into the new outline, and encourage new
documents to be integrated in the new outline from the beginning.

However, the new outline will take some more time to be completed.  It
has started about two years ago, and still many parts are not yet
written.  Also, there is no clear guidline for placing each document for
all cases, for not only the 'legacy' documents, but also for new
documents.  For example, memory allocation profiling document has been
added to the bottom of the new outline.  Apparently it was not following
some well-defined guideliens or a result of a discussion.

Furthermore, the title ("legacy") makes people feel the documents on the
chapter might be outdated or not actively maintained.

Rename 'Legacy Documentation' to 'Unsorted Documentation' and remove the
description saying it is for 'older' documents.  After this change, new
documents that not clear enough where it should be placed on the new
outline can be added on the chapter while well-defined guidelines or
discussion for the new outline is made.

Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-4-sj@kernel.org
2024-07-03 16:19:15 -06:00
SeongJae Park
8c678c9ca7 Docs/mm/index: Remove 'Memory Management Guide' chapter marker
'Memory Management Guide' chapter aims to be not an additional chapter
of the document, but the ultimate single outline of the document.  In
the sense, marking it as a chapter under the document makes no sense,
and the rendered document looks odd.  Remove the chapter marker.

Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-3-sj@kernel.org
2024-07-03 16:19:15 -06:00
SeongJae Park
51c702b0ae Docs/mm/allocation-profiling: mark 'Theory of operation' as chapter
'Theory of operation' part of allocation-profiling document is
apparently a chapter.  However, it is mistakenly marked as a document
title.  As a result, rendered mm document index page shows two items for
the document.  Fix it to be marked as a chapter.

Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-2-sj@kernel.org
2024-07-03 16:19:15 -06:00
Piotr Wojtaszczyk
f63b94be69 i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
When del_timer_sync() is called in an interrupt context it throws a warning
because of potential deadlock. The timer is used only to exit from
wait_for_completion() after a timeout so replacing the call with
wait_for_completion_timeout() allows to remove the problematic timer and
its related functions altogether.

Fixes: 41561f28e7 ("i2c: New Philips PNX bus driver")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-04 00:17:47 +02:00
Ian Rogers
1059fb5291 perf dsos: When adding a dso into sorted dsos maintain the sort order
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.

Fixes: 3f4ac23a99 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03 15:02:53 -07:00
Ian Rogers
feaaa8be0b perf comm str: Avoid sort during insert
The array is sorted, so just move the elements and insert in order.

Fixes: 13ca628716 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <matt@readmodwrite.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Matt Fleming <matt@readmodwrite.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03 14:59:15 -07:00
Linus Torvalds
795c58e4c7 Merge tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
 "Fix ioctl conflict with memmapped ring buffer ioctl

  It was reported that the ioctl() number used to update the ring buffer
  memory mapping conflicted with the TCGETS ioctl causing strace to
  report:

    $ strace -e ioctl stty
    ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0

  Since this ioctl hasn't been in a full release yet, change it from
  "T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
  commands, as some more are being worked on for the future"

* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F
2024-07-03 14:54:35 -07:00
Wei Liu
fea93a3e5d PCI: hv: Return zero, not garbage, when reading PCI_INTERRUPT_PIN
The intent of the code snippet is to always return 0 for both
PCI_INTERRUPT_LINE and PCI_INTERRUPT_PIN.

The check misses PCI_INTERRUPT_PIN. This patch fixes that.

This is discovered by this call in VFIO:

    pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);

The old code does not set *val to 0 because it misses the check for
PCI_INTERRUPT_PIN. Garbage is returned in that case.

Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/linux-pci/20240701202606.129606-1-wei.liu@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Cc: stable@kernel.org
2024-07-03 21:11:14 +00:00
Andreas Kemnade
80e64b6d34 dt-bindings: mfd: twl: Fix example
Fix example to also conform to rules specified in the separate
not-included gpadc binding.

Fixes: 62e4f33961 ("dt-bindings: regulator: twl-regulator: convert to yaml")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reported-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240612134039.1089839-1-andreas@kemnade.info
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-03 22:01:26 +01:00
Dmitry Torokhov
5e13bea78d Input: cypress_ps2 - use u8 when dealing with byte data
When dealing with byte data use u8 instead of unsigned char or int.
Stop layering error handling in cypress_ps2_sendbyte() and simply
pass on error code from ps2_sendbyte().

Additionally use u8 instead of unisgned char throughout the code.

Link: https://lore.kernel.org/r/20240628224728.2180126-5-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-07-03 13:48:53 -07:00
Dmitry Torokhov
93f25f92fc Input: cypress_ps2 - propagate errors from lower layers
Do not override errors reported by lower layers with generic "-1",
but propagate them to the callers. Change the checks for errors to be
in the form of "if (error)" to maintain consistency.

Link: https://lore.kernel.org/r/20240628224728.2180126-4-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-07-03 13:48:53 -07:00
Dmitry Torokhov
8bccf667f6 Input: cypress_ps2 - report timeouts when reading command status
Report -ETIMEDOUT error code from cypress_ps2_read_cmd_status() when
device does not send enough data within the allotted time in response
to a command.

Link: https://lore.kernel.org/r/20240628224728.2180126-3-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-07-03 13:48:53 -07:00