Test 900c: Create pfifo_fast with default setting
Test 7470: Dump pfifo_fast stats
Test b974: Replace pfifo_fast with different handle
Test 3240: Delete pfifo_fast with valid handle
Test 4385: Delete pfifo_fast with invalid handle
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 4812: Create HHF with default setting
Test 8a92: Create HHF with limit setting
Test 3491: Create HHF with quantum setting
Test ba04: Create HHF with reset_timeout setting
Test 4238: Create HHF with admit_bytes setting
Test 839f: Create HHF with evict_timeout setting
Test a044: Create HHF with non_hh_weight setting
Test 32f9: Change HHF with limit setting
Test 385e: Show HHF class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 8942: Create GRED with default setting
Test 5783: Create GRED with grio setting
Test 8a09: Create GRED with limit setting
Test 48cb: Create GRED with ecn setting
Test 763a: Change GRED setting
Test 8309: Show GRED class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 983b: Create FQ with default setting
Test 38a1: Create FQ with limit packet setting
Test 0a18: Create FQ with flow_limit setting
Test 2390: Create FQ with quantum setting
Test 845b: Create FQ with initial_quantum setting
Test 9398: Create FQ with maxrate setting
Test 342c: Create FQ with nopacing setting
Test 6391: Create FQ with refill_delay setting
Test 238b: Create FQ with low_rate_threshold setting
Test 7582: Create FQ with orphan_mask setting
Test 4894: Create FQ with timer_slack setting
Test 324c: Create FQ with ce_threshold setting
Test 424a: Create FQ with horizon time setting
Test 89e1: Create FQ with horizon_cap setting
Test 32e1: Delete FQ with valid handle
Test 49b0: Replace FQ with limit setting
Test 9478: Change FQ with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 34ba: Create ETF with default setting
Test 438f: Create ETF with delta nanos setting
Test 9041: Create ETF with deadline_mode setting
Test 9a0c: Create ETF with skip_sock_check setting
Test 2093: Delete ETF with valid handle
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 983a: Create CODEL with default setting
Test 38aa: Create CODEL with limit packet setting
Test 9178: Create CODEL with target setting
Test 78d1: Create CODEL with interval setting
Test 238a: Create CODEL with ecn setting
Test 939c: Create CODEL with ce_threshold setting
Test 8380: Delete CODEL with valid handle
Test 289c: Replace CODEL with limit setting
Test 0648: Change CODEL with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 8937: Create CHOKE with default setting
Test 48c0: Create CHOKE with min packet setting
Test 38c1: Create CHOKE with max packet setting
Test 234a: Create CHOKE with ecn setting
Test 4380: Create CHOKE with burst setting
Test 48c7: Delete CHOKE with valid handle
Test 4398: Replace CHOKE with min setting
Test 0301: Change CHOKE with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 7628: Create ATM with default setting
Test 390a: Delete ATM with valid handle
Test 32a0: Show ATM class
Test 6310: Dump ATM stats
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rust symbols can become quite long due to namespacing introduced
by modules, types, traits, generics, etc. For instance,
the following code:
pub mod my_module {
pub struct MyType;
pub struct MyGenericType<T>(T);
pub trait MyTrait {
fn my_method() -> u32;
}
impl MyTrait for MyGenericType<MyType> {
fn my_method() -> u32 {
42
}
}
}
generates a symbol of length 96 when using the upcoming v0 mangling scheme:
_RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method
At the moment, Rust symbols may reach up to 300 in length.
Setting 512 as the maximum seems like a reasonable choice to
keep some headroom.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Fix virtio test compilation failure caused by vq reset.
../../drivers/virtio/virtio_ring.c: In function ‘vring_create_virtqueue_packed’:
../../drivers/virtio/virtio_ring.c:1999:8: error: ‘struct virtqueue’ has no member named ‘reset’
1999 | vq->vq.reset = false;
| ^
../../drivers/virtio/virtio_ring.c: In function ‘__vring_new_virtqueue’:
../../drivers/virtio/virtio_ring.c:2493:8: error: ‘struct virtqueue’ has no member named ‘reset’
2493 | vq->vq.reset = false;
| ^
../../drivers/virtio/virtio_ring.c: In function ‘virtqueue_resize’:
../../drivers/virtio/virtio_ring.c:2587:18: error: ‘struct virtqueue’ has no member named ‘num_max’
2587 | if (num > vq->vq.num_max)
| ^
../../drivers/virtio/virtio_ring.c:2596:11: error: ‘struct virtio_device’ has no member named ‘config’
2596 | if (!vdev->config->disable_vq_and_reset)
| ^~
../../drivers/virtio/virtio_ring.c:2599:11: error: ‘struct virtio_device’ has no member named ‘config’
2599 | if (!vdev->config->enable_vq_after_reset)
| ^~
../../drivers/virtio/virtio_ring.c:2602:12: error: ‘struct virtio_device’ has no member named ‘config’
2602 | err = vdev->config->disable_vq_and_reset(_vq);
| ^~
../../drivers/virtio/virtio_ring.c:2614:10: error: ‘struct virtio_device’ has no member named ‘config’
2614 | if (vdev->config->enable_vq_after_reset(_vq))
| ^~
make: *** [<builtin>: virtio_ring.o] Error 1
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20220830110549.103168-1-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.
To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.
[0] Closes: https://github.com/libbpf/libbpf/issues/562
Fixes: d90ec262b3 ("libbpf: Add enum64 support for btf_dump")
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
Remove some left-over from commit e2be04c7f9 ("License cleanup: add SPDX
license identifier to uapi header files with a license")
When the SPDX-License-Identifier tag has been added, the corresponding
license text has not been removed.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/88410cddd31197ea26840d7dd71612bece8c6acf.1663871981.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the tests are run in a function $@ there actually contains the
function arguments, not the script ones.
Pass "$@" to the function as well.
Fixes: 272d1f4cfa ("selftests: bpf: test_kmod.sh: Pass parameters to the module")
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220926092320.564631-1-ykaliuta@redhat.com
Skip selftests that require EPT support in the VM when it is not
available. For example, if running on a machine where kvm_intel.ept=N
since KVM does not offer EPT support to guests if EPT is not supported
on the host.
This commit causes vmx_dirty_log_test to be skipped instead of failing
on hosts where kvm_intel.ept=N.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220926171457.532542-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The socket 2 bind the addr in use, bind should fail with EADDRINUSE. So
if bind success or errno != EADDRINUSE, testcase should be failed.
Fixes: 3ca8e40299 ("soreuseport: BPF selection functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/1663916557-10730-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When running rootless with special capabilities like:
FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH
The "access" API will not make the proper check if there is really
access to a file or not.
>From the access man page:
"
The check is done using the calling process's real UID and GID, rather
than the effective IDs as is done when actually attempting an operation
(e.g., open(2)) on the file. Similarly, for the root user, the check
uses the set of permitted capabilities rather than the set of effective
capabilities; ***and for non-root users, the check uses an empty set of
capabilities.***
"
What that means is that for non-root user the access API will not do the
proper validation if the process really has permission to a file or not.
To resolve this this patch replaces all the access API calls with
faccessat with AT_EACCESS flag.
Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
With CONFIG_X86_KERNEL_IBT enabled the test for kprobe with offset
won't work because of the extra endbr instruction.
As suggested by Andrii adding CONFIG_X86_KERNEL_IBT detection
and using appropriate offset value based on that.
Also removing test7 program, because it does the same as test6.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Changing return value of kprobe's version of bpf_get_func_ip
to return zero if the attach address is not on the function's
entry point.
For kprobes attached in the middle of the function we can't easily
get to the function address especially now with the CONFIG_X86_KERNEL_IBT
support.
If user cares about current IP for kprobes attached within the
function body, they can get it with PT_REGS_IP(ctx).
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martynas reported bpf_get_func_ip returning +4 address when
CONFIG_X86_KERNEL_IBT option is enabled.
When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction
at the function entry, which screws return value of bpf_get_func_ip()
helper that should return the function address.
There's short term workaround for kprobe_multi bpf program made by
Alexei [1], but we need this fixup also for bpf_get_attach_cookie,
that returns cookie based on the entry_ip value.
Moving the fixup in the fprobe handler, so both bpf_get_func_ip
and bpf_get_attach_cookie get expected function address when
CONFIG_X86_KERNEL_IBT option is enabled.
Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip
so it's clearer this is an ftrace __fentry__ ip.
[1] commit 7f0059b58f ("selftests/bpf: Fix kprobe_multi test.")
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When mremap call results in expansion, it might be possible to merge the
VMA with the next VMA which might become adjacent. This patch adds
vma_merge call after the expansion is done to try and merge.
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20220603145719.1012094-3-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is a test suite that uses the radix test infrastructure. It has been
split into its own commit to allow for easier review of the maple tree
code.
The testing includes:
- Allocation of nodes
- gfp flag allocation checks
- Expansion & contraction of tree
- preallocation checks
- tree navigation by next/prev
- tree navigation by iterators (mas_for_each, etc)
- Number of nodes for a given number of entries
- Generic tree construction tests
- Addition and removal of entries in forward and reverse numerical indexes
- gap searching both forward and reverse
- Combining gaps by overwriting entries in different ways
- splitting right-most node
- splitting left-most node
- overwriting multiple slots
- overwriting across different levels of the tree
- overwriting the middle of a tree
- causing a 3-way split up to the root by overwriting the last slot and
first slot of different nodes and spanning different levels
- RCU stress testing of the tree with threads
- Duplication of the tree by entry count
- Tests which were generated by fuzzers have been added.
- A large number of tests which come from recording crashing in a VM and
reconstructing the tree (see check_erase2_set())
Link: https://lkml.kernel.org/r/20220906194824.2110408-8-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
maple tree uses lockdep_is_held, so define it as external in the header.
Link: https://lkml.kernel.org/r/20220906194824.2110408-7-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the
radix tree test suite.
Link: https://lkml.kernel.org/r/20220906194824.2110408-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add functions to get the number of allocations, and total allocations from
a kmem_cache. Also add a function to get the allocated size and a way to
zero the total allocations.
Link: https://lkml.kernel.org/r/20220906194824.2110408-5-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kmem_cache_set_non_kernel() is a mechanism to allow a certain number of
kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in
the flags. This functionality allows for testing different paths though
the code.
Link: https://lkml.kernel.org/r/20220906194824.2110408-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
cycle, 18 are for earlier issues, and are cc:stable.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYzH+NgAKCRDdBJ7gKXxA
ju4AAQDrFWErVp+ra5P66SSbiFmm8NAW1awt4nHwAPcihNf3yQD/eQcB3w2q0Dm1
9HjsyEVkTYIeaJSAbCraDnMwUdWTIgY=
=p5+0
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull last (?) hotfixes from Andrew Morton:
"26 hotfixes.
8 are for issues which were introduced during this -rc cycle, 18 are
for earlier issues, and are cc:stable"
* tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
mm/page_isolation: fix isolate_single_pageblock() isolation behavior
mm,hwpoison: check mm when killing accessing process
mm/hugetlb: correct demote page offset logic
mm: prevent page_frag_alloc() from corrupting the memory
mm: bring back update_mmu_cache() to finish_fault()
frontswap: don't call ->init if no ops are registered
mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
mm: fix madivse_pageout mishandling on non-LRU page
powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
mm: gup: fix the fast GUP race against THP collapse
mm: fix dereferencing possible ERR_PTR
vmscan: check folio_test_private(), not folio_get_private()
mm: fix VM_BUG_ON in __delete_from_swap_cache()
tools: fix compilation after gfp_types.h split
mm/damon/dbgfs: fix memory leak when using debugfs_lookup()
mm/migrate_device.c: copy pte dirty bit to page
mm/migrate_device.c: add missing flush_cache_page()
mm/migrate_device.c: flush TLB while holding PTL
x86/mm: disable instrumentations of mm/pgprot.c
...
We can make the phc2sys helper not only synchronize a PHC to
CLOCK_REALTIME, which is what it currently does, but also CLOCK_REALTIME
to a PHC, which is going to be needed in distributed TSN tests.
Instead of making the complexity of the arguments passed to
phc2sys_start() explode, we can let it figure out the sync direction
automatically, based on ptp4l's port states.
Towards that goal, pass just the path to the desired ptp4l instance's
UNIX domain socket, and remove the $if_name argument (from which it
derives the PHC). Also adapt the one caller from the ocelot psfp.sh
test. In the case of psfp.sh, phc2sys_start is able to properly figure
out that CLOCK_REALTIME is the source clock and swp1's PHC is the
destination, because of the way in which ptp4l_start for the
UDS_ADDRESS_SWP1 was called: with slave_only=false, so it will always
win the BMCA and always become the sync master between itself and $h1.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the PID variable for the isochron receiver into a separate
namespace per stats port, to allow multiple receivers (and/or
orchestration daemons) to be instantiated by the same script.
Preserve the existing behavior by making isochron_do() use the default
stats TCP port of 5000.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Switch ports will want to act as Boundary Clocks, which are configured
using ptp4l by specifying the "-i" argument multiple times.
Since we track a log file and a pid file for each ptp4l instance, and we
want to be compatible with the existing single-port callers of
ptp4l_start and ptp4l_stop, pass the interface list as a single string
of space-separated values. Based on this, we create a label for each
ptp4l instance, where the spaces are replaced with underscores
(ptp4l_start "eth0 eth1" generates "ptp4l_pid_eth0_eth1").
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The extra_args argument ($3) of isochron_recv_start is overwritten with
uds ($2), if that argument exists.
This is currently not a problem, because the only TSN selftest
(ocelot/psfp.sh) omits remote sync so it does not specify to the
receiver a UNIX domain socket for ptp4l. So $uds is currently an empty
string.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The __cfi_ preambles contain a mov instruction that embeds the KCFI
type identifier in the following format:
; type preamble
__cfi_function:
mov <id>, %eax
function:
...
While the preamble symbols are STT_FUNC and contain valid
instructions, they are never executed and always fall through. Skip
the warning for them.
.kcfi_traps sections point to CFI traps in text sections. Also skip
the warning about them referencing !ENDBR instructions.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-18-samitolvanen@google.com
elf_update_symbol fails to preserve the special st_shndx values
between [SHN_LORESERVE, SHN_HIRESERVE], which results in it
converting SHN_ABS entries into SHN_UNDEF, for example. Explicitly
check for the special indexes and ensure these symbols are not
marked undefined.
Fixes: ead165fa10 ("objtool: Fix symbol creation")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-17-samitolvanen@google.com
Following Daniel's suggestion, fix similar warning
in template files, which would prevent new monitors
from such warning.
Link: https://lkml.kernel.org/r/20220824034357.2014202-3-zengheng4@huawei.com
Cc: <mingo@redhat.com>
Fixes: 24bce201d7 ("tools/rv: Add dot2k")
Suggested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add a syntax error test case for eprobe as same as kprobes.
Link: https://lkml.kernel.org/r/165932115471.2850673.8014722990775242727.stgit@devnote2
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add a test to verify that KVM_{G,S}ET_EVENTS play nice with pending vs.
injected exceptions when an exception is being queued for L2, and that
KVM correctly handles L1's exception intercept wants.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-27-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Include the vmx.h and svm.h uapi headers that KVM so kindly provides
instead of manually defining all the same exit reasons/code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-26-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The updated Enlightened VMCS definition has 'encls_exiting_bitmap'
field which needs mapping to VMCS, add the missing encoding.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-13-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Require KVM_CAP_VM_DISABLE_NX_HUGE_PAGES for the entire NX hugepage test
instead of skipping the "disable" subtest if the capability isn't
supported by the host kernel. While the "enable" subtest does provide
value when the capability isn't supported, silently providing only half
the promised coveraged is undesirable, i.e. it's better to skip the test
so that the user knows something.
Alternatively, the test could print something to alert the user instead
of silently skipping the subtest, but that would encourage other tests
to follow suit, and it's not clear that it's desirable to take selftests
in that direction. And if selftests do head down the path of skipping
subtests, such behavior needs first-class support in the framework.
Opportunistically convert other test preconditions to TEST_REQUIRE().
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220812175301.3915004-1-oliver.upton@linux.dev
[sean: rewrote changelog to capture discussion about skipping the test]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add one test for wait redirect sock's send memory test for sockmap.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220823133755.314697-3-liujian56@huawei.com
Commit b55878c90a ("perf test: Add test for branch stack
sampling") added test for branch stack sampling. There is a sanity check
in the beginning to skip the test if the hardware doesn't support branch
stack sampling.
Snippet
<<>>
skip the test if the hardware doesn't support branch stack sampling
perf record -b -o- -B true > /dev/null 2>&1 || exit 2
<<>>
But the testcase also uses branch sample types: save_type, any. if any
platform doesn't support the branch filters used in the test, the testcase
will fail. In powerpc, currently mutliple branch filters are not supported
and hence this test fails in powerpc. Fix the sanity check to look at
the support for branch filters used in this test before proceeding with
the test.
Fixes: b55878c90a ("perf test: Add test for branch stack sampling")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20220921145255.20972-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By default, we create two hybrid cache events, one is for cpu_core, and
another is for cpu_atom. But Some hybrid hardware cache events are only
available on one CPU PMU. For example, the 'L1-dcache-load-misses' is only
available on cpu_core, while the 'L1-icache-loads' is only available on
cpu_atom. We need to remove "not supported" hybrid cache events. By
extending is_event_supported() to global API and using it to check if the
hybrid cache events are supported before being created, we can remove the
"not supported" hybrid cache events.
Before:
# ./perf stat -e L1-dcache-load-misses,L1-icache-loads -a sleep 1
Performance counter stats for 'system wide':
52,570 cpu_core/L1-dcache-load-misses/
<not supported> cpu_atom/L1-dcache-load-misses/
<not supported> cpu_core/L1-icache-loads/
1,471,817 cpu_atom/L1-icache-loads/
1.004915229 seconds time elapsed
After:
# ./perf stat -e L1-dcache-load-misses,L1-icache-loads -a sleep 1
Performance counter stats for 'system wide':
54,510 cpu_core/L1-dcache-load-misses/
1,441,286 cpu_atom/L1-icache-loads/
1.005114281 seconds time elapsed
Fixes: 30def61f64 ("perf parse-events: Create two hybrid cache events")
Reported-by: Yi Ammy <ammy.yi@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220923030013.3726410-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some hybrid hardware cache events are only available on one CPU PMU. For
example, 'L1-dcache-load-misses' is only available on cpu_core.
We have supported in the perf list clearly reporting this info, the
function works fine before but recently the argument "config" in API
is_event_supported() is changed from "u64" to "unsigned int" which
caused a regression, the "perf list" then can not display the PMU prefix
for some hybrid cache events.
For the hybrid systems, the PMU type ID is stored at config[63:32],
define config to "unsigned int" will miss the PMU type ID information,
then the regression happened, the config should be defined as "u64".
Before:
# ./perf list |grep "Hardware cache event"
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
After:
# ./perf list |grep "Hardware cache event"
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
cpu_atom/L1-icache-loads/ [Hardware cache event]
cpu_core/L1-dcache-load-misses/ [Hardware cache event]
cpu_core/node-load-misses/ [Hardware cache event]
cpu_core/node-loads/ [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
Fixes: 9b7c7728f4 ("perf parse-events: Break out tracepoint and printing")
Reported-by: Yi Ammy <ammy.yi@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220923030013.3726410-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf_event_cgrp_id can be different on other configurations.
To be more portable as CO-RE, it needs to get the cgroup subsys id using
the bpf_core_enum_value() helper.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220923063205.772936-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.
The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.
As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.
Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/
Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com
- Fix a infinite loop bug in fsdax
- Fix memory-type detection for devdax (EINJ regression)
- Small cleanups
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYy+skgAKCRDfioYZHlFs
Zxw3AQCVDbuMh2wBUSJP4G4XF+ZHM+FntpUmawcmUFsEjt0fGQEA9NCjOoj7jLlP
BrI1OtbskTLAMzJeRI3qY3irV4iHsAI=
=28T9
-----END PGP SIGNATURE-----
Merge tag 'dax-and-nvdimm-fixes-v6.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull NVDIMM and DAX fixes from Dan Williams:
"A recently discovered one-line fix for devdax that further addresses a
v5.5 regression, and (a bit embarrassing) a small batch of fixes that
have been sitting in my fixes tree for weeks.
The older fixes have soaked in linux-next during that time and address
an fsdax infinite loop and some other minor fixups.
- Fix a infinite loop bug in fsdax
- Fix memory-type detection for devdax (EINJ regression)
- Small cleanups"
* tag 'dax-and-nvdimm-fixes-v6.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
devdax: Fix soft-reservation memory description
fsdax: Fix infinite loop in dax_iomap_rw()
nvdimm/namespace: drop nested variable in create_namespace_pmem()
ndtest: Cleanup all of blk namespace specific code
pmem: fix a name collision
In order to comply with PEP 8, the first parameter of a class should be
__init__.
Signed-off-by: Elijah Conners <business@elijahpepe.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add -l (--log-level) flag to override default BPF verifier log lever.
This only matters in verbose mode, which is the mode in which veristat
emits verifier log for each processed BPF program.
This is important because for successfully verified BPF programs
log_level 1 is empty, as BPF verifier truncates all the successfully
verified paths. So -l2 is the only way to actually get BPF verifier log
in practice. It looks sometihng like this:
[vmuser@archvm bpf]$ sudo ./veristat xdp_tx.bpf.o -vl2
Processing 'xdp_tx.bpf.o'...
PROCESSING xdp_tx.bpf.o/xdp_tx, DURATION US: 19, VERDICT: success, VERIFIER LOG:
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
; return XDP_TX;
0: (b4) w0 = 3 ; R0_w=3
1: (95) exit
verification time 19 usec
stack depth 0
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
File Program Verdict Duration (us) Total insns Total states Peak states
------------ ------- ------- ------------- ----------- ------------ -----------
xdp_tx.bpf.o xdp_tx success 19 2 0 0
------------ ------- ------- ------------- ----------- ------------ -----------
Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923175913.3272430-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Emit "Processing <filepath>..." for each BPF object file to be
processed, to show progress. But also add -q (--quiet) flag to silence
such messages. Doing something more clever (like overwriting same output
line) is to cumbersome and easily breakable if there is any other
console output (e.g., errors from libbpf).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923175913.3272430-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make veristat ignore non-BPF object files. This allows simpler
mass-verification (e.g., `sudo ./veristat *.bpf.o` in selftests/bpf
directory). Note that `sudo ./veristat *.o` would also work, but with
selftests's multiple copies of BPF object files (.bpf.o and
.bpf.linked{1,2,3}.o) it's 4x slower.
Also, given some of BPF object files could be incomplete in the sense
that they are meant to be statically linked into final BPF object file
(like linked_maps, linked_funcs, linked_vars), note such instances in
stderr, but proceed anyways. This seems like a better trade off between
completely silently ignoring BPF object file and aborting
mass-verification altogether.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923175913.3272430-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure veristat doesn't spend ridiculous amount of time parsing
verifier stats from verifier log, especially for very large logs or
truncated logs (e.g., when verifier returns -ENOSPC due to too small
buffer). For this, parse lines from the end of the log and make sure we
parse only up to 100 last lines, where stats should be, if at all.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923175913.3272430-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When attach_prog_fd field was removed in libbpf 1.0 and replaced with
`long: 0` placeholder, it actually shifted all the subsequent fields by
8 byte. This is due to `long: 0` promising to adjust next field's offset
to long-aligned offset. But in this case we were already long-aligned
as pin_root_path is a pointer. So `long: 0` had no effect, and thus
didn't feel the gap created by removed attach_prog_fd.
Non-zero bitfield should have been used instead. I validated using
pahole. Originally kconfig field was at offset 40. With `long: 0` it's
at offset 32, which is wrong. With this change it's back at offset 40.
While technically libbpf 1.0 is allowed to break backwards
compatibility and applications should have been recompiled against
libbpf 1.0 headers, but given how trivial it is to preserve memory
layout, let's fix this.
Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com>
Fixes: 146bf811f5 ("libbpf: remove most other deprecated high-level APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923230559.666608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The cgroup_hierarchical_stats selftest is complicated. It has to be,
because it tests an entire workflow of recording, aggregating, and
dumping cgroup stats. However, some of the complexity is unnecessary.
The test now enables the memory controller in a cgroup hierarchy, invokes
reclaim, measure reclaim time, THEN uses that reclaim time to test the
stats collection and aggregation. We don't need to use such a
complicated stat, as the context in which the stat is collected is
orthogonal.
Simplify the test by using a simple stat instead of reclaim time, the
total number of times a process has ever entered a cgroup. This makes
the test simpler and removes the dependency on the memory controller and
the memory reclaim interface.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20220919175330.890793-1-yosryahmed@google.com
It's not currently possible but in the future we may get
IORING_CQE_F_MORE and so a notification even for a failed request, i.e.
when cqe->res <= 0. That's precisely what the documentation says, so
adjust the test and do IORING_CQE_F_MORE checks regardless of the main
completion cqe->res.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/aac948ea753a8bfe1fa3b82fe45debcb54586369.1663953085.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
for-6.0 has the following fix for cgroup_get_from_id().
836ac87d ("cgroup: fix cgroup_get_from_id")
which conflicts with the following two commits in for-6.1.
4534dee9 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
fa7e439c ("cgroup: Homogenize cgroup_get_from_id() return value")
While the resolution is straightforward, the code ends up pretty ugly
afterwards. Let's pull for-6.0-fixes into for-6.1 so that the code can be
fixed up there.
Signed-off-by: Tejun Heo <tj@kernel.org>
* Fix for kmemleak with pKVM
s390:
* Fixes for VFIO with zPCI
* smatch fix
x86:
* Ensure XSAVE-capable hosts always allow FP and SSE state to be saved
and restored via KVM_{GET,SET}_XSAVE
* Fix broken max_mmu_rmap_size stat
* Fix compile error with old glibc that doesn't have gettid()
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmMtvg4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNy2Af/VybWK2uAbaMkV6irQ1YTJIrRPco1
C+JQdiQklYbjzThfPWNF/MiH+VTObloR1KqztOeQbfcrgwzygO68D3bs0wkAukLA
mtdcMjdsqNx8r9u533i6S8Dpo0RkHKl+I8+3mHdPHTzlrbCuYJFFFxFNLhE+xbrK
DP2Gl/xXIGYwOv2nfHA/xxI7TRICv4IxmzQazxlmC27n6BLNSr8qp6jI9lXJQfJ8
XJh3SbmRux3/cs2oEqONg8DySJh631kI1jGGOmL3qk07ZR7A5KZ+lju0xM7vQIEq
aR25YNYZux+BPIY/WxT1R0j6pwinBmFp8OoQYCs8DaQ65fdE5gegtJRpow==
=VADm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"As everyone back came back from conferences, here are the pending
patches for Linux 6.0.
ARM:
- Fix for kmemleak with pKVM
s390:
- Fixes for VFIO with zPCI
- smatch fix
x86:
- Ensure XSAVE-capable hosts always allow FP and SSE state to be
saved and restored via KVM_{GET,SET}_XSAVE
- Fix broken max_mmu_rmap_size stat
- Fix compile error with old glibc that doesn't have gettid()"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled
KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES
KVM: x86: Reinstate kvm_vcpu_arch.guest_supported_xcr0
KVM: x86/mmu: add missing update to max_mmu_rmap_size
selftests: kvm: Fix a compile error in selftests/kvm/rseq_test.c
KVM: s390: pci: register pci hooks without interpretation
KVM: s390: pci: fix GAIT physical vs virtual pointers usage
KVM: s390: Pass initialized arg even if unused
KVM: s390: pci: fix plain integer as NULL pointer warnings
KVM: arm64: Use kmemleak_free_part_phys() to unregister hyp_mem_base
Fix for a code analyser warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmMtmHcACgkQ41TmuOI4
ufhapA/+OGjJtUKEa/LkCArUMUgJ+ZYy9cwx3Qrkvm7BtEguN35uxO5T7X8pdMDU
/dYSDwtPT1ob/QmjhwwE0dpgUF8yfkBNLKBK37f6jTznI4mS6UEq/IB1BcCk3ss5
45xG8rWiXS7oFm9bxtNsa5jkrSf7gIOuEa570tOVtOlBYvtFIH2SiHvcCUY1w48N
A6docb+vOJnPIBHPR/1q+bBoghO5rplYaX0EiAt3EYThrhAjsXjIFzaQPwXUxgFh
Nk/3ni1St72sbWDiC7YxjDghMVuW1GuQAlgkDWvy8bdA0IfsaqbyISuyIWZWkRfx
V8+7OfGVsCB7qhT4C+65fQPqwZwERNwwvAFsx175/Mm9/AXn217+sN52O54olFcM
dhKI7med1R/LqO2HH9kCGIUfJ3t+0JHpFDwcJyiAZsFbYffnwUQbIW4YcA3O+rje
cV8/kRhD6ebzrScTOMVH+Rb4Q0N0kL34iHXbI8wcTBKmSFrdRm9YeLtQM2P/HtPA
GVyg49bfXfA13goM9jaUW/IgT1aSHNRHfbfFKTUB7ph2Wi5ktMizpkn/CQ8L7bm1
I0JoZfA7ALDufg2Ajyqsqmv3RX2k6ydenmX/AxzQlo34kSXY6X+N+yiPlqhumBqq
Gs2pmvQoyvAmuShhj+eSZsQYBObgjzN7B/SAbbUBT71GerGyWeQ=
=Z7gl
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-6.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
More pci fixes
Fix for a code analyser warning
It looks like this test has been accidentally dropped when resolving
conflicts in this Makefile.
Most probably because there were 3 different patches modifying this file
in parallel:
commit 152e8ec776 ("selftests/bonding: add a test for bonding lladdr target")
commit bbb774d921 ("net: Add tests for bonding and team address list management")
commit 2ffd57327f ("selftests: bonding: cause oops in bond_rr_gen_slave_id")
The first one was applied in 'net-next' while the two other ones were
recently applied in the 'net' tree.
But that's alright, easy to fix by re-adding the missing one!
Fixes: 0140a7168f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220923082306.2468081-1-matthieu.baerts@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The livepatch kselftests rely on comparing expected and actual output
from such commands as sysctl. A recent commit in procps-ng v4.0.0 [1]
changed sysctl's output to emit key pathnames like:
sysctl: setting key "/proc/sys/kernel/ftrace_enabled": Device or resource busy
versus previous dotted output:
sysctl: setting key "kernel.ftrace_enabled": Device or resource busy
The modification in output was later reverted [2], but since the change
has been tagged in procps-ng v4.0.0, update the livepatch kselftest to
handle either case.
[1] 6389deca5b
[2] b159c198c9
Reported-by: Dennis(Zhuoheng) Li <denli@redhat.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220811212138.182575-1-joe.lawrence@redhat.com
Test 290a: Show RED class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 2410: Show prio class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 1023: Show mq class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 0521: Show ingress class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 0582: Create QFQ with default setting
Test c9a3: Create QFQ with class weight setting
Test 8452: Create QFQ with class maxpkt setting
Test d920: Create QFQ with multiple class setting
Test 0548: Delete QFQ with handle
Test 5901: Show QFQ class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test cb28: Create NETEM with default setting
Test a089: Create NETEM with limit flag
Test 3449: Create NETEM with delay time
Test 3782: Create NETEM with distribution and corrupt flag
Test 2b82: Create NETEM with distribution and duplicate flag
Test a932: Create NETEM with distribution and loss flag
Test e01a: Create NETEM with distribution and loss state flag
Test ba29: Create NETEM with loss gemodel flag
Test 0492: Create NETEM with reorder flag
Test 7862: Create NETEM with rate limit
Test 7235: Create NETEM with multiple slot rate
Test 5439: Create NETEM with multiple slot setting
Test 5029: Change NETEM with loss state
Test 3785: Replace NETEM with delay time
Test 4502: Delete NETEM with handle
Test 0785: Show NETEM class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 20ba: Add multiq Qdisc to multi-queue device (8 queues)
Test 4301: List multiq Class
Test 7832: Delete nonexistent multiq Qdisc
Test 2891: Delete multiq Qdisc twice
Test 1329: Add multiq Qdisc to single-queue device
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 9903: Add mqprio Qdisc to multi-queue device (8 queues)
Test 453a: Delete nonexistent mqprio Qdisc
Test 5292: Delete mqprio Qdisc twice
Test 45a9: Add mqprio Qdisc to single-queue device
Test 2ba9: Show mqprio class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 0904: Create HTB with default setting
Test 3906: Create HTB with default-N setting
Test 8492: Create HTB with r2q setting
Test 9502: Create HTB with direct_qlen setting
Test b924: Create HTB with class rate and burst setting
Test 4359: Create HTB with class mpu setting
Test 9048: Create HTB with class prio setting
Test 4994: Create HTB with class ceil setting
Test 9523: Create HTB with class cburst setting
Test 5353: Create HTB with class mtu setting
Test 346a: Create HTB with class quantum setting
Test 303a: Delete HTB with handle
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 3254: Create HFSC with default setting
Test 0289: Create HFSC with class sc and ul rate setting
Test 846a: Create HFSC with class sc umax and dmax setting
Test 5413: Create HFSC with class rt and ls rate setting
Test 9312: Create HFSC with class rt umax and dmax setting
Test 6931: Delete HFSC with handle
Test 8436: Show HFSC class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 4957: Create FQ_CODEL with default setting
Test 7621: Create FQ_CODEL with limit setting
Test 6871: Create FQ_CODEL with memory_limit setting
Test 5636: Create FQ_CODEL with target setting
Test 630a: Create FQ_CODEL with interval setting
Test 4324: Create FQ_CODEL with quantum setting
Test b190: Create FQ_CODEL with noecn flag
Test 5381: Create FQ_CODEL with ce_threshold setting
Test c9d2: Create FQ_CODEL with drop_batch setting
Test 523b: Create FQ_CODEL with multiple setting
Test 9283: Replace FQ_CODEL with noecn setting
Test 3459: Change FQ_CODEL with limit setting
Test 0128: Delete FQ_CODEL with handle
Test 0435: Show FQ_CODEL class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 6345: Create DSMARK with default setting
Test 3462: Create DSMARK with default_index setting
Test ca95: Create DSMARK with set_tc_index flag
Test a950: Create DSMARK with multiple setting
Test 4092: Delete DSMARK with handle
Test 5930: Show DSMARK class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 0385: Create DRR with default setting
Test 2375: Delete DRR with handle
Test 3092: Show DRR class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 1820: Create CBS with default setting
Test 1532: Create CBS with hicredit setting
Test 2078: Create CBS with locredit setting
Test 9271: Create CBS with sendslope setting
Test 0482: Create CBS with idleslope setting
Test e8f3: Create CBS with multiple setting
Test 23c9: Replace CBS with sendslope setting
Test a07a: Change CBS with idleslope setting
Test 43b3: Delete CBS with handle
Test 9472: Show CBS class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 3460: Create CBQ with default setting
Test 0592: Create CBQ with mpu
Test 4684: Create CBQ with valid cell num
Test 4345: Create CBQ with invalid cell num
Test 4525: Create CBQ with valid ewma
Test 6784: Create CBQ with invalid ewma
Test 5468: Delete CBQ with handle
Test 492a: Show CBQ class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 1212: Create CAKE with default setting
Test 3281: Create CAKE with bandwidth limit
Test c940: Create CAKE with autorate-ingress flag
Test 2310: Create CAKE with rtt time
Test 2385: Create CAKE with besteffort flag
Test a032: Create CAKE with diffserv8 flag
Test 2349: Create CAKE with diffserv4 flag
Test 8472: Create CAKE with flowblind flag
Test 2341: Create CAKE with dsthost and nat flag
Test 5134: Create CAKE with wash flag
Test 2302: Create CAKE with flowblind and no-split-gso flag
Test 0768: Create CAKE with dual-srchost and ack-filter flag
Test 0238: Create CAKE with dual-dsthost and ack-filter-aggressive flag
Test 6572: Create CAKE with memlimit and ptm flag
Test 2436: Create CAKE with fwmark and atm flag
Test 3984: Create CAKE with overhead and mpu
Test 5421: Create CAKE with conservative and ingress flag
Test 6854: Delete CAKE with conservative and ingress flag
Test 2342: Replace CAKE with mpu
Test 2313: Change CAKE with mpu
Test 4365: Show CAKE class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Free the created fd or allocated bpf_object after test case succeeds,
else there will be resource leaks.
Spotted by using address sanitizer and checking the content of
/proc/$pid/fd directory.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220921070035.2016413-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Destroy the created skeleton when CONFIG_PREEMPT is off, else will be
resource leak.
Fixes: 73b97bc78b ("selftests/bpf: Test concurrent updates on bpf_task_storage_busy")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220921070035.2016413-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The following warning appears when executing:
make -C tools/testing/selftests/kvm
rseq_test.c: In function ‘main’:
rseq_test.c:237:33: warning: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Wimplicit-function-declaration]
(void *)(unsigned long)gettid());
^~~~~~
getgid
/usr/bin/ld: /tmp/ccr5mMko.o: in function `main':
../kvm/tools/testing/selftests/kvm/rseq_test.c:237: undefined reference to `gettid'
collect2: error: ld returned 1 exit status
make: *** [../lib.mk:173: ../kvm/tools/testing/selftests/kvm/rseq_test] Error 1
Use the more compatible syscall(SYS_gettid) instead of gettid() to fix it.
More subsequent reuse may cause it to be wrapped in a lib file.
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Message-Id: <20220802071240.84626-1-cloudliang@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Added urandom_read shared lib is missing from the list of installed
files what makes urandom_read test after `make install` or `make
gen_tar` broken.
Add the library to TEST_GEN_FILES. The names in the list do not
contain $(OUTPUT) since it's added by lib.mk code.
Fixes: 00a0fa2d7d ("selftests/bpf: Add urandom_read shared lib and USDTs")
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920161409.129953-1-ykaliuta@redhat.com
We use a local variable hwcap to refer to the element of the hwcaps array
which we are currently checking. When checking for the relevant hwcap bit
being set in testing we were dereferencing hwcaps rather than hwcap in
fetching the AT_HWCAP to use, which is perfectly valid C but means we were
always checking the bit was set in the hwcap for whichever feature is first
in the array. Remove the stray s.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220907113400.12982-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A handful of awaited fixes here - revert of the FEC changes,
bluetooth fix, fixes for iwlwifi spew.
We added a warning in PHY/MDIO code which is triggering on
a couple of platforms in a false-positive-ish way. If we can't
iron that out over the week we'll drop it and re-add for 6.1.
I've added a new "follow up fixes" section for fixes to fixes
in 6.0-rcs but it may actually give the false impression that
those are problematic or that more testing time would have
caught them. So likely a one time thing.
Follow up fixes:
- nf_tables_addchain: fix nft_counters_enabled underflow
- ebtables: fix memory leak when blob is malformed
- nf_ct_ftp: fix deadlock when nat rewrite is needed
Current release - regressions:
- Revert "fec: Restart PPS after link state change"
- Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
- Bluetooth: fix HCIGETDEVINFO regression
- wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
- mptcp: fix fwd memory accounting on coalesce
- rwlock removal fall out:
- ipmr: always call ip{,6}_mr_forward() from RCU read-side
critical section
- ipv6: fix crash when IPv6 is administratively disabled
- tcp: read multiple skbs in tcp_read_skb()
- mdio_bus_phy_resume state warning fallout:
- eth: ravb: fix PHY state warning splat during system resume
- eth: sh_eth: fix PHY state warning splat during system resume
Current release - new code bugs:
- wifi: iwlwifi: don't spam logs with NSS>2 messages
- eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
Previous releases - regressions:
- bonding: fix NULL deref in bond_rr_gen_slave_id
- wifi: iwlwifi: mark IWLMEI as broken
Previous releases - always broken:
- nf_conntrack helpers:
- irc: tighten matching on DCC message
- sip: fix ct_sip_walk_headers
- osf: fix possible bogus match in nf_osf_find()
- ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
- core: fix flow symmetric hash
- bonding, team: unsync device addresses on ndo_stop
- phy: micrel: fix shared interrupt on LAN8814
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmMsj3EACgkQMUZtbf5S
IrsUgQ//eXxuUZeGTg7cgJKPFJelrZ3iL16B1+s2qX94GPIqXRAShgC78iM7IbSe
y3vR/7YVE7sKXm88wnLefMQVXPp0cE2p0+8++E/j4zcRZsM5sHb2+d3gW6nos2ed
U8Ldm7LzWUNt/o1ZHDqZWBSoreFkmbFyHO6FVPCuH11tFUJqxJ/SP860mwo6tbuT
HOoVphKis41IMEXCgybs2V0DAQewba0gejzAmySDy8epNhOj2F4Vo6aadnUCI68U
HrIFYe2wiEi6MZDsB9zpRXc9seb6ZBKbBjgQnTK7MwfBEQCzxtR2lkNobJM1WbdL
nYwHBOJ16yX0BnlSpUEepv6iJYY5Q7FS35Wk3Rq5Mik6DaEir6vVSBdRxHpYOkO2
KPIyyMMAA5E8mAtqH3PcpnwDK+9c3KlZYYKXxIp2IjQm87DpOZJFynwsC3Crmbzo
C7UTMav2nkHljoapMLUwzqyw2ip+Qo14XA043FDPUru1sXY9CY6q50XZa5GmrNKh
xyaBdp4Ckj1kOuXUR9jz3Rq8skOZ8lNGHtiCdgPZitWhNKW1YORJihC7/9zdieCR
1gOE7Dpz/MhVmFn2e8S5O3TkU5lXfALfPDJi4QiML5VLHXd/nCE5sHPiOBWcoo4w
2djKbIGpLRnO6qMs4NkWNmPbG+/ouvpM+lewqn+xU4TGyn/NTbI=
=wrep
-----END PGP SIGNATURE-----
Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wifi, netfilter and can.
A handful of awaited fixes here - revert of the FEC changes, bluetooth
fix, fixes for iwlwifi spew.
We added a warning in PHY/MDIO code which is triggering on a couple of
platforms in a false-positive-ish way. If we can't iron that out over
the week we'll drop it and re-add for 6.1.
I've added a new "follow up fixes" section for fixes to fixes in
6.0-rcs but it may actually give the false impression that those are
problematic or that more testing time would have caught them. So
likely a one time thing.
Follow up fixes:
- nf_tables_addchain: fix nft_counters_enabled underflow
- ebtables: fix memory leak when blob is malformed
- nf_ct_ftp: fix deadlock when nat rewrite is needed
Current release - regressions:
- Revert "fec: Restart PPS after link state change" and the related
"net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
- Bluetooth: fix HCIGETDEVINFO regression
- wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
- mptcp: fix fwd memory accounting on coalesce
- rwlock removal fall out:
- ipmr: always call ip{,6}_mr_forward() from RCU read-side
critical section
- ipv6: fix crash when IPv6 is administratively disabled
- tcp: read multiple skbs in tcp_read_skb()
- mdio_bus_phy_resume state warning fallout:
- eth: ravb: fix PHY state warning splat during system resume
- eth: sh_eth: fix PHY state warning splat during system resume
Current release - new code bugs:
- wifi: iwlwifi: don't spam logs with NSS>2 messages
- eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
Previous releases - regressions:
- bonding: fix NULL deref in bond_rr_gen_slave_id
- wifi: iwlwifi: mark IWLMEI as broken
Previous releases - always broken:
- nf_conntrack helpers:
- irc: tighten matching on DCC message
- sip: fix ct_sip_walk_headers
- osf: fix possible bogus match in nf_osf_find()
- ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
- core: fix flow symmetric hash
- bonding, team: unsync device addresses on ndo_stop
- phy: micrel: fix shared interrupt on LAN8814"
* tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
selftests: forwarding: add shebang for sch_red.sh
bnxt: prevent skb UAF after handing over to PTP worker
net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
net: sched: fix possible refcount leak in tc_new_tfilter()
net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
udp: Use WARN_ON_ONCE() in udp_read_skb()
selftests: bonding: cause oops in bond_rr_gen_slave_id
bonding: fix NULL deref in bond_rr_gen_slave_id
net: phy: micrel: fix shared interrupt on LAN8814
net/smc: Stop the CLC flow if no link to map buffers on
ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
net: atlantic: fix potential memory leak in aq_ndev_close()
can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
can: gs_usb: gs_can_open(): fix race dev->can.state condition
can: flexcan: flexcan_mailbox_read() fix return value for drop = true
net: sh_eth: Fix PHY state warning splat during system resume
net: ravb: Fix PHY state warning splat during system resume
netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
netfilter: ebtables: fix memory leak when blob is malformed
netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
...
RHEL/Fedora RPM build checks are stricter, and complain when executable
files don't have a shebang line, e.g.
*** WARNING: ./kselftests/net/forwarding/sch_red.sh is executable but has no shebang, removing executable bit
Fix it by adding shebang line.
Fixes: 6cf0291f95 ("selftests: forwarding: Add a RED test for SW datapath")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20220922024453.437757-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This bonding selftest used to cause a kernel oops on aarch64
and should be architectures agnostic.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add -f (--filter) argument which accepts glob-based filters for
narrowing down what BPF object files and programs within them should be
processed by veristat. This filtering applies both to comparison and
main (verification) mode.
Filter can be of two forms:
- file (object) filter: 'strobemeta*'; in this case all the programs
within matching files are implicitly allowed (or denied, depending
if it's positive or negative rule, see below);
- file and prog filter: 'strobemeta*/*unroll*' will further filter
programs within matching files to only allow those program names that
match '*unroll*' glob.
As mentioned, filters can be positive (allowlisting) and negative
(denylisting). Negative filters should start with '!': '!strobemeta*'
will deny any filename which basename starts with "strobemeta".
Further, one extra special syntax is supported to allow more convenient
use in practice. Instead of specifying rule on the command line,
veristat allows to specify file that contains rules, both positive and
negative, one line per one filter. This is achieved with -f @<filepath>
use, where <filepath> points to a text file containing rules (negative
and positive rules can be mixed). For convenience empty lines and lines
starting with '#' are ignored. This feature is useful to have some
pre-canned list of object files and program names that are tested
repeatedly, allowing to check in a list of rules and quickly specify
them on the command line.
As a demonstration (and a short cut for nearest future), create a small
list of "interesting" BPF object files from selftests/bpf and commit it
as veristat.cfg. It currently includes 73 programs, most of which are
the most complex and largest BPF programs in selftests, as judged by
total verified instruction count and verifier states total.
If there is overlap between positive or negative filters, negative
filter takes precedence (denylisting is stronger than allowlisting). If
no allow filter is specified, veristat implicitly assumes '*/*' rule. If
no deny rule is specified, veristat (logically) assumes no negative
filters.
Also note that -f (just like -e and -s) can be specified multiple times
and their effect is cumulative.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220921164254.3630690-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add ability to compare and contrast two veristat runs, previously
recorded with veristat using CSV output format.
When veristat is called with -C (--compare) flag, veristat expects
exactly two input files specified, both should be in CSV format.
Expectation is that it's output from previous veristat runs, but as long
as column names and formats match, it should just work. First CSV file
is designated as a "baseline" provided, and the second one is
comparison (experiment) data set. Establishing baseline matters later
when calculating difference percentages, see below.
Veristat parses these two CSV files and "reconstructs" verifier stats
(it could be just a subset of all possible stats). File and program
names are mandatory as they are used as joining key (these two "stats"
are designated as "key stats" in the code).
Veristat currently enforces that the set of stats recorded in both CSV
has to exactly match, down to exact order. This is just a simplifying
condition which can be lifted with a bit of additional pre-processing to
reorded stat specs internally, which I didn't bother doing, yet.
For all the non-key stats, veristat will output three columns: one for
baseline data, one for comparison data, and one with an absolute and
relative percentage difference. If either baseline or comparison values
are missing (that is, respective CSV file doesn't have a row with
*exactly* matching file and program name), those values are assumed to
be empty or zero. In such case relative percentages are forced to +100%
or -100% output, for consistency with a typical case.
Veristat's -e (--emit) and -s (--sort) specs still apply, so even if CSV
contains lots of stats, user can request to compare only a subset of
them (and specify desired column order as well). Similarly, both CSV and
human-readable table output is honored. Note that input is currently
always expected to be CSV.
Here's an example shell session, recording data for biosnoop tool on two
different kernels and comparing them afterwards, outputting data in table
format.
# on slightly older production kernel
$ sudo ./veristat biosnoop_bpf.o
File Program Verdict Duration (us) Total insns Total states Peak states
-------------- ------------------------ ------- ------------- ----------- ------------ -----------
biosnoop_bpf.o blk_account_io_merge_bio success 37 24 1 1
biosnoop_bpf.o blk_account_io_start failure 0 0 0 0
biosnoop_bpf.o block_rq_complete success 76 104 6 6
biosnoop_bpf.o block_rq_insert success 83 85 7 7
biosnoop_bpf.o block_rq_issue success 79 85 7 7
-------------- ------------------------ ------- ------------- ----------- ------------ -----------
Done. Processed 1 object files, 5 programs.
$ sudo ./veristat ~/local/tmp/fbcode-bpf-objs/biosnoop_bpf.o -o csv > baseline.csv
$ cat baseline.csv
file_name,prog_name,verdict,duration,total_insns,total_states,peak_states
biosnoop_bpf.o,blk_account_io_merge_bio,success,36,24,1,1
biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0
biosnoop_bpf.o,block_rq_complete,success,82,104,6,6
biosnoop_bpf.o,block_rq_insert,success,78,85,7,7
biosnoop_bpf.o,block_rq_issue,success,74,85,7,7
# on latest bpf-next kernel
$ sudo ./veristat biosnoop_bpf.o
File Program Verdict Duration (us) Total insns Total states Peak states
-------------- ------------------------ ------- ------------- ----------- ------------ -----------
biosnoop_bpf.o blk_account_io_merge_bio success 31 24 1 1
biosnoop_bpf.o blk_account_io_start failure 0 0 0 0
biosnoop_bpf.o block_rq_complete success 76 104 6 6
biosnoop_bpf.o block_rq_insert success 83 91 7 7
biosnoop_bpf.o block_rq_issue success 74 91 7 7
-------------- ------------------------ ------- ------------- ----------- ------------ -----------
Done. Processed 1 object files, 5 programs.
$ sudo ./veristat biosnoop_bpf.o -o csv > comparison.csv
$ cat comparison.csv
file_name,prog_name,verdict,duration,total_insns,total_states,peak_states
biosnoop_bpf.o,blk_account_io_merge_bio,success,71,24,1,1
biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0
biosnoop_bpf.o,block_rq_complete,success,82,104,6,6
biosnoop_bpf.o,block_rq_insert,success,83,91,7,7
biosnoop_bpf.o,block_rq_issue,success,87,91,7,7
# now let's compare with human-readable output (note that no sudo needed)
# we also ignore verification duration in this case to shortned output
$ ./veristat -C baseline.csv comparison.csv -e file,prog,verdict,insns
File Program Verdict (A) Verdict (B) Verdict (DIFF) Total insns (A) Total insns (B) Total insns (DIFF)
-------------- ------------------------ ----------- ----------- -------------- --------------- --------------- ------------------
biosnoop_bpf.o blk_account_io_merge_bio success success MATCH 24 24 +0 (+0.00%)
biosnoop_bpf.o blk_account_io_start failure failure MATCH 0 0 +0 (+100.00%)
biosnoop_bpf.o block_rq_complete success success MATCH 104 104 +0 (+0.00%)
biosnoop_bpf.o block_rq_insert success success MATCH 91 85 -6 (-6.59%)
biosnoop_bpf.o block_rq_issue success success MATCH 91 85 -6 (-6.59%)
-------------- ------------------------ ----------- ----------- -------------- --------------- --------------- ------------------
While not particularly exciting example (it turned out to be kind of hard to
quickly find a nice example with significant difference just because of kernel
version bump), it should demonstrate main features.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220921164254.3630690-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Teach veristat to output results as CSV table for easier programmatic
processing. Change what was --output/-o argument to now be --emit/-e.
And then use --output-format/-o <fmt> to specify output format.
Currently "table" and "csv" is supported, table being default.
For CSV output mode veristat is using spec identifiers as column names.
E.g., instead of "Total states" veristat uses "total_states" as a CSV
header name.
Internally veristat recognizes three formats, one of them
(RESFMT_TABLE_CALCLEN) is a special format instructing veristat to
calculate column widths for table output. This felt a bit cleaner and
more uniform than either creating separate functions just for this.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220921164254.3630690-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_object__close(obj) is called twice for BPF object files with single
BPF program in it. This causes crash. Fix this by not calling
bpf_object__close() unnecessarily.
Fixes: c8bc5e0509 ("selftests/bpf: Add veristat tool for mass-verifying BPF object files")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220921164254.3630690-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests to ensure that only supported dynamic pointer types are accepted,
that the passed argument is actually a dynamic pointer, that the passed
argument is a pointer to the stack, and that bpf_verify_pkcs7_signature()
correctly handles dynamic pointers with data set to NULL.
The tests are currently in the deny list for s390x (JIT does not support
calling kernel function).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220920075951.929132-14-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Perform several tests to ensure the correct implementation of the
bpf_verify_pkcs7_signature() kfunc.
Do the tests with data signed with a generated testing key (by using
sign-file from scripts/) and with the tcp_bic.ko kernel module if it is
found in the system. The test does not fail if tcp_bic.ko is not found.
First, perform an unsuccessful signature verification without data.
Second, perform a successful signature verification with the session
keyring and a new one created for testing.
Then, ensure that permission and validation checks are done properly on the
keyring provided to bpf_verify_pkcs7_signature(), despite those checks were
deferred at the time the keyring was retrieved with bpf_lookup_user_key().
The tests expect to encounter an error if the Search permission is removed
from the keyring, or the keyring is expired.
Finally, perform a successful and unsuccessful signature verification with
the keyrings with pre-determined IDs (the last test fails because the key
is not in the platform keyring).
The test is currently in the deny list for s390x (JIT does not support
calling kernel function).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/20220920075951.929132-13-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test to ensure that bpf_lookup_user_key() creates a referenced
special keyring when the KEY_LOOKUP_CREATE flag is passed to this function.
Ensure that the kfunc rejects invalid flags.
Ensure that a keyring can be obtained from bpf_lookup_system_key() when one
of the pre-determined keyring IDs is provided.
The test is currently blacklisted for s390x (JIT does not support calling
kernel function).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/20220920075951.929132-12-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add verifier tests for bpf_lookup_*_key() and bpf_key_put(), to ensure that
acquired key references stored in the bpf_key structure are released, that
a non-NULL bpf_key pointer is passed to bpf_key_put(), and that key
references are not leaked.
Also, slightly modify test_verifier.c, to find the BTF ID of the attach
point for the LSM program type (currently, it is done only for TRACING).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220920075951.929132-11-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since the eBPF CI does not support kernel modules, change the kernel config
to compile everything as built-in.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Daniel Müller <deso@posteo.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220920075951.929132-10-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move dynptr type check to is_dynptr_type_expected() from
is_dynptr_reg_valid_init(), so that callers can better determine the cause
of a negative result (dynamic pointer not valid/initialized, dynamic
pointer of the wrong type). It will be useful for example for BTF, to
restrict which dynamic pointer types can be passed to kfuncs, as initially
only the local type will be supported.
Also, splitting makes the code more readable, since checking the dynamic
pointer type is not necessarily related to validity and initialization.
Split the validity/initialization and dynamic pointer type check also in
the verifier, and adjust the expected error message in the test (a test for
an unexpected dynptr type passed to a helper cannot be added due to missing
suitable helpers, but this case has been tested manually).
Cc: Joanne Koong <joannelkoong@gmail.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220920075951.929132-4-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the default vmlinux files at '/boot/vmlinux-*',
'/lib/modules/*/vmlinux-*' etc. are parsed with 'btf__parse_elf()' to
extract BTF. It is possible that these files are actually raw BTF files
similar to /sys/kernel/btf/vmlinux. So parse these files with
'btf__parse' which tries both raw format and ELF format.
This might be useful in some scenarios where users put their custom BTF
into known locations and don't want to specify btf_custom_path option.
Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/3f59fb5a345d2e4f10e16fe9e35fbc4c03ecaa3e.1662999860.git.chentao.kernel@linux.alibaba.com
It's possible to specify particular tests for test_bpf.ko with
module parameters. Make it possible to pass the module parameters,
example:
test_kmod.sh test_range=1,3
Since magnitude tests take long time it can be reasonable to skip
them.
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220908120146.381218-1-ykaliuta@redhat.com
Commit 34586d29f8 ("libbpf: Add new BPF_PROG2 macro") added BPF_PROG2
macro for trampoline based programs with struct arguments. Andrii
made a few suggestions to improve code quality and description.
This patch implemented these suggestions including better internal
macro name, consistent usage pattern for __builtin_choose_expr(),
simpler macro definition for always-inline func arguments and
better macro description.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220910025214.1536510-1-yhs@fb.com
This change includes selftests that validate the expected behavior and
APIs of the new BPF_MAP_TYPE_USER_RINGBUF map type.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-5-void@manifault.com
Now that all of the logic is in place in the kernel to support user-space
produced ring buffers, we can add the user-space logic to libbpf. This
patch therefore adds the following public symbols to libbpf:
struct user_ring_buffer *
user_ring_buffer__new(int map_fd,
const struct user_ring_buffer_opts *opts);
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
__u32 size, int timeout_ms);
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
void user_ring_buffer__discard(struct user_ring_buffer *rb,
void user_ring_buffer__free(struct user_ring_buffer *rb);
A user-space producer must first create a struct user_ring_buffer * object
with user_ring_buffer__new(), and can then reserve samples in the
ring buffer using one of the following two symbols:
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
__u32 size, int timeout_ms);
With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
buffer will be returned if sufficient space is available in the buffer.
user_ring_buffer__reserve_blocking() provides similar semantics, but will
block for up to 'timeout_ms' in epoll_wait if there is insufficient space
in the buffer. This function has the guarantee from the kernel that it will
receive at least one event-notification per invocation to
bpf_ringbuf_drain(), provided that at least one sample is drained, and the
BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().
Once a sample is reserved, it must either be committed to the ring buffer
with user_ring_buffer__submit(), or discarded with
user_ring_buffer__discard().
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
will allow user-space applications to publish messages to a ring buffer
that is consumed by a BPF program in kernel-space. In order for this
map-type to be useful, it will require a BPF helper function that BPF
programs can invoke to drain samples from the ring buffer, and invoke
callbacks on those samples. This change adds that capability via a new BPF
helper function:
bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
u64 flags)
BPF programs may invoke this function to run callback_fn() on a series of
samples in the ring buffer. callback_fn() has the following signature:
long callback_fn(struct bpf_dynptr *dynptr, void *context);
Samples are provided to the callback in the form of struct bpf_dynptr *'s,
which the program can read using BPF helper functions for querying
struct bpf_dynptr's.
In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
type is added to the verifier to reflect a dynptr that was allocated by
a helper function and passed to a BPF program. Unlike PTR_TO_STACK
dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
dynptrs need not use reference tracking, as the BPF helper is trusted to
properly free the dynptr before returning. The verifier currently only
supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.
Note that while the corresponding user-space libbpf logic will be added
in a subsequent patch, this patch does contain an implementation of the
.map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
.map_poll() callback guarantees that an epoll-waiting user-space
producer will receive at least one event notification whenever at least
one sample is drained in an invocation of bpf_user_ringbuf_drain(),
provided that the function is not invoked with the BPF_RB_NO_WAKEUP
flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
notification is sent even if no sample was drained.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
We want to support a ringbuf map type where samples are published from
user-space, to be consumed by BPF programs. BPF currently supports a
kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
map type. We'll need to define a new map type for user-space -> kernel,
as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
to a user-space producer ring buffer, and we'll want to add one or
more helper functions that would not apply for a kernel-producer
ring buffer.
This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
definition. The map type is useless in its current form, as there is no
way to access or use it for anything until we one or more BPF helpers. A
follow-on patch will therefore add a new helper function that allows BPF
programs to run callbacks on samples that are published to the ring
buffer.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
Sync find_first_bit() and find_next_bit() implementation with the
mother kernel.
Also, drop unused find_last_bit() and find_next_clump8().
Signed-off-by: Yury Norov <yury.norov@gmail.com>
It needs to enter the namespace before reading a file.
Fixes: 4183a8d70a ("perf tools: Allow synthesizing the build id for kernel/modules/tasks in PERF_RECORD_MMAP2")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220920222822.2171056-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
7df548840c ("x86/bugs: Add "unknown" reporting for MMIO Stale Data")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/lkml/YysTRji90sNn2p5f@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
/proc/kallsyms and /proc/modules are compared before and after the copy
in order to ensure no changes during the copy.
However /proc/modules also might change due to reference counts changing
even though that does not make any difference.
Any modules loaded or unloaded should be visible in changes to kallsyms,
so it is not necessary to check /proc/modules also anyway.
Remove the comparison checking that /proc/modules is unchanged.
Fixes: fc1b691d76 ("perf buildid-cache: Add ability to add kcore to the cache")
Reported-by: Daniel Dao <dqminh@cloudflare.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Daniel Dao <dqminh@cloudflare.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220914122429.8770-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With mixed per-thread and (system-wide) per-cpu maps, the "any cpu" value
-1 must be skipped when setting CPU mask bits.
Prior to commit cbd7bfc7fd ("tools/perf: Fix out of bound access
to cpu mask array") the invalid setting went unnoticed, but since then
it causes perf record to fail with an error.
Example:
Before:
$ perf record -e intel_pt// --per-thread uname
Failed to initialize parallel data streaming masks
After:
$ perf record -e intel_pt// --per-thread uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.068 MB perf.data ]
Fixes: ae4f8ae16a ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220915122612.81738-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It uses PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl. The kernel would return
ENOTTY if it's not supported. Update the skip reason in that case.
Committer notes:
On s/390 the args aren't used, so need to be marked __maybe_unused.
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20220914183338.546357-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The attach flags is meaningless for effective query and
its value will always be set as 0 during effective query.
Root cg's effective progs is always its attached progs,
so we use non-effective query to get its progs count and
attach flags. And we don't need the remain attach flags
check.
Fixes: b79c9fc955 ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20220921104604.2340580-4-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When root-cgroup attach multi progs and sub-cgroup attach a override prog,
bpftool will display incorrectly for the attach flags of the sub-cgroup’s
effective progs:
$ bpftool cgroup tree /sys/fs/cgroup effective
CgroupPath
ID AttachType AttachFlags Name
/sys/fs/cgroup
6 cgroup_sysctl multi sysctl_tcp_mem
13 cgroup_sysctl multi sysctl_tcp_mem
/sys/fs/cgroup/cg1
20 cgroup_sysctl override sysctl_tcp_mem
6 cgroup_sysctl override sysctl_tcp_mem <- wrong
13 cgroup_sysctl override sysctl_tcp_mem <- wrong
/sys/fs/cgroup/cg1/cg2
20 cgroup_sysctl sysctl_tcp_mem
6 cgroup_sysctl sysctl_tcp_mem
13 cgroup_sysctl sysctl_tcp_mem
Attach flags is only valid for attached progs of this layer cgroup,
but not for effective progs. For querying with EFFECTIVE flags,
exporting attach flags does not make sense. So let's remove the
AttachFlags field and the associated logic. After this patch, the
above effective cgroup tree will show as bellow:
$ bpftool cgroup tree /sys/fs/cgroup effective
CgroupPath
ID AttachType Name
/sys/fs/cgroup
6 cgroup_sysctl sysctl_tcp_mem
13 cgroup_sysctl sysctl_tcp_mem
/sys/fs/cgroup/cg1
20 cgroup_sysctl sysctl_tcp_mem
6 cgroup_sysctl sysctl_tcp_mem
13 cgroup_sysctl sysctl_tcp_mem
/sys/fs/cgroup/cg1/cg2
20 cgroup_sysctl sysctl_tcp_mem
6 cgroup_sysctl sysctl_tcp_mem
13 cgroup_sysctl sysctl_tcp_mem
Fixes: b79c9fc955 ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Fixes: a98bf57391 ("tools: bpftool: add support for reporting the effective cgroup progs")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20220921104604.2340580-3-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Attach flags is only valid for attached progs of this layer cgroup,
but not for effective progs. For querying with EFFECTIVE flags,
exporting attach flags does not make sense. So when effective query,
we reject prog_attach_flags array and don't need to populate it.
Also we limit attach_flags to output 0 during effective query.
Fixes: b79c9fc955 ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20220921104604.2340580-2-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Following the introduction of pitch, yaw and roll IIO modifiers, update the
event_monitor tool accordingly.
Signed-off-by: Andrea Merello <andrea.merello@iit.it>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220907132205.28021-7-andrea.merello@iit.it
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Following the introduction of IIO linear acceleration modifiers, update the
event_monitor tool accordingly.
Signed-off-by: Andrea Merello <andrea.merello@iit.it>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220907132205.28021-4-andrea.merello@iit.it
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Extend the ptrace test support for NT_ARM_TLS to cover TPIDR2_EL0 - on
systems that support SME the NT_ARM_TLS regset can be up to 2 elements
long with the second element containing TPIDR2_EL0. On systems
supporting SME we verify that this value can be read and written while
on systems that do not support SME we verify correct truncation of reads
and writes.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for extending support for NT_ARM_TLS to cover additional
TPIDRs add some tests for the existing interface. Do this in a generic
ptrace test program to provide a place to collect additional tests in
the future.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This includes Nuno Sa's work to move the IIO core over to generic firmware
properties rather than having DT specific code paths. Combined with Andy
Shevchenko's long term work on drivers, this leaves IIO in a good state for
handling other firmware types.
New device support
- liteon,ltrf216a
* New driver and dt bindings to support this Light sensor.
- maxim,max11205
* New driver for this 16bit single channel ADC.
- memsensing,msa311
* New driver for this accelerometer. Includes a string helper for read/write.
- richtek,rtq6056
* New driver and dt binding to support this current monitor used to measure
power usage.
- yamaha,yas530
* Support the YAS537 variant (series includes several fixes for other parts
and new driver features).
Staging graduation
- adi,ad7746 CDC. Cleanup conducted against set of roadtest tests using
the posted RFC of that framework.
Features
- core
* Large rework to make all the core IIO code use generic firmware properties.
Includes switching some drivers over as well using newly provided
generic interfaces and allowing removal of DT specific ones.
* Support for gesture event types for single and double tap. Used in
bosch,bma400.
- atmel,at91-sama5d2
* Add support for temperature sensor which uses two muxed inputs to estimate
the temperature.
* Handle trackx bits of EMR register to improve temp sampling accuracy.
* Runtime PM support.
- liteon,ltrf216a
* Add a _raw channel output to allow working around an issue with
differing conversions equations that breaks some user space controls.
- mexelis,mlx90632
* Support regulator control.
- ti,tsc2046
* External reference voltage support.
Clean up and minor fixes
- Tree-wide
* devm_clk_get_enabled() replacements of opencoded equivalent.
* Remaining IIO_DMA_MINALIGN conversions (the staging/iio drivers).
* Various minor warning and similar cleanup such as missing static
markings.
* strlcpy() to strscpy() for cases where return value not checked.
* provide units.h entries for more HZ units and use them in drivers.
- dt-bindings cleanup
* Drop maintainers listss where the email address is bouncing.
* Switch spi devices over to using spi-peripheral.yaml
* Add some missing unevaluatedProperties / additionalProperties: false
entries.
- ABI docs
* Add some missing channel type specific sampling frequency entries.
* Add parameter names for callback parameters.
- MAINTAINERS
* Fix wrong ADI forum links.
- core
* lockdep class per device, to avoid an issue with nest when one IIO
device is the consumer of another.
* White space tweaks.
- asc,dlhl60d
* Use get_unaligned_be24 to avoid some unusual data manipulation and masking.
- atmel,at91-sama5d2
* Fix wrong max value.
* Improve error handling when measuring pressure and touch.
* Add locks to remove races on updating oversampling / sampling freq.
* Add missing calls in suspend and resume path to ensure state is correctly
brought up if buffered capture was in use when suspend happened.
* Error out of write_raw() callback if buffered capture enabled to avoid
unpredictable behavior.
* Handle different versions having different oversampling ratio support and
drop excess error checking.
* Cleanup magic value defines where the name is just the value and hence
hurts readability.
* Use read_avail() callback to provide info on possible oversampling ratios.
* Correctly handle variable bit depth when doing oversampling on different
supported parts. Also handle higher oversampling ratios.
- fsl,imx8qxp
* Don't ignore errors from regulator_get_voltage() so as to avoid some
very surprising scaling.
- invensense,icp10100
* Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS. UNIVERSAL rarely made
sense and is now deprecated. In this driver we just avoid double disabling
in some paths.
- maxim,max1363
* Drop consumer channel map provision by platform data. There have been
better ways of doing this for years and there are no in tree users.
- microchip,mcp3911
* Update status to maintained.
- qcom,spmi-adc5
* Support measurement of LDO output voltage.
- qcom,spmi-adc
* Add missing channel available on SM6125 SoC.
- st,stmpe
* Drop requirement on node name in binding now that driver correctly
doesn't enforce it.
- stx104
* Move to more appropriate addac directory
- ti,am335x
* Document ti,am654-adc compatible already in use in tree.
- ti,hmc5843
* Move dev_pm_ops out of header and use new pm macros to handle export.
- yamaha,yas530
* Minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmMdsjERHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0FohwYA//cDu2/UcQgXRfIvWTmAAIXNk0qQ/3VQbf
h6hjy9wESmW8728sed77bkL49WriyyXkYe3qalLph4Y+bFRwLAss/2iHlyM396+7
3qFZz8ydkU9b2pJMt9clPr24RMoPo32s5gwRLs4LG/KJdzZNPYC+AI/ZaErtKSaD
58p7RD9If11/jPo1q1CDJVSF6rsHPfd3terhgP/8eMNyPSwggCohVf+g8CL62Suf
2F0m8RjNMCC+RvPIHQ4u0LtGOtRMvw5DMOcRe/xk2xVxdR2awqmgUGyTIUe61Wuz
C9kuJlxSMovyHRyKNAhHcTDNl8vFbV/4ZaqV4PJDPT5GBiMH6TulXFX2a0oowJHB
sjMYQP2mH7ZVssdm9P3Nv/cYs3IVHtTdQsDzMIwU9A0qiObG7tisZHm51RsgBBGa
Zy4JnphOEjia4pz2V3Tf/sFJxNz1OmKFsUD18LdXIEv2f93++7+K8FJL3TMuSsBj
PdWbLjhd3ZEp5re/60RIW5AaQjBxZf90GRFNIyOu4q1xGMFbHglbWjMEtiyBr+M5
SRPnatHBLm3LXF31/Km/+VcHhIWWh3InxeORhufNVQsx600J2gpX3nvLxj2D3bWX
QRjUj76zgH37KvKca8NTgPUrzWg10K330toLF34mIq5rFonp0dI41CzApVxQsmCH
NO4fYKP+c1k=
=fUm6
-----END PGP SIGNATURE-----
Merge tag 'iio-for-6.1a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
1st set of IIO new device support, features and cleanup for 6.1
This includes Nuno Sa's work to move the IIO core over to generic firmware
properties rather than having DT specific code paths. Combined with Andy
Shevchenko's long term work on drivers, this leaves IIO in a good state for
handling other firmware types.
New device support
- liteon,ltrf216a
* New driver and dt bindings to support this Light sensor.
- maxim,max11205
* New driver for this 16bit single channel ADC.
- memsensing,msa311
* New driver for this accelerometer. Includes a string helper for read/write.
- richtek,rtq6056
* New driver and dt binding to support this current monitor used to measure
power usage.
- yamaha,yas530
* Support the YAS537 variant (series includes several fixes for other parts
and new driver features).
Staging graduation
- adi,ad7746 CDC. Cleanup conducted against set of roadtest tests using
the posted RFC of that framework.
Features
- core
* Large rework to make all the core IIO code use generic firmware properties.
Includes switching some drivers over as well using newly provided
generic interfaces and allowing removal of DT specific ones.
* Support for gesture event types for single and double tap. Used in
bosch,bma400.
- atmel,at91-sama5d2
* Add support for temperature sensor which uses two muxed inputs to estimate
the temperature.
* Handle trackx bits of EMR register to improve temp sampling accuracy.
* Runtime PM support.
- liteon,ltrf216a
* Add a _raw channel output to allow working around an issue with
differing conversions equations that breaks some user space controls.
- mexelis,mlx90632
* Support regulator control.
- ti,tsc2046
* External reference voltage support.
Clean up and minor fixes
- Tree-wide
* devm_clk_get_enabled() replacements of opencoded equivalent.
* Remaining IIO_DMA_MINALIGN conversions (the staging/iio drivers).
* Various minor warning and similar cleanup such as missing static
markings.
* strlcpy() to strscpy() for cases where return value not checked.
* provide units.h entries for more HZ units and use them in drivers.
- dt-bindings cleanup
* Drop maintainers listss where the email address is bouncing.
* Switch spi devices over to using spi-peripheral.yaml
* Add some missing unevaluatedProperties / additionalProperties: false
entries.
- ABI docs
* Add some missing channel type specific sampling frequency entries.
* Add parameter names for callback parameters.
- MAINTAINERS
* Fix wrong ADI forum links.
- core
* lockdep class per device, to avoid an issue with nest when one IIO
device is the consumer of another.
* White space tweaks.
- asc,dlhl60d
* Use get_unaligned_be24 to avoid some unusual data manipulation and masking.
- atmel,at91-sama5d2
* Fix wrong max value.
* Improve error handling when measuring pressure and touch.
* Add locks to remove races on updating oversampling / sampling freq.
* Add missing calls in suspend and resume path to ensure state is correctly
brought up if buffered capture was in use when suspend happened.
* Error out of write_raw() callback if buffered capture enabled to avoid
unpredictable behavior.
* Handle different versions having different oversampling ratio support and
drop excess error checking.
* Cleanup magic value defines where the name is just the value and hence
hurts readability.
* Use read_avail() callback to provide info on possible oversampling ratios.
* Correctly handle variable bit depth when doing oversampling on different
supported parts. Also handle higher oversampling ratios.
- fsl,imx8qxp
* Don't ignore errors from regulator_get_voltage() so as to avoid some
very surprising scaling.
- invensense,icp10100
* Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS. UNIVERSAL rarely made
sense and is now deprecated. In this driver we just avoid double disabling
in some paths.
- maxim,max1363
* Drop consumer channel map provision by platform data. There have been
better ways of doing this for years and there are no in tree users.
- microchip,mcp3911
* Update status to maintained.
- qcom,spmi-adc5
* Support measurement of LDO output voltage.
- qcom,spmi-adc
* Add missing channel available on SM6125 SoC.
- st,stmpe
* Drop requirement on node name in binding now that driver correctly
doesn't enforce it.
- stx104
* Move to more appropriate addac directory
- ti,am335x
* Document ti,am654-adc compatible already in use in tree.
- ti,hmc5843
* Move dev_pm_ops out of header and use new pm macros to handle export.
- yamaha,yas530
* Minor cleanups.
* tag 'iio-for-6.1a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (142 commits)
iio: pressure: icp10100: Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS().
iio: adc: max1363: Drop provision to provide an IIO channel map via platform data
iio: accel: bma400: Add support for single and double tap events
iio: Add new event type gesture and use direction for single and double tap
iio: Use per-device lockdep class for mlock
iio: adc: add max11205 adc driver
dt-bindings: iio: adc: Add max11205 documentation file
iio: magnetometer: yamaha-yas530: Use dev_err_probe()
iio: magnetometer: yamaha-yas530: Make strings const in chip info
iio: magnetometer: yamaha-yas530: Use pointers as driver data
iio: adc: tsc2046: silent spi_device_id warning
iio: adc: tsc2046: add vref support
dt-bindings: iio: adc: ti,tsc2046: add vref-supply property
iio: light: ltrf216a: Add raw attribute
dt-bindings: iio: Add missing (unevaluated|additional)Properties on child nodes
MAINTAINERS: fix Analog Devices forum links
iio/accel: fix repeated words in comments
dt-bindings: iio: accel: add dt-binding schema for msa311 accel driver
iio: add MEMSensing MSA311 3-axis accelerometer driver
dt-bindings: vendor-prefixes: add MEMSensing Microsystems Co., Ltd.
...
The missing header makes it hard for programs like elfutils to open
these files.
Fixes: 2d86612aac ("perf symbol: Correct address for bss symbols")
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Lieven Hey <lieven.hey@kdab.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220915092910.711036-1-lieven.hey@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ sudo ./perf test -v each-cgroup
96: perf stat --bpf-counters --for-each-cgroup test :
--- start ---
test child forked, pid 79600
test child finished with 0
---- end ----
perf stat --bpf-counters --for-each-cgroup test: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220916184132.1161506-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If it mixes core and uncore events, each evsel would have different cpu map.
But it assumed they are same with evlist's all_cpus and accessed by the same
index. This resulted in a crash like below.
$ perf stat -a --bpf-counters --for-each_cgroup ^. -e cycles,imc/cas_count_read/ sleep 1
Segmentation fault
While it's not recommended to use uncore events for cgroup aggregation, it
should not crash.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220916184132.1161506-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous cpu map introduced a bug in the bperf cgroup counter. This
results in a failure when user gives a partial cpu map starting from
non-zero.
$ sudo ./perf stat -C 1-2 --bpf-counters --for-each-cgroup ^. sleep 1
libbpf: prog 'on_cgrp_switch': failed to create BPF link for perf_event FD 0:
-9 (Bad file descriptor)
Failed to attach cgroup program
To get the FD of an evsel, it should use a map index not the CPU number.
Fixes: 0255571a16 ("perf cpumap: Switch to using perf_cpu_map API")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220916184132.1161506-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It seems the recent libbpf got more strict about the section name.
I'm seeing a failure like this:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup ^. sleep 1
libbpf: prog 'on_cgrp_switch': missing BPF prog type, check ELF section name 'perf_events'
libbpf: prog 'on_cgrp_switch': failed to load: -22
libbpf: failed to load object 'bperf_cgroup_bpf'
libbpf: failed to load BPF skeleton 'bperf_cgroup_bpf': -22
Failed to load cgroup skeleton
The section name should be 'perf_event' (without the trailing 's').
Although it's related to the libbpf change, it'd be better fix the
section name in the first place.
Fixes: 944138f048 ("perf stat: Enable BPF counter with --for-each-cgroup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220916184132.1161506-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We found that function btf_dump__dump_type_data can be called by the
user as an API, but in this function, the `opts` parameter may be used
as a null pointer.This causes `opts->indent_str` to trigger a NULL
pointer exception.
Fixes: 2ce8450ef5 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Weibin Kong <kongweibin2@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
Test 0811: Add multiple basic filter with cmp ematch u8/link layer and
default action and dump them
Test 5129: List basic filters
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 8293: Add tcindex filter with default action
Test 7281: Add tcindex filter with hash size and pass action
Test b294: Add tcindex filter with mask shift and reclassify action
Test 0532: Add tcindex filter with pass_on and continue actions
Test d473: Add tcindex filter with pipe action
Test 2940: Add tcindex filter with miltiple actions
Test 1893: List tcindex filters
Test 2041: Change tcindex filter with pass action
Test 9203: Replace tcindex filter with pass action
Test 7957: Delete tcindex filter with drop action
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 2141: Add rsvp filter with tcp proto and specific IP address
Test 5267: Add rsvp filter with udp proto and specific IP address
Test 2819: Add rsvp filter with src ip and src port
Test c967: Add rsvp filter with tunnelid and continue action
Test 5463: Add rsvp filter with tunnel and pipe action
Test 2332: Add rsvp filter with miltiple actions
Test 8879: Add rsvp filter with tunnel and skp flag
Test 8261: List rsvp filters
Test 8989: Delete rsvp filter
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test e122: Add route filter with from and to tag
Test 6573: Add route filter with fromif and to tag
Test 1362: Add route filter with to flag and reclassify action
Test 4720: Add route filter with from flag and continue actions
Test 2812: Add route filter with form tag and pipe action
Test 7994: Add route filter with miltiple actions
Test 4312: List route filters
Test 2634: Delete route filter with pipe action
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 5294: Add flow filter with map key and ops
Test 3514: Add flow filter with map key or ops
Test 7534: Add flow filter with map key xor ops
Test 4524: Add flow filter with map key rshift ops
Test 0230: Add flow filter with map key addend ops
Test 2344: Add flow filter with src map key
Test 9304: Add flow filter with proto map key
Test 9038: Add flow filter with proto-src map key
Test 2a03: Add flow filter with proto-dst map key
Test a073: Add flow filter with iif map key
Test 3b20: Add flow filter with priority map key
Test 8945: Add flow filter with mark map key
Test c034: Add flow filter with nfct map key
Test 0205: Add flow filter with nfct-src map key
Test 5315: Add flow filter with nfct-src map key
Test 7849: Add flow filter with nfct-proto-src map key
Test 9902: Add flow filter with nfct-proto-dst map key
Test 6742: Add flow filter with rt-classid map key
Test 5432: Add flow filter with sk-uid map key
Test 4134: Add flow filter with sk-gid map key
Test 4522: Add flow filter with vlan-tag map key
Test 4253: Add flow filter with rxhash map key
Test 4452: Add flow filter with hash key list
Test 4341: Add flow filter with muliple ops
Test 4392: List flow filters
Test 4322: Change flow filter with map key num
Test 2320: Replace flow filter with map key num
Test 3213: Delete flow filter with map key num
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 6273: Add cgroup filter with cmp ematch u8/link layer and drop action
Test 4721: Add cgroup filter with cmp ematch u8/link layer with trans
flag and pass action
Test d392: Add cgroup filter with cmp ematch u16/link layer and pipe action
Test 0234: Add cgroup filter with cmp ematch u32/link layer and miltiple
actions
Test 8499: Add cgroup filter with cmp ematch u8/network layer and pass
action
Test b273: Add cgroup filter with cmp ematch u8/network layer with trans
flag and drop action
Test 1934: Add cgroup filter with cmp ematch u16/network layer and pipe
action
Test 2733: Add cgroup filter with cmp ematch u32/network layer and
miltiple actions
Test 3271: Add cgroup filter with NOT cmp ematch rule and pass action
Test 2362: Add cgroup filter with two ANDed cmp ematch rules and single
action
Test 9993: Add cgroup filter with two ORed cmp ematch rules and single
action
Test 2331: Add cgroup filter with two ANDed cmp ematch rules and one ORed
ematch rule and single action
Test 3645: Add cgroup filter with two ANDed cmp ematch rules and one NOT
ORed ematch rule and single action
Test b124: Add cgroup filter with u32 ematch u8/zero offset and drop
action
Test 7381: Add cgroup filter with u32 ematch u8/zero offset and invalid
value >0xFF
Test 2231: Add cgroup filter with u32 ematch u8/positive offset and drop
action
Test 1882: Add cgroup filter with u32 ematch u8/invalid mask >0xFF
Test 1237: Add cgroup filter with u32 ematch u8/missing offset
Test 3812: Add cgroup filter with u32 ematch u8/missing AT keyword
Test 1112: Add cgroup filter with u32 ematch u8/missing value
Test 3241: Add cgroup filter with u32 ematch u8/non-numeric value
Test e231: Add cgroup filter with u32 ematch u8/non-numeric mask
Test 4652: Add cgroup filter with u32 ematch u8/negative offset and pass
Test 1331: Add cgroup filter with u32 ematch u16/zero offset and pipe
action
Test e354: Add cgroup filter with u32 ematch u16/zero offset and invalid
value >0xFFFF
Test 3538: Add cgroup filter with u32 ematch u16/positive offset and drop
action
Test 4576: Add cgroup filter with u32 ematch u16/invalid mask >0xFFFF
Test b842: Add cgroup filter with u32 ematch u16/missing offset
Test c924: Add cgroup filter with u32 ematch u16/missing AT keyword
Test cc93: Add cgroup filter with u32 ematch u16/missing value
Test 123c: Add cgroup filter with u32 ematch u16/non-numeric value
Test 3675: Add cgroup filter with u32 ematch u16/non-numeric mask
Test 1123: Add cgroup filter with u32 ematch u16/negative offset and drop
action
Test 4234: Add cgroup filter with u32 ematch u16/nexthdr+ offset and pass
action
Test e912: Add cgroup filter with u32 ematch u32/zero offset and pipe
action
Test 1435: Add cgroup filter with u32 ematch u32/positive offset and drop
action
Test 1282: Add cgroup filter with u32 ematch u32/missing offset
Test 6456: Add cgroup filter with u32 ematch u32/missing AT keyword
Test 4231: Add cgroup filter with u32 ematch u32/missing value
Test 2131: Add cgroup filter with u32 ematch u32/non-numeric value
Test f125: Add cgroup filter with u32 ematch u32/non-numeric mask
Test 4316: Add cgroup filter with u32 ematch u32/negative offset and drop
action
Test 23ae: Add cgroup filter with u32 ematch u32/nexthdr+ offset and pipe
action
Test 23a1: Add cgroup filter with canid ematch and single SFF
Test 324f: Add cgroup filter with canid ematch and single SFF with mask
Test 2576: Add cgroup filter with canid ematch and multiple SFF
Test 4839: Add cgroup filter with canid ematch and multiple SFF with masks
Test 6713: Add cgroup filter with canid ematch and single EFF
Test ab9d: Add cgroup filter with canid ematch and multiple EFF with masks
Test 5349: Add cgroup filter with canid ematch and a combination of
SFF/EFF
Test c934: Add cgroup filter with canid ematch and a combination of
SFF/EFF with masks
Test 4319: Replace cgroup filter with diffferent match
Test 4636: Delete cgroup filter
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test 23c3: Add cBPF filter with valid bytecode
Test 1563: Add cBPF filter with invalid bytecode
Test 2334: Add eBPF filter with valid object-file
Test 2373: Add eBPF filter with invalid object-file
Test 4423: Replace cBPF bytecode
Test 5122: Delete cBPF filter
Test e0a9: List cBPF filters
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since 1b620d539c ("kbuild: disable header exports for UML in a
straightforward way"), installing headers fails on UML, so just disable
installing them, since they're not needed anyway on the architecture.
Fixes: b438b3b8d6 ("wireguard: selftests: support UML")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a regression test for commit 592335a416 ("bonding: accept
unsolicited NA message") and commit b7f14132bf ("bonding: use unspecified
address if no available link local address"). When the bond interface
up and no available link local address, unspecified address(::) is used to
send the NS message. The unsolicited NA message should also be accepted
for validation.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Remove the recent "unshare time namespace on vfork+exec" feature (Andrei Vagin)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmMoxpIWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpd/D/9V7iLUZoquMvXFonv//sRH21P+
u7vH03q0X4lSov73jdjizq8znZl9RVO14IYi+6lQE8VHyOjzjBoTALRPnirNCyGa
Ia8P+LPaOHDTDQmGqt+9xmPKp3z0qwrpWWyTrFHLo7GRzWtI0QjQsSlgUTIz7jCw
dSwLRWN6n7d3hzNzFWt9VUOOlzpip8NTcnAbC9YA5dPFLO85+wZ4ZpMYYfFJMcQj
N/Zm63lrqAU0wy7EhonkKJQDjgRP/zYUs6VJMejHqYl951SrZJ+DgXEGaAwR14Sz
IZAUhSM5Fl8alhkrcmlkiy9A5P014iVRR6AaSyeT2616fac97wY1EWHxvBMqzNsB
AJJqjPHoN+mc8cqt9lMyIhbmS8WkTuyTHziEcFyyTVsNYGYN6x9hVVZalqPrl8o3
Y3zC6MfRK33JNVB2GZVUzsf5EZC3mjz9VJKKmLwYmG4X7/JOvIVCiW123b060T7z
b49PzI+0rTG8SHTk1I/T8NpWuvLRTCglzZK06q971uyT80xPoGD/HmSpmm+86dHs
k3WV2qBoz31Eaoewa3NJqn6pBxQLy9WAZP6rJb3aQSFwDRCuvKO4CUpHAXILt5U+
SoarR5445zVzY3NYHaf/3BRsEnCQS06U67ma0lAmMWk4J3ehFOY0DrRqtLJ02iwd
sKJD/KnKC+IEcLjrAA==
=yFGx
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve reverts from Kees Cook:
"The recent work to support time namespace unsharing turns out to have
some undesirable corner cases, so rather than allowing the API to stay
exposed for another release, it'd be best to remove it ASAP, with the
replacement getting another cycle of testing. Nothing is known to use
this yet, so no userspace breakage is expected.
For more details, see:
https://lore.kernel.org/lkml/ed418e43ad28b8688cfea2b7c90fce1c@ispras.ru
Summary:
- Remove the recent 'unshare time namespace on vfork+exec' feature
(Andrei Vagin)"
* tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
Revert "fs/exec: allow to unshare a time namespace on vfork+exec"
Revert "selftests/timens: add a test for vfork+exit"
Add IPv4 and IPv6 test cases for unresolved multicast routes, testing
that queued packets are forwarded after installing a matching (S, G)
route.
The test cases can be used to reproduce the bugs fixed in "ipmr: Always
call ip{,6}_mr_forward() from RCU read-side critical section".
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This selftest is designed for testing the support of NEXT-C-SID flavor
for SRv6 End behavior. It instantiates a virtual network composed of
several nodes: hosts and SRv6 routers. Each node is realized using a
network namespace that is properly interconnected to others through veth
pairs.
The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged
by hosts for communicating with each other. Such routers i) apply
different SRv6 Policies to the traffic received from connected hosts,
considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID
compression mechanism for encoding several SRv6 segments within a single
128-bit SID address, referred to as a Compressed SID (C-SID) container.
The NEXT-C-SID is provided as a "flavor" of the SRv6 End behavior,
enabling it to properly process the C-SID containers. The correct
execution of the enabled NEXT-C-SID SRv6 End behavior is verified
through reachability tests carried out between hosts belonging to the
same VPN.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The previous patch added a test which can be used instead of qos_burst.sh.
Remove this test.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an equivalent test to qos_burst, the test's purpose is same, but the
new test uses simpler topology and does not require forcing low speed.
In addition, it can be run Spectrum-2 and not only Spectrum-3+. The idea
is to use a shaper in order to limit the traffic and create congestion.
qos_burst test uses small pool, sends many small packets, and verify that
packets are not dropped, which means that many descriptors can be handled.
This test should check the change that commit c864769add
("mlxsw: Configure descriptor buffers") pushed.
Instead, the new test tries to use more than 85% of maximum supported
descriptors. The idea is to use big pool (as much as the ASIC supports),
such that the pool size does not limit the traffic, then send many small
packets, which means that many descriptors are used, and check how many
packets the switch can handle.
The usage of shaper allows to run the test in all ASICs, regardless of
the CPU abilities, as it is able to create the congestion with low rate
of packets.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The maximum pool size is exposed via 'devlink sb' command. The next
patch will add a test which increases some pools to the maximum size.
Add a function to query the value.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
QOS tests create congestion and verify the switch behavior. To create
congestion, they need to have more traffic than the port can handle, so
some of them force 1Gbps speed.
The tests assume that 1Gbps speed is supported, otherwise, they will fail.
Spectrum-4 ASIC will not support this speed in all ports, so to be able
to run the tests there, some adjustments are required. Use shapers to limit
the traffic instead of forcing speed. Note that for several ports, the
speed configuration is just for autoneg issues, so shaper is not needed
instead.
The tests already use ETS qdisc as a root and RED qdiscs as children. Add
a new TBF shaper to limit the rate of traffic, and use it as a root qdisc,
then save the previous hierarchy of qdiscs under the new TBF root.
In some ASICs, the shapers do not limit the traffic as accurately as
forcing speed. To make the tests stable, allow the backlog size to be up to
+-10% of the threshold. The aim of the tests is to make sure that with
backlog << threshold, there are no drops, and that packets are dropped
somewhere in vicinity of the configured threshold.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
QOS tests create congestion and verify the switch behavior. To create
congestion, they need to have more traffic than the port can handle, so
some of them force 1Gbps speed.
The tests assume that 1Gbps speed is supported, otherwise, they will fail.
Spectrum-4 ASIC will not support this speed in all ports, so to be able
to run QOS tests there, some adjustments are required. Use shapers to
limit the traffic instead of forcing speed. Note that for several ports,
the speed configuration is just for autoneg issues, so shaper is not needed
instead.
In tests that already use shapers, set the existing shaper to be a child of
a new TBF shaper which is added as a root qdisc and acts as a port shaper.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test result message when test_task_storage_map_stress_lookup()
succeeds or is skipped. The test case can be skipped due to the choose
of preemption model in kernel config, so export skips in test_maps.c and
increase it when needed.
The following is the output of test_maps when the test case succeeds or
is skipped:
test_task_storage_map_stress_lookup:PASS
test_maps: OK, 0 SKIPPED
test_task_storage_map_stress_lookup SKIP (no CONFIG_PREEMPT)
test_maps: OK, 1 SKIPPED
Fixes: 73b97bc78b ("selftests/bpf: Test concurrent updates on bpf_task_storage_busy")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220919035714.2195144-1-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
* kvm-arm64/single-step-async-exception:
: .
: Single-step fixes from Reiji Watanabe:
:
: "This series fixes two bugs of single-step execution enabled by
: userspace, and add a test case for KVM_GUESTDBG_SINGLESTEP to
: the debug-exception test to verify the single-step behavior."
: .
KVM: arm64: selftests: Add a test case for KVM_GUESTDBG_SINGLESTEP
KVM: arm64: selftests: Refactor debug-exceptions to make it amenable to new test cases
KVM: arm64: Clear PSTATE.SS when the Software Step state was Active-pending
KVM: arm64: Preserve PSTATE.SS for the guest while single-step is enabled
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a test case for KVM_GUESTDBG_SINGLESTEP to the debug-exceptions test.
The test enables single-step execution from userspace, and check if the
exit to userspace occurs for each instruction that is stepped.
Set the default number of the test iterations to a number of iterations
sufficient to always reproduce the problem that the previous patch fixes
on an Ampere Altra machine.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220917010600.532642-5-reijiw@google.com
Split up the current test into a helper, but leave the debug version
checking in main(), to make it convenient to add a new debug exception
test case in a subsequent patch.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220917010600.532642-4-reijiw@google.com
* Allow to configure for 64-bit kernel with ARCH=parisc
* Fix asm/errno.h includes in tools directory for parisc and xtensa
* Clean up iosapic memory allocation
* Minor typo and spelling fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYydbwQAKCRD3ErUQojoP
X0JNAQD050ybcW5iTIs1Hns/20BmpPyI+ph75iNE5jRX/85i/wD8DdfUkI06sfzq
vIshpSaXY5AuBNQsblXJpiFCjbU4/Q4=
=TCpO
-----END PGP SIGNATURE-----
Merge tag 'parisc-for-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Some small parisc architecture fixes for 6.0-rc6:
One patch lightens up a previous commit and thus unbreaks building the
debian kernel, which tries to configure a 64-bit kernel with the
ARCH=parisc environment variable set.
The other patches fixes asm/errno.h includes in the tools directory
and cleans up memory allocation in the iosapic driver.
Summary:
- Allow configuring 64-bit kernel with ARCH=parisc
- Fix asm/errno.h includes in tools directory for parisc and xtensa
- Clean up iosapic memory allocation
- Minor typo and spelling fixes"
* tag 'parisc-for-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Allow CONFIG_64BIT with ARCH=parisc
parisc: remove obsolete manual allocation aligning in iosapic
tools/include/uapi: Fix <asm/errno.h> for parisc and xtensa
Input: hp_sdc: fix spelling typo in comment
parisc: ccio-dma: Add missing iounmap in error path in ccio_probe()
Add tests for memblock_alloc_try_nid() and memblock_alloc_try_nid_raw()
where the simulated physical memory is set up with multiple NUMA nodes.
Additionally, two of these tests set nid != NUMA_NO_NODE. All tests are
run for both top-down and bottom-up allocation directions.
The tested scenarios are:
Range unrestricted:
- region cannot be allocated:
+ none of the nodes have enough memory to allocate the region
Range restricted:
- region can be allocated in the specific node requested without dropping
min_addr:
+ the range fully overlaps with the node, and there are adjacent
reserved regions
- region cannot be allocated:
+ nid is set to NUMA_NO_NODE and the total range can fit the region,
but the range is split between two nodes and everything else is
reserved
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/4b2c7e6e5f3a9837939e99293c77e0e6fc3ae4f9.1663046060.git.remckee0@gmail.com
Add tests for memblock_alloc_try_nid() and memblock_alloc_try_nid_raw()
where the simulated physical memory is set up with multiple NUMA nodes.
Additionally, all of these tests set nid != NUMA_NO_NODE. These tests are
run with a bottom-up allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
- the specific node requested cannot accommodate the request, but the
region can be allocated in a different node:
+ there are no previously reserved regions, but node is too small
+ the requested node is fully reserved
+ the requested node is partially reserved and does not have
enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the first
node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
- region cannot be allocated in the specific node requested, but it can be
allocated in the requested range:
+ range overlaps with multiple nodes along node boundaries, and the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and the
requested node starts after max_addr
- region cannot be allocated in the specific node requested, but it can be
allocated after dropping min_addr:
+ range partially overlaps with two different nodes, where the
second node is the requested node
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/00c4810daaf5d050abc71915b24ed7419bb16b51.1663046060.git.remckee0@gmail.com
Add tests for memblock_alloc_try_nid() and memblock_alloc_try_nid_raw()
where the simulated physical memory is set up with multiple NUMA nodes.
Additionally, all of these tests set nid != NUMA_NO_NODE. These tests are
run with a top-down allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
- the specific node requested cannot accommodate the request, but the
region can be allocated in a different node:
+ there are no previously reserved regions, but node is too small
+ the requested node is fully reserved
+ the requested node is partially reserved and does not have
enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the first
node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
- region cannot be allocated in the specific node requested, but it can be
allocated in the requested range:
+ range overlaps with multiple nodes along node boundaries, and the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and the
requested node starts after max_addr
- region cannot be allocated in the specific node requested, but it can be
allocated after dropping min_addr:
+ range partially overlaps with two different nodes, where the
second node is the requested node
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/84009c5b3969337ccf89df850db56d364f8c228b.1663046060.git.remckee0@gmail.com
Add function setup_numa_memblock() for setting up a memory layout with
multiple NUMA nodes in a previously allocated dummy physical memory.
This function can be used in place of setup_memblock() in tests that need
to simulate a NUMA system.
setup_numa_memblock():
- allows for setting up a memory layout by specifying the fraction of
MEM_SIZE in each node
Set CONFIG_NODES_SHIFT to 4 when building with NUMA=1 to allow for up to
16 NUMA nodes.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/4566d816a85f009268d4858d1ef06c7571a960f9.1663046060.git.remckee0@gmail.com
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF
for freplace programs. It's wrong to search in vmlinux BTF and libbpf
doesn't even mark vmlinux BTF as required for freplace programs. So
trying to search anything in obj->vmlinux_btf might cause NULL
dereference if nothing else in BPF object requires vmlinux BTF.
Instead, error out if freplace (EXT) program doesn't specify
attach_prog_fd during at the load time.
Fixes: 91abb4a6d7 ("libbpf: Support attachment of BPF tracing programs to kernel modules")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
Use proper SEC("tc") for test_verif_scale{1,3} programs. It's not
a problem for selftests right now because we manually set type
programmatically, but not having correct SEC() definitions makes it
harded to generically load BPF object files.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-2-andrii@kernel.org
x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
Test that the bonding and team drivers clean up an underlying device's
address lists (dev->uc, dev->mc) when the aggregated device is deleted.
Test addition and removal of the LACPDU multicast address on underlying
devices by the bonding driver.
v2:
* add lag_lib.sh to TEST_FILES
v3:
* extend bond_listen_lacpdu_multicast test to init_state up and down cases
* remove some superfluous shell syntax and 'set dev ... up' commands
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 3671: Delete tunnel_key set action with valid index
Test 8597: Delete tunnel_key set action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 3872: Delete sample action with valid index
Test a394: Delete sample action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test b811: Delete nat action with valid index
Test a521: Delete nat action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test a972: Delete ife encode action with valid index
Test 1272: Delete ife encode action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 6571: Delete connmark action with valid index
Test 3426: Delete connmark action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 2029: Add xt action with log-prefix
Test 3562: Replace xt action log-prefix
Test 8291: Delete xt action with valid index
Test 5169: Delete xt action with invalid index
Test 7284: List xt actions
Test 5010: Flush xt actions
Test 8437: Add xt action with duplicate index
Test 2837: Add xt action with invalid index
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test 5153: Add gate action with priority and sched-entry
Test 7189: Add gate action with base-time
Test a721: Add gate action with cycle-time
Test c029: Add gate action with cycle-time-ext
Test 3719: Replace gate base-time action
Test d821: Delete gate action with valid index
Test 3128: Delete gate action with invalid index
Test 7837: List gate actions
Test 9273: Flush gate actions
Test c829: Add gate action with duplicate index
Test 3043: Add gate action with invalid index
Test 2930: Add gate action with cookie
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test c826: Add ctinfo action with default setting
Test 0286: Add ctinfo action with dscp
Test 4938: Add ctinfo action with valid cpmark and zone
Test 7593: Add ctinfo action with drop control
Test 2961: Replace ctinfo action zone and action control
Test e567: Delete ctinfo action with valid index
Test 6a91: Delete ctinfo action with invalid index
Test 5232: List ctinfo actions
Test 7702: Flush ctinfo actions
Test 3201: Add ctinfo action with duplicate index
Test 8295: Add ctinfo action with invalid index
Test 3964: Replace ctinfo action with invalid goto_chain control
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate the RNG hwcap and make sure we don't generate a SIGILL reading
RNDR when it is reported.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220913141101.151400-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Clean up the output of the test by adding a missing newline, the fix had
been done locally but didn't make it into the applied version.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220913141101.151400-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Update version number.
This version includes fixes for:
- fix build failure when using gcc options -Wl,--as-needed
- Fix warning for perf_cap.cpu may be uninitialized
- Fix off by one check for MAX_DIE_PER_PACKAGE
- Fix issue with use of get_physical_die_id instead of
get_physical_die_id
Optimizations:
- Removed unused interfaces and functions
- Better handle package, die, cpu combination by
defining a struct and set at one place instead
at each user level.
New functional change:
- Warn if turbo is disabled and SST turbo-freq feature is requested
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Optimize CPU initialization.
Do cpu related initialization in one function, including setting the cpu
present_cpumask, target_cpumask, and cpu_map and core_count arrays.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
cpu_map already has the cpu package id, die id information.
Thus there is no need to re-evaluating sysfs attributes or stored data
file to get the package id and die id of a given CPU each time.
In order to unitlize this, cpu_map needs to be created unconditionally.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
pkg_id/die_id can be retrieved from struct isst_id, remove the redundant
clos_config->pkg_id/die_id fields.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Enforce the pkg/die value in struct isst_id are either -1 or a valid
value.
This helps avoid inconsistent or redundant checks.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Now, all the get_physical_pkg/die/core_id() users are inside
isst-config.c, so no need to export these APIs.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
struct isst_id contains cpu, package and die info, and it can represent
a specific SST power domain.
Introduce is_cpu_in_power_domain() helper to identify if a cpu is in a
specified power_domain.
And cleanup the code to use the new helper.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
struct isst_id already contains package and die id information, thus
there is no need to get the package and die id information, when struct
isst_id is already available.
Remove unneeded get_physical_package_id/get_physical_die_id usage.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
With pkg and die info added into struct isst_id, more functions can
be converted to use struct isst_id as parameter.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Code uses pkg_id and die_id to refer to a specific power domain.
The pkg/die information is already settled at start time. Adding package
id and die id information into struct isst_id so that code does not need
to retrieve them at runtime.
More code cleanups can be done with the package/die info available.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
SST control is power-domain based rather than cpu based, on all the
systems including Sapphire Rapids and ealier.
SST core APIs uses cpu id as parameter, and use the underlying pkg_id and
die_id information to find a power domain, this is not straight forward
and introduces obscure logics in the code.
Introduce struct isst_id to represent a SST Power Domain.
All core APIs are converted to use struct isst_id as parameter instead of
using cpu id.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Remove dead code.
Not functional change in this patch
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
In the function isst_ctdp_display_information(), call to the function
get_cpu_count() is using get_physical_die_id() instead of
get_physical_package_id(). This will result in wrong display of
CPU count in that level.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ Srinivas Pandruvada: fixed subject and change log ]
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
When 'discussing' control flow Masami mentioned the LOOP* instructions
and I realized objtool doesn't decode them properly.
As it turns out, these instructions are somewhat inefficient and as
such unlikely to be emitted by the compiler (a few vmlinux.o checks
can't find a single one) so this isn't critical, but still, best to
decode them properly.
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net
Move the fullmesh prefix test of addr_nr_ns2 together with its other
prefix tests.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
These changes simplify the Makefile and handle these 5 ways to build
Landlock tests:
- make -C tools/testing/selftests/landlock
- make -C tools/testing/selftests TARGETS=landlock gen_tar
- make TARGETS=landlock kselftest-gen_tar
- make TARGETS=landlock O=build kselftest-gen_tar
- make -C /tmp/linux TARGETS=landlock O=/tmp/build kselftest-gen_tar
This also makes $(KHDR_INCLUDES) available to other test collections
when building in their directory.
Fixes: f1227dc7d0 ("selftests/landlock: fix broken include of linux/landlock.h")
Fixes: 3bb267a361 ("selftests: drop khdr make target")
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220909103402.1501802-1-mic@digikod.net
Add a test to assert that KVM handles the AArch64 views of the AArch32
ID registers as RAZ/WI (writable only from userspace). For registers
that were already hidden or unallocated, expect RAZ + invariant
behavior.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220913094441.3957645-8-oliver.upton@linux.dev
tools/include/uapi/asm/errno.h currently attempts to include
non-existent arch-specific errno.h header for xtensa.
Remove this case so that <asm-generic/errno.h> is used instead,
and add the missing arch-specific header for parisc.
References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=ia64&ver=5.8.3-1%7Eexp1&stamp=1598340829&raw=1
Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Cc: <stable@vger.kernel.org> # 5.10+
Signed-off-by: Helge Deller <deller@gmx.de>
Do one fork in vsyscall detection code and let SIGSEGV handler exit and
carry information to the parent saving LOC.
[adobriyan@gmail.com: redo original patch, delete unnecessary variables, minimise code changes]
Link: https://lkml.kernel.org/r/YvoWzAn5dlhF75xa@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit d2d6cba5d6623 ("selftest: vm: remove orphaned references to
local_config.{h,mk}") took care of removing orphaned references. This
commit removes local_config from .gitignore.
Parent patch commit 69007f156ba ("Kselftests: remove support of
libhugetlbfs from kselftests")
Link: https://lkml.kernel.org/r/20220901092315.33619-1-tsahu@linux.ibm.com
Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
libhugetlbfs, the user side utitlity to work with hugepages, does not have
any active support. There are only 2 selftests which are part of in
vm/hmm_test.c that depends on libhugetlbfs.
This patch modifies the tests so that they will not require libhugetlb
library.
[axelrasmussen@google.com: : remove orphaned references to local_config.{h,mk}]
Link: https://lkml.kernel.org/r/20220831211526.2743216-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220801070231.13831-1-tsahu@linux.ibm.com
Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Tested-by: Zach O'Keefe <zokeefe@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The -f option is to filter out the information of blocks whose memory has
not been released, I noticed some blocks should not be filtered out.
Commit 9cc7e96aa8 ("mm/page_owner: record timestamp and pid") records
the allocation timestamp (ts_nsec) of all pages.
Commit 866b485262 ("mm/page_owner: record the timestamp of all pages
during free") records the free timestamp (free_ts_nsec) of all pages.
When the page is allocated for the first time, the initial value of
free_ts_nsec is 0, and the corresponding time will be obtained when the
page is released. But during reallocation the free_ts_nsec will not reset
to 0 again. In particular, when page migration occurs, these two
timestamps will be the same.
Now page_owner_sort removes all text blocks whose free_ts_nsec is not 0
when using -f option. However, this way can only select pages allocated
for the first time. If a freed page is reallocated, free_ts_nsec will be
less than ts_nsec; if page migration occurs, the two timestamps will be
equal. These cases should be considered as pages are not released.
So I fix the function is_need() to keep text blocks that meet the above
two conditions when using -f option.
Link: https://lkml.kernel.org/r/20220812155515.30846-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Georgi Djakov <georgi.djakov@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This new mode was recently added to the userfaultfd selftest. We want to
exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both
test cases to the script.
Link: https://lkml.kernel.org/r/20220808175614.3885028-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep
working into the future, so just run the test twice, using each interface.
Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let
the user choose which to test.
As with other test features, change the behavior based on a new command
line flag. Introduce the idea of "test mods", which are generic (not
specific to a test type) modifications to the behavior of the test. This
is sort of borrowed from this RFC patch series [1], but simplified a bit.
The benefit is, in "typical" configurations this test is somewhat slow
(say, 30sec or something). Testing both clearly doubles it, so it may not
always be desirable, as users are likely to use one or the other, but
never both, in the "real world".
[1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/
[axelrasmussen@google.com: modify selftest to exit with KSFT_SKIP *only* when features are unsupported, per Mike]
Link: https://lkml.kernel.org/r/20220819205201.658693-4-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-4-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "userfaultfd: add /dev/userfaultfd for fine grained access
control", v7.
Why not ...?
============
- Why not /proc/[pid]/userfaultfd? Two main points (additional discussion [1]):
- /proc/[pid]/* files are all owned by the user/group of the process, and
they don't really support chmod/chown. So, without extending procfs it
doesn't solve the problem this series is trying to solve.
- The main argument *for* this was to support creating UFFDs for remote
processes. But, that use case clearly calls for CAP_SYS_PTRACE, so to
support this we could just use the UFFD syscall as-is.
- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access
without also granting more / other permissions as well, and adding a new
capability was rejected [2].
- It's possible a LSM could be used to control access instead, but I have
some concerns. I don't think this approach would be as easy to use,
particularly if we were to try to solve this with something heavyweight
like SELinux. Maybe we could pursue adding a new LSM specifically for
this user case, but it may be too narrow of a case to justify that.
[1]: https://patchwork.kernel.org/project/linux-mm/cover/20220719195628.3415852-1-axelrasmussen@google.com/
[2]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8-170e5943abe4@schaufler-ca.com/T/
This patch (of 5):
This not being included was just a simple oversight. There are certain
features (like minor fault support) which are only enabled on shared
mappings, so without including hugetlb_shared we actually lose a
significant amount of test coverage.
Link: https://lkml.kernel.org/r/20220808175614.3885028-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support to allocate and verify collapse of multiple hugepage-sized
regions into multiple THPs.
Add "nr" argument to check_huge() that instructs check_huge() to check for
exactly "nr_hpages" THPs. This has the added benefit of now being able to
check for exactly 0 THPs, and so callsites that previously checked the
negation of exactly 1 THP are now more correct.
->collapse struct collapse_context hook has been expanded with a
"nr_hpages" argument to collapse "nr_hpages" hugepages. The
collapse_full() test has been repurposed to collapse 4 THPs at once. It
is expected more tests will want to test multi THP collapse (e.g.
file/shmem).
This is of particular benefit to madvise collapse context given that it
may do many THP collapses during a single syscall.
Link: https://lkml.kernel.org/r/20220706235936.2197195-19-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add selftest specific to madvise collapse context that tests MADV_COLLAPSE
is "successful" if a hugepage-aligned/sized region is already pmd-mapped.
This test also verifies that MADV_COLLAPSE can collapse memory into THPs
even in "madvise" THP mode and the memory isn't marked VM_HUGEPAGE.
Link: https://lkml.kernel.org/r/20220706235936.2197195-18-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add madvise collapse context to hugepage collapse selftests. This context
is tested with /sys/kernel/mm/transparent_hugepage/enabled set to "never"
in order to avoid unwanted interaction with khugepaged during testing.
Also, refactor updates to sysfs THP settings using a stack so that the THP
settings from nested callers can be restored.
Link: https://lkml.kernel.org/r/20220706235936.2197195-17-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The code
p = alloc_mapping();
printf("Allocate huge page...");
madvise(p, hpage_pmd_size, MADV_HUGEPAGE);
fill_memory(p, 0, hpage_pmd_size);
if (check_huge(p))
success("OK");
else
fail("Fail");
Is repeated many times in different tests. Add a helper, alloc_hpage()
to handle this.
Link: https://lkml.kernel.org/r/20220706235936.2197195-16-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Modularize the collapse action of khugepaged collapse selftests by
introducing a struct collapse_context which specifies how to collapse a
given memory range and the expected semantics of the collapse. This can
be reused later to test other collapse contexts.
Additionally, all tests have logic that checks if a collapse occurred via
reading /proc/self/smaps, and report if this is different than expected.
Move this logic into the per-context ->collapse() hook instead of
repeating it in every test.
Link: https://lkml.kernel.org/r/20220706235936.2197195-15-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This idea was introduced by David Rientjes[1].
Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request
a synchronous collapse of memory at their own expense.
The benefits of this approach are:
* CPU is charged to the process that wants to spend the cycles for the
THP
* Avoid unpredictable timing of khugepaged collapse
Semantics
This call is independent of the system-wide THP sysfs settings, but will
fail for memory marked VM_NOHUGEPAGE. If the ranges provided span
multiple VMAs, the semantics of the collapse over each VMA is independent
from the others. This implies a hugepage cannot cross a VMA boundary. If
collapse of a given hugepage-aligned/sized region fails, the operation may
continue to attempt collapsing the remainder of memory specified.
The memory ranges provided must be page-aligned, but are not required to
be hugepage-aligned. If the memory ranges are not hugepage-aligned, the
start/end of the range will be clamped to the first/last hugepage-aligned
address covered by said range. The memory ranges must span at least one
hugepage-sized region.
All non-resident pages covered by the range will first be
swapped/faulted-in, before being internally copied onto a freshly
allocated hugepage. Unmapped pages will have their data directly
initialized to 0 in the new hugepage. However, for every eligible
hugepage aligned/sized region to-be collapsed, at least one page must
currently be backed by memory (a PMD covering the address range must
already exist).
Allocation for the new hugepage may enter direct reclaim and/or
compaction, regardless of VMA flags. When the system has multiple NUMA
nodes, the hugepage will be allocated from the node providing the most
native pages. This operation operates on the current state of the
specified process and makes no persistent changes or guarantees on how
pages will be mapped, constructed, or faulted in the future
Return Value
If all hugepage-sized/aligned regions covered by the provided range were
either successfully collapsed, or were already PMD-mapped THPs, this
operation will be deemed successful. On success, process_madvise(2)
returns the number of bytes advised, and madvise(2) returns 0. Else, -1
is returned and errno is set to indicate the error for the most-recently
attempted hugepage collapse. Note that many failures might have occurred,
since the operation may continue to collapse in the event a single
hugepage-sized/aligned region fails.
ENOMEM Memory allocation failed or VMA not found
EBUSY Memcg charging failed
EAGAIN Required resource temporarily unavailable. Try again
might succeed.
EINVAL Other error: No PMD found, subpage doesn't have Present
bit set, "Special" page no backed by struct page, VMA
incorrectly sized, address not page-aligned, ...
Most notable here is ENOMEM and EBUSY (new to madvise) which are intended
to provide the caller with actionable feedback so they may take an
appropriate fallback measure.
Use Cases
An immediate user of this new functionality are malloc() implementations
that manage memory in hugepage-sized chunks, but sometimes subrelease
memory back to the system in native-sized chunks via MADV_DONTNEED;
zapping the pmd. Later, when the memory is hot, the implementation could
madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage
coverage and dTLB performance. TCMalloc is such an implementation that
could benefit from this[2].
Only privately-mapped anon memory is supported for now, but additional
support for file, shmem, and HugeTLB high-granularity mappings[2] is
expected. File and tmpfs/shmem support would permit:
* Backing executable text by THPs. Current support provided by
CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which
might impair services from serving at their full rated load after
(re)starting. Tricks like mremap(2)'ing text onto anonymous memory to
immediately realize iTLB performance prevents page sharing and demand
paging, both of which increase steady state memory footprint. With
MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance
and lower RAM footprints.
* Backing guest memory by hugapages after the memory contents have been
migrated in native-page-sized chunks to a new host, in a
userfaultfd-based live-migration stack.
[1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
[2] https://github.com/google/tcmalloc/tree/master/tcmalloc
[jrdr.linux@gmail.com: avoid possible memory leak in failure path]
Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com
[zokeefe@google.com add missing kfree() to madvise_collapse()]
Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/
Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com
[zokeefe@google.com: delay computation of hpage boundaries until use]]
Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When gfp_types.h was split from gfp.h, it broke the radix test suite. Fix
the test suite by using gfp_types.h in the tools gfp.h header.
Link: https://lkml.kernel.org/r/20220902191923.1735933-1-willy@infradead.org
Fixes: cb5a065b4e (headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Check properly the connection tracking entry status configured running
bpf_ct_change_status kfunc.
Remove unnecessary IPS_CONFIRMED status configuration since it is
already done during entry allocation.
Fixes: 6eb7fba007 ("selftests/bpf: Add tests for new nf_conntrack kfuncs")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/813a5161a71911378dfac8770ec890428e4998aa.1662623574.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This KUnit fixes update for Linux 6.0-rc5 consists of 2 fixes to test
build and a fix to incorrect taint reason reporting.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmMbcogACgkQCwJExA0N
QxwhOg/+I0mZJcPFxVAQZgm/7sMWnsiI76H+HEwwHyXpzuJaAAtn/K0jtokeHTMe
Leib27pPJSIdmTbop3fTg0etswfIlWTiQ+v7UUSunlLvID9JyeYtk3WVvuxSNHiw
XsufuhjsM0oGhB+9QtmkTJWxGbB5wKPC5NGICcjNbQsN6pl8Rtn8u4bRUJt+pCcX
qscuJuOtZdUYGVtBsawICWzCOmUtY4ezYxnQbDWUkQfDWopOjdDVNccmQZZVY8Lo
rkS6ahYGI2IP/nipPZwwepu5EafqIyAF3Uu4rdxnUI7S9qkSuC0OiS5vNAoJKdu0
H581tXzFgqlKAVpfkfJ9RUeFY+gTdMPlpotc3itYpYl3m8KMZnM5/TLCcHSrbVGN
f5CYEhdpd9DkPUHEgXa45wpW0/rw98xPNfZOixdDgjvMf+JslUO4lZD3mbJRl2Yw
AD4be8+AcLCLEUGuCEpOsUHdJPjbqe81z1aEPHkebdn9s+uhuzrDS+4KwmV/H7kb
v2L669Opbo8zqaQ+0oev+ePVCE9k+MPQSkiZ4KLMHjDsSVeQ0G7ucMm/94VD1RPZ
ZJePgormRbffKwfMLswpbuVQnzBMfN+XG8AFPqpW34lZNKEOl9X0BZk+WBj3pwZ1
J9SQvHXA5RXM9iEazhk1C+a+m89JN2/vpIqrkxrNyz5Bx6vlXGU=
=xA12
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-fixes-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"Two fixes to test build and a fix for incorrect taint reason reporting"
* tag 'linux-kselftest-kunit-fixes-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
tools: Add new "test" taint to kernel-chktaint
kunit: fix Kconfig for build-in tests USB4 and Nitro Enclaves
kunit: fix assert_type for comparison macros
This tests that when an unprivileged ICMP ping socket connects,
the hooks are actually invoked. We also ensure that if the hook does
not call bpf_bind(), the bound address is unmodified, and if the
hook calls bpf_bind(), the bound address is exactly what we provided
to the helper.
A new netns is used to enable ping_group_range in the test without
affecting ouside of the test, because by default, not even root is
permitted to use unprivileged ICMP ping...
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/086b227c1b97f4e94193e58aae7576d0261b68a4.1662682323.git.zhuyifei@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This helper is needed in multiple tests. Instead of copying it over
and over, better to deduplicate this helper to test_progs.c.
test_progs.c is chosen over testing_helpers.c because of this helper's
use of CHECK / ASSERT_*, and the CHECK was modified to use ASSERT_*
so it does not rely on a duration variable.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/9b4fc9a27bd52f771b657b4c4090fc8d61f3a6b5.1662682323.git.zhuyifei@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This reverts commit 14e5ce7994 ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.
Fixes: 14e5ce7994 ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
- Fix per-thread mmaps for multi-threaded targets, noticed with 'perf top --pid' with
multithreaded targets.
- Fix synthesis failure warnings in 'perf record'.
- Fix L2 Topdown metrics disappearance for raw events in 'perf stat'.
- Fix out of bound access in some CPU masks.
- Fix segfault if there is no CPU PMU table and a metric is sought, noticed when
building with NO_JEVENTS=1.
- Skip dummy event attr check in 'perf script' fixing nonsensical warning about
UREGS attribute not set, as 'dummy' events have no samples.
- Fix 'iregs' field handling with dummy events on hybrid systems in 'perf script'.
- Prevent potential memory leak in c2c_he_zalloc() in 'perf c2c'.
- Don't install data files with x permissions.
- Fix types for print format in dlfilter-show-cycles.
- Switch deprecated openssl MD5_* functions to new EVP API in 'genelf'.
- Remove redundant word 'contention' in 'perf lock' help message.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYxqBaAAKCRCyPKLppCJ+
J6euAP0Rmztl5DUU7gCI939JHuVuC6ry68FgMPtpAKlD7QHgFQD7BwrhfVkvvLCH
wWNf69a47ZJwriZ2pLOhfYDQWk7SMQU=
=NiL3
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.0-2022-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix per-thread mmaps for multi-threaded targets, noticed with
'perf top --pid' with multithreaded targets
- Fix synthesis failure warnings in 'perf record'
- Fix L2 Topdown metrics disappearance for raw events in 'perf stat'
- Fix out of bound access in some CPU masks
- Fix segfault if there is no CPU PMU table and a metric is sought,
noticed when building with NO_JEVENTS=1
- Skip dummy event attr check in 'perf script' fixing nonsensical
warning about UREGS attribute not set, as 'dummy' events have no
samples
- Fix 'iregs' field handling with dummy events on hybrid systems in
'perf script'
- Prevent potential memory leak in c2c_he_zalloc() in 'perf c2c'
- Don't install data files with x permissions
- Fix types for print format in dlfilter-show-cycles
- Switch deprecated openssl MD5_* functions to new EVP API in 'genelf'
- Remove redundant word 'contention' in 'perf lock' help message
* tag 'perf-tools-fixes-for-v6.0-2022-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf record: Fix synthesis failure warnings
perf tools: Don't install data files with x permissions
perf script: Fix Cannot print 'iregs' field for hybrid systems
perf lock: Remove redundant word 'contention' in help message
perf dlfilter dlfilter-show-cycles: Fix types for print format
libperf evlist: Fix per-thread mmaps for multi-threaded targets
perf c2c: Prevent potential memory leak in c2c_he_zalloc()
perf genelf: Switch deprecated openssl MD5_* functions to new EVP API
tools/perf: Fix out of bound access to cpu mask array
perf affinity: Fix out of bound access to "sched_cpus" mask
perf stat: Fix L2 Topdown metrics disappear for raw events
perf script: Skip dummy event attr check
perf metric: Return early if no CPU PMU table exists
Florian Westhal says:
====================
netfilter: bugfixes for net
The following set contains four netfilter patches for your *net* tree.
When there are multiple Contact headers in a SIP message its possible
the next headers won't be found because the SIP helper confuses relative
and absolute offsets in the message. From Igor Ryzhov.
Make the nft_concat_range self-test support socat, this makes the
selftest pass on my test VM, from myself.
nf_conntrack_irc helper can be tricked into opening a local port forward
that the client never requested by embedding a DCC message in a PING
request sent to the client. Fix from David Leadbeater.
Both have been broken since the kernel 2.6.x days.
The 'osf' match might indicate success while it could not find
anything, broken since 5.2 . Fix from Pablo Neira.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some calls to synthesis functions set err < 0 but only warn about the
failure and continue. However they do not set err back to zero, relying
on subsequent code to do that.
That changed with the introduction of option --synth. When --synth=no
subsequent functions that set err back to zero are not called.
Fix by setting err = 0 in those cases.
Example:
Before:
$ perf record --no-bpf-event --synth=all -o /tmp/huh uname
Couldn't synthesize bpf events.
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ]
$ perf record --no-bpf-event --synth=no -o /tmp/huh uname
Couldn't synthesize bpf events.
After:
$ perf record --no-bpf-event --synth=no -o /tmp/huh uname
Couldn't synthesize bpf events.
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ]
Fixes: 41b740b6e8 ("perf record: Add --synth option")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220907162458.72817-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
install(1), by default, installs with rwxr-xr-x permissions. Modify
perf's Makefile to pass '-m 644' when installing:
* Documentation/tips.txt
* examples/bpf/*
* perf-completion.sh
* perf_dlfilter.h header
* scripts/perl/Perf-Trace-Util/lib/Perf/Trace/*
* scripts/perl/*.pl
* tests/attr/*
* tests/attr.py
* tests/shell/lib/*.sh
* trace/strace/groups/*
All those are supposed to be non-executable. Either they are not scripts
at all, or they don't have shebang.
Signed-off-by: <jslaby@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908060426.9619-1-jslaby@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit b91e5492f9 ("perf record: Add a dummy event on hybrid
systems to collect metadata records") adds a dummy event on hybrid
systems to fix the symbol "unknown" issue when the workload is created
in a P-core but runs on an E-core. The added dummy event will cause
"perf script -F iregs" to fail. Dummy events do not have "iregs"
attribute set, so when we do evsel__check_attr, the "iregs" attribute
check will fail, so the issue happened.
The following commit [1] has fixed a similar issue by skipping the attr
check for the dummy event because it does not have any samples anyway. It
works okay for the normal mode, but the issue still happened when running
the test in the pipe mode. In the pipe mode, it calls process_attr() which
still checks the attr for the dummy event. This commit fixed the issue by
skipping the attr check for the dummy event in the API evsel__check_attr,
Otherwise, we have to patch everywhere when evsel__check_attr() is called.
Before:
#./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field.
0x120 [0x90]: failed to process type: 64
#
After:
# ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
ABI:2 CX:0x55b8efa87000 DX:0x55b8efa7e000 DI:0xffffba5e625efbb0 R8:0xffff90e51f8ae100
ABI:2 CX:0x7f1dae1e4000 DX:0xd0 DI:0xffff90e18c675ac0 R8:0x71
ABI:2 CX:0xcc0 DX:0x1 DI:0xffff90e199880240 R8:0x0
ABI:2 CX:0xffff90e180dd7500 DX:0xffff90e180dd7500 DI:0xffff90e180043500 R8:0x1
ABI:2 CX:0x50 DX:0xffff90e18c583bd0 DI:0xffff90e1998803c0 R8:0x58
#
[1]https://lore.kernel.org/lkml/20220831124041.219925-1-jolsa@kernel.org/
Fixes: b91e5492f9 ("perf record: Add a dummy event on hybrid systems to collect metadata records")
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908070030.3455164-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
drivers/net/ethernet/freescale/fec.h
7d650df99d ("net: fec: add pm_qos support on imx6q platform")
40c79ce13b ("net: fec: add stop mode support for imx8 platform")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Avoid compiler warning about format %llu that expects long long unsigned
int but argument has type __u64.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: c3afd6e50f ("perf dlfilter: Add dlfilter-show-cycles")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220905074735.4513-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The offending commit removed mmap_per_thread(), which did not consider
the different set-output rules for per-thread mmaps i.e. in the per-thread
case set-output is used for file descriptors of the same thread not the
same cpu.
This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.
Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
events i.e. to continue to allow the mixing of per-thread and per-cpu
mmaps.
Debug messages (with -vv) show the file descriptors that are opened with
sys_perf_event_open. New debug messages are added (needs -vvv) that show
also which file descriptors are mmapped and which are redirected with
set-output.
In the per-cpu case (cpu != -1) file descriptors for the same CPU are
set-output to the first file descriptor for that CPU.
In the per-thread case (cpu == -1) file descriptors for the same thread are
set-output to the first file descriptor for that thread.
Example (process 17489 has 2 threads):
Before (but with new debug prints):
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
failed to mmap with 22 (Invalid argument)
After:
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 1: mmapping fd 6
<SNIP>
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
Per-cpu example (process 20341 has 2 threads, same as above):
$ perf record --no-bpf-event -vvv -p 20341
<SNIP>
sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
<SNIP>
libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
libperf: idx 1: mmapping fd 7
libperf: idx 1: set output fd 8 -> 7
libperf: idx 2: mmapping fd 9
libperf: idx 2: set output fd 10 -> 9
libperf: idx 3: mmapping fd 11
libperf: idx 3: set output fd 12 -> 11
libperf: idx 4: mmapping fd 13
libperf: idx 4: set output fd 14 -> 13
libperf: idx 5: mmapping fd 15
libperf: idx 5: set output fd 16 -> 15
libperf: idx 6: mmapping fd 17
libperf: idx 6: set output fd 18 -> 17
libperf: idx 7: mmapping fd 19
libperf: idx 7: set output fd 20 -> 19
<SNIP>
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
Fixes: ae4f8ae16a ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Reported-by: Tomáš Trnka <trnka@scm.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
wireless and bluetooth subtrees
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmMZwOMSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkCVMP/jUOqbRTzPOqV7+HhDM65Ww9Prb1WYKS
4o9Vi9DAL/DH8tXSDDtacE0GbtN6Mlr0kEYJE+lwn1hFsfGY1mzH2pcVjJqtZ7fh
6o2joaQMJ5lJe4dsr2k0TRtYhCzdeCrvzoTs2qLSeb5KYFOsxMtBikSF3kOJ1TZq
3bOK3OomiT/XO4FCnyPJvg8VBghkp69oNnefOavM8x7FzrMh6MY7VQem/IjnlIU2
9sHvMPLai81B1NXu2eybThEYSqutZHzM1PLuqIMhSMwdiSrVRlLFq7/BNP0FxCTR
/Mby+fnU4xIu+TK1bRoUM0fsipNceVEkXln4pQrRmxu1Yw62RFn5p+MGCM+Sc1UB
8HZzjtNRtlD+ZfRyQKs2m1+WJtbuESyzAeJaKoQ2eO2StmLqlnZLEYGl6oVQj6Uj
l3Bn1aBIOYrSM0T/qTNmAv7BTJYvpomlwqoQ5A1+5b0jKhn39Un3Yj3eU2aEv8eX
mJ80KAWk26Cc7ZA8WbL70Ac8wV76+TcpWCezVvrcZqrFoYAvicLrwEc92Fq0IiG5
bkA84zhd1TOFSuYVH7f79lSUPYHIs8FxFXIV8SZZGLbhj6nk607cwgbpA9IXizZT
CUajNoJclS1tELPG7VOv6DJS+VnpdsCm0+q3dnYTYiLlrEX71Bgc/ofgGOBKTeS3
iRyS2V0GLYoW
=DeAB
-----END PGP SIGNATURE-----
Merge tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc, netfilter, wireless and bluetooth
subtrees.
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for
of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after
enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in
rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N"
* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
sch_sfb: Also store skb len before calling child enqueue
net: phy: lan87xx: change interrupt src of link_up to comm_ready
net/smc: Fix possible access to freed memory in link clear
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
net: usb: qmi_wwan: add Quectel RM520N
net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
stmmac: intel: Simplify intel_eth_pci_remove()
net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
ipv6: sr: fix out-of-bounds read when setting HMAC data.
bonding: accept unsolicited NA message
bonding: add all node mcast address when slave up
bonding: use unspecified address if no available link local address
wifi: use struct_group to copy addresses
wifi: mac80211_hwsim: check length for virtio packets
...
Clarify the LKDTM FORTIFY tests, and add tests for the mem*() family of
functions, now that run-time checking is distinct.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Commit c272612cb4 ("kunit: Taint the kernel when KUnit tests are run")
added a new taint flag for when in-kernel tests run. This commit adds
recognition of this new flag in kernel-chktaint.
With this change the correct reason will be reported if the kernel is
tainted because of a test run.
Amended Commit log: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Joe Fradley <joefradley@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We add 2 new kfuncs that are following the RET_PTR_TO_MEM
capability from the previous commit.
Then we test them in selftests:
the first tests are testing valid case, and are not failing,
and the later ones are actually preventing the program to be loaded
because they are wrong.
To work around that, we mark the failing ones as not autoloaded
(with SEC("?tc")), and we manually enable them one by one, ensuring
the verifier rejects them.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-8-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We need to also export the kfunc set to the syscall program type,
and then add a couple of eBPF programs that are testing those calls.
The first one checks for valid access, and the second one is OK
from a static analysis point of view but fails at run time because
we are trying to access outside of the allocated memory.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-5-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similar to tools/testing/selftests/bpf/prog_tests/dynptr.c:
we declare an array of tests that we run one by one in a for loop.
Followup patches will add more similar-ish tests, so avoid a lot of copy
paste by grouping the declaration in an array.
For light skeletons, we have to rely on the offsetof() macro so we can
statically declare which program we are using.
In the libbpf case, we can rely on bpf_object__find_program_by_name().
So also change the Makefile to generate both light skeletons and normal
ones.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-2-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently syscall-abi permits the bits in Z registers not shared with the
V registers as well as all of the predicate registers to be preserved on
syscall but the actual implementation has always cleared them and our
documentation has now been updated to make that the documented ABI so
update the syscall-abi test to match.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220829162502.886816-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The buffer used for verifying SVE Z registers allocated enough space for
16 maximally sized registers rather than 32 due to using the macro for the
number of P registers. In practice this didn't matter since for historical
reasons the maximum VQ defined in the ABI is greater the architectural
maximum so we will always allocate more space than is needed even with
emulated platforms implementing the architectural maximum. Still, we should
use the right define.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220829162502.886816-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the core utilities for signal testing support handling data in
EXTRA_CONTEXT blocks we can test larger SVE and SME VLs which spill over
the limits in the base signal context. This is done by defining storage
for the context as a union with a ucontext_t and a buffer together with
some helpers for getting relevant sizes and offsets like we do for
fake_sigframe, this isn't the most lovely code ever but is fairly
straightforward to implement and much less invasive to the somewhat
unclear and indistinct layers of abstraction in the signal handling test
code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-11-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to allow testing of signal contexts that overflow the base signal
frame allow callers to pass the buffer size for the user context into
get_signal_context(). No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-10-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When preserving the signal context for later verification by testcases
check for and include any EXTRA_CONTEXT block if enough space has been
provided.
Since the EXTRA_CONTEXT block includes a pointer to the start of the
additional data block we need to do at least some fixup on the copied
data. For simplicity in users we do this by extending the length of
the EXTRA_CONTEXT to include the following termination record, this
will cause users to see the extra data as part of the linked list of
contexts without needing any special handling. Care will be needed if
any specific tests for EXTRA_CONTEXT are added beyond the validation
done in ASSERT_GOOD_CONTEXT.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-9-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently in validate_reserved() we check the basic form and contents of
an EXTRA_CONTEXT block but do not actually validate anything inside the
data block it provides. Extend the validation to do so, when we get to the
terminator for the main data block reset and start walking the extra data
block instead.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-8-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently for the more complex signal context types we validate the context
specific details the end of the parsing loop validate_reserved() if we've
ever seen a context of that type. This is currently merely a bit inefficient
but will get a bit awkward when we start parsing extra_context, at which
point we will need to reset the head to advance into the extra space that
extra_context provides. Instead only do the more detailed checks on each
context type the first time we see that context type.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-7-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Nothing outside testcases.c should need to use validate_extra_context(),
remove the prototype to ensure nothing does.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently in validate_extra_context() we assert both that the extra data
pointed to by the EXTRA_CONTEXT is 16 byte aligned and that it immediately
follows the struct _aarch64_ctx providing the terminator for the linked
list of contexts in the signal frame. Since struct _aarch64_ctx is an 8
byte structure which must be 16 byte aligned these cannot both be true. As
documented in sigcontext.h and implemented by the kernel the extra data
should be at the next 16 byte aligned address after the terminator so fix
the validation to match.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When arm64 signal context data overflows the base struct sigcontext it gets
placed in an extra buffer pointed to by a record of type EXTRA_CONTEXT in
the base struct sigcontext which is required to be the last record in the
base struct sigframe. The current validation code attempts to check this
by using GET_RESV_NEXT_HEAD() to step forward from the current record to
the next but that is a macro which assumes it is being provided with a
struct _aarch64_ctx and uses the size there to skip forward to the next
record. Instead validate_extra_context() passes it a struct extra_context
which has a separate size field. This compiles but results in us trying
to validate a termination record in completely the wrong place, at best
failing validation and at worst just segfaulting. Fix this by passing
the struct _aarch64_ctx we meant to into the macro.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In handle_input_signal_copyctx() we use ASSERT_GOOD_CONTEXT() to validate
that the context we are saving meets expectations however we do this on
the saved copy rather than on the actual signal context passed in. This
breaks validation of EXTRA_CONTEXT since we attempt to validate the ABI
requirement that the additional space supplied is immediately after the
termination record in the standard context which will not be the case
after it has been copied to another location.
Fix this by doing the validation before we copy. Note that nothing actually
looks inside the EXTRA_CONTEXT at present.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The za_regs signal test was enumerating the SVE vector lengths rather than
the SME vector lengths through cut'n'paste error when determining what to
test. Enumerate the SME vector lengths instead.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829160703.874492-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When ZA is disabled there should be no register data in the ZA signal
frame, add a test case which confirms that this is the case.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829155728.854947-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we accept any size for the ZA signal context that the shared
code will accept which means we don't verify that any data is present.
Since we have enabled ZA we know that there must be data so strengthen
the check to only accept a signal frame with data, and while we're at it
since we enabled ZA but did not set any data we know that ZA must contain
zeros, confirm that.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829155728.854947-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the stress test programs for floating point context switching are
run by hand, there are extremely simplistic harnesses which run some copies
of each test individually but they are not integrated into kselftest and
with SVE and SME they only run with whatever vector length the process has
by default. This is hassle when running the tests and means that they're
not being run at all by CI systems picking up kselftest.
In order to improve our coverage and provide a more convenient interface
provide a harness program which starts enough stress test programs up to
cause context switching and runs them for a set period. If only FPSIMD is
available in the system we start two copies of the FPSIMD stress test per
CPU, otherwise we start one copy of the FPSIMD and then start the SVE,
streaming SVE and ZA tests once per CPU for each available VL they have
to run on. We then run for a set period monitoring for any errors
reported by the test programs before cleanly terminating them.
In order to provide additional coverage of signal handling and some extra
noise in the scheduling we send a SIGUSR2 to the stress tests once a
second, the tests will count the number of signals they get.
Since kselftest is generally expected to run quickly we by default only run
for ten seconds. This is enough to show if there is anything cripplingly
wrong but not exactly a thorough soak test, for interactive and more
focused use a command line option -t N is provided which overrides the
length of time to run for (specified in seconds) and if 0 is specified then
there is no timeout and the test must be manually terminated. The timeout
is counted in seconds with no output, this is done to account for the
potentially slow startup time for the test programs on virtual platforms
which tend to struggle during startup as they are both slow and tend to
support a wide range of vector lengths.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154452.824870-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To interface more robustly with other processes install the signal handers
in the floating point stress tests before we produce any output, this
means that a parent process can know that if it has seen any output from
the test then the test is ready to handle incoming signals.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220906220056.820295-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are different flavors of 'nc' around, this script fails on
my test vm because 'nc' is 'nmap-ncat' which isn't 100% compatible.
Add socat support and use it if available.
Signed-off-by: Florian Westphal <fw@strlen.de>
Add tracing_struct test in DENYLIST.s390x since s390x does not
support trampoline now.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152723.2081551-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use BPF_PROG2 instead of BPF_PROG for programs in progs/timer.c
to test BPF_PROG2 for cases without struct arguments.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152718.2081091-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add various struct argument tests with fentry/fexit programs.
Also add one test with a kernel func which does not have any
argument to test BPF_PROG2 macro in such situation.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152713.2080039-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To support struct arguments in trampoline based programs,
existing BPF_PROG doesn't work any more since
the type size is needed to find whether a parameter
takes one or two registers. So this patch added a new
BPF_PROG2 macro to support such trampoline programs.
The idea is suggested by Andrii. For example, if the
to-be-traced function has signature like
typedef struct {
void *x;
int t;
} sockptr;
int blah(sockptr x, char y);
In the new BPF_PROG2 macro, the argument can be
represented as
__bpf_prog_call(
({ union {
struct { __u64 x, y; } ___z;
sockptr x;
} ___tmp = { .___z = { ctx[0], ctx[1] }};
___tmp.x;
}),
({ union {
struct { __u8 x; } ___z;
char y;
} ___tmp = { .___z = { ctx[2] }};
___tmp.y;
}));
In the above, the values stored on the stack are properly
assigned to the actual argument type value by using 'union'
magic. Note that the macro also works even if no arguments
are with struct types.
Note that new BPF_PROG2 works for both llvm16 and pre-llvm16
compilers where llvm16 supports bpf target passing value
with struct up to 16 byte size and pre-llvm16 will pass
by reference by storing values on the stack. With static functions
with struct argument as always inline, the compiler is able
to optimize and remove additional stack saving of struct values.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152707.2079473-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now instead of the number of arguments, the number of registers
holding argument values are stored in trampoline. Update
the description of bpf_get_func_arg[_cnt]() helpers. Previous
programs without struct arguments should continue to work
as usual.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152657.2078805-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-09-05
The following pull-request contains BPF updates for your *net-next* tree.
We've added 106 non-merge commits during the last 18 day(s) which contain
a total of 159 files changed, 5225 insertions(+), 1358 deletions(-).
There are two small merge conflicts, resolve them as follows:
1) tools/testing/selftests/bpf/DENYLIST.s390x
Commit 27e23836ce ("selftests/bpf: Add lru_bug to s390x deny list") in
bpf tree was needed to get BPF CI green on s390x, but it conflicted with
newly added tests on bpf-next. Resolve by adding both hunks, result:
[...]
lru_bug # prog 'printk': failed to auto-attach: -524
setget_sockopt # attach unexpected error: -524 (trampoline)
cb_refs # expected error message unexpected error: -524 (trampoline)
cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc)
htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
[...]
2) net/core/filter.c
Commit 1227c1771d ("net: Fix data-races around sysctl_[rw]mem_(max|default).")
from net tree conflicts with commit 29003875bd ("bpf: Change bpf_setsockopt(SOL_SOCKET)
to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from
bpf-next tree, result:
[...]
if (getopt) {
if (optname == SO_BINDTODEVICE)
return -EINVAL;
return sk_getsockopt(sk, SOL_SOCKET, optname,
KERNEL_SOCKPTR(optval),
KERNEL_SOCKPTR(optlen));
}
return sk_setsockopt(sk, SOL_SOCKET, optname,
KERNEL_SOCKPTR(optval), *optlen);
[...]
The main changes are:
1) Add any-context BPF specific memory allocator which is useful in particular for BPF
tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov.
2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort
to reuse the existing core socket code as much as possible, from Martin KaFai Lau.
3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector
with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani.
4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo.
5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF
integration with the rstat framework, from Yosry Ahmed.
6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev.
7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao.
8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF
backend, from James Hilliard.
9) Fix verifier helper permissions and reference state management for synchronous
callbacks, from Kumar Kartikeya Dwivedi.
10) Add support for BPF selftest's xskxceiver to also be used against real devices that
support MAC loopback, from Maciej Fijalkowski.
11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet.
12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu.
13) Various minor misc improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits)
bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.
bpf: Remove usage of kmem_cache from bpf_mem_cache.
bpf: Remove prealloc-only restriction for sleepable bpf programs.
bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
bpf: Remove tracing program restriction on map types
bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.
bpf: Add percpu allocation support to bpf_mem_alloc.
bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.
bpf: Adjust low/high watermarks in bpf_mem_cache
bpf: Optimize call_rcu in non-preallocated hash map.
bpf: Optimize element count in non-preallocated hash map.
bpf: Relax the requirement to use preallocated hash maps in tracing progs.
samples/bpf: Reduce syscall overhead in map_perf_test.
selftests/bpf: Improve test coverage of test_maps
bpf: Convert hash map to bpf_mem_alloc.
bpf: Introduce any context BPF specific memory allocator.
selftest/bpf: Add test for bpf_getsockopt()
bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
...
====================
Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently the floating point stress tests mostly support testing that the
data they are checking can be disrupted from a signal handler triggered by
SIGUSR1. This is not properly implemented for all the tests and in testing
is frequently modified to just handle the signal without corrupting data in
order to ensure that signal handling does not corrupt data. Directly support
this usage by installing a SIGUSR2 handler which simply counts the signal
delivery.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154452.824870-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since we now have an explicit test for the syscall ABI there is no need for
za-test to cover getpid() so just unconditionally do sched_yield() like we
do in fpsimd-test.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154452.824870-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add some trivial hwcap validation which checks that /proc/cpuinfo and
AT_HWCAP agree with each other and can verify that for extensions that can
generate a SIGILL due to adding new instructions one appears or doesn't
appear as expected. I've added SVE and SME, other capabilities can be
added later if this gets merged.
This isn't super exciting but on the other hand took very little time to
write and should be handy when verifying that you wired up AT_HWCAP
properly.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154602.827275-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Free allocated resources when zalloc() fails for members in c2c_he, to
prevent potential memory leak in c2c_he_zalloc().
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220906032906.21395-4-shangxiaojing@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Switch to the flavored EVP API like in test-libcrypto.c, and remove the
bad gcc #pragma.
Inspired-by: 5b245985a6 ("tools build: Switch to new openssl API for test-libcrypto")
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/CABwm_eTnARC1GwMD-JF176k8WXU1Z0+H190mvXn61yr369qt6g@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The cpu mask init code in "record__mmap_cpu_mask_init" function access
"bits" array part of "struct mmap_cpu_mask". The size of this array is
the value from cpu__max_cpu().cpu. This array is used to contain the
cpumask value for each cpu. While setting bit for each cpu, it calls
"set_bit" function which access index in "bits" array.
If we provide a command line option to -C which is greater than the
number of CPU's present in the system, the set_bit could access an array
member which is out-of the array size. This is because currently, there
is no boundary check for the CPU. This will result in seg fault:
<<>>
./perf record -C 12341234 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Segmentation fault (core dumped)
<<>>
Debugging with gdb, points to function flow as below:
<<>>
set_bit
record__mmap_cpu_mask_init
record__init_thread_default_masks
record__init_thread_masks
cmd_record
<<>>
Fix this by adding boundary check for the array.
After the patch:
<<>>
./perf record -C 12341234 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Failed to initialize parallel data streaming masks
<<>>
With this fix, if -C is given a non-exsiting CPU, perf
record will fail with:
<<>>
./perf record -C 50 ls
Failed to initialize parallel data streaming masks
<<>>
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220905141929.7171-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The affinity code in "affinity_set" function access array named
"sched_cpus". The size for this array is allocated in affinity_setup
function which is nothing but value from get_cpu_set_size. This is used
to contain the cpumask value for each cpu.
While setting bit for each cpu, it calls "set_bit" function which access
index in sched_cpus array. If we provide a command-line option to -C
which is more than the number of CPU's present in the system, the
set_bit could access an array member which is out-of the array size.
This is because currently, there is no boundary check for the CPU. This
will result in seg fault:
<<>>
./perf stat -C 12323431 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Segmentation fault (core dumped)
<<>>
Fix this by adding boundary check for the array.
After the fix from powerpc system:
<<>>
./perf stat -C 12323431 ls 1>out
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Performance counter stats for 'CPU(s) 12323431':
<not supported> msec cpu-clock
<not supported> context-switches
<not supported> cpu-migrations
<not supported> page-faults
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.001192373 seconds time elapsed
<<>>
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220905141929.7171-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add new event type for tap called gesture and the direction can be used
to differentiate single and double tap. This may be used by accelerometer
sensors to express single and double tap events. For directional tap,
modifiers like IIO_MOD_(X/Y/Z) can be used along with singletap and
doubletap direction.
Signed-off-by: Jagath Jog J <jagathjog1996@gmail.com>
Link: https://lore.kernel.org/r/20220831063117.4141-2-jagathjog1996@gmail.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Doing call_rcu() million times a second becomes a bottle neck.
Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU.
The rcu critical section is no longer observed for one htab element
which makes non-preallocated hash map behave just like preallocated hash map.
The map elements are released back to kernel memory after observing
rcu critical section.
This improves 'map_perf_test 4' performance from 100k events per second
to 250k events per second.
bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance
boost to non-preallocated hash map and make it within few % of preallocated map
while consuming fraction of memory.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220902211058.60789-8-alexei.starovoitov@gmail.com
Make test_maps more stressful with more parallelism in
update/delete/lookup/walk including different value sizes.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220902211058.60789-4-alexei.starovoitov@gmail.com
Add a test script test_cpuset_prs.sh with a helper program wait_inotify
for exercising the cpuset v2 partition root state code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Systems using the hash MMU with a 4K page size don't support 4PB address
space, so skip the test because the bug it tests for can't be triggered.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220901020215.254097-1-mpe@ellerman.id.au
The tests in alloc_nid_api can now run either memblock_alloc_try_nid()
or memblock_alloc_try_nid_raw(). The comment blocks for these tests
should not refer to a 'cleared' region since that only applies to
memblock_alloc_try_nid(). Remove 'cleared' from the comment blocks so
that the comments are accurate for either memblock function.
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/e8be24137e54e9f81a06af969ded82b319114d7a.1662264347.git.remckee0@gmail.com
The initialization is unnecessary, because ret is always assigned a new
value before reading it.
Signed-off-by: Shi junming <junming@nfschina.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch removes the __bpf_getsockopt() which directly
reads the sk by using PTR_TO_BTF_ID. Instead, the test now directly
uses the kernel bpf helper bpf_getsockopt() which supports all
the required optname now.
TCP_SAVE[D]_SYN and TCP_MAXSEG are not tested in a loop for all
the hooks and sock_ops's cb. TCP_SAVE[D]_SYN only works
in passive connection. TCP_MAXSEG only works when
it is setsockopt before the connection is established and
the getsockopt return value can only be tested after
the connection is established.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220902002937.2896904-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Westphal says:
====================
netfilter: bug fixes for net
1. Fix IP address check in irc DCC conntrack helper, this should check
the opposite direction rather than the destination address of the
packets' direction, from David Leadbeater.
2. bridge netfilter needs to drop dst references, from Harsh Modi.
This was fine back in the day the code was originally written,
but nowadays various tunnels can pre-set metadata dsts on packets.
3. Remove nf_conntrack_helper sysctl and the modparam toggle, users
need to explicitily assign the helpers to use via nftables or
iptables. Conntrack helpers, by design, may be used to add dynamic
port redirections to internal machines, so its necessary to restrict
which hosts/peers are allowed to use them.
It was discovered that improper checking in the irc DCC helper makes
it possible to trigger the 'please do dynamic port forward'
from outside by embedding a 'DCC' in a PING request; if the client
echos that back a expectation/port forward gets added.
The auto-assign-for-everything mechanism has been in "please don't do this"
territory since 2012. From Pablo.
4. Fix a memory leak in the netdev hook error unwind path, also from Pablo.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_conntrack_irc: Fix forged IP logic
netfilter: nf_tables: clean up hook list when offload flags check fails
netfilter: br_netfilter: Drop dst references before setting.
netfilter: remove nf_conntrack_helper sysctl and modparam toggles
====================
Link: https://lore.kernel.org/r/20220901071238.3044-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmMSIfUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmWyD/9vhN1GTy41KS+GoU4ljBCNx3ubigp9FqXc
baCFaDEQlsUuYyuaG/DAWghs/dDJc/Qk51cpii+QOrU2yVw6vvOO32P2rbSJYW+l
qlqxNm3Mmpr+amMc94RcL26r69av6l+6UA4HTgziWia2LR4QklvthmC+G65HqQRt
dbkWRb7KPqlmtFyeTQSzy0mgwwra66zyUUEpcif4Fhu1puV64MD7IGvR9INsS50r
Mey2Uo+ANvEKtqhhAFBPL7/LH49IyYJtLoJiH5PdCJzS0UTIUkb5FQoiu2gQTeF2
/2ihT6wqe4T75Zo2Ofm8qK7OQF2+vM2KQWBc7ytiBNat32ypT/MGS3YtRd1p3zqc
XTtfrtkzu7eoYGKCZfgZrRwKzfJuV9SjOwZEtfLgV42gTLnNkPZQCGsSPfPnVNeU
Lc4GCxmf95blNvlkJK5SCtZ+SQQ+zOFm9sPuiDtmjJ5yMxUd28n52dzxTXqfG8su
75Zgh+tH5rFK2Vi0tn8mWsm8/ZLwEdMoyG1OCi9eXWyfxXfR2iYj7PK9a/T9aTXn
cr5JwuGbfW6pnnoTq9CSKFknJSar4FiJMCkQuhtzBMmhl23gpjJifqs4PXWQpX+s
e2rL5B0/9DOKXmagTqPYtfBLvvmnNRgZ7wE+N3gLbPbtRPjb6XupOv3nudLA+rgR
keJh/vVhqQ==
=yXSa
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- A single fix for over-eager retries for networking (Pavel)
- Revert the notification slot support for zerocopy sends.
It turns out that even after more than a year or development and
testing, there's not full agreement on whether just using plain
ordered notifications is Good Enough to avoid the complexity of using
the notifications slots. Because of that, we decided that it's best
left to a future final decision.
We can always bring back this feature, but we can't really change it
or remove it once we've released 6.0 with it enabled. The reverts
leave the usual CQE notifications as the primary interface for
knowing when data was sent, and when it was acked. (Pavel)
* tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block:
selftests/net: return back io_uring zc send tests
io_uring/net: simplify zerocopy send user API
io_uring/notif: remove notif registration
Revert "io_uring: rename IORING_OP_FILES_UPDATE"
Revert "io_uring: add zc notification flush requests"
selftests/net: temporarily disable io_uring zc test
io_uring/net: fix overexcessive retries
-----BEGIN PGP SIGNATURE-----
iIYEABYIAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCYxILwxAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSdu4BANqLdLqVhylwJRjZS91rpxrtwp6bOTxtR6+Q
aSVtD2ZlAQCEl4/twUSO0mARkkprXXqNsjGcQ8wxN9JrxtcXAlaBCQ==
=uVMR
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
"This fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when
multiple rulesets/domains are stacked.
The expected behaviour was that an additional ruleset can only
restrict the set of permitted operations, but in this particular case,
it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER
right"
* tag 'landlock-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFER
The put lowers the reference count to 0 and frees ctx, reading it
afterwards is invalid. Move the put after the uses and determine the
last use by the reference count being 1.
Fixes: 39e940d4ab ("selftests/xsk: Destroy BPF resources only when ctx refcount drops to 0")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901202645.1463552-1-irogers@google.com
BPF object files are, in a way, the final artifact produced as part of
the ahead-of-time compilation process. That makes them somewhat special
compared to "regular" object files, which are a intermediate build
artifacts that can typically be removed safely. As such, it can make
sense to name them differently to make it easier to spot this difference
at a glance.
Among others, libbpf-bootstrap [0] has established the extension .bpf.o
for BPF object files. It seems reasonable to follow this example and
establish the same denomination for selftest build artifacts. To that
end, this change adjusts the corresponding part of the build system and
the test programs loading BPF object files to work with .bpf.o files.
[0] https://github.com/libbpf/libbpf-bootstrap
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220901222253.1199242-1-deso@posteo.net
Introduce new mode to xdpxceiver responsible for testing AF_XDP zero
copy support of driver that serves underlying physical device. When
setting up test suite, determine whether driver has ZC support or not by
trying to bind XSK ZC socket to the interface. If it succeeded,
interpret it as ZC support being in place and do softirq and busy poll
tests for zero copy mode.
Note that Rx dropped tests are skipped since ZC path is not touching
rx_dropped stat at all.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-7-maciej.fijalkowski@intel.com
For single threaded poll tests call pthread_kill() from main thread so
that we are sure worker thread has finished its job and it is possible
to proceed with next test types from test suite. It was observed that on
some platforms it takes a bit longer for worker thread to exit and next
test case sees device as busy in this case.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-6-maciej.fijalkowski@intel.com
Currently, architecture of xdpxceiver is designed strictly for
conducting veth based tests. Veth pair is created together with a
network namespace and one of the veth interfaces is moved to the
mentioned netns. Then, separate threads for Tx and Rx are spawned which
will utilize described setup.
Infrastructure described in the paragraph above can not be used for
testing AF_XDP support on physical devices. That testing will be
conducted on a single network interface and same queue. Xskxceiver
needs to be extended to distinguish between veth tests and physical
interface tests.
Since same iface/queue id pair will be used by both Tx/Rx threads for
physical device testing, Tx thread, which happen to run after the Rx
thread, is going to create XSK socket with shared umem flag. In order to
track this setting throughout the lifetime of spawned threads, introduce
'shared_umem' boolean variable to struct ifobject and set it to true
when xdpxceiver is run against physical device. In such case, UMEM size
needs to be doubled, so half of it will be used by Rx thread and other
half by Tx thread. For two step based test types, value of XSKMAP
element under key 0 has to be updated as there is now another socket for
the second step. Also, to avoid race conditions when destroying XSK
resources, move this activity to the main thread after spawned Rx and Tx
threads have finished its job. This way it is possible to gracefully
remove shared umem without introducing synchronization mechanisms.
To run xsk selftests suite on physical device, append "-i $IFACE" when
invoking test_xsk.sh. For veth based tests, simply skip it. When "-i
$IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both
interfaces supplied to xdpxceiver, which in turn will interpret that
this execution of test suite is for a physical device.
Note that currently this makes it possible only to test SKB and DRV mode
(in case underlying device has native XDP support). ZC testing support
is added in a later patch.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-5-maciej.fijalkowski@intel.com
So that "enp240s0f0" or such name can be used against xskxceiver.
While at it, also extend character count for netns name.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-4-maciej.fijalkowski@intel.com
In order to prepare xdpxceiver for physical device testing, let us
introduce default Rx pkt stream. Reason for doing it is that physical
device testing will use a UMEM with a doubled size where half of it will
be used by Tx and other half by Rx. This means that pkt addresses will
differ for Tx and Rx streams. Rx thread will initialize the
xsk_umem_info::base_addr that is added here so that pkt_set(), when
working on Rx UMEM will add this offset and second half of UMEM space
will be used. Note that currently base_addr is 0 on both sides. Future
commit will do the mentioned initialization.
Previously, veth based testing worked on separate UMEMs, so single
default stream was fine.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-3-maciej.fijalkowski@intel.com
Currently, xdpxceiver assumes that underlying device supports XDP in
native mode - it is fine by now since tests can run only on a veth pair.
Future commit is going to allow running test suite against physical
devices, so let us query the device if it is capable of running XDP
programs in native mode. This way xdpxceiver will not try to run
TEST_MODE_DRV if device being tested is not supporting it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-2-maciej.fijalkowski@intel.com
This change fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right
when multiple rulesets/domains are stacked. The expected behaviour was
that an additional ruleset can only restrict the set of permitted
operations, but in this particular case, it was potentially possible to
re-gain the LANDLOCK_ACCESS_FS_REFER right.
With the introduction of LANDLOCK_ACCESS_FS_REFER, we added the first
globally denied-by-default access right. Indeed, this lifted an initial
Landlock limitation to rename and link files, which was initially always
denied when the source or the destination were different directories.
This led to an inconsistent backward compatibility behavior which was
only taken into account if no domain layer were using the new
LANDLOCK_ACCESS_FS_REFER right. However, when restricting a thread with
a new ruleset handling LANDLOCK_ACCESS_FS_REFER, all inherited parent
rulesets/layers not explicitly handling LANDLOCK_ACCESS_FS_REFER would
behave as if they were handling this access right and with all their
rules allowing it. This means that renaming and linking files could
became allowed by these parent layers, but all the other required
accesses must also be granted: all layers must allow file removal or
creation, and renaming and linking operations cannot lead to privilege
escalation according to the Landlock policy. See detailed explanation
in commit b91c3e4ea7 ("landlock: Add support for file reparenting with
LANDLOCK_ACCESS_FS_REFER").
To say it another way, this bug may lift the renaming and linking
limitations of the initial Landlock version, and a same ruleset can
enforce different restrictions depending on previous or next enforced
ruleset (i.e. inconsistent behavior). The LANDLOCK_ACCESS_FS_REFER right
cannot give access to data not already allowed, but this doesn't follow
the contract of the first Landlock ABI. This fix puts back the
limitation for sandboxes that didn't opt-in for this additional right.
For instance, if a first ruleset allows LANDLOCK_ACCESS_FS_MAKE_REG on
/dst and LANDLOCK_ACCESS_FS_REMOVE_FILE on /src, renaming /src/file to
/dst/file is denied. However, without this fix, stacking a new ruleset
which allows LANDLOCK_ACCESS_FS_REFER on / would now permit the
sandboxed thread to rename /src/file to /dst/file .
This change fixes the (absolute) rule access rights, which now always
forbid LANDLOCK_ACCESS_FS_REFER except when it is explicitly allowed
when creating a rule.
Making all domain handle LANDLOCK_ACCESS_FS_REFER was an initial
approach but there is two downsides:
* it makes the code more complex because we still want to check that a
rule allowing LANDLOCK_ACCESS_FS_REFER is legitimate according to the
ruleset's handled access rights (i.e. ABI v1 != ABI v2);
* it would not allow to identify if the user created a ruleset
explicitly handling LANDLOCK_ACCESS_FS_REFER or not, which will be an
issue to audit Landlock.
Instead, this change adds an ACCESS_INITIALLY_DENIED list of
denied-by-default rights, which (only) contains
LANDLOCK_ACCESS_FS_REFER. All domains are treated as if they are also
handling this list, but without modifying their fs_access_masks field.
A side effect is that the errno code returned by rename(2) or link(2)
*may* be changed from EXDEV to EACCES according to the enforced
restrictions. Indeed, we now have the mechanic to identify if an access
is denied because of a required right (e.g. LANDLOCK_ACCESS_FS_MAKE_REG,
LANDLOCK_ACCESS_FS_REMOVE_FILE) or if it is denied because of missing
LANDLOCK_ACCESS_FS_REFER rights. This may result in different errno
codes than for the initial Landlock version, but this approach is more
consistent and better for rename/link compatibility reasons, and it
wasn't possible before (hence no backport to ABI v1). The
layout1.rename_file test reflects this change.
Add 4 layout1.refer_denied_by_default* test suites to check that the
behavior of a ruleset not handling LANDLOCK_ACCESS_FS_REFER (ABI v1) is
unchanged even if another layer handles LANDLOCK_ACCESS_FS_REFER (i.e.
ABI v1 precedence). Make sure rule's absolute access rights are correct
by testing with and without a matching path. Add test_rename() and
test_exchange() helpers.
Extend layout1.inval tests to check that a denied-by-default access
right is not necessarily part of a domain's handled access rights.
Test coverage for security/landlock is 95.3% of 599 lines according to
gcc/gcov-11.
Fixes: b91c3e4ea7 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Reviewed-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20220831203840.1370732-1-mic@digikod.net
Cc: stable@vger.kernel.org
[mic: Constify and slightly simplify test helpers]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Get the tunnel flags in {ipv6}vxlan_get_tunnel_src and ensure they are
aligned with tunnel params set at {ipv6}vxlan_set_tunnel_dst.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-2-shmulik.ladkani@gmail.com
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
(id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
tun_flags to the BPF program.
It makes sense to expose tun_flags to the BPF program.
Assume for example multiple GRE tunnels maintained on a single GRE
interface in collect_md mode. The program expects origins to initiate
over GRE, however different origins use different GRE characteristics
(e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
some do not, etc..).
A BPF program getting tun_flags can therefore remember the relevant
flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In
the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
to correctly reply to the remote, using similar characteristics, based
on the stored tunnel flags.
Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If
specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.
Decided to use the existing unused 'tunnel_ext' as the storage for the
'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.
Also, the following has been considered during the design:
1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
and place into the new 'tunnel_flags' field. This has 2 drawbacks:
- The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
e.g. BPF_F_ZERO_CSUM_TX. It is awkward that it is "returned" into
tunnel_flags from a *get_tunnel_key* call.
- Not all "interesting" TUNNEL_xxx flags can be mapped to existing
BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
flags just for purposes of the returned tunnel_flags.
2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
"interesting" flags. That's ok, but the drawback is that what's
"interesting" for my usecase might be limiting for other usecases.
Therefore I decided to expose what's in key.tun_flags *as is*, which seems
most flexible. The BPF user can just choose to ignore bits he's not
interested in. The TUNNEL_xxx are also UAPI, so no harm exposing them
back in the get_tunnel_key call.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-1-shmulik.ladkani@gmail.com
This has been validated on the Ocelot/Felix switch family (NXP LS1028A)
and should be relevant to any switch driver that offloads the tc-flower
and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has
operations.
TEST: gact drop and ok (skip_hw) [ OK ]
TEST: mirred egress flower redirect (skip_hw) [ OK ]
TEST: mirred egress flower mirror (skip_hw) [ OK ]
TEST: mirred egress matchall mirror (skip_hw) [ OK ]
TEST: mirred_egress_to_ingress (skip_hw) [ OK ]
TEST: gact drop and ok (skip_sw) [ OK ]
TEST: mirred egress flower redirect (skip_sw) [ OK ]
TEST: mirred egress flower mirror (skip_sw) [ OK ]
TEST: mirred egress matchall mirror (skip_sw) [ OK ]
TEST: trap (skip_sw) [ OK ]
TEST: mirred_egress_to_ingress (skip_sw) [ OK ]
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220831170839.931184-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Under full preemptible kernel, task local storage lookup operations on
the same CPU may update per-cpu bpf_task_storage_busy concurrently. If
the update of bpf_task_storage_busy is not preemption safe, the final
value of bpf_task_storage_busy may become not-zero forever and
bpf_task_storage_trylock() will always fail. So add a test case to
ensure the update of bpf_task_storage_busy is preemption safe.
Will skip the test case when CONFIG_PREEMPT is disabled, and it can only
reproduce the problem probabilistically. By increasing
TASK_STORAGE_MAP_NR_LOOP and running it under ARM64 VM with 4-cpus, it
takes about four rounds to reproduce:
> test_maps is modified to only run test_task_storage_map_stress_lookup()
$ export TASK_STORAGE_MAP_NR_THREAD=256
$ export TASK_STORAGE_MAP_NR_LOOP=81920
$ export TASK_STORAGE_MAP_PIN_CPU=1
$ time ./test_maps
test_task_storage_map_stress_lookup(135):FAIL:bad bpf_task_storage_busy got -2
real 0m24.743s
user 0m6.772s
sys 0m17.966s
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20220901061938.3789460-5-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
sys_pidfd_open() is defined twice in both test_bprm_opts.c and
test_local_storage.c, so move it to a common header file. And it will be
used in map_tests as well.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20220901061938.3789460-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The IP_UNICAST_IF socket option is used to set the outgoing interface
for outbound packets.
The IP_UNICAST_IF socket option was added as it was needed by the
Wine project, since no other existing option (SO_BINDTODEVICE socket
option, IP_PKTINFO socket option or the bind function) provided the
needed characteristics needed by the IP_UNICAST_IF socket option. [1]
The IP_UNICAST_IF socket option works well for unconnected sockets,
that is, the interface specified by the IP_UNICAST_IF socket option
is taken into consideration in the route lookup process when a packet
is being sent. However, for connected sockets, the outbound interface
is chosen when connecting the socket, and in the route lookup process
which is done when a packet is being sent, the interface specified by
the IP_UNICAST_IF socket option is being ignored.
This inconsistent behavior was reported and discussed in an issue
opened on systemd's GitHub project [2]. Also, a bug report was
submitted in the kernel's bugzilla [3].
To understand the problem in more detail, we can look at what happens
for UDP packets over IPv4 (The same analysis was done separately in
the referenced systemd issue).
When a UDP packet is sent the udp_sendmsg function gets called and
the following happens:
1. The oif member of the struct ipcm_cookie ipc (which stores the
output interface of the packet) is initialized by the ipcm_init_sk
function to inet->sk.sk_bound_dev_if (the device set by the
SO_BINDTODEVICE socket option).
2. If the IP_PKTINFO socket option was set, the oif member gets
overridden by the call to the ip_cmsg_send function.
3. If no output interface was selected yet, the interface specified
by the IP_UNICAST_IF socket option is used.
4. If the socket is connected and no destination address is
specified in the send function, the struct ipcm_cookie ipc is not
taken into consideration and the cached route, that was calculated in
the connect function is being used.
Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken
into consideration.
This patch corrects the behavior of the IP_UNICAST_IF socket option
for connect()ed sockets by taking into consideration the
IP_UNICAST_IF sockopt when connecting the socket.
In order to avoid reconnecting the socket, this option is still
ignored when applied on an already connected socket until connect()
is called again by the Richard Gobert.
Change the __ip4_datagram_connect function, which is called during
socket connection, to take into consideration the interface set by
the IP_UNICAST_IF socket option, in a similar way to what is done in
the udp_sendmsg function.
[1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/
[2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018
[3] https://bugzilla.kernel.org/show_bug.cgi?id=210255
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829111554.GA1771@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One test demonstrates the reentrancy of hash map update on the same
bucket should fail, and another one shows concureently updates of
the same hash map bucket should succeed and not fail due to
the reentrancy checking for bucket lock.
There is no trampoline support on s390x, so move htab_update to
denylist.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch adds a test to ensure bpf_setsockopt(TCP_CONGESTION, "not_exist")
will not trigger the kernel module autoload.
Before the fix:
[ 40.535829] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
[...]
[ 40.552134] tcp_ca_find_autoload.constprop.0+0xcb/0x200
[ 40.552689] tcp_set_congestion_control+0x99/0x7b0
[ 40.553203] do_tcp_setsockopt+0x3ed/0x2240
[...]
[ 40.556041] __bpf_setsockopt+0x124/0x640
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231953.792412-1-martin.lau@linux.dev
This is the result of `sort tools/testing/selftests/net/.gitignore`, but
preserving the comment at the top.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.
To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.
We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
Commit 1034b03e54 ("selftests: xsk: Simplify cleanup of ifobjects")
removed close on netns fd, which is not correct, so let us restore it.
Fixes: 1034b03e54 ("selftests: xsk: Simplify cleanup of ifobjects")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220830133905.9945-1-maciej.fijalkowski@intel.com
Hongtao Yu reported problem when displaying uregs in perf script
for system wide perf.data:
# perf script -F uregs | head -10
Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field.
The problem is the extra dummy event added for system wide,
which does not have proper sample_type setup.
Skipping attr check completely for dummy event as suggested
by Namhyung, because it does not have any samples anyway.
Reported-by: Hongtao Yu <hoy@fb.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220831124041.219925-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previous behavior is to segfault if there is no CPU PMU table and a
metric is sought. To reproduce compile with NO_JEVENTS=1 then request a
metric, for example, "perf stat -M IPC true".
Committer testing:
Before:
$ make -k NO_JEVENTS=1 BUILD_BPF_SKEL=1 O=/tmp/build/perf-urgent -C tools/perf install-bin
$ perf stat -M IPC true
Segmentation fault (core dumped)
$
After:
$ perf stat -M IPC true
Usage: perf stat [<options>] [<command>]
-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)
$
Fixes: 00facc7609 ("perf jevents: Switch build to use jevents.py")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <rogers.email@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220830164846.401143-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After running the nolibc tests, the "git status" is not clean because
the generated files are not ignored. Create a `.gitignore` inside the
selftests/nolibc directory to ignore them.
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Cc: Fernanda Ma'rouf <fernandafmr2@gmail.com>
Signed-off-by: Fernanda Ma'rouf <fernandafmr12@gnuweeb.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
It presents the supported targets, and becomes the default target to
save the user from having to read the makefile. The "all" target was
placed after it and now points to "run" to do everything since it's
no longer the default one.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
It's not convenient to rely on a sysroot built in another directory,
especially when running cross-compilation tests, where one has to
switch back and forth between directories.
Let's make it possible to install the sysroot directly in the test
directory. It's not big and even benefits from being copied by arch
so that it's easier to switch between archs if needed. The new
"sysroot" target does this, it just calls "headers_standalone" from
nolibc to install the sysroot right here.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The "run" target will build the kernel and start it in QEMU. The
"rerun" target will not have the kernel dependency and will just try
to start QEMU. The QEMU architecture used to start the kernel is
derived from the configured ARCH. This might need to be improved
for archs which include different variants under the same name
(mips vs mipsel, +/-64, riscv32 vs riscv64). This could be tested
for i386, x86, arm, arm64, mips and riscv (the later two reporting
issues on some tests).
It is possible to pass a test specification for nolibc-test in the TEST
variable, which will be passed as-is as NOLIBC_TEST.
On success, the number of successful tests is printed. On failure, failed
lines are individually printed.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
While most archs will work fine with "make defconfig", not all will
do, and it's not always easy to remember the most suitable choice to
use for a specific architecture.
This adds a "defconfig" target to the Makefile so that one may easily
run "make -C ... defconfig" and make sure to clean and rebuild a fresh
config. This is *not* used by default because we want to preserve the
user's config by default.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The "kernel" target rebuilds the kernel with the current config for the
selected arch, with an initramfs containing the nolibc-test utility.
Since image names depend on the architecture, the currently supported
ones are referenced and resolved based on the architecture.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Adding support for glibc can be useful to distinguish between bugs in
nolibc and bugs in the kernel when a syscall reports an unusual value.
It's not that much work and should not affect the long term
maintainability of the tests. The necessary changes can essentially be
summed up like this:
- set _GNU_SOURCE a the top to access some definitions
- many includes added when we know we don't come from nolibc (missing
the stdio include guard)
- disable gettid() which is not exposed by glibc
- disable gettimeofday's support of bad pointers since these crash
in glibc
- add a simple itoa() for errorname(); strerror() is too verbose (no
way to get short messages). strerrorname_np() was added in modern
glibc (2.32) to do exactly this but that 's too recent to be usable
as the default fallback.
- use the standard ioperm() definition. May be we need to implement
ioperm() in nolibc if that's useful.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If /proc is not available (program run inside a chroot or without
sufficient permissions), it's better to disable the associated tests.
Some will be preserved like the ones which check for a failure to
create some entries there since they're still supposed to fail.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Most of the time the program will be run alone in an initramfs. There
is no value in requiring the user to populate /dev and /proc for such
tests, we can do it ourselves, and it participates to the tests at the
same time.
What's done here is that when called as init (getpid()==1) we check
if /dev exists or create it, if /dev/console and /dev/null exists,
otherwise we try to mount a devtmpfs there, and if it fails we fall
back to mknod. The console is reopened if stdout was closed. Finally
/proc is created and mounted if /proc/self cannot be found. This is
sufficient for most tests.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
QEMU, when started with "-device isa-debug-exit -no-reboot" will exit
with status code 2N+1 when N is written to 0x501. This is particularly
convenient for automated tests but this is not portable. As such we
only enable this on x86_64 when pid==1. In addition, this requires an
ioperm() call but in order not to have to define arch-specific syscalls
we just perform the syscall by hand there.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The idea is to ease automated testing under qemu. If the test succeeds
while running as PID 1, indicating the system was booted with init=/test,
let's just power off so that qemu can exit with a successful code. In
other situations it will exit and provoke a panic, which may be caught
for example with CONFIG_PVPANIC.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The test series called "stdlib" covers some libc functions (string,
stdlib etc). By default they are automatically run after "syscall"
but may be requested in argument or in variable NOLIBC_TEST.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This adds 63 tests covering about 34 syscalls. Both successes and
failures are tested. Two tests fail when run as unprivileged user
(link_dir which returns EACCESS instead of EPERM, and chroot which
returns EPERM). One test (execve("/")) expects to fail on EACCESS,
but needs to have valid arguments otherwise the kernel will log a
message. And a few tests require /proc to be mounted.
The code is not pretty since all tests are one-liners, sometimes
resulting in long lines, especially when using compount statements to
preset a line, but it's convenient and doesn't obfuscate the code,
which is important to understand what failed.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
It now becomes possible to pass a string either in argv[1] or in the
NOLIBC_TEST environment variable (the former having precedence), to
specify which tests to run. The format is:
testname[:range]*[,testname...]
Where a range is either a single value or the min and max numbers of the
test IDs in a sequence, delimited by a dash. Multiple ranges are possible.
This should provide enough flexibility to focus on certain failing parts
just by playing with the boot command line in a boot loader or in qemu
depending on what is accessible.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This creates a "nolibc" selftest that intends to test various parts of
the nolibc component, both in terms of build and execution for a given
architecture.
The aim is for it to be as simple to run as a kernel build, by just
passing the compiler (for the build) and the ARCH (for kernel and
execution).
It brings a basic squeleton made of a single C file that will ease testing
and error reporting. The code will be arranged so that it remains easy to
add basic tests for syscalls or library calls that may rely on a condition
to be executed, and whose result is compared to a value or to an error
with a specific errno value.
Tests will just use a relative line number in switch/case statements as
an index, saving the user from having to maintain arrays and complicated
functions which can often just be one-liners.
MAINTAINERS was updated.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
__NR_mmap2 was used for i386 but it's also needed for other archs such
as RISCV32 or ARM. Let's decide to use it based on the __NR_mmap2
definition as it's not defined on other archs.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We return -ENOSYS when there's no syscall6() operation, but we must cast
it to void* to avoid a warning.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The "ld a0, 0(sp)" instruction doesn't build on RISCV32 because that
would load a 64-bit value into a 32-bit register. But argc 32-bit,
not 64, so we ought to use "lw" here. Tested on both RISCV32 and
RISCV64.
Cc: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
As discussed, clarify LKMM not recognizing certain kinds of orderings.
In particular, highlight the fact that LKMM might deliberately make
weaker guarantees than compilers and architectures.
[ paulmck: Fix whitespace issue noted by checkpatch.pl. ]
Link: https://lore.kernel.org/all/YpoW1deb%2FQeeszO1@ethstick13.dse.in.tum.de/T/#u
Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul Heidekrüger <paul.heidekrueger@in.tum.de>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Charalampos Mainas <charalampos.mainas@gmail.com>
Cc: Pramod Bhatotia <pramod.bhatotia@in.tum.de>
Cc: Soham Chakraborty <s.s.chakraborty@tudelft.nl>
Cc: Martin Fink <martin.fink@in.tum.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
__nf_ct_try_assign_helper() remains in place but it now requires a
template to configure the helper.
A toggle to disable automatic helper assignment was added by:
a900689264 ("netfilter: nf_ct_helper: allow to disable automatic helper assignment")
in 2012 to address the issues described in "Secure use of iptables and
connection tracking helpers". Automatic conntrack helper assignment was
disabled by:
3bb398d925 ("netfilter: nf_ct_helper: disable automatic helper assignment")
back in 2016.
This patch removes the sysctl and modparam toggles, users now have to
rely on explicit conntrack helper configuration via ruleset.
Update tools/testing/selftests/netfilter/nft_conntrack_helper.sh to
check that auto-assignment does not happen anymore.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Support dumping info of a cgroup_iter link. This includes
showing the cgroup's id and the order for walking the cgroup
hierarchy. Example output is as follows:
> bpftool link show
1: iter prog 2 target_name bpf_map
2: iter prog 3 target_name bpf_prog
3: iter prog 12 target_name cgroup cgroup_id 72 order self_only
> bpftool -p link show
[{
"id": 1,
"type": "iter",
"prog_id": 2,
"target_name": "bpf_map"
},{
"id": 2,
"type": "iter",
"prog_id": 3,
"target_name": "bpf_prog"
},{
"id": 3,
"type": "iter",
"prog_id": 12,
"target_name": "cgroup",
"cgroup_id": 72,
"order": "self_only"
}
]
Signed-off-by: Hao Luo <haoluo@google.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220829231828.1016835-1-haoluo@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Add tests for memblock_trim_memory() for the following scenarios:
- all regions aligned
- one unaligned region that is smaller than the alignment
- one unaligned region that is unaligned at the base
- one unaligned region that is unaligned at the end
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/0e5f55154a3b66581e04ba3717978795cbc08a5b.1661578349.git.remckee0@gmail.com
Update memblock_alloc_try_nid() tests so that they test either
memblock_alloc_try_nid() or memblock_alloc_try_nid_raw() depending on the
value of alloc_nid_test_flags. Run through all the existing tests in
alloc_nid_api twice: once for memblock_alloc_try_nid() and once for
memblock_alloc_try_nid_raw().
When the tests run memblock_alloc_try_nid(), they test that the entire
memory region is zero. When the tests run memblock_alloc_try_nid_raw(),
they test that the entire memory region is nonzero. The content of the
memory region is initialized to nonzero, and we expect it to remain
unchanged if running memblock_alloc_try_nid_raw().
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/6fa8938f67872841c10a00afb042947d1d280a04.1661578349.git.remckee0@gmail.com
Update memblock_alloc() tests so that they test either memblock_alloc()
or memblock_alloc_raw() depending on the value of alloc_test_flags. Run
through all the existing tests in memblock_alloc_api twice: once for
memblock_alloc() and once for memblock_alloc_raw().
When the tests run memblock_alloc(), they test that the entire memory
region is zero. When the tests run memblock_alloc_raw(), they test that
the entire memory region is nonzero. The content of the memory region is
initialized to nonzero, and we expect it to remain unchanged if running
memblock_alloc_raw().
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/5a7cfb2f807ee2cb53ee77f9f5c910107b253d6e.1661578349.git.remckee0@gmail.com
Add tests for memblock_add(), memblock_reserve(), memblock_remove(),
memblock_free(), and memblock_alloc() for the following test scenarios.
memblock_add() and memblock_reserve():
- add/reserve a memory block in the gap between two existing memory
blocks, and check that the blocks are merged into one region
- try to add/reserve memblock regions that extend past PHYS_ADDR_MAX
memblock_remove() and memblock_free():
- remove/free a region when it is the only available region
+ These tests ensure that the first region is overwritten with a
"dummy" region when the last remaining region of that type is
removed or freed.
- remove/free() a region that overlaps with two existing regions of the
relevant type
- try to remove/free memblock regions that extend past PHYS_ADDR_MAX
memblock_alloc():
- try to allocate a region that is larger than the total size of available
memory (memblock.memory)
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/c23c0393c5b9a53fe7f676996913c629495e9727.1661578349.git.remckee0@gmail.com
Generic tests for memblock_alloc*() functions do not use separate
functions for testing top-down and bottom-up allocation directions.
Therefore, the function name that is displayed in the verbose testing
output does not include the allocation direction.
Add an additional prefix when running generic tests for
memblock_alloc*() functions that indicates which allocation direction is
set. The prefix will be displayed when the tests are run in verbose mode.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/fb76a42253d2a196a7daea29dd8121a69904f58e.1661578349.git.remckee0@gmail.com
Update the assert in memblock_alloc_try_nid() and memblock_alloc_from()
tests that checks whether the memory is cleared so that it checks the
entire chunk of allocated memory instead of just the first byte.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/24b3271751756100142e65b75284d43b4d30c9b7.1661578349.git.remckee0@gmail.com
Add an assert in memblock_alloc() tests where allocation is expected to
occur. The assert checks whether the entire chunk of allocated memory is
cleared.
The current memblock_alloc() tests do not check whether the allocated
memory was zeroed. memblock_alloc() should zero the allocated memory since
it is a wrapper for memblock_alloc_try_nid().
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/83ffb941b65074f40eb14552f8bfe5b71fe50abd.1661578349.git.remckee0@gmail.com
The VERBOSE build option was replaced with the --verbose runtime option,
but the comments describing the ASSERT_*() macros still refer to the
VERBOSE build option. Update these comments so that they refer to the
--verbose runtime option.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/5f8a4c2bde34cc029282c68d47eda982d950f421.1660451025.git.remckee0@gmail.com
Add a help command line option to the help message. Add the help option
to the short and long options so it will be recognized as a valid
option.
Usage:
$ ./main -h
Or:
$ ./main --help
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/0f3b93a79de78c0da1ca90f74fe35e9a85c7cf93.1660451025.git.remckee0@gmail.com
There is a potential for us to hit a type conflict when including
netinet/tcp.h and sys/socket.h, we can replace both of these includes
with linux/tcp.h and bpf_tcp_helpers.h to avoid this conflict.
Fixes errors like the below when compiling with gcc BPF backend:
In file included from /usr/include/netinet/tcp.h:91,
from progs/connect4_prog.c:11:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/connect4_prog.c:10:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829154710.3870139-1-james.hilliard1@gmail.com
There is a potential for us to hit a type conflict when including
netinet/tcp.h with sys/socket.h, we can remove these as they are not
actually needed.
Fixes errors like the below when compiling with gcc BPF backend:
In file included from /usr/include/netinet/tcp.h:91,
from progs/bind4_prog.c:10:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/bind4_prog.c:9:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220826052925.980431-1-james.hilliard1@gmail.com
- Fix PAT on Xen, which caused i915 driver failures
- Fix compat INT 80 entry crash on Xen PV guests
- Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs
- Fix RSB stuffing regressions
- Fix ORC unwinding on ftrace trampolines
- Add Intel Raptor Lake CPU model number
- Fix (work around) a SEV-SNP bootloader bug providing bogus values in
boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups.
- Fix SEV-SNP early boot failure
- Fix the objtool list of noreturn functions and annotate snp_abort(),
which bug confused objtool on gcc-12.
- Fix the documentation for retbleed
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLgUERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hU1hAAj9bKO+D07gROpkRhXpLvbqm+mUqk6It8
qEuyLkC/xvD9N//Pya0ZPPE+l23EHQxM/L0tXCYwsc8Zlx3fr678FQktHZuxULaP
qlGmwub8PvsyMesXBGtgHJ4RT8L07FelBR+E/TpqCSbv/geM7mcc0ojyOKo+OmbD
vbsUTDNqsMYndzePUBrv/vJRqVLGlbd1a22DKE/YiYBbZu5hughNUB0bHYhYdsml
mutcRsVTKRbZOi4wiuY/pnveMf/z9wAwKOWd9EBaDWILygVSHkBJJQyhHigBh3F8
H1hFn1q4ybULdrAUT4YND9Tjb6U8laU0fxg8Z43bay7bFXXElqoj3W2qka9NjAim
QvWSFvTYQRTIWt6sCshVsUBWj6fBZdHxcJ8Fh+ucJJp+JWl9/aD0A41vK2Jx0LIt
p8YzBRKEqd1Q/7QD855BArN7HNtQGgShNt4oPVZ7nPnjuQt0+lngu8NR+6X3RKpX
r8rordyPvzgPL6W+1uV+8hnz1w+YU2xplAbK+zTijwgJVgyf8khSlZQNpFKULrou
zjtzo/2nB+4C4bvfetNnaOGhi1/AdCHZHyZE35rotpd73SLHvdOrH0Ll9oCVfVrC
UWbC1E67cHQw97Ni/4CrCsJRBULK01uyszVCxlEkSYInf0UsKlnmy+TqxizwsCVy
reYQ0ePyWg0=
=ZpCJ
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix PAT on Xen, which caused i915 driver failures
- Fix compat INT 80 entry crash on Xen PV guests
- Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs
- Fix RSB stuffing regressions
- Fix ORC unwinding on ftrace trampolines
- Add Intel Raptor Lake CPU model number
- Fix (work around) a SEV-SNP bootloader bug providing bogus values in
boot_params->cc_blob_address, by ignoring the value on !SEV-SNP
bootups.
- Fix SEV-SNP early boot failure
- Fix the objtool list of noreturn functions and annotate snp_abort(),
which bug confused objtool on gcc-12.
- Fix the documentation for retbleed
* tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/ABI: Mention retbleed vulnerability info file for sysfs
x86/sev: Mark snp_abort() noreturn
x86/sev: Don't use cc_platform_has() for early SEV-SNP calls
x86/boot: Don't propagate uninitialized boot_params->cc_blob_address
x86/cpu: Add new Raptor Lake CPU model number
x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
x86/nospec: Fix i386 RSB stuffing
x86/nospec: Unwreck the RSB stuffing
x86/bugs: Add "unknown" reporting for MMIO Stale Data
x86/entry: Fix entry_INT80_compat for Xen PV guests
x86/PAT: Have pat_enabled() properly reflect state when running on Xen
Update the documentation to reflect the kernel changes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An array of strings is passed to cmd_record but not freed. As
cmd_record modifies the array, add another array as a copy that can be
mutated allowing the original array contents to all be freed.
Detected with -fsanitize=address.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Intel hybrid description is written in a different style than the
rest of the perf record man page. There were some new command line
options added after it which resulted in very strange section ordering.
Move the hybrid include last.
Also the sub sections in the hybrid document don't fit the record
manpage well (especially since it talks about all kinds of unrelated
commands). I left this for now, but would be better to separate this
properly in the different man pages.
It would be better to use sub sections for the other sections, but these
don't seem to be supported in AsciiDoc?
Some of the examples are still misrendered in the manpage with an
indented troff command, but I don't know how to fix that.
In any case it's now better than before.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Breaking a weak group requires multiple passes of an evlist, with
multiple runs this can introduce bugs ultimately leading to
segfaults. Add a test to cover this.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a weak group is broken then the reset_group flag remains set for
the next run. Having reset_group set means the counter isn't created
and ultimately a segfault.
A simple reproduction of this is:
# perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W
which will be added as a test in the next patch.
Fixes: 4804e01116 ("perf stat: Use affinity for opening events")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
ae3b1da954 ("KVM: arm64: Fix compile error due to sign extension")
That doesn't result in any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous change to Python autodetection had a small mistake where
the auto value was used to determine the Python binary, rather than the
user supplied value. The Python binary is only used for one part of the
build process, rather than the final linking, so it was producing
correct builds in most scenarios, especially when the auto detected
value matched what the user wanted, or the system only had a valid set
of Pythons.
Change it so that the Python binary path is derived from either the
PYTHON_CONFIG value or PYTHON value, depending on what is specified by
the user. This was the original intention.
This error was spotted in a build failure an odd cross compilation
environment after commit 4c41cb46a7 ("perf python: Prefer
python3") was merged.
Fixes: 630af16eee ("perf tools: Use Python devtools for version autodetection rather than runtime")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Due to bpf_map_lookup_elem being declared static we need to also
declare subprog_noise as static.
Fixes the following error:
progs/tailcall_bpf2bpf4.c:26:9: error: 'bpf_map_lookup_elem' is static but used in inline function 'subprog_noise' which is not static [-Werror]
26 | bpf_map_lookup_elem(&nop_table, &key);
| ^~~~~~~~~~~~~~~~~~~
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220826035141.737919-1-james.hilliard1@gmail.com
The sys/socket.h header isn't required to build test_tc_dtime and may
cause a type conflict.
Fixes the following error:
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/test_tc_dtime.c:18:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: error: conflicting types for 'int8_t'; have '__int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
In file included from progs/test_tc_dtime.c:5:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'char'}
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: error: conflicting types for 'int64_t'; have '__int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long int'}
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/test_tc_dtime.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/r/20220826050703.869571-1-james.hilliard1@gmail.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Daniel borkmann says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 11 non-merge commits during the last 14 day(s) which contain
a total of 13 files changed, 61 insertions(+), 24 deletions(-).
The main changes are:
1) Fix BPF verifier's precision tracking around BPF ring buffer, from Kumar Kartikeya Dwivedi.
2) Fix regression in tunnel key infra when passing FLOWI_FLAG_ANYSRC, from Eyal Birger.
3) Fix insufficient permissions for bpf_sys_bpf() helper, from YiFei Zhu.
4) Fix splat from hitting BUG when purging effective cgroup programs, from Pu Lehui.
5) Fix range tracking for array poke descriptors, from Daniel Borkmann.
6) Fix corrupted packets for XDP_SHARED_UMEM in aligned mode, from Magnus Karlsson.
7) Fix NULL pointer splat in BPF sockmap sk_msg_recvmsg(), from Liu Jian.
8) Add READ_ONCE() to bpf_jit_limit when reading from sysctl, from Kuniyuki Iwashima.
9) Add BPF selftest lru_bug check to s390x deny list, from Daniel Müller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to have a better control over maps from the kernel when
preloading eBPF programs.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220824134055.1328882-8-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This selftest is designed to cover execute-only protections
on the Radix MMU but will also work with Hash.
The tests are based on those found in pkey_exec_test with modifications
to use the generic mprotect() instead of the pkey variants.
Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220817050640.406017-2-ruscur@russell.cc
bpf_cgroup_iter_order is globally visible but the entries do not have
CGROUP prefix. As requested by Andrii, put a CGROUP in the names
in bpf_cgroup_iter_order.
This patch fixes two previous commits: one introduced the API and
the other uses the API in bpf selftest (that is, the selftest
cgroup_hierarchical_stats).
I tested this patch via the following command:
test_progs -t cgroup,iter,btf_dump
Fixes: d4ccaf58a8 ("bpf: Introduce cgroup iter")
Fixes: 88886309d2 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Current release - new code bugs:
- dsa: don't dereference NULL extack in dsa_slave_changeupper()
- dpaa: fix <1G ethernet on LS1046ARDB
- neigh: don't call kfree_skb() under spin_lock_irqsave()
Previous releases - regressions:
- r8152: fix the RX FIFO settings when suspending
- dsa: microchip: keep compatibility with device tree blobs with
no phy-mode
- Revert "net: macsec: update SCI upon MAC address change."
- Revert "xfrm: update SA curlft.use_time", comply with RFC 2367
Previous releases - always broken:
- netfilter: conntrack: work around exceeded TCP receive window
- ipsec: fix a null pointer dereference of dst->dev on a metadata
dst in xfrm_lookup_with_ifid
- moxa: get rid of asymmetry in DMA mapping/unmapping
- dsa: microchip: make learning configurable and keep it off
while standalone
- ice: xsk: prohibit usage of non-balanced queue id
- rxrpc: fix locking in rxrpc's sendmsg
Misc:
- another chunk of sysctl data race silencing
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmMH1scACgkQMUZtbf5S
IrtzTA//as5jbKepxBLqWjmDtTXTzkR9AZwD3pz/y2eRYYZz97N5R6TYLXh03zc0
OoB7yNIsjOtYu0aB0KosF+mqeGSzIG8MZ5W6eecQVRhUL270OD/kJ0G89CeHyuKP
BYUQE2S8z+55qM6IQ0DKbR4F038J2OeR6HdV7VUDFYRGfxDZsTZU4q3aY5bklAuz
TvpDAEsw0818a2lTdgqFUeRwbcU8ZIAJhiE/LQmqxhjsGyPkK02907Ccn06IrcAy
UHRBc6Cbjn8IcNNSL0hChjAkUdHtk7iHAqU8Nr2QnxKbE0FHGVOW8BsmY5GYvLAC
hH7t/dJAu3WUxubImZG6rnp3YD3YNZoaJrDgg6jSCJeUL6MKO2rJf8Q5HGiTJOWH
8vyPfCrB9IQVnef6Im0u9EFTyu9+W4MGVN4hyhttv2OykZwSQfdpjceGZgELiwSC
+od2p8TSXkZix//cTdWeO5THSnpHeMudh+0DEm10Uzf4+ybqIVuPn2ZCSy6piYJX
nsAIac1j7onWEyKQQ/nqy0o6rlZwLe+h0BraHHp3sApWVjyFwS4p6Z6VADed4kga
n/BsINdIW56pBT2nSrBTG5/RirlVfUTOaqiry0t6oak2qooEs0Gmm8DEbgTkncbs
BRLZTVzn6X3XWq52SXf7/v36xEJ/LRooY7MqUEMPg4emgGoNuC4=
=azH5
-----END PGP SIGNATURE-----
Merge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec and netfilter (with one broken Fixes tag).
Current release - new code bugs:
- dsa: don't dereference NULL extack in dsa_slave_changeupper()
- dpaa: fix <1G ethernet on LS1046ARDB
- neigh: don't call kfree_skb() under spin_lock_irqsave()
Previous releases - regressions:
- r8152: fix the RX FIFO settings when suspending
- dsa: microchip: keep compatibility with device tree blobs with no
phy-mode
- Revert "net: macsec: update SCI upon MAC address change."
- Revert "xfrm: update SA curlft.use_time", comply with RFC 2367
Previous releases - always broken:
- netfilter: conntrack: work around exceeded TCP receive window
- ipsec: fix a null pointer dereference of dst->dev on a metadata dst
in xfrm_lookup_with_ifid
- moxa: get rid of asymmetry in DMA mapping/unmapping
- dsa: microchip: make learning configurable and keep it off while
standalone
- ice: xsk: prohibit usage of non-balanced queue id
- rxrpc: fix locking in rxrpc's sendmsg
Misc:
- another chunk of sysctl data race silencing"
* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
net: lantiq_xrx200: restore buffer if memory allocation failed
net: lantiq_xrx200: fix lock under memory pressure
net: lantiq_xrx200: confirm skb is allocated before using
net: stmmac: work around sporadic tx issue on link-up
ionic: VF initial random MAC address if no assigned mac
ionic: fix up issues with handling EAGAIN on FW cmds
ionic: clear broken state on generation change
rxrpc: Fix locking in rxrpc's sendmsg
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
MAINTAINERS: rectify file entry in BONDING DRIVER
i40e: Fix incorrect address type for IPv6 flow rules
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
net: Fix a data-race around sysctl_somaxconn.
net: Fix a data-race around netdev_unregister_timeout_secs.
net: Fix a data-race around gro_normal_batch.
net: Fix data-races around sysctl_devconf_inherit_init_net.
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
net: Fix a data-race around netdev_budget_usecs.
net: Fix data-races around sysctl_max_skb_frags.
net: Fix a data-race around netdev_budget.
...
Add a test to ensure we do mark_chain_precision for the argument type
ARG_CONST_ALLOC_SIZE_OR_ZERO. For other argument types, this was already
done, but propagation for missing for this case. Without the fix, this
test case loads successfully.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220823185500.467-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When `data` points to a boolean value, casting it to `int *` is problematic
and could lead to a wrong value being passed to `jsonw_bool`. Change the
cast to `bool *` instead.
Fixes: b12d6ec097 ("bpf: btf: add btf print functionality")
Signed-off-by: Lam Thai <lamthai@arista.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220824225859.9038-1-lamthai@arista.com
Add a selftest that tests the whole workflow for collecting,
aggregating (flushing), and displaying cgroup hierarchical stats.
TL;DR:
- Userspace program creates a cgroup hierarchy and induces memcg reclaim
in parts of it.
- Whenever reclaim happens, vmscan_start and vmscan_end update
per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs
have updates.
- When userspace tries to read the stats, vmscan_dump calls rstat to flush
the stats, and outputs the stats in text format to userspace (similar
to cgroupfs stats).
- rstat calls vmscan_flush once for every (cgroup, cpu) pair that has
updates, vmscan_flush aggregates cpu readings and propagates updates
to parents.
- Userspace program makes sure the stats are aggregated and read
correctly.
Detailed explanation:
- The test loads tracing bpf programs, vmscan_start and vmscan_end, to
measure the latency of cgroup reclaim. Per-cgroup readings are stored in
percpu maps for efficiency. When a cgroup reading is updated on a cpu,
cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the
rstat updated tree on that cpu.
- A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for
each cgroup. Reading this file invokes the program, which calls
cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all
cpus and cgroups that have updates in this cgroup's subtree. Afterwards,
the stats are exposed to the user. vmscan_dump returns 1 to terminate
iteration early, so that we only expose stats for one cgroup per read.
- An ftrace program, vmscan_flush, is also loaded and attached to
bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked
once for each (cgroup, cpu) pair that has updates. cgroups are popped
from the rstat tree in a bottom-up fashion, so calls will always be
made for cgroups that have updates before their parents. The program
aggregates percpu readings to a total per-cgroup reading, and also
propagates them to the parent cgroup. After rstat flushing is over, all
cgroups will have correct updated hierarchical readings (including all
cpus and all their descendants).
- Finally, the test creates a cgroup hierarchy and induces memcg reclaim
in parts of it, and makes sure that the stats collection, aggregation,
and reading workflow works as expected.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-6-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch extends bpf selft cgroup_helpers [ID] n various ways:
- Add enable_controllers() that allows tests to enable all or a
subset of controllers for a specific cgroup.
- Add join_cgroup_parent(). The cgroup workdir is based on the pid,
therefore a spawned child cannot join the same cgroup hierarchy of the
test through join_cgroup(). join_cgroup_parent() is used in child
processes to join a cgroup under the parent's workdir.
- Add write_cgroup_file() and write_cgroup_file_parent() (similar to
join_cgroup_parent() above).
- Add get_root_cgroup() for tests that need to do checks on root cgroup.
- Distinguish relative and absolute cgroup paths in function arguments.
Now relative paths are called relative_path, and absolute paths are
called cgroup_path.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-5-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree
of the following structure:
ROOT (working cgroup)
|
PARENT
/ \
CHILD1 CHILD2
and tests the following scenarios:
- invalid cgroup fd.
- pre-order walk over descendants from PARENT.
- post-order walk over descendants from PARENT.
- walk of ancestors from PARENT.
- process only a single object (i.e. PARENT).
- early termination.
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-3-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
- walking a cgroup's descendants in pre-order.
- walking a cgroup's descendants in post-order.
- walking a cgroup's ancestors.
- process only the given cgroup.
When attaching cgroup_iter, one can set a cgroup to the iter_link
created from attaching. This cgroup is passed as a file descriptor
or cgroup id and serves as the starting point of the walk. If no
cgroup is specified, the starting point will be the root cgroup v2.
For walking descendants, one can specify the order: either pre-order or
post-order. For walking ancestors, the walk starts at the specified
cgroup and ends at the root.
One can also terminate the walk early by returning 1 from the iter
program.
Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
program is called with cgroup_mutex held.
Currently only one session is supported, which means, depending on the
volume of data bpf program intends to send to user space, the number
of cgroups that can be walked is limited. For example, given the current
buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
be walked is 512. This is a limitation of cgroup_iter. If the output
data is larger than the kernel buffer size, after all data in the
kernel buffer is consumed by user space, the subsequent read() syscall
will signal EOPNOTSUPP. In order to work around, the user may have to
update their program to reduce the volume of data sent to output. For
example, skip some uninteresting cgroups. In future, we may extend
bpf_iter flags to allow customizing buffer size.
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Mark both the function prototype and definition as noreturn in order to
prevent the compiler from doing transformations which confuse objtool
like so:
vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction
This triggers with gcc-12.
Add it and sev_es_terminate() to the objtool noreturn tracking array
too. Sort it while at it.
Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
This patch adds 2 new tests: sk_bind_sendto_listen and
sk_connect_zero_addr.
The sk_bind_sendto_listen test exercises the path where a socket's
rcv saddr changes after it has been added to the binding tables,
and then a listen() on the socket is invoked. The listen() should
succeed.
The sk_bind_sendto_listen test is copied over from one of syzbot's
tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000
The sk_connect_zero_addr test exercises the path where the socket was
never previously added to the binding tables and it gets assigned a
saddr upon a connect() to address 0.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.
When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).
To run the script:
Usage: ./bind_bhash.sh [-6 | -4] [-p port] [-a address]
6: use ipv6
4: use ipv4
port: Port number
address: ip address
Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address
"2001:0db8:0:f101::1" on port 443.
On my local machine, I see:
ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds
ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sizeof(new_cc) is not real memory size that new_cc points to; introduce
a new_cc_len to store the size and then pass it to bpf_setsockopt().
Fixes: 31123c0360 ("selftests/bpf: bpf_setsockopt tests")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220824013907.380448-1-yangyingliang@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The cb_refs BPF selftest is failing execution on s390x machines. This is
a newly added test that requires a feature not presently supported on
this architecture.
Denylist the test for this architecture.
Fixes: 3cf7e7d8685c ("selftests/bpf: Add tests for reference state fixes for callbacks")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220824163906.1186832-1-deso@posteo.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
These are regression tests to ensure we don't end up in invalid runtime
state for helpers that execute callbacks multiple times. It exercises
the fixes to verifier callback handling for reference state in previous
patches.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220823013226.24988-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For each hook, have a simple bpf_set_retval(bpf_get_retval) program
and make sure it loads for the hooks we want. The exceptions are
the hooks which don't propagate the error to the callers:
- sockops
- recvmsg
- getpeername
- getsockname
- cg_skb ingress and egress
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-6-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* replace 'syscall' with 'upper layers', still mention that it's being
exported via syscall errno
* describe what happens in set_retval(-EPERM) + return 1
* describe what happens with bind's 'return 3'
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The dissector program returns BPF_FLOW_DISSECTOR_CONTINUE (and avoids
setting skb->flow_keys or last_dissection map) in case it encounters
IP packets whose (outer) source address is 127.0.0.127.
Additional test is added to prog_tests/flow_dissector.c which sets
this address as test's pkk.iph.saddr, with the expected retval of
BPF_FLOW_DISSECTOR_CONTINUE.
Also, legacy test_flow_dissector.sh was similarly augmented.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-5-shmulik.ladkani@gmail.com
Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK
was set into 'bpf_attr.test.retval'.
Augment this, so users can check the actual return code of the dissector
program under test.
Existing prog_tests/flow_dissector*.c tests were correspondingly changed
to check against each test's expected retval.
Also, tests' resulting 'flow_keys' are verified only in case the expected
retval is BPF_OK. This allows adding new tests that expect non BPF_OK.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
replaces the flow-dissector logic with custom dissection logic. This
forces implementors to write programs that handle dissection for any
flows expected in the namespace.
It makes sense for flow-dissector BPF programs to just augment the
dissector with custom logic (e.g. dissecting certain flows or custom
protocols), while enjoying the broad capabilities of the standard
dissector for any other traffic.
Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
programs may return this to indicate no dissection was made, and
fallback to the standard dissector is requested.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
This Kselftest fixes update for Linux 6.0-rc3 consists of fixes
and warnings to vm and sgx test builds.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmMD9vkACgkQCwJExA0N
QxwZExAAnkMMex+CoNdwpc73LOGjx4839mB11rhR34sUo+OhpJohCB5tkoF87j+9
+N/jYot4jwl3WxBo9oniVsoALHALX4ygnphTiqf/EsczAeJgecf1s6mW6oAzv1HD
v0FKWHSQQiHhQsjo5lBL0RoJOHY8fAfxm38Aj6WdgWCQcv5i8cl32eJd+CCRJl35
WMKbZvH7M+Vum1w0Jy8wkNVjRMXUhNniWvjP/JXPyPR/nCCZBA1pFZQJu9EHd84H
lVxX0GUolplhzFJ7t+HfxVfkvCO8U6QAusKFW61qcUmlFQmuAUKvFj3pAaVe9jWv
1bFtTLXMI8SKk9ZI96/gLRrvAxI3z9qDx++2tFBRVCyLN+UOQrCI4u/csIMEX7Ci
S+LVgwfaoLEg7jqcU6mJiYisfqwXcBVFHUnm/n3eOXEsdGit7XaH9qHMGdwmWaYF
kGnv6le21lrNuNd/l299hFXI2+jCDTSPCMX6d+96gF9QdhVH/K3q27lN1wI/mRLu
uJ9BkZw8yNuaY+pkQEegW+UIm8zQ2D3RTdd00jtBBX2/EQzvyJBBD8r/9Z3Lfi8Q
kh8wc93YvQPa2zmzu9szry7iwMOE5b0y8UiPIs5fLpfbPTSw2MgcBDebcid3AVjs
ZvN273OZVs91Afdbr5yzqRDJ4w8mjhrUlZ71hJucwTq1ICIv8YU=
=Y9BY
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Fixes to vm and sgx test builds"
* tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vm: fix inability to build any vm tests
selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
This adds test to check, that when poll() returns POLLIN, POLLRDNORM bits,
next read call won't block.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This creates a test collection in drivers/net/bonding for bonding
specific kernel selftests.
The first test is a reproducer that provisions a bond and given the
specific order in how the ip-link(8) commands are issued the bond never
transmits an LACPDU frame on any of its slaves.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit a0a12c3ed0 ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates
CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to
use asm_volatile_goto when compiling __GEN_RMWcc.
However, asm_volatile_goto is not declared in the perf tool headers,
which causes a compilation error:
In file included from tools/arch/x86/include/asm/atomic.h:7,
from tools/include/asm/atomic.h:6,
from tools/include/linux/atomic.h:5,
from tools/include/linux/refcount.h:41,
from tools/lib/perf/include/internal/cpumap.h:5,
from tools/perf/util/cpumap.h:7,
from tools/perf/util/env.h:7,
from tools/perf/util/header.h:12,
from pmu-events/pmu-events.c:9:
tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’:
tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration]
asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
^~~~~~~~~~~~~~~~~
Define asm_volatile_goto in compiler_types.h if not declared, like the
main kernel header files do.
Fixes: a0a12c3ed0 ("asm goto: eradicate CC_HAS_ASM_GOTO")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.
Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.
The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.
Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.
Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637
Acked-by: Borislav Petkov <bp@suse.de>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix alignment for cpu map masks in event encoding.
- Support reading PERF_FORMAT_LOST, perf tool counterpart for a feature
that was added in this merge window.
- Sync perf tools copies of kernel headers: socket, msr-index, fscrypt,
cpufeatures, i915_drm, kvm, vhost, perf_event.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYv/jZAAKCRCyPKLppCJ+
J5ZWAP9P7qMcKppgzYqnHU7TX2tJoT7WbYpaf2lp0TkIpynKdAD9E+9FjbAnQznN
90DIIH/BljChgiUTkMxHaAl789XL1gU=
=UXWB
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix alignment for cpu map masks in event encoding.
- Support reading PERF_FORMAT_LOST, perf tool counterpart for a feature
that was added in this merge window.
- Sync perf tools copies of kernel headers: socket, msr-index, fscrypt,
cpufeatures, i915_drm, kvm, vhost, perf_event.
* tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Support reading PERF_FORMAT_LOST
libperf: Add a test case for read formats
libperf: Handle read format in perf_evsel__read()
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools headers kvm s390: Sync headers with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf beauty: Update copy of linux/socket.h with the kernel sources
perf cpumap: Fix alignment for masks in event encoding
perf cpumap: Compute mask size in constant time
perf cpumap: Synthetic events and const/static
perf cpumap: Const map for max()
- Fix atomic sleep warnings at boot due to get_phb_number() taking a mutex with a
spinlock held on some machines.
- Add missing PMU selftests to .gitignores.
Thanks to: Guenter Roeck, Russell Currey.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmMAG1oTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOkfD/0caaEOHTJakpC1FLk1H7owpqJG/GSM
iHpYnMu672CiqmGEiDAa20or1DwRHSThTQGgLR+nJUyB1AtGhavE7qKJ2yAJDz7I
/vbEWtJLtLJg4SF6sIQT7AYk//GPcOCFLOKyxsDQd7sI+UWEEUdO2OpCcLkIMJw/
JKwz3S621UDf9XMObu1dU69DNwkg37SZ9g1o60C5HrDGoP89OmgYaEs/MyqInwMX
e4v5jvFnCYGLBsTPPx9IEHyxWMFPydPl673WuI0dDlpZ0+PaWf/4y9Yilidu14a0
XT+hiTDFKzjJ8cDCLCZy9Ckhz0onuOM2hzqLpnnjeHoo/8NZNXDnH/jpKSuqV/6n
OSM9tFjmQCh/3dBOU9VOPxvbz50XakWqRoY/UIhMtqVhLc9EmNX+l2Q621TBlMBK
sd269EMTHrLUT1qyEl6zofTje6DTAPZpc4lxjLlicMtleadJuaVacVJGsikbZ/qx
5dYFunmMGHgjrFnY3nA2hIQsFukOHLyz4KpUGIqMYMmILk6rzRRKomLAtsEoUvfD
Y7JcaBkxO7VrypJrld1+RDSp4snOfTpx7SyhHWcVPCKfuO8fa4yAxhi+IB0QTpvL
QiVZNMFK1U5c+lU5i9JxpX/HaMo9gQEIdgh87ws0Tg2A4scMHpT34adKWy25Gkml
qIq+z4Zn13/Zsg==
=Yvlp
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix atomic sleep warnings at boot due to get_phb_number() taking a
mutex with a spinlock held on some machines.
- Add missing PMU selftests to .gitignores.
Thanks to Guenter Roeck and Russell Currey.
* tag 'powerpc-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Add missing PMU selftests to .gitignores
powerpc/pci: Fix get_phb_number() locking
When we stopped using KSFT_KHDR_INSTALL, a side effect is we also
changed the value of `top_srcdir`. This can be seen by looking at the
code removed by commit 49de12ba06
("selftests: drop KSFT_KHDR_INSTALL make target").
(Note though that this commit didn't break this, technically the one
before it did since that's the one that stopped KSFT_KHDR_INSTALL from
being used, even though the code was still there.)
Previously lib.mk reconfigured `top_srcdir` when KSFT_KHDR_INSTALL was
being used. Now, that's no longer the case.
As a result, the path to gup_test.h in vm/Makefile was wrong, and
since it's a dependency of all of the vm binaries none of them could
be built. Instead, we'd get an "error" like:
make[1]: *** No rule to make target
'/[...]/tools/testing/selftests/vm/compaction_test', needed by
'all'. Stop.
So, modify lib.mk so it once again sets top_srcdir to the root of the
kernel tree.
Fixes: f2745dc0ba ("selftests: stop using KSFT_KHDR_INSTALL")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There are currently 3 ip tunnels that are capable of carrying
L2 traffic: gretap, vxlan and geneve.
They all are capable to inherit the TOS/TTL for the outer
IP-header from the inner frame.
Add a test that verifies that these fields are correctly inherited.
These tests failed before the following commits:
b09ab9c92e ("ip6_tunnel: allow to inherit from VLAN encapsulated IP")
3f8a8447fd ("ip6_gre: use actual protocol to select xmit")
41337f52b9 ("ip6_gre: set DSCP for non-IP")
7ae29fd1be ("ip_tunnel: allow to inherit from VLAN encapsulated IP")
7074732c8f ("ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN")
ca2bb69514 ("geneve: do not use RT_TOS for IPv6 flowlabel")
b4ab94d6ad ("geneve: fix TOS inheriting for ipv4")
Signed-off-by: Matthias May <matthias.may@westermo.com>
Link: https://lore.kernel.org/r/20220817073649.26117-1-matthias.may@westermo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
* Tidy-up handling of AArch32 on asymmetric systems
x86:
* Fix "missing ENDBR" BUG for fastop functions
Generic:
* Some cleanup and static analyzer patches
* More fixes to KVM_CREATE_VM unwind paths
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmL/YoEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNK7wf/f/CxUT2NW8+klMBSUTL6YNMwPp5A
9xpfASi4pGiID27EEAOOLWcOr+A5bfa7fLS70Dyc+Wq9h0/tlnhFEF1X9RdLNHc+
I2HgNB64TZI7aLiZSm3cH3nfoazkAMPbGjxSlDmhH58cR9EPIlYeDeVMR/velbDZ
Z4kfwallR2Mizb7olvXy0lYfd6jZY+JkIQQtgml801aIpwkJggwqhnckbxCDEbSx
oB17T99Q2UQasDFusjvZefHjPhwZ7rxeXNTKXJLZNWecd7lAoPYJtTiYw+cxHmSY
JWsyvtcHons6uNoP1y60/OuVYcLFseeY3Yf9sqI8ivyF0HhS1MXQrcXX8g==
=V4Ib
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
- Tidy-up handling of AArch32 on asymmetric systems
x86:
- Fix 'missing ENDBR' BUG for fastop functions
Generic:
- Some cleanup and static analyzer patches
- More fixes to KVM_CREATE_VM unwind paths"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device()
KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()
x86/kvm: Fix "missing ENDBR" BUG for fastop functions
x86/kvm: Simplify FOP_SETCC()
x86/ibt, objtool: Add IBT_NOSEAL()
KVM: Rename mmu_notifier_* to mmu_invalidate_*
KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS
KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()
KVM: Unconditionally get a ref to /dev/kvm module when creating a VM
KVM: Properly unwind VM creation if creating debugfs fails
KVM: arm64: Reject 32bit user PSTATE on asymmetric systems
KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems
KVM: arm64: Fix compile error due to sign extension
There is a spelling mistake in an ASSERT_OK literal string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Mykola Lysenko <mykolal@fb.com>
Link: https://lore.kernel.org/r/20220817213242.101277-1-colin.i.king@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The recent kernel added lost count can be read from either read(2) or
ring buffer data with PERF_SAMPLE_READ. As it's a variable length data
we need to access it according to the format info.
But for perf tools use cases, PERF_FORMAT_ID is always set. So we can
only check PERF_FORMAT_LOST bit to determine the data format.
Add sample_read_value_size() and next_sample_read_value() helpers to
make it a bit easier to access. Use them in all places where it reads
the struct sample_read_value.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It checks a various combination of the read format settings and verify
it return the value in a proper position. The test uses task-clock
software events to guarantee it's always active and sets enabled/running
time.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf_counts_values should be increased to read the new lost data.
Also adjust values after read according the read format.
This supports PERF_FORMAT_GROUP which has a different data format but
it's only available for leader events. Currently it doesn't have an API
to read sibling (member) events in the group. But users may read the
sibling event directly.
Also reading from mmap would be disabled when the read format has ID or
LOST bit as it's not exposed via mmap.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the trivial change in:
119a784c81 ("perf/core: Add a new read format to get a number of lost samples")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
43bb9e000e ("KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific")
94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
bfbcc81bb8 ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior")
b172862241 ("KVM: x86: PIT: Preserve state of speaker port data bit")
ed2351174e ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Link: https://lore.kernel.org/lkml/Yv6OMPKYqYSbUxwZ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
2f4073e08f ("KVM: VMX: Enable Notify VM exit")
That makes 'perf kvm-stat' aware of this new NOTIFY exit reason, thus
addressing the following perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tao Xu <tao3.xu@intel.com>
Link: http://lore.kernel.org/lkml/Yv6LavXMZ+njijpq@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
f5ecfee944 ("KVM: s390: resetting the Topology-Change-Report")
None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Link: http://lore.kernel.org/lkml/YvzwMXzaIzOU4WAY@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
8a061562e2 ("RISC-V: KVM: Add extensible CSR emulation framework")
f5ecfee944 ("KVM: s390: resetting the Topology-Change-Report")
450a563924 ("KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats")
1b870fa557 ("kvm: stats: tell userspace which values are boolean")
db1c875e05 ("KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices")
94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
084cc29f8b ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
2f4073e08f ("KVM: VMX: Enable Notify VM exit")
ed2351174e ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
e9bf3acb23 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP")
8aba09588d ("KVM: s390: Add CPU dump functionality")
0460eb35b4 ("KVM: s390: Add configuration dump functionality")
fe9a93e07b ("KVM: s390: pv: Add query dump information")
35d02493db ("KVM: s390: pv: Add query interface")
c24a950ec7 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES")
ffbb61d09f ("KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.")
661a20fab7 ("KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND")
fde0451be8 ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
28d1629f75 ("KVM: x86/xen: Kernel acceleration for XENVER_version")
5363952605 ("KVM: x86/xen: handle PV timers oneshot mode")
942c2490c2 ("KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID")
2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
35025735a7 ("KVM: x86/xen: Support direct injection of event channel events")
That just rebuilds perf, as these patches add just an ioctl that is S390
specific and may clash with other arches, so are so far being excluded
in the harvester script:
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
$ grep 390 tools/perf/trace/beauty/kvm_ioctl.sh
egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
$
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Anup Patel <anup@brainfault.org>
Cc: Ben Gardon <bgardon@google.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: João Martins <joao.m.martins@oracle.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Tao Xu <tao3.xu@intel.com>
Link: https://lore.kernel.org/lkml/YvzuryClcn%2FvA0Gn@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
a913bde810 ("drm/i915: Update i915 uapi documentation")
525e93f631 ("drm/i915/uapi: add NEEDS_CPU_ACCESS hint")
141f733bb3 ("drm/i915/uapi: expose the avail tracking")
3f4309cbdc ("drm/i915/uapi: add probed_cpu_visible_size")
a50794f26f ("uapi/drm/i915: Document memory residency and Flat-CCS capability of obj")
That don't add any new ioctl, so no changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Link: http://lore.kernel.org/lkml/Yvzrp9RFIeEkb5fI@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
6b2a51ff03 ("fscrypt: Add HCTR2 support for filename encryption")
That don't result in any changes in tooling, just causes this to be
rebuilt:
CC /tmp/build/perf-urgent/trace/beauty/sync_file_range.o
LD /tmp/build/perf-urgent/trace/beauty/perf-in.o
addressing this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Huckleberry <nhuck@google.com>
Link: https://lore.kernel.org/lkml/Yvzl8C7O1b+hf9GS@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>