Extract the list of all tests into a variable, ALL_TESTS. Then assume the
environment variable TESTS holds the list of tests to actually run, falling
back to ALL_TESTS if TESTS is empty. This is the same interface that
forwarding selftests use to make the set of tests to run configurable.
In addition to this, allow setting the value explicitly through a command
line option "-t" along the lines of what fib_nexthops.sh does.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi and ipsec.
A little more changes than usual, but it's pretty normal for us that
the rc3/rc4 PRs are oversized as people start testing in earnest.
Possibly an extra boost from people deploying the 6.1 LTS but that's
more of an unscientific hunch.
Current release - regressions:
- phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
- virtio: vsock: don't use skbuff state to account credit
- virtio: vsock: don't drop skbuff on copy failure
- virtio_net: fix page_to_skb() miscalculating the memory size
Current release - new code bugs:
- eth: correct xdp_features after device reconfig
- wifi: nl80211: fix the puncturing bitmap policy
- net/mlx5e: flower:
- fix raw counter initialization
- fix missing error code
- fix cloned flow attribute
- ipa:
- fix some register validity checks
- fix a surprising number of bad offsets
- kill FILT_ROUT_CACHE_CFG IPA register
Previous releases - regressions:
- tcp: fix bind() conflict check for dual-stack wildcard address
- veth: fix use after free in XDP_REDIRECT when skb headroom is small
- ipv4: fix incorrect table ID in IOCTL path
- ipvlan: make skb->skb_iif track skb->dev for l3s mode
- mptcp:
- fix possible deadlock in subflow_error_report
- fix UaFs when destroying unaccepted and listening sockets
- dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290
Previous releases - always broken:
- tcp: tcp_make_synack() can be called from process context, don't
assume preemption is disabled when updating stats
- netfilter: correct length for loading protocol registers
- virtio_net: add checking sq is full inside xdp xmit
- bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype
change
- phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number
- eth: i40e: fix crash during reboot when adapter is in recovery mode
- eth: ice: avoid deadlock on rtnl lock when auxiliary device
plug/unplug meets bonding
- dsa: mt7530:
- remove now incorrect comment regarding port 5
- set PLL frequency and trgmii only when trgmii is used
- eth: mtk_eth_soc: reset PCS state when changing interface types
Misc:
- ynl: another license adjustment
- move the TCA_EXT_WARN_MSG attribute for tc action"
* tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
selftests: bonding: add tests for ether type changes
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change
net: renesas: rswitch: Fix GWTSDIE register handling
net: renesas: rswitch: Fix the output value of quote from rswitch_rx()
ethernet: sun: add check for the mdesc_grab()
net: ipa: fix some register validity checks
net: ipa: kill FILT_ROUT_CACHE_CFG IPA register
net: ipa: add two missing declarations
net: ipa: reg: include <linux/bug.h>
net: xdp: don't call notifiers during driver init
net/sched: act_api: add specific EXT_WARN_MSG for tc action
Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795
ynl: make the tooling check the license
ynl: broaden the license even more
tools: ynl: make definitions optional again
hsr: ratelimit only when errors are printed
qed/qed_mng_tlv: correctly zero out ->min instead of ->hour
selftests: net: devlink_port_split.py: skip test if no suitable device available
...
Add test cases for VXLAN MDB, testing the control and data paths. Two
different sets of namespaces (i.e., ns{1,2}_v4 and ns{1,2}_v6) are used
in order to test VXLAN MDB with both IPv4 and IPv6 underlays,
respectively.
Example truncated output:
# ./test_vxlan_mdb.sh
[...]
Tests passed: 620
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new network selftests for the bonding device which exercise the ether
type changing call paths. They also test for the recent syzbot bug[1] which
causes a warning and results in wrong device flags (IFF_SLAVE missing).
The test adds three bond devices and a nlmon device, enslaves one of the
bond devices to the other and then uses the nlmon device for successful
and unsuccesful enslaves both of which change the bond ether type. Thus
we can test for both MASTER and SLAVE flags at the same time.
If the flags are properly restored we get:
TEST: Change ether type of an enslaved bond device with unsuccessful enslave [ OK ]
TEST: Change ether type of an enslaved bond device with successful enslave [ OK ]
[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The `devlink -j port show` command output may not contain the "flavour"
key, an example from Ubuntu 22.10 s390x LPAR(5.19.0-37-generic), with
mlx4 driver and iproute2-5.15.0:
{"port":{"pci/0001:00:00.0/1":{"type":"eth","netdev":"ens301"},
"pci/0001:00:00.0/2":{"type":"eth","netdev":"ens301d1"},
"pci/0002:00:00.0/1":{"type":"eth","netdev":"ens317"},
"pci/0002:00:00.0/2":{"type":"eth","netdev":"ens317d1"}}}
This will cause a KeyError exception.
Create a validate_devlink_output() to check for this "flavour" from
devlink command output to avoid this KeyError exception. Also let
it handle the check for `devlink -j dev show` output in main().
Apart from this, if the test was not started because the max lanes of
the designated device is 0. The script will still return 0 and thus
causing a false-negative test result.
Use a found_max_lanes flag to determine if these tests were skipped
due to this reason and return KSFT_SKIP to make it more clear.
Link: https://bugs.launchpad.net/bugs/1937133
Fixes: f3348a82e7 ("selftests: net: Add port split test")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20230315165353.229590-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Address a rather annoying bug w.r.t. guest timer offsetting. The
synchronization of timer offsets between vCPUs was broken, leading
to inconsistent timer reads within the VM.
x86:
- New tests for the slow path of the EVTCHNOP_send Xen hypercall
- Add missing nVMX consistency checks for CR0 and CR4
- Fix bug that broke AMD GATag on 512 vCPU machines
Selftests:
- Skip hugetlb tests if huge pages are not available
- Sync KVM exit reasons"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Sync KVM exit reasons in selftests
KVM: selftests: Add macro to generate KVM exit reason strings
KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
KVM: selftests: Make vCPU exit reason test assertion common
KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
KVM: selftests: Use enum for test numbers in xen_shinfo_test
KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
KVM: SVM: WARN if GATag generation drops VM or vCPU ID information
KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs
KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask
selftests: KVM: skip hugetlb tests if huge pages are not available
KVM: VMX: Use tabs instead of spaces for indentation
KVM: VMX: Fix indentation coding style issue
KVM: nVMX: remove unnecessary #ifdef
KVM: nVMX: add missing consistency checks for CR0 and CR4
KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
It tries to read data to NULL buffer (data already presents in socket's
queue), then uses valid buffer. For SOCK_STREAM second read must return
data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
be dropped by kernel, and 'recv()' will return EAGAIN.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull kselftest fixes from Shuah Khan:
"A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
kselftest common lib.mk"
* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix LLVM build for i386 and x86_64
selftests: amd-pstate: fix TEST_FILES
The test checks if (IPv4, IPv6) address pair properly conflict or not.
* IPv4
* 0.0.0.0
* 127.0.0.1
* IPv6
* ::
* ::1
If the IPv6 address is [::], the second bind() always fails.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The xen_shinfo_test started off with very few iterations, and the numbers
we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values
anyway to start with.
It has since grown quite a few more tests, and it's kind of awful to be
handling them all as bare numbers. Especially when I want to add a new
test in the middle. Define an enum for the test stages, and use it both
in the guest code and the host switch statement.
No functional change, if I can count to 24.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI
(as opposed to full Xen-style hypercalls through a hypervisor provided
page). Using the common helpers dedups a pile of code, and uses the
native hypercall instruction when running on AMD.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls,
which have a different register ABI, can reuse the VMCALL vs. VMMCALL
logic.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is
not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0)
the test will fail with a memory allocation error.
This makes it impossible to add tests that default to hugetlb-backed memory,
because on a machine with a default configuration they will fail. Therefore,
check HugePages_Total as well and, if zero, direct the user to enable hugepages
in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add tests that check if filters can bind actions, that is create an
action independently and then bind to a filter.
tdc-tests under category 'infra':
1..18
ok 1 abdc - Reference pedit action object in filter
ok 2 7a70 - Reference mpls action object in filter
ok 3 d241 - Reference bpf action object in filter
ok 4 383a - Reference connmark action object in filter
ok 5 c619 - Reference csum action object in filter
ok 6 a93d - Reference ct action object in filter
ok 7 8bb5 - Reference ctinfo action object in filter
ok 8 2241 - Reference gact action object in filter
ok 9 35e9 - Reference gate action object in filter
ok 10 b22e - Reference ife action object in filter
ok 11 ef74 - Reference mirred action object in filter
ok 12 2c81 - Reference nat action object in filter
ok 13 ac9d - Reference police action object in filter
ok 14 68be - Reference sample action object in filter
ok 15 cf01 - Reference skbedit action object in filter
ok 16 c109 - Reference skbmod action object in filter
ok 17 4abc - Reference tunnel_key action object in filter
ok 18 dadd - Reference vlan action object in filter
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230309175554.304824-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull clone3 fix from Christian Brauner:
"A simple fix for the clone3() system call.
The CLONE_NEWTIME allows the creation of time namespaces. The flag
reuses a bit from the CSIGNAL bits that are used in the legacy clone()
system call to set the signal that gets sent to the parent after the
child exits.
The clone3() system call doesn't rely on CSIGNAL anymore as it uses a
dedicated .exit_signal field in struct clone_args. So we blocked all
CSIGNAL bits in clone3_args_valid(). When CLONE_NEWTIME was introduced
and reused a CSIGNAL bit we forgot to adapt clone3_args_valid()
causing CLONE_NEWTIME with clone3() to be rejected. Fix this"
* tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: test clone3 with CLONE_NEWTIME
fork: allow CLONE_NEWTIME in clone3 flags
In case of errors, the printed message had the expected and the seen
value inverted.
This patch simply correct the order: first the expected value, then the
one that has been seen.
Fixes: 10d4273411 ("selftests: mptcp: userspace: print error details if any")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
Netfilter updates for net-next
1. nf_tables 'brouting' support, from Sriram Yagnaraman.
2. Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header, from Xin Long.
This comes with a test BIG TCP test case, added to
tools/testing/selftests/net/.
3. Fix spelling and indentation in conntrack, from Jeremy Sowden.
* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nat: fix indentation of function arguments
netfilter: conntrack: fix typo
selftests: add a selftest for big tcp
netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
netfilter: move br_nf_check_hbh_len to utils
netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
netfilter: bridge: check len before accessing more nh data
netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
netfilter: bridge: introduce broute meta statement
====================
Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- core: avoid skb end_offset change in __skb_unclone_keeptruesize()
- sched:
- act_connmark: handle errno on tcf_idr_check_alloc
- flower: fix fl_change() error recovery path
- ieee802154: prevent user from crashing the host
Current release - new code bugs:
- eth: bnxt_en: fix the double free during device removal
- tools: ynl:
- fix enum-as-flags in the generic CLI
- fully inherit attrs in subsets
- re-license uniformly under GPL-2.0 or BSD-3-clause
Previous releases - regressions:
- core: use indirect calls helpers for sk_exit_memory_pressure()
- tls:
- fix return value for async crypto
- avoid hanging tasks on the tx_lock
- eth: ice: copy last block omitted in ice_get_module_eeprom()
Previous releases - always broken:
- core: avoid double iput when sock_alloc_file fails
- af_unix: fix struct pid leaks in OOB support
- tls:
- fix possible race condition
- fix device-offloaded sendpage straddling records
- bpf:
- sockmap: fix an infinite loop error
- test_run: fix &xdp_frame misplacement for LIVE_FRAMES
- fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
- netfilter: tproxy: fix deadlock due to missing BH disable
- phylib: get rid of unnecessary locking
- eth: bgmac: fix *initial* chip reset to support BCM5358
- eth: nfp: fix csum for ipsec offload
- eth: mtk_eth_soc: fix RX data corruption issue
Misc:
- usb: qmi_wwan: add telit 0x1080 composition"
* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
tools: ynl: fix enum-as-flags in the generic CLI
tools: ynl: move the enum classes to shared code
net: avoid double iput when sock_alloc_file fails
af_unix: fix struct pid leaks in OOB support
eth: fealnx: bring back this old driver
net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
net: microchip: sparx5: fix deletion of existing DSCP mappings
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
net/smc: fix fallback failed while sendmsg with fastopen
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
mailmap: update entries for Stephen Hemminger
mailmap: add entry for Maxim Mikityanskiy
nfc: change order inside nfc_se_io error path
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
net: usb: qmi_wwan: add Telit 0x1080 composition
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
netfilter: conntrack: adopt safer max chain length
net: tls: fix device-offloaded sendpage straddling records
...
Pull HID fixes from Benjamin Tissoires:
- fix potential out of bound write of zeroes in HID core with a
specially crafted uhid device (Lee Jones)
- fix potential use-after-free in work function in intel-ish-hid (Reka
Norman)
- selftests config fixes (Benjamin Tissoires)
- few device small fixes and support
* tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
selftest: hid: fix hid_bpf not set in config
HID: uhid: Over-ride the default maximum data buffer value with our own
HID: core: Provide new max_buffer_size attribute to over-ride the default
Commit 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED
flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh
digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash()
still returned also non-fresh digests.
Correct the test by accepting both cases. If the samples returned are 1,
assume that the commit above is applied and that the returned digest is
fresh. If the samples returned are 2, assume that the commit above is not
applied, and check both the non-fresh and fresh digest.
Fixes: 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED flag is set")
Reported-by: David Vernet <void@manifault.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
This test runs on the client-router-server topo, and monitors the traffic
on the RX devices of router and server while sending BIG TCP packets with
netperf from client to server. Meanwhile, it changes 'tso' on the TX devs
and 'gro' on the RX devs. Then it checks if any BIG TCP packets appears
on the RX devs with 'ip/ip6tables -m length ! --length 0:65535' for each
case.
Note that we also add tc action ct in link1 ingress to cover the ipv6
jumbo packets process in nf_ct_skb_network_trim() of nf_conntrack_ovs.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-03-06
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 9 files changed, 64 insertions(+), 18 deletions(-).
The main changes are:
1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
that is, keep resolving such instances instead of bailing out,
from Lorenz Bauer.
2) Fix BPF test framework with regards to xdp_frame info misplacement
in the "live packet" code, from Alexander Lobakin.
3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
from Liu Jian.
4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
from Randy Dunlap.
5) Several BPF doc fixes with either broken links or external instead
of internal doc links, from Bagas Sanjaya.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: check that modifier resolves after pointer
btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
bpf, doc: Link to submitting-patches.rst for general patch submission info
bpf, doc: Do not link to docs.kernel.org for kselftest link
bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
riscv, bpf: Fix patch_text implicit declaration
bpf, docs: Fix link to BTF doc
====================
Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bring back the Python scripts that were initially added with
TEST_GEN_FILES but now with TEST_FILES to avoid having them deleted
when doing a clean. Also fix the way the architecture is being
determined as they should also be installed when ARCH=x86_64 is
provided explicitly. Then also append extra files to TEST_FILES and
TEST_PROGS with += so they don't get discarded.
Fixes: ba2d788aa8 ("selftests: amd-pstate: Trigger tbench benchmark and test cpus")
Fixes: a49fb7218e ("selftests: amd-pstate: Don't delete source files via Makefile")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
In order to adapt to the older kernel, now we split the "attach_probe"
testing into multi subtests:
manual // manual attach tests for kprobe/uprobe
auto // auto-attach tests for kprobe and uprobe
kprobe-sleepable // kprobe sleepable test
uprobe-lib // uprobe tests for library function by name
uprobe-sleepable // uprobe sleepable test
uprobe-ref_ctr // uprobe ref_ctr test
As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading,
we need to move it to a stand alone skel file, in case of it is not
supported by kernel and make the whole loading fail.
Therefore, we can only enable part of the subtests for older kernel.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com
Adjust log_fixup's expected buffer length to fix the test. It's pretty
finicky in its length expectation, but it doesn't break often. So just
adjust the length to work on current kernel and with follow up iterator
changes as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow to search for expected register state in all the verifier log
output that's related to specified instruction number.
See added comment for an example of possible situation that is happening
due to a simple enhancement done in the next patch, which fixes handling
of env->test_state_freq flag in state checkpointing logic.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Function verifier.c:convert_ctx_access() applies some rewrites to BPF
instructions that read or write BPF program context. This commit adds
machinery to allow test cases that inspect BPF program after these
rewrites are applied.
An example of a test case:
{
// Shorthand for field offset and size specification
N(CGROUP_SOCKOPT, struct bpf_sockopt, retval),
// Pattern generated for field read
.read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
"$dst = *(u64 *)($dst + task_struct::bpf_ctx);"
"$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);",
// Pattern generated for field write
.write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;"
"r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
"r9 = *(u64 *)(r9 + task_struct::bpf_ctx);"
"*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;"
"r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" ,
},
For each test case, up to three programs are created:
- One that uses BPF_LDX_MEM to read the context field.
- One that uses BPF_STX_MEM to write to the context field.
- One that uses BPF_ST_MEM to write to the context field.
The disassembly of each program is compared with the pattern specified
in the test case.
Kernel code for disassembly is reused (as is in the bpftool).
To keep Makefile changes to the minimum, symbolic links to
`kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
- verifier.c:do_check() for BPF_ST updated to:
- no longer forbid writes to registers of type PTR_TO_CTX;
- track dst_reg type in the env->insn_aux_data[...].ptr_type field
(same way it is done for BPF_STX and BPF_LDX instructions).
- verifier.c:convert_ctx_access() and various callbacks invoked by
it are updated to handled BPF_ST instruction alongside BPF_STX.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
of such key mechanism makes it impossible for sleepable bpf programs to use RCU
pointers.
Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
support btf_type_tag yet) and allowlist certain field dereferences in important
data structures like tast_struct, cgroup, socket that are used by sleepable
programs either as RCU pointer or full trusted pointer (which is valid outside
of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
tagging. They will be removed once GCC supports btf_type_tag.
With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
modifier flags. There is a chance that this strict enforcement might break
existing programs (especially on GCC compiled kernels), but this cleanup has to
start sooner than later. Note PTR_TO_CTX access still yields old deprecated
PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.
Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.
For example:
struct map_value {
struct cgroup __kptr *cgrp;
};
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
struct cgroup *cg, *cg2;
cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
bpf_kptr_xchg(&v->cgrp, cg);
cg2 = v->cgrp; // This is new feature introduced by this patch.
// cg2 is PTR_MAYBE_NULL | MEM_RCU.
// When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
if (cg2)
bpf_cgroup_ancestor(cg2, level); // safe to do.
}
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.
Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com