dac6c7b3d33756d6ce09f00a96ea2ecd79fae9fb
78472 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f61060fb29 |
Merge tag 'for-net-2024-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - RFCOMM: FIX possible deadlock in rfcomm_sk_state_change - hci_conn: Fix UAF in hci_enhanced_setup_sync - btusb: Don't fail external suspend requests * tag 'for-net-2024-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btusb: Don't fail external suspend requests Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change ==================== Link: https://patch.msgid.link/20241004210124.4010321-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
6310831433 |
net: explicitly clear the sk pointer, when pf->create fails
We have recently noticed the exact same KASAN splat as in commit |
||
|
|
1dae9f1187 |
net: Fix an unsafe loop on the list
The kernel may crash when deleting a genetlink family if there are still
listeners for that family:
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0
Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.
Fixes:
|
||
|
|
18fd04ad85 |
Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync
This checks if the ACL connection remains valid as it could be destroyed
while hci_enhanced_setup_sync is pending on cmd_sync leading to the
following trace:
BUG: KASAN: slab-use-after-free in hci_enhanced_setup_sync+0x91b/0xa60
Read of size 1 at addr ffff888002328ffd by task kworker/u5:2/37
CPU: 0 UID: 0 PID: 37 Comm: kworker/u5:2 Not tainted 6.11.0-rc6-01300-g810be445d8d6 #7099
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
? hci_enhanced_setup_sync+0x91b/0xa60
print_report+0x152/0x4c0
? hci_enhanced_setup_sync+0x91b/0xa60
? __virt_addr_valid+0x1fa/0x420
? hci_enhanced_setup_sync+0x91b/0xa60
kasan_report+0xda/0x1b0
? hci_enhanced_setup_sync+0x91b/0xa60
hci_enhanced_setup_sync+0x91b/0xa60
? __pfx_hci_enhanced_setup_sync+0x10/0x10
? __pfx___mutex_lock+0x10/0x10
hci_cmd_sync_work+0x1c2/0x330
process_one_work+0x7d9/0x1360
? __pfx_lock_acquire+0x10/0x10
? __pfx_process_one_work+0x10/0x10
? assign_work+0x167/0x240
worker_thread+0x5b7/0xf60
? __kthread_parkme+0xac/0x1c0
? __pfx_worker_thread+0x10/0x10
? __pfx_worker_thread+0x10/0x10
kthread+0x293/0x360
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 34:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__hci_conn_add+0x187/0x17d0
hci_connect_sco+0x2e1/0xb90
sco_sock_connect+0x2a2/0xb80
__sys_connect+0x227/0x2a0
__x64_sys_connect+0x6d/0xb0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 37:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x101/0x160
kfree+0xd0/0x250
device_release+0x9a/0x210
kobject_put+0x151/0x280
hci_conn_del+0x448/0xbf0
hci_abort_conn_sync+0x46f/0x980
hci_cmd_sync_work+0x1c2/0x330
process_one_work+0x7d9/0x1360
worker_thread+0x5b7/0xf60
kthread+0x293/0x360
ret_from_fork+0x2f/0x70
ret_from_fork_asm+0x1a/0x30
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
08d1914293 |
Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change
rfcomm_sk_state_change attempts to use sock_lock so it must never be
called with it locked but rfcomm_sock_ioctl always attempt to lock it
causing the following trace:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
------------------------------------------------------
syz-executor386/5093 is trying to acquire lock:
ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline]
ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73
but task is already holding lock:
ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491
Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218
Fixes:
|
||
|
|
f9ff7665cd |
netfilter: br_netfilter: fix panic with metadata_dst skb
Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.
It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.
The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.
This case was never supported in the first place, so drop the packet
instead.
PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[ 176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[ 176.292101] Mem abort info:
[ 176.292184] ESR = 0x0000000096000004
[ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits
[ 176.292530] SET = 0, FnV = 0
[ 176.292709] EA = 0, S1PTW = 0
[ 176.292862] FSC = 0x04: level 0 translation fault
[ 176.293013] Data abort info:
[ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[ 176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[ 176.296314] Hardware name: linux,dummy-virt (DT)
[ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[ 176.297636] sp : ffff800080003630
[ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[ 176.300889] Call trace:
[ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[ 176.301703] nf_hook_slow+0x48/0x124
[ 176.302060] br_forward_finish+0xc8/0xe8 [bridge]
[ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter]
[ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[ 176.303359] nf_hook_slow+0x48/0x124
[ 176.303803] __br_forward+0xc4/0x194 [bridge]
[ 176.304013] br_flood+0xd4/0x168 [bridge]
[ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter]
[ 176.305428] br_handle_frame+0x244/0x3cc [bridge]
[ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc
[ 176.306080] __netif_receive_skb_one_core+0x40/0x8c
[ 176.306197] __netif_receive_skb+0x18/0x64
[ 176.306369] process_backlog+0x80/0x124
[ 176.306540] __napi_poll+0x38/0x17c
[ 176.306636] net_rx_action+0x124/0x26c
[ 176.306758] __do_softirq+0x100/0x26c
[ 176.307051] ____do_softirq+0x10/0x1c
[ 176.307162] call_on_irq_stack+0x24/0x4c
[ 176.307289] do_softirq_own_stack+0x1c/0x2c
[ 176.307396] do_softirq+0x54/0x6c
[ 176.307485] __local_bh_enable_ip+0x8c/0x98
[ 176.307637] __dev_queue_xmit+0x22c/0xd28
[ 176.307775] neigh_resolve_output+0xf4/0x1a0
[ 176.308018] ip_finish_output2+0x1c8/0x628
[ 176.308137] ip_do_fragment+0x5b4/0x658
[ 176.308279] ip_fragment.constprop.0+0x48/0xec
[ 176.308420] __ip_finish_output+0xa4/0x254
[ 176.308593] ip_finish_output+0x34/0x130
[ 176.308814] ip_output+0x6c/0x108
[ 176.308929] ip_send_skb+0x50/0xf0
[ 176.309095] ip_push_pending_frames+0x30/0x54
[ 176.309254] raw_sendmsg+0x758/0xaec
[ 176.309568] inet_sendmsg+0x44/0x70
[ 176.309667] __sys_sendto+0x110/0x178
[ 176.309758] __arm64_sys_sendto+0x28/0x38
[ 176.309918] invoke_syscall+0x48/0x110
[ 176.310211] el0_svc_common.constprop.0+0x40/0xe0
[ 176.310353] do_el0_svc+0x1c/0x28
[ 176.310434] el0_svc+0x34/0xb4
[ 176.310551] el0t_64_sync_handler+0x120/0x12c
[ 176.310690] el0t_64_sync+0x190/0x194
[ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[ 176.315743] ---[ end trace 0000000000000000 ]---
[ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[ 176.316564] PHYS_OFFSET: 0xffff97d780000000
[ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[ 176.317210] Memory Limit: none
[ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\
Fixes:
|
||
|
|
7a310f8d7d |
rxrpc: Fix uninitialised variable in rxrpc_send_data()
Fix the uninitialised txb variable in rxrpc_send_data() by moving the code
that loads it above all the jumps to maybe_error, txb being stored back
into call->tx_pending right before the normal return.
Fixes:
|
||
|
|
bc21246532 |
rxrpc: Fix a race between socket set up and I/O thread creation
In rxrpc_open_socket(), it sets up the socket and then sets up the I/O
thread that will handle it. This is a problem, however, as there's a gap
between the two phases in which a packet may come into rxrpc_encap_rcv()
from the UDP packet but we oops when trying to wake the not-yet created I/O
thread.
As a quick fix, just make rxrpc_encap_rcv() discard the packet if there's
no I/O thread yet.
A better, but more intrusive fix would perhaps be to rearrange things such
that the socket creation is done by the I/O thread.
Fixes:
|
||
|
|
27c80efcc2 |
tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits out
Fix tcp_rcv_synrecv_state_fastopen() to not zero retrans_stamp if retransmits are outstanding. tcp_fastopen_synack_timer() sets retrans_stamp, so typically we'll need to zero retrans_stamp here to prevent spurious retransmits_timed_out(). The logic to zero retrans_stamp is from this 2019 commit: commit |
||
|
|
b41b4cbd96 |
tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe
Fix tcp_enter_recovery() so that if there are no retransmits out then
we zero retrans_stamp when entering fast recovery. This is necessary
to fix two buggy behaviors.
Currently a non-zero retrans_stamp value can persist across multiple
back-to-back loss recovery episodes. This is because we generally only
clears retrans_stamp if we are completely done with loss recoveries,
and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
behavior causes two bugs:
(1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
immediately by a new CA_Recovery, the retrans_stamp value can persist
and can be a time before this new CA_Recovery episode starts. That
means that timestamp-based undo will be using the wrong retrans_stamp
(a value that is too old) when comparing incoming TS ecr values to
retrans_stamp to see if the current fast recovery episode can be
undone.
(2) If there is a roughly minutes-long sequence of back-to-back fast
recovery episodes, one after another (e.g. in a shallow-buffered or
policed bottleneck), where each fast recovery successfully makes
forward progress and recovers one window of sequence space (but leaves
at least one retransmit in flight at the end of the recovery),
followed by several RTOs, then the ETIMEDOUT check may be using the
wrong retrans_stamp (a value set at the start of the first fast
recovery in the sequence). This can cause a very premature ETIMEDOUT,
killing the connection prematurely.
This commit changes the code to zero retrans_stamp when entering fast
recovery, when this is known to be safe (no retransmits are out in the
network). That ensures that when starting a fast recovery episode, and
it is safe to do so, retrans_stamp is set when we send the fast
retransmit packet. That addresses both bug (1) and bug (2) by ensuring
that (if no retransmits are out when we start a fast recovery) we use
the initial fast retransmit of this fast recovery as the time value
for undo and ETIMEDOUT calculations.
This makes intuitive sense, since the start of a new fast recovery
episode (in a scenario where no lost packets are out in the network)
means that the connection has made forward progress since the last RTO
or fast recovery, and we should thus "restart the clock" used for both
undo and ETIMEDOUT logic.
Note that if when we start fast recovery there *are* retransmits out
in the network, there can still be undesirable (1)/(2) issues. For
example, after this patch we can still have the (1) and (2) problems
in cases like this:
+ round 1: sender sends flight 1
+ round 2: sender receives SACKs and enters fast recovery 1,
retransmits some packets in flight 1 and then sends some new data as
flight 2
+ round 3: sender receives some SACKs for flight 2, notes losses, and
retransmits some packets to fill the holes in flight 2
+ fast recovery has some lost retransmits in flight 1 and continues
for one or more rounds sending retransmits for flight 1 and flight 2
+ fast recovery 1 completes when snd_una reaches high_seq at end of
flight 1
+ there are still holes in the SACK scoreboard in flight 2, so we
enter fast recovery 2, but some retransmits in the flight 2 sequence
range are still in flight (retrans_out > 0), so we can't execute the
new retrans_stamp=0 added here to clear retrans_stamp
It's not yet clear how to fix these remaining (1)/(2) issues in an
efficient way without breaking undo behavior, given that retrans_stamp
is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
expensive) strategy would be to set retrans_stamp to the timestamp of
the earliest outstanding retransmit when entering fast recovery. But
at least this commit makes things better.
Note that this does not change the semantics of retrans_stamp; it
simply makes retrans_stamp accurate in some cases where it was not
before:
(1) Some loss recovery, followed by an immediate entry into a fast
recovery, where there are no retransmits out when entering the fast
recovery.
(2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
and then the ACK that completes the 3-way handshake has SACK blocks
that trigger a fast recovery. In this case when entering fast recovery
we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
and set the retrans_stamp based on the timestamp of the fast recovery.
We introduce a tcp_retrans_stamp_cleanup() helper, because this
two-line sequence already appears in 3 places and is about to appear
in 2 more as a result of this bug fix patch series. Once this bug fix
patches series in the net branch makes it into the net-next branch
we'll update the 3 other call sites to use the new helper.
This is a long-standing issue. The Fixes tag below is chosen to be the
oldest commit at which the patch will apply cleanly, which is from
Linux v3.5 in 2012.
Fixes:
|
||
|
|
e37ab73736 |
tcp: fix to allow timestamp undo if no retransmits were sent
Fix the TCP loss recovery undo logic in tcp_packet_delayed() so that it can trigger undo even if TSQ prevents a fast recovery episode from reaching tcp_retransmit_skb(). Geumhwan Yu <geumhwan.yu@samsung.com> recently reported that after this commit from 2019: commit |
||
|
|
8c245fe7dd |
Merge tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ieee802154, bluetooth and netfilter.
Current release - regressions:
- eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc
- eth: am65-cpsw: fix forever loop in cleanup code
Current release - new code bugs:
- eth: mlx5: HWS, fixed double-free in error flow of creating SQ
Previous releases - regressions:
- core: avoid potential underflow in qdisc_pkt_len_init() with UFO
- core: test for not too small csum_start in virtio_net_hdr_to_skb()
- vrf: revert "vrf: remove unnecessary RCU-bh critical section"
- bluetooth:
- fix uaf in l2cap_connect
- fix possible crash on mgmt_index_removed
- dsa: improve shutdown sequence
- eth: mlx5e: SHAMPO, fix overflow of hd_per_wq
- eth: ip_gre: fix drops of small packets in ipgre_xmit
Previous releases - always broken:
- core: fix gso_features_check to check for both
dev->gso_{ipv4_,}max_size
- core: fix tcp fraglist segmentation after pull from frag_list
- netfilter: nf_tables: prevent nf_skb_duplicated corruption
- sctp: set sk_state back to CLOSED if autobind fails in
sctp_listen_start
- mac802154: fix potential RCU dereference issue in
mac802154_scan_worker
- eth: fec: restart PPS after link state change"
* tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems
doc: net: napi: Update documentation for napi_schedule_irqoff
net/ncsi: Disable the ncsi work before freeing the associated structure
net: phy: qt2025: Fix warning: unused import DeviceId
gso: fix udp gso fraglist segmentation after pull from frag_list
bridge: mcast: Fail MDB get request on empty entry
vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code
net: phy: realtek: Check the index value in led_hw_control_get
ppp: do not assume bh is held in ppp_channel_bridge_input()
selftests: rds: move include.sh to TEST_FILES
net: test for not too small csum_start in virtio_net_hdr_to_skb()
net: gso: fix tcp fraglist segmentation after pull from frag_list
ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
net: add more sanity checks to qdisc_pkt_len_init()
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
net: ethernet: ti: cpsw_ale: Fix warning on some platforms
net: microchip: Make FDMA config symbol invisible
...
|
||
|
|
8beee4d8de |
sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
In sctp_listen_start() invoked by sctp_inet_listen(), it should set the
sk_state back to CLOSED if sctp_autobind() fails due to whatever reason.
Otherwise, next time when calling sctp_inet_listen(), if sctp_sk(sk)->reuse
is already set via setsockopt(SCTP_REUSE_PORT), sctp_sk(sk)->bind_hash will
be dereferenced as sk_state is LISTENING, which causes a crash as bind_hash
is NULL.
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:sctp_inet_listen+0x7f0/0xa20 net/sctp/socket.c:8617
Call Trace:
<TASK>
__sys_listen_socket net/socket.c:1883 [inline]
__sys_listen+0x1b7/0x230 net/socket.c:1894
__do_sys_listen net/socket.c:1902 [inline]
Fixes:
|
||
|
|
1127c73a8d |
Merge tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h regarding flowtable hooks, from Phil Sutter. 2) Fix nft_audit.sh selftests with newer nft binaries, due to different (valid) audit output, also from Phil. 3) Disable BH when duplicating packets via nf_dup infrastructure, otherwise race on nf_skb_duplicated for locally generated traffic. From Eric. 4) Missing return in callback of selftest C program, from zhang jiao. netfilter pull request 24-10-02 * tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: Add missing return value netfilter: nf_tables: prevent nf_skb_duplicated corruption selftests: netfilter: Fix nft_audit.sh for newer nft binaries netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED ==================== Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
a0ffa68c70 |
net/ncsi: Disable the ncsi work before freeing the associated structure
The work function can run after the ncsi device is freed, resulting
in use-after-free bugs or kernel panic.
Fixes:
|
||
|
|
a1e40ac5b5 |
gso: fix udp gso fraglist segmentation after pull from frag_list
Detect gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.
In extreme cases they pull all data into skb linear. For UDP, this
causes a NULL ptr deref in __udpv4_gso_segment_list_csum at
udp_hdr(seg->next)->dest.
Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.
Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/
Fixes:
|
||
|
|
555f45d24b |
bridge: mcast: Fail MDB get request on empty entry
When user space deletes a port from an MDB entry, the port is removed
synchronously. If this was the last port in the entry and the entry is
not joined by the host itself, then the entry is scheduled for deletion
via a timer.
The above means that it is possible for the MDB get netlink request to
retrieve an empty entry which is scheduled for deletion. This is
problematic as after deleting the last port in an entry, user space
cannot rely on a non-zero return code from the MDB get request as an
indication that the port was successfully removed.
Fix by returning an error when the entry's port list is empty and the
entry is not joined by the host.
Fixes:
|
||
|
|
17bd3bd82f |
net: gso: fix tcp fraglist segmentation after pull from frag_list
Detect tcp gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.
In extreme cases they pull all data into skb linear. For TCP, this
causes a NULL ptr deref in __tcpv4_gso_segment_list_csum at
tcp_hdr(seg->next).
Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.
Approach and description based on a patch by Willem de Bruijn.
Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/
Link: https://lore.kernel.org/netdev/20240922150450.3873767-1-willemdebruijn.kernel@gmail.com/
Fixes:
|
||
|
|
e5e3f369b1 |
Merge tag 'for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - btmrvl: Use IRQF_NO_AUTOEN flag in request_irq() - MGMT: Fix possible crash on mgmt_index_removed - L2CAP: Fix uaf in l2cap_connect - Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE * tag 'for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE Bluetooth: btmrvl: Use IRQF_NO_AUTOEN flag in request_irq() Bluetooth: L2CAP: Fix uaf in l2cap_connect Bluetooth: MGMT: Fix possible crash on mgmt_index_removed ==================== Link: https://patch.msgid.link/20240927145730.2452175-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
cb3ad11342 |
Merge tag 'ieee802154-for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2024-09-27 Jinjie Ruan added the use of IRQF_NO_AUTOEN in the mcr20a driver and fixed and addiotinal build dependency problem while doing so. Jiawei Ye, ensured a correct RCU handling in mac802154_scan_worker. * tag 'ieee802154-for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan: net: ieee802154: mcr20a: Use IRQF_NO_AUTOEN flag in request_irq() mac802154: Fix potential RCU dereference issue in mac802154_scan_worker ieee802154: Fix build error ==================== Link: https://patch.msgid.link/20240927094351.3865511-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
5f60d5f6bb |
move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h |
||
|
|
c4a14f6d9d |
ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
Regression Description:
Depending on the options specified for the GRE tunnel device, small
packets may be dropped. This occurs because the pskb_network_may_pull
function fails due to the packet's insufficient length.
For example, if only the okey option is specified for the tunnel device,
original (before encapsulation) packets smaller than 28 bytes (including
the IPv4 header) will be dropped. This happens because the required
length is calculated relative to the network header, not the skb->head.
Here is how the required length is computed and checked:
* The pull_len variable is set to 28 bytes, consisting of:
* IPv4 header: 20 bytes
* GRE header with Key field: 8 bytes
* The pskb_network_may_pull function adds the network offset, shifting
the checkable space further to the beginning of the network header and
extending it to the beginning of the packet. As a result, the end of
the checkable space occurs beyond the actual end of the packet.
Instead of ensuring that 28 bytes are present in skb->head, the function
is requesting these 28 bytes starting from the network header. For small
packets, this requested length exceeds the actual packet size, causing
the check to fail and the packets to be dropped.
This issue affects both locally originated and forwarded packets in
DMVPN-like setups.
How to reproduce (for local originated packets):
ip link add dev gre1 type gre ikey 1.9.8.4 okey 1.9.8.4 \
local <your-ip> remote 0.0.0.0
ip link set mtu 1400 dev gre1
ip link set up dev gre1
ip address add 192.168.13.1/24 dev gre1
ip neighbor add 192.168.13.2 lladdr <remote-ip> dev gre1
ping -s 1374 -c 10 192.168.13.2
tcpdump -vni gre1
tcpdump -vni <your-ext-iface> 'ip proto 47'
ip -s -s -d link show dev gre1
Solution:
Use the pskb_may_pull function instead the pskb_network_may_pull.
Fixes:
|
||
|
|
ab9a9a9e96 |
net: add more sanity checks to qdisc_pkt_len_init()
One path takes care of SKB_GSO_DODGY, assuming
skb->len is bigger than hdr_len.
virtio_net_hdr_to_skb() does not fully dissect TCP headers,
it only make sure it is at least 20 bytes.
It is possible for an user to provide a malicious 'GSO' packet,
total length of 80 bytes.
- 20 bytes of IPv4 header
- 60 bytes TCP header
- a small gso_size like 8
virtio_net_hdr_to_skb() would declare this packet as a normal
GSO packet, because it would see 40 bytes of payload,
bigger than gso_size.
We need to make detect this case to not underflow
qdisc_skb_cb(skb)->pkt_len.
Fixes:
|
||
|
|
c20029db28 |
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
After commit |
||
|
|
e609c959a9 |
net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size
Commit |
||
|
|
e8d4d34df7 |
net: Add netif_get_gro_max_size helper for GRO
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
6c24a03a61 |
net: dsa: improve shutdown sequence
Alexander Sverdlin presents 2 problems during shutdown with the
lan9303 driver. One is specific to lan9303 and the other just happens
to reproduce there.
The first problem is that lan9303 is unique among DSA drivers in that it
calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
not remove):
phy_state_machine()
-> ...
-> dsa_user_phy_read()
-> ds->ops->phy_read()
-> lan9303_phy_read()
-> chip->ops->phy_read()
-> lan9303_mdio_phy_read()
-> dev_get_drvdata()
But we never stop the phy_state_machine(), so it may continue to run
after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
to set drvdata to NULL to suppress the remove() method that may come
afterwards. But in this case it will result in an NPD.
The second problem is that the way in which we set
dp->conduit->dsa_ptr = NULL; is concurrent with receive packet
processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
but afterwards, rather than continuing to use that non-NULL value,
dev->dsa_ptr is dereferenced again and again without NULL checks:
dsa_conduit_find_user() and many other places. In between dereferences,
there is no locking to ensure that what was valid once continues to be
valid.
Both problems have the common aspect that closing the conduit interface
solves them.
In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN
event in dsa_user_netdevice_event() which closes user ports as well.
dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
the phylink state machine, and ds->ops->phy_read() will thus no longer
call into the driver after this point.
In the second case, dev_close(conduit) should do this, as per
Documentation/networking/driver.rst:
| Quiescence
| ----------
|
| After the ndo_stop routine has been called, the hardware must
| not receive or transmit any data. All in flight packets must
| be aborted. If necessary, poll or wait for completion of
| any reset commands.
So it should be sufficient to ensure that later, when we zeroize
conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
on this conduit.
The addition of the netif_device_detach() function is to ensure that
ioctls, rtnetlinks and ethtool requests on the user ports no longer
propagate down to the driver - we're no longer prepared to handle them.
The race condition actually did not exist when commit
|
||
|
|
894b3c35d1 |
Merge tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov: "Three CephFS fixes from Xiubo and Luis and a bunch of assorted cleanups" * tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client: ceph: remove the incorrect Fw reference check when dirtying pages ceph: Remove empty definition in header file ceph: Fix typo in the comment ceph: fix a memory leak on cap_auths in MDS client ceph: flush all caps releases when syncing the whole filesystem ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases() libceph: use min() to simplify code in ceph_dns_resolve_name() ceph: Convert to use jiffies macro ceph: Remove unused declarations |
||
|
|
cb787f4ac0 |
[tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit
|
||
|
|
b25e11f978 |
Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE
This aligned BR/EDR JUST_WORKS method with LE which since |
||
|
|
333b4fd11e |
Bluetooth: L2CAP: Fix uaf in l2cap_connect
[Syzbot reported]
BUG: KASAN: slab-use-after-free in l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949
Read of size 8 at addr ffff8880241e9800 by task kworker/u9:0/54
CPU: 0 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-00268-g788220eee30d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: hci2 hci_rx_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949
l2cap_connect_req net/bluetooth/l2cap_core.c:4080 [inline]
l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:4772 [inline]
l2cap_sig_channel net/bluetooth/l2cap_core.c:5543 [inline]
l2cap_recv_frame+0xf0b/0x8eb0 net/bluetooth/l2cap_core.c:6825
l2cap_recv_acldata+0x9b4/0xb70 net/bluetooth/l2cap_core.c:7514
hci_acldata_packet net/bluetooth/hci_core.c:3791 [inline]
hci_rx_work+0xaab/0x1610 net/bluetooth/hci_core.c:4028
process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231
process_scheduled_works kernel/workqueue.c:3312 [inline]
worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
...
Freed by task 5245:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
poison_slab_object+0xf7/0x160 mm/kasan/common.c:240
__kasan_slab_free+0x32/0x50 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2256 [inline]
slab_free mm/slub.c:4477 [inline]
kfree+0x12a/0x3b0 mm/slub.c:4598
l2cap_conn_free net/bluetooth/l2cap_core.c:1810 [inline]
kref_put include/linux/kref.h:65 [inline]
l2cap_conn_put net/bluetooth/l2cap_core.c:1822 [inline]
l2cap_conn_del+0x59d/0x730 net/bluetooth/l2cap_core.c:1802
l2cap_connect_cfm+0x9e6/0xf80 net/bluetooth/l2cap_core.c:7241
hci_connect_cfm include/net/bluetooth/hci_core.h:1960 [inline]
hci_conn_failed+0x1c3/0x370 net/bluetooth/hci_conn.c:1265
hci_abort_conn_sync+0x75a/0xb50 net/bluetooth/hci_sync.c:5583
abort_conn_sync+0x197/0x360 net/bluetooth/hci_conn.c:2917
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:328
process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231
process_scheduled_works kernel/workqueue.c:3312 [inline]
worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Reported-by: syzbot+c12e2f941af1feb5632c@syzkaller.appspotmail.com
Tested-by: syzbot+c12e2f941af1feb5632c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c12e2f941af1feb5632c
Fixes:
|
||
|
|
f53e1c9c72 |
Bluetooth: MGMT: Fix possible crash on mgmt_index_removed
If mgmt_index_removed is called while there are commands queued on
cmd_sync it could lead to crashes like the bellow trace:
0x0000053D: __list_del_entry_valid_or_report+0x98/0xdc
0x0000053D: mgmt_pending_remove+0x18/0x58 [bluetooth]
0x0000053E: mgmt_remove_adv_monitor_complete+0x80/0x108 [bluetooth]
0x0000053E: hci_cmd_sync_work+0xbc/0x164 [bluetooth]
So while handling mgmt_index_removed this attempts to dequeue
commands passed as user_data to cmd_sync.
Fixes:
|
||
|
|
92ceba94de |
netfilter: nf_tables: prevent nf_skb_duplicated corruption
syzbot found that nf_dup_ipv4() or nf_dup_ipv6() could write
per-cpu variable nf_skb_duplicated in an unsafe way [1].
Disabling preemption as hinted by the splat is not enough,
we have to disable soft interrupts as well.
[1]
BUG: using __this_cpu_write() in preemptible [00000000] code: syz.4.282/6316
caller is nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
CPU: 0 UID: 0 PID: 6316 Comm: syz.4.282 Not tainted 6.11.0-rc7-syzkaller-00104-g7052622fccb1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
check_preemption_disabled+0x10e/0x120 lib/smp_processor_id.c:49
nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
nft_dup_ipv4_eval+0x1db/0x300 net/ipv4/netfilter/nft_dup_ipv4.c:30
expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
nft_do_chain_ipv4+0x202/0x320 net/netfilter/nft_chain_filter.c:23
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
nf_hook+0x2c4/0x450 include/linux/netfilter.h:269
NF_HOOK_COND include/linux/netfilter.h:302 [inline]
ip_output+0x185/0x230 net/ipv4/ip_output.c:433
ip_local_out net/ipv4/ip_output.c:129 [inline]
ip_send_skb+0x74/0x100 net/ipv4/ip_output.c:1495
udp_send_skb+0xacf/0x1650 net/ipv4/udp.c:981
udp_sendmsg+0x1c21/0x2a60 net/ipv4/udp.c:1269
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
__do_sys_sendmmsg net/socket.c:2766 [inline]
__se_sys_sendmmsg net/socket.c:2763 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4ce4f7def9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4ce5d4a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f4ce5135f80 RCX: 00007f4ce4f7def9
RDX: 0000000000000001 RSI: 0000000020005d40 RDI: 0000000000000006
RBP: 00007f4ce4ff0b76 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f4ce5135f80 R15: 00007ffd4cbc6d68
</TASK>
Fixes:
|
||
|
|
62a0e2fa40 |
Merge tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
It looks like that most people are still traveling: both the ML volume
and the processing capacity are low.
Previous releases - regressions:
- netfilter:
- nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
- nf_tables: keep deleted flowtable hooks until after RCU
- tcp: check skb is non-NULL in tcp_rto_delta_us()
- phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not
present
- eth: virtio_net: fix mismatched buf address when unmapping for
small packets
- eth: stmmac: fix zero-division error when disabling tc cbs
- eth: bonding: fix unnecessary warnings and logs from
bond_xdp_get_xmit_slave()
Previous releases - always broken:
- netfilter:
- fix clash resolution for bidirectional flows
- fix allocation with no memcg accounting
- eth: r8169: add tally counter fields added with RTL8125
- eth: ravb: fix rx and tx frame size limit"
* tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
selftests: netfilter: Avoid hanging ipvs.sh
kselftest: add test for nfqueue induced conntrack race
netfilter: nfnetlink_queue: remove old clash resolution logic
netfilter: nf_tables: missing objects with no memcg accounting
netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
docs: tproxy: ignore non-transparent sockets in iptables
netfilter: ctnetlink: Guard possible unused functions
selftests: netfilter: nft_tproxy.sh: add tcp tests
selftests: netfilter: add reverse-clash resolution test case
netfilter: conntrack: add clash resolution for reverse collisions
netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
selftests/net: packetdrill: increase timing tolerance in debug mode
usbnet: fix cyclical race on disconnect with work queue
net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled
virtio_net: Fix mismatched buf address when unmapping for small packets
bonding: Fix unnecessary warnings and logs from bond_xdp_get_xmit_slave()
r8169: add missing MODULE_FIRMWARE entry for RTL8126A rev.b
...
|
||
|
|
4965ddb166 |
Merge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt updates from Greg KH:
"Here is the large set of USB and Thunderbolt changes for 6.12-rc1.
Nothing "major" in here, except for a new 9p network gadget that has
been worked on for a long time (all of the needed acks are here)
Other than that, it's the usual set of:
- Thunderbolt / USB4 driver updates and additions for new hardware
- dwc3 driver updates and new features added
- xhci driver updates
- typec driver updates
- USB gadget updates and api additions to make some gadgets more
configurable by userspace
- dwc2 driver updates
- usb phy driver updates
- usbip feature additions
- other minor USB driver updates
All of these have been in linux-next for a long time with no reported
issues"
* tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits)
sub: cdns3: Use predefined PCI vendor ID constant
sub: cdns2: Use predefined PCI vendor ID constant
USB: misc: yurex: fix race between read and write
USB: misc: cypress_cy7c63: check for short transfer
USB: appledisplay: close race between probe and completion handler
USB: class: CDC-ACM: fix race between get_serial and set_serial
usb: r8a66597-hcd: make read-only const arrays static
usb: typec: ucsi: Fix busy loop on ASUS VivoBooks
usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed()
usb: storage: ene_ub6250: Fix right shift warnings
usb: roles: Improve the fix for a false positive recursive locking complaint
locking/mutex: Introduce mutex_init_with_key()
locking/mutex: Define mutex_init() once
net/9p/usbg: fix CONFIG_USB_GADGET dependency
usb: xhci: fix loss of data on Cadence xHC
usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host
usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep
usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode
usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk
usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support
...
|
||
|
|
0181f8c809 |
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ... |
||
|
|
aef3a58b06 |
Merge tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says: ==================== Netfilter fixes for net v2: with kdoc fixes per Paolo Abeni. The following patchset contains Netfilter fixes for net: Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP packets to one another, two packets of the same flow in each direction handled by different CPUs that result in two conntrack objects in NEW state, where reply packet loses race. Then, patch #3 adds a testcase for this scenario. Series from Florian Westphal. 1) NAT engine can falsely detect a port collision if it happens to pick up a reply packet as NEW rather than ESTABLISHED. Add extra code to detect this and suppress port reallocation in this case. 2) To complete the clash resolution in the reply direction, extend conntrack logic to detect clashing conntrack in the reply direction to existing entry. 3) Adds a test case. Then, an assorted list of fixes follow: 4) Add a selftest for tproxy, from Antonio Ojea. 5) Guard ctnetlink_*_size() functions under #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS) From Andy Shevchenko. 6) Use -m socket --transparent in iptables tproxy documentation. From XIE Zhibang. 7) Call kfree_rcu() when releasing flowtable hooks to address race with netlink dump path, from Phil Sutter. 8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n. From Simon Horman. 9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which is its only user, to address a compilation warning. From Simon Horman. 10) Use rcu-protected list iteration over basechain hooks from netlink dump path. 11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete. 12) Remove old nfqueue conntrack clash resolution. Instead trying to use same destination address consistently which requires double DNAT, use the existing clash resolution which allows clashing packets go through with different destination. Antonio Ojea originally reported an issue from the postrouting chain, I proposed a fix: https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/ which he reported it did not work for him. 13) Adds a selftest for patch 12. 14) Fixes ipvs.sh selftest. netfilter pull request 24-09-26 * tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: Avoid hanging ipvs.sh kselftest: add test for nfqueue induced conntrack race netfilter: nfnetlink_queue: remove old clash resolution logic netfilter: nf_tables: missing objects with no memcg accounting netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n netfilter: nf_tables: Keep deleted flowtable hooks until after RCU docs: tproxy: ignore non-transparent sockets in iptables netfilter: ctnetlink: Guard possible unused functions selftests: netfilter: nft_tproxy.sh: add tcp tests selftests: netfilter: add reverse-clash resolution test case netfilter: conntrack: add clash resolution for reverse collisions netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash ==================== Link: https://patch.msgid.link/20240926110717.102194-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
8af79d3edb |
netfilter: nfnetlink_queue: remove old clash resolution logic
For historical reasons there are two clash resolution spots in netfilter, one in nfnetlink_queue and one in conntrack core. nfnetlink_queue one was added first: If a colliding entry is found, NAT NAT transformation is reversed by calling nat engine again with altered tuple. See commit |
||
|
|
69e687cea7 |
netfilter: nf_tables: missing objects with no memcg accounting
Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for
memory accounting, update them. This includes:
- catchall elements
- compat match large info area
- log prefix
- meta secctx
- numgen counters
- pipapo set backend datastructure
- tunnel private objects
Fixes:
|
||
|
|
4ffcf5ca81 |
netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
Lockless iteration over hook list is possible from netlink dump path,
use rcu variant to iterate over the hook list as is done with flowtable
hooks.
Fixes:
|
||
|
|
e1f1ee0e9a |
netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
Only provide ctnetlink_label_size when it is used,
which is when CONFIG_NF_CONNTRACK_EVENTS is configured.
Flagged by clang-18 W=1 builds as:
.../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
| ^~~~~~~~~~~~~~~~~~~~
The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
this patch guards compilation of non-trivial implementations
of ctnetlink_dump_labels() and ctnetlink_label_size().
However, this is not necessary as each of these functions
will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
as each function starts with the equivalent of:
struct nf_conn_labels *labels = nf_ct_labels_find(ct);
if (!labels)
return 0;
And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
is not enabled. So I believe that the compiler optimises the code away
in such cases anyway.
Found by inspection.
Compile tested only.
Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
added Fixes: tag.
Fixes:
|
||
|
|
fc56878ca1 |
netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
using gcc-14 results in the following warnings, which are treated as
errors:
net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
243 | struct iphdr *niph;
| ^~~~
cc1: all warnings being treated as errors
net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
286 | struct ipv6hdr *ip6h;
| ^~~~
cc1: all warnings being treated as errors
Address this by reducing the scope of these local variables to where
they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
enabled.
Compile tested and run through netfilter selftests.
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
||
|
|
642c89c475 |
netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
Documentation of list_del_rcu() warns callers to not immediately free
the deleted list item. While it seems not necessary to use the
RCU-variant of list_del() here in the first place, doing so seems to
require calling kfree_rcu() on the deleted item as well.
Fixes:
|
||
|
|
2cadd3b177 |
netfilter: ctnetlink: Guard possible unused functions
Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n
and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang,
`make W=1` and CONFIG_WERROR=y:
net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function]
657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
| ^~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function]
667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
| ^~~~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function]
683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
| ^~~~~~~~~~~~~~~~~~~~~~~~
Fix this by guarding possible unused functions with ifdeffery.
See also commit
|
||
|
|
a4e6a1031e |
netfilter: conntrack: add clash resolution for reverse collisions
Given existing entry: ORIGIN: a:b -> c:d REPLY: c:d -> a:b And colliding entry: ORIGIN: c:d -> a:b REPLY: a:b -> c:d The colliding ct (and the associated skb) get dropped on insert. Permit this by checking if the colliding entry matches the reply direction. Happens when both ends send packets at same time, both requests are picked up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the second packet. This is an esoteric condition, as ruleset must permit NEW connections in either direction and both peers must already have a bidirectional traffic flow at the time conntrack gets enabled. Allow the 'reverse' skb to pass and assign the existing (clashing) entry. While at it, also drop the extra 'dying' check, this is already tested earlier by the calling function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
|
|
d8f84a9bc7 |
netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
A conntrack entry can be inserted to the connection tracking table if there
is no existing entry with an identical tuple in either direction.
Example:
INITIATOR -> NAT/PAT -> RESPONDER
Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite).
Then, later, NAT/PAT machine itself also wants to connect to RESPONDER.
This will not work if the SNAT done earlier has same IP:PORT source pair.
Conntrack table has:
ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT
REPLY: $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
and new locally originating connection wants:
ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT
REPLY: $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
This is handled by the NAT engine which will do a source port reallocation
for the locally originating connection that is colliding with an existing
tuple by attempting a source port rewrite.
This is done even if this new connection attempt did not go through a
masquerade/snat rule.
There is a rare race condition with connection-less protocols like UDP,
where we do the port reallocation even though its not needed.
This happens when new packets from the same, pre-existing flow are received
in both directions at the exact same time on different CPUs after the
conntrack table was flushed (or conntrack becomes active for first time).
With strict ordering/single cpu, the first packet creates new ct entry and
second packet is resolved as established reply packet.
With parallel processing, both packets are picked up as new and both get
their own ct entry.
In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by
NAT engine because a port collision is detected.
This change isn't enough to prevent a packet drop later during
nf_conntrack_confirm(), the existing clash resolution strategy will not
detect such reverse clash case. This is resolved by a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
||
|
|
efcd71af38 |
vsock/virtio: avoid queuing packets when intermediate queue is empty
When the driver needs to send new packets to the device, it always queues the new sk_buffs into an intermediate queue (send_pkt_queue) and schedules a worker (send_pkt_work) to then queue them into the virtqueue exposed to the device. This increases the chance of batching, but also introduces a lot of latency into the communication. So we can optimize this path by adding a fast path to be taken when there is no element in the intermediate queue, there is space available in the virtqueue, and no other process that is sending packets (tx_lock held). The following benchmarks were run to check improvements in latency and throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz and L1 guest running on QEMU/KVM with vhost process and all vCPUs pinned individually to pCPUs. - Latency Tool: Fio version 3.37-56 Mode: pingpong (h-g-h) Test runs: 50 Runtime-per-test: 50s Type: SOCK_STREAM In the following fio benchmark (pingpong mode) the host sends a payload to the guest and waits for the same payload back. fio process pinned both inside the host and the guest system. Before: Linux 6.9.8 Payload 64B: 1st perc. overall 99th perc. Before 12.91 16.78 42.24 us After 9.77 13.57 39.17 us Payload 512B: 1st perc. overall 99th perc. Before 13.35 17.35 41.52 us After 10.25 14.11 39.58 us Payload 4K: 1st perc. overall 99th perc. Before 14.71 19.87 41.52 us After 10.51 14.96 40.81 us - Throughput Tool: iperf-vsock The size represents the buffer length (-l) to read/write P represents the number of parallel streams P=1 4K 64K 128K Before 6.87 29.3 29.5 Gb/s After 10.5 39.4 39.9 Gb/s P=2 4K 64K 128K Before 10.5 32.8 33.2 Gb/s After 17.8 47.7 48.5 Gb/s P=4 4K 64K 128K Before 12.7 33.6 34.2 Gb/s After 16.9 48.1 50.5 Gb/s The performance improvement is related to this optimization, I used a ebpf kretprobe on virtio_transport_send_skb to check that each packet was sent directly to the virtqueue Co-developed-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> |
||
|
|
26618da3b2 |
vsock/virtio: refactor virtio_transport_send_pkt_work
Preliminary patch to introduce an optimization to the enqueue system. All the code used to enqueue a packet into the virtqueue is removed from virtio_transport_send_pkt_work() and moved to the new virtio_transport_send_skb() function. Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> |
||
|
|
684a64bf32 |
Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker: "New Features: - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention - Add support for the LOCALIO protocol extention Bugfixes: - Fix memory leak in error path of nfs4_do_reclaim() - Simplify and guarantee lock owner uniqueness - Fix -Wformat-truncation warning - Fix folio refcounts by using folio_attach_private() - Fix failing the mount system call when the server is down - Fix detection of "Proxying of Times" server support Cleanups: - Annotate struct nfs_cache_array with __counted_by() - Remove unnecessary NULL checks before kfree() - Convert RPC_TASK_* constants to an enum - Remove obsolete or misleading comments and declerations" * tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits) nfs: Fix `make htmldocs` warnings in the localio documentation nfs: add "NFS Client and Server Interlock" section to localio.rst nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst nfs: add Documentation/filesystems/nfs/localio.rst nfs: implement client support for NFS_LOCALIO_PROGRAM nfs/localio: use dedicated workqueues for filesystem read and write pnfs/flexfiles: enable localio support nfs: enable localio for non-pNFS IO nfs: add LOCALIO support nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit nfsd: implement server support for NFS_LOCALIO_PROGRAM nfsd: add LOCALIO support nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO nfs_common: add NFS LOCALIO auxiliary protocol enablement SUNRPC: replace program list with program array SUNRPC: add svcauth_map_clnt_to_svc_cred_local SUNRPC: remove call_allocate() BUG_ONs nfsd: add nfsd_serv_try_get and nfsd_serv_put nfsd: add nfsd_file_acquire_local() nfsd: factor out __fh_verify to allow NULL rqstp to be passed ... |
||
|
|
fa8380a06b |
Merge tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf 'struct fd' updates from Alexei Starovoitov: "This includes struct_fd BPF changes from Al and Andrii" * tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf: convert bpf_token_create() to CLASS(fd, ...) security,bpf: constify struct path in bpf_token_create() LSM hook bpf: more trivial fdget() conversions bpf: trivial conversions for fdget() bpf: switch maps to CLASS(fd, ...) bpf: factor out fetching bpf_map from FD and adding it to used_maps list bpf: switch fdget_raw() uses to CLASS(fd_raw, ...) bpf: convert __bpf_prog_get() to CLASS(fd, ...) |