linux/net
Mark Gray b83d23a2a3 openvswitch: Introduce per-cpu upcall dispatch
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.

This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:

* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)

This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.

In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:

a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.

The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html

Bugzilla: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16 11:06:33 -07:00
..
6lowpan
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net/802/garp: fix memleak in garp_request_join() 2021-07-01 11:21:57 -07:00
8021q net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features 2021-06-18 11:58:03 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bpf bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUN 2021-07-07 19:51:13 -07:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: multicast: fix MRD advertisement router port marking race 2021-07-11 12:11:06 -07:00
caif net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core rtnetlink: use nlmsg_notify() in rtnetlink_send() 2021-07-16 10:46:35 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
decnet decnet: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
dns_resolver
dsa net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() 2021-07-13 14:47:10 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool net: sock: extend SO_TIMESTAMPING for PHC binding 2021-07-01 13:08:18 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 ieee802154: fix error return code in ieee802154_llsec_getparams() 2021-06-03 10:59:49 +02:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-15 22:40:10 -07:00
ipv6 ipv6: remove unnecessary local variable 2021-07-15 10:26:03 -07:00
iucv s390: iucv: Avoid field over-reading memcpy() 2021-07-01 15:54:01 -07:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154 net: mac802154: Fix general protection fault 2021-04-06 22:42:16 +02:00
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter netfilter: nft_last: incorrect arithmetics when restoring last used 2021-07-06 14:15:13 +02:00
netlabel netlabel: Fix memory leak in netlbl_mgmt_add_common 2021-06-15 11:19:04 -07:00
netlink net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
netrom net: netrom: Fix fall-through warnings for Clang 2021-05-17 19:57:08 -05:00
nfc TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
nsh
openvswitch openvswitch: Introduce per-cpu upcall dispatch 2021-07-16 11:06:33 -07:00
packet Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
phonet
psample
qrtr net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
rds Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
sched net/sched: Remove unnecessary if statement 2021-07-16 10:46:35 -07:00
sctp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
smc net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
switchdev net: switchdev: add a context void pointer to struct switchdev_notifier_info 2021-06-28 14:09:03 -07:00
tipc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-15 22:40:10 -07:00
vmw_vsock Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
wireless cfg80211: Support hidden AP discovery over 6GHz band 2021-06-23 13:05:09 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
socket.c net: socket: support hardware timestamp conversion to PHC bound 2021-07-01 13:08:18 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00