linux/include/net
Jakub Kicinski 759ab1edb5 net: store netdevs in an xarray
Iterating over the netdev hash table for netlink dumps is hard.
Dumps are done in "chunks" so we need to save the position
after each chunk, so we know where to restart from. Because
netdevs are stored in a hash table we remember which bucket
we were in and how many devices we dumped.

Since we don't hold any locks across the "chunks" - devices may
come and go while we're dumping. If that happens we may miss
a device (if device is deleted from the bucket we were in).
We indicate to user space that this may have happened by setting
NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
if it sees that. Somehow I doubt most user space gets this right..

To illustrate let's look at an example:

               System state:
  start:       # [A, B, C]
  del:  B      # [A, C]

with the hash table we may dump [A, B], missing C completely even
tho it existed both before and after the "del B".

Add an xarray and use it to allocate ifindexes. This way we
can iterate ifindexes in order, without the worry that we'll
skip one. We may still generate a dump of a state which "never
existed", for example for a set of values and sequence of ops:

               System state:
  start:       # [A, B]
  add:  C      # [A, C, B]
  del:  B      # [A, C]

we may generate a dump of [A], if C got an index between A and B.
System has never been in such state. But I'm 90% sure that's perfectly
fine, important part is that we can't _miss_ devices which exist before
and after. User space which wants to mirror kernel's state subscribes
to notifications and does periodic dumps so it will know that C exists
from the notification about its creation or from the next dump
(next dump is _guaranteed_ to include C, if it doesn't get removed).

To avoid any perf regressions keep the hash table for now. Most
net namespaces have very few devices and microbenchmarking 1M lookups
on Skylake I get the following results (not counting loopback
to number of devs):

 #devs | hash |  xa  | delta
    2  | 18.3 | 20.1 | + 9.8%
   16  | 18.3 | 20.1 | + 9.5%
   64  | 18.3 | 26.3 | +43.8%
  128  | 20.4 | 26.3 | +28.6%
  256  | 20.0 | 26.4 | +32.1%
 1024  | 26.6 | 26.7 | + 0.2%
 8192  |541.3 | 33.5 | -93.8%

No surprises since the hash table has 256 entries.
The microbenchmark scans indexes in order, if the pattern is more
random xa starts to win at 512 devices already. But that's a lot
of devices, in practice.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:35:58 -07:00
..
9p 9p: Add additional debug flags and open modes 2023-03-27 02:33:48 +00:00
bluetooth Bluetooth: coredump: fix building with coredump disabled 2023-07-20 11:25:24 -07:00
caif
iucv
mana Linux 6.4 2023-06-27 14:06:29 -03:00
netfilter netfilter: allow exp not to be removed in nf_ct_find_expectation 2023-07-20 10:06:36 +02:00
netns tcp: get rid of sysctl_tcp_adv_win_scale 2023-07-18 18:41:18 -07:00
nfc
phonet net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
sctp sctp: delete the nested flexible array peer_init 2023-04-21 08:19:30 +01:00
tc_act net/sched: act_connmark: transition to percpu stats and rcu 2023-02-16 10:39:28 +01:00
6lowpan.h
Space.h
act_api.h net/sched: Rename user cookie and act cookie 2023-02-20 16:46:10 -08:00
addrconf.h ipv6: constify inet6_mc_check() 2023-03-17 08:56:37 +00:00
af_ieee802154.h
af_rxrpc.h rxrpc: Fix timeout of a call that hasn't yet been granted a channel 2023-05-01 07:43:19 +01:00
af_unix.h af_unix: preserve const qualifier in unix_sk() 2023-03-18 12:23:33 +00:00
af_vsock.h vsock: support sockmap 2023-03-29 08:19:38 +01:00
ah.h
amt.h
arp.h neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
atmclip.h
ax25.h x25: preserve const qualifier in [a]x25_sk() 2023-03-18 12:23:34 +00:00
ax88796.h
bareudp.h
bond_3ad.h
bond_alb.h
bond_options.h
bonding.h net: bonding: remove kernel-doc comment marker 2023-07-14 20:39:29 -07:00
bpf_sk_storage.h
busy_poll.h
calipso.h
cfg80211-wext.h
cfg80211.h wifi: cfg80211: Retrieve PSD information from RNR AP information 2023-06-21 14:01:29 +02:00
cfg802154.h net: cfg802154: fix kernel-doc notation warnings 2023-07-14 20:39:29 -07:00
checksum.h net: checksum: drop the linux/uaccess.h include 2023-01-27 11:19:46 +00:00
cipso_ipv4.h
cls_cgroup.h
codel.h codel: fix kernel-doc notation warnings 2023-07-14 20:39:29 -07:00
codel_impl.h
codel_qdisc.h
compat.h
datalink.h net: datalink: Remove unused declarations 2023-07-27 17:17:32 -07:00
dcbevent.h
dcbnl.h net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps 2023-01-20 09:33:22 +00:00
devlink.h devlink: fix kernel-doc notation warnings 2023-07-14 20:39:29 -07:00
dropreason-core.h tcp: add TCP_OLD_SEQUENCE drop reason 2023-07-20 12:49:40 +02:00
dropreason.h mac80211: use the new drop reasons infrastructure 2023-04-20 20:20:49 -07:00
dsa.h net: dsa: remove legacy_pre_march2020 detection 2023-07-18 09:47:08 +02:00
dsa_stubs.h net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub 2023-04-09 15:35:49 +01:00
dsfield.h
dst.h net: dst: Switch to rcuref_t reference counting 2023-03-28 18:52:28 -07:00
dst_cache.h
dst_metadata.h xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF 2022-12-05 21:58:27 -08:00
dst_ops.h ipv6: remove max_size check inline with ipv4 2023-01-13 20:59:14 -08:00
erspan.h
esp.h
espintcp.h
ethoc.h
failover.h
fib_notifier.h
fib_rules.h
firewire.h
flow.h ipv4: Drop tos parameter from flowi4_update_output() 2023-06-02 10:52:38 +01:00
flow_dissector.h net: flow_dissector: add support for cfm packets 2023-06-12 17:01:45 -07:00
flow_offload.h net/sched: cls_api: Support hardware miss to tc action 2023-02-20 16:46:10 -08:00
fou.h bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fq.h
fq_impl.h
garp.h
gen_stats.h
genetlink.h
geneve.h
gre.h
gro.h net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
gro_cells.h
gso.h net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
gtp.h
gue.h
handshake.h net/handshake: Enable the SNI extension to work properly 2023-05-24 22:05:24 -07:00
hwbm.h
icmp.h
ieee80211_radiotap.h wifi: iwlwifi: mvm: support U-SIG EHT validate checks 2023-06-14 12:32:19 +02:00
ieee802154_netdev.h mac802154: Handle received BEACON_REQ 2023-03-23 21:51:30 +01:00
if_inet6.h
ife.h
ila.h
inet6_connection_sock.h
inet6_hashtables.h
inet_common.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
inet_connection_sock.h
inet_dscp.h
inet_ecn.h
inet_frag.h inet: frags: eliminate kernel-doc warning 2023-07-14 20:39:29 -07:00
inet_hashtables.h tcp: Add TIME_WAIT sockets in bhash2. 2022-12-30 07:25:52 +00:00
inet_sock.h inet: preserve const qualifier in inet_sk() 2023-03-17 08:56:37 +00:00
inet_timewait_sock.h tcp: Add TIME_WAIT sockets in bhash2. 2022-12-30 07:25:52 +00:00
inetpeer.h
ioam6.h
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-05-25 19:57:39 -07:00
ip6_checksum.h
ip6_fib.h ipv6: Remove in6addr_any alternatives. 2023-03-29 08:22:52 +01:00
ip6_route.h IPv6: add extack info for IPv6 address add/delete 2023-07-28 11:01:56 +01:00
ip6_tunnel.h
ip_fib.h
ip_tunnels.h ip_tunnels: Add nexthop ID field to ip_tunnel_key 2023-07-19 10:53:48 +01:00
ip_vs.h ipvs: Correct spelling in comments 2023-04-22 01:39:41 +02:00
ipcomp.h
ipconfig.h
ipv6.h tcp: Reduce chance of collisions in inet6_hashfn(). 2023-07-24 16:52:37 -07:00
ipv6_frag.h
ipv6_stubs.h
iw_handler.h
kcm.h kcm: Send multiple frags in one sendmsg() 2023-06-12 21:13:23 -07:00
l3mdev.h
lag.h
lapb.h
lib80211.h
llc.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: Check netns in llc_estab_match() and llc_listener_match(). 2023-07-20 10:46:28 +02:00
llc_if.h
llc_pdu.h net: llc: fix kernel-doc notation warnings 2023-07-14 20:39:29 -07:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
lwtunnel.h
mac80211.h wifi: mac80211: fix documentation config reference 2023-06-21 09:16:57 +02:00
mac802154.h
macsec.h macsec: Use helper macsec_netdev_priv for offload drivers 2023-05-10 11:32:09 +01:00
mctp.h mctp: Reorder fields in 'struct mctp_route' 2023-06-20 20:06:16 -07:00
mctpdevice.h
mip6.h
mld.h
mpls.h
mpls_iptunnel.h
mptcp.h mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2022-12-12 13:11:24 -08:00
mrp.h
ncsi.h
ndisc.h neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
neighbour.h neighbour: fix unaligned access to pneigh_entry 2023-06-01 21:36:37 -07:00
net_debug.h
net_failover.h
net_namespace.h net: store netdevs in an xarray 2023-07-28 11:35:58 -07:00
net_ratelimit.h
net_trackers.h
netdev_queues.h net: add macro netif_subqueue_completed_wake 2023-04-18 12:59:01 +02:00
netevent.h
netlabel.h
netlink.h netlink: allow be16 and be32 types in all uint policy checks 2023-07-27 13:45:51 +02:00
netprio_cgroup.h
netrom.h
nexthop.h ipv6: remove nexthop_fib6_nh_bh() 2023-05-11 18:07:05 -07:00
nl802154.h ieee802154: Add support for user beaconing requests 2023-01-28 13:51:22 +01:00
nsh.h net: NSH: fix kernel-doc notation warning 2023-07-14 20:39:29 -07:00
p8022.h
page_pool.h net: page_pool: hide page_pool_release_page() 2023-07-21 18:50:18 -07:00
pie.h pie: fix kernel-doc notation warning 2023-07-14 20:39:30 -07:00
ping.h net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294 2023-06-02 09:55:22 +01:00
pkt_cls.h sch_htb: Allow HTB quantum parameter in offload mode 2023-07-21 09:55:53 +01:00
pkt_sched.h net/sched: make psched_mtu() RTNL-less safe 2023-07-12 15:59:33 -07:00
pptp.h
protocol.h
psample.h
psnap.h
raw.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
rawv6.h ipv6: raw: constify raw_v6_match() socket argument 2023-03-17 08:56:37 +00:00
red.h
regulatory.h wifi: cfg80211: fix regulatory disconnect with OCB/NAN 2023-06-19 12:05:29 +02:00
request_sock.h
rose.h
route.h ipv4: Constify the sk parameter of ip_route_output_*(). 2023-07-14 08:27:33 +01:00
rpl.h ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv(). 2023-06-19 11:32:58 -07:00
rsi_91x.h rsi: remove kernel-doc comment marker 2023-07-14 20:39:30 -07:00
rtnetlink.h
rtnh.h
sch_generic.h bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
scm.h net: scm: introduce and use scm_recv_unix helper 2023-06-27 10:50:22 -07:00
secure_seq.h
seg6.h
seg6_hmac.h
seg6_local.h
selftests.h
slhc_vj.h
smc.h net/smc: Introduce explicit check for v2 support 2023-03-15 08:18:35 +00:00
snmp.h
sock.h ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
sock_reuseport.h
stp.h
strparser.h
switchdev.h net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
tc_wrapper.h net/sched: Retire rsvp classifier 2023-02-16 09:27:07 +01:00
tcp.h mptcp: fix rcv buffer auto-tuning 2023-07-24 16:36:05 -07:00
tcp_states.h
tcx.h bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
timewait_sock.h
tipc.h
tls.h net: tls: make the offload check helper take skb not socket 2023-06-15 09:01:05 +01:00
tls_toe.h
transp_v6.h
tso.h net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
tun_proto.h
udp.h net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
udp_tunnel.h
udplite.h
vsock_addr.h
vxlan.h vxlan: calculate correct header length for GPE 2023-07-24 09:37:32 +01:00
wext.h
x25.h x25: preserve const qualifier in [a]x25_sk() 2023-03-18 12:23:34 +00:00
x25device.h
xdp.h bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
xdp_priv.h
xdp_sock.h xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path 2023-07-19 09:56:49 -07:00
xdp_sock_drv.h xsk: support mbuf on ZC RX 2023-07-19 09:56:49 -07:00
xfrm.h xfrm: Treat already-verified secpath entries as optional 2023-05-21 09:21:37 +02:00
xsk_buff_pool.h xsk: support mbuf on ZC RX 2023-07-19 09:56:49 -07:00