Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
This commit is contained in:
@@ -6,7 +6,8 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
|
||||
endif
|
||||
CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
|
||||
|
||||
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
|
||||
|
||||
@@ -307,8 +307,8 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
struct bpf_array *array = container_of(map, struct bpf_array, map);
|
||||
u32 index = *(u32 *)key;
|
||||
@@ -386,7 +386,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int array_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long array_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -686,8 +686,8 @@ static const struct bpf_iter_seq_info iter_seq_info = {
|
||||
.seq_priv_size = sizeof(struct bpf_iter_seq_array_map_info),
|
||||
};
|
||||
|
||||
static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_fn,
|
||||
void *callback_ctx, u64 flags)
|
||||
static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_fn,
|
||||
void *callback_ctx, u64 flags)
|
||||
{
|
||||
u32 i, key, num_elems = 0;
|
||||
struct bpf_array *array;
|
||||
@@ -721,6 +721,28 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
|
||||
return num_elems;
|
||||
}
|
||||
|
||||
static u64 array_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_array *array = container_of(map, struct bpf_array, map);
|
||||
bool percpu = map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
|
||||
u32 elem_size = array->elem_size;
|
||||
u64 entries = map->max_entries;
|
||||
u64 usage = sizeof(*array);
|
||||
|
||||
if (percpu) {
|
||||
usage += entries * sizeof(void *);
|
||||
usage += entries * elem_size * num_possible_cpus();
|
||||
} else {
|
||||
if (map->map_flags & BPF_F_MMAPABLE) {
|
||||
usage = PAGE_ALIGN(usage);
|
||||
usage += PAGE_ALIGN(entries * elem_size);
|
||||
} else {
|
||||
usage += entries * elem_size;
|
||||
}
|
||||
}
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(array_map_btf_ids, struct, bpf_array)
|
||||
const struct bpf_map_ops array_map_ops = {
|
||||
.map_meta_equal = array_map_meta_equal,
|
||||
@@ -742,6 +764,7 @@ const struct bpf_map_ops array_map_ops = {
|
||||
.map_update_batch = generic_map_update_batch,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_array_elem,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
};
|
||||
@@ -762,6 +785,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
|
||||
.map_update_batch = generic_map_update_batch,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_array_elem,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
};
|
||||
@@ -847,7 +871,7 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int fd_array_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_array *array = container_of(map, struct bpf_array, map);
|
||||
void *old_ptr;
|
||||
@@ -1156,6 +1180,7 @@ const struct bpf_map_ops prog_array_map_ops = {
|
||||
.map_fd_sys_lookup_elem = prog_fd_array_sys_lookup_elem,
|
||||
.map_release_uref = prog_array_map_clear,
|
||||
.map_seq_show_elem = prog_array_map_seq_show_elem,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -1257,6 +1282,7 @@ const struct bpf_map_ops perf_event_array_map_ops = {
|
||||
.map_fd_put_ptr = perf_event_fd_array_put_ptr,
|
||||
.map_release = perf_event_fd_array_release,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -1291,6 +1317,7 @@ const struct bpf_map_ops cgroup_array_map_ops = {
|
||||
.map_fd_get_ptr = cgroup_fd_array_get_ptr,
|
||||
.map_fd_put_ptr = cgroup_fd_array_put_ptr,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
};
|
||||
#endif
|
||||
@@ -1379,5 +1406,6 @@ const struct bpf_map_ops array_of_maps_map_ops = {
|
||||
.map_lookup_batch = generic_map_lookup_batch,
|
||||
.map_update_batch = generic_map_update_batch,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = array_map_mem_usage,
|
||||
.map_btf_id = &array_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -16,13 +16,6 @@ struct bpf_bloom_filter {
|
||||
struct bpf_map map;
|
||||
u32 bitset_mask;
|
||||
u32 hash_seed;
|
||||
/* If the size of the values in the bloom filter is u32 aligned,
|
||||
* then it is more performant to use jhash2 as the underlying hash
|
||||
* function, else we use jhash. This tracks the number of u32s
|
||||
* in an u32-aligned value size. If the value size is not u32 aligned,
|
||||
* this will be 0.
|
||||
*/
|
||||
u32 aligned_u32_count;
|
||||
u32 nr_hash_funcs;
|
||||
unsigned long bitset[];
|
||||
};
|
||||
@@ -32,16 +25,15 @@ static u32 hash(struct bpf_bloom_filter *bloom, void *value,
|
||||
{
|
||||
u32 h;
|
||||
|
||||
if (bloom->aligned_u32_count)
|
||||
h = jhash2(value, bloom->aligned_u32_count,
|
||||
bloom->hash_seed + index);
|
||||
if (likely(value_size % 4 == 0))
|
||||
h = jhash2(value, value_size / 4, bloom->hash_seed + index);
|
||||
else
|
||||
h = jhash(value, value_size, bloom->hash_seed + index);
|
||||
|
||||
return h & bloom->bitset_mask;
|
||||
}
|
||||
|
||||
static int bloom_map_peek_elem(struct bpf_map *map, void *value)
|
||||
static long bloom_map_peek_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
struct bpf_bloom_filter *bloom =
|
||||
container_of(map, struct bpf_bloom_filter, map);
|
||||
@@ -56,7 +48,7 @@ static int bloom_map_peek_elem(struct bpf_map *map, void *value)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
|
||||
static long bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
|
||||
{
|
||||
struct bpf_bloom_filter *bloom =
|
||||
container_of(map, struct bpf_bloom_filter, map);
|
||||
@@ -73,12 +65,12 @@ static int bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bloom_map_pop_elem(struct bpf_map *map, void *value)
|
||||
static long bloom_map_pop_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static int bloom_map_delete_elem(struct bpf_map *map, void *value)
|
||||
static long bloom_map_delete_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
@@ -152,11 +144,6 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
|
||||
bloom->nr_hash_funcs = nr_hash_funcs;
|
||||
bloom->bitset_mask = bitset_mask;
|
||||
|
||||
/* Check whether the value size is u32-aligned */
|
||||
if ((attr->value_size & (sizeof(u32) - 1)) == 0)
|
||||
bloom->aligned_u32_count =
|
||||
attr->value_size / sizeof(u32);
|
||||
|
||||
if (!(attr->map_flags & BPF_F_ZERO_SEED))
|
||||
bloom->hash_seed = get_random_u32();
|
||||
|
||||
@@ -177,8 +164,8 @@ static void *bloom_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
static int bloom_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
static long bloom_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
{
|
||||
/* The eBPF program should use map_push_elem instead */
|
||||
return -EINVAL;
|
||||
@@ -193,6 +180,17 @@ static int bloom_map_check_btf(const struct bpf_map *map,
|
||||
return btf_type_is_void(key_type) ? 0 : -EINVAL;
|
||||
}
|
||||
|
||||
static u64 bloom_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_bloom_filter *bloom;
|
||||
u64 bitset_bytes;
|
||||
|
||||
bloom = container_of(map, struct bpf_bloom_filter, map);
|
||||
bitset_bytes = BITS_TO_BYTES((u64)bloom->bitset_mask + 1);
|
||||
bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
|
||||
return sizeof(*bloom) + bitset_bytes;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(bpf_bloom_map_btf_ids, struct, bpf_bloom_filter)
|
||||
const struct bpf_map_ops bloom_filter_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -206,5 +204,6 @@ const struct bpf_map_ops bloom_filter_map_ops = {
|
||||
.map_update_elem = bloom_map_update_elem,
|
||||
.map_delete_elem = bloom_map_delete_elem,
|
||||
.map_check_btf = bloom_map_check_btf,
|
||||
.map_mem_usage = bloom_map_mem_usage,
|
||||
.map_btf_id = &bpf_bloom_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -46,8 +46,6 @@ static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
|
||||
void bpf_cgrp_storage_free(struct cgroup *cgroup)
|
||||
{
|
||||
struct bpf_local_storage *local_storage;
|
||||
bool free_cgroup_storage = false;
|
||||
unsigned long flags;
|
||||
|
||||
rcu_read_lock();
|
||||
local_storage = rcu_dereference(cgroup->bpf_cgrp_storage);
|
||||
@@ -57,14 +55,9 @@ void bpf_cgrp_storage_free(struct cgroup *cgroup)
|
||||
}
|
||||
|
||||
bpf_cgrp_storage_lock();
|
||||
raw_spin_lock_irqsave(&local_storage->lock, flags);
|
||||
free_cgroup_storage = bpf_local_storage_unlink_nolock(local_storage);
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
bpf_local_storage_destroy(local_storage);
|
||||
bpf_cgrp_storage_unlock();
|
||||
rcu_read_unlock();
|
||||
|
||||
if (free_cgroup_storage)
|
||||
kfree_rcu(local_storage, rcu);
|
||||
}
|
||||
|
||||
static struct bpf_local_storage_data *
|
||||
@@ -100,8 +93,8 @@ static void *bpf_cgrp_storage_lookup_elem(struct bpf_map *map, void *key)
|
||||
return sdata ? sdata->data : NULL;
|
||||
}
|
||||
|
||||
static int bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
static long bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_local_storage_data *sdata;
|
||||
struct cgroup *cgroup;
|
||||
@@ -128,11 +121,11 @@ static int cgroup_storage_delete(struct cgroup *cgroup, struct bpf_map *map)
|
||||
if (!sdata)
|
||||
return -ENOENT;
|
||||
|
||||
bpf_selem_unlink(SELEM(sdata), true);
|
||||
bpf_selem_unlink(SELEM(sdata), false);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
static long bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct cgroup *cgroup;
|
||||
int err, fd;
|
||||
@@ -156,7 +149,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
|
||||
|
||||
static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
return bpf_local_storage_map_alloc(attr, &cgroup_cache);
|
||||
return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
|
||||
}
|
||||
|
||||
static void cgroup_storage_map_free(struct bpf_map *map)
|
||||
@@ -221,6 +214,7 @@ const struct bpf_map_ops cgrp_storage_map_ops = {
|
||||
.map_update_elem = bpf_cgrp_storage_update_elem,
|
||||
.map_delete_elem = bpf_cgrp_storage_delete_elem,
|
||||
.map_check_btf = bpf_local_storage_map_check_btf,
|
||||
.map_mem_usage = bpf_local_storage_map_mem_usage,
|
||||
.map_btf_id = &bpf_local_storage_map_btf_id[0],
|
||||
.map_owner_storage_ptr = cgroup_storage_ptr,
|
||||
};
|
||||
@@ -230,7 +224,7 @@ const struct bpf_func_proto bpf_cgrp_storage_get_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &bpf_cgroup_btf_id[0],
|
||||
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
@@ -241,6 +235,6 @@ const struct bpf_func_proto bpf_cgrp_storage_delete_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &bpf_cgroup_btf_id[0],
|
||||
};
|
||||
|
||||
@@ -57,7 +57,6 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
|
||||
void bpf_inode_storage_free(struct inode *inode)
|
||||
{
|
||||
struct bpf_local_storage *local_storage;
|
||||
bool free_inode_storage = false;
|
||||
struct bpf_storage_blob *bsb;
|
||||
|
||||
bsb = bpf_inode(inode);
|
||||
@@ -72,13 +71,8 @@ void bpf_inode_storage_free(struct inode *inode)
|
||||
return;
|
||||
}
|
||||
|
||||
raw_spin_lock_bh(&local_storage->lock);
|
||||
free_inode_storage = bpf_local_storage_unlink_nolock(local_storage);
|
||||
raw_spin_unlock_bh(&local_storage->lock);
|
||||
bpf_local_storage_destroy(local_storage);
|
||||
rcu_read_unlock();
|
||||
|
||||
if (free_inode_storage)
|
||||
kfree_rcu(local_storage, rcu);
|
||||
}
|
||||
|
||||
static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
|
||||
@@ -94,8 +88,8 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
|
||||
return sdata ? sdata->data : NULL;
|
||||
}
|
||||
|
||||
static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_local_storage_data *sdata;
|
||||
struct fd f = fdget_raw(*(int *)key);
|
||||
@@ -122,12 +116,12 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
|
||||
if (!sdata)
|
||||
return -ENOENT;
|
||||
|
||||
bpf_selem_unlink(SELEM(sdata), true);
|
||||
bpf_selem_unlink(SELEM(sdata), false);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct fd f = fdget_raw(*(int *)key);
|
||||
int err;
|
||||
@@ -197,7 +191,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
|
||||
|
||||
static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
return bpf_local_storage_map_alloc(attr, &inode_cache);
|
||||
return bpf_local_storage_map_alloc(attr, &inode_cache, false);
|
||||
}
|
||||
|
||||
static void inode_storage_map_free(struct bpf_map *map)
|
||||
@@ -215,6 +209,7 @@ const struct bpf_map_ops inode_storage_map_ops = {
|
||||
.map_update_elem = bpf_fd_inode_storage_update_elem,
|
||||
.map_delete_elem = bpf_fd_inode_storage_delete_elem,
|
||||
.map_check_btf = bpf_local_storage_map_check_btf,
|
||||
.map_mem_usage = bpf_local_storage_map_mem_usage,
|
||||
.map_btf_id = &bpf_local_storage_map_btf_id[0],
|
||||
.map_owner_storage_ptr = inode_storage_ptr,
|
||||
};
|
||||
@@ -226,7 +221,7 @@ const struct bpf_func_proto bpf_inode_storage_get_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &bpf_inode_storage_btf_ids[0],
|
||||
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
@@ -237,6 +232,6 @@ const struct bpf_func_proto bpf_inode_storage_delete_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &bpf_inode_storage_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -776,3 +776,73 @@ const struct bpf_func_proto bpf_loop_proto = {
|
||||
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
struct bpf_iter_num_kern {
|
||||
int cur; /* current value, inclusive */
|
||||
int end; /* final value, exclusive */
|
||||
} __aligned(8);
|
||||
|
||||
__diag_push();
|
||||
__diag_ignore_all("-Wmissing-prototypes",
|
||||
"Global functions as their definitions will be in vmlinux BTF");
|
||||
|
||||
__bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
|
||||
{
|
||||
struct bpf_iter_num_kern *s = (void *)it;
|
||||
|
||||
BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num));
|
||||
BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num));
|
||||
|
||||
BTF_TYPE_EMIT(struct btf_iter_num);
|
||||
|
||||
/* start == end is legit, it's an empty range and we'll just get NULL
|
||||
* on first (and any subsequent) bpf_iter_num_next() call
|
||||
*/
|
||||
if (start > end) {
|
||||
s->cur = s->end = 0;
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* avoid overflows, e.g., if start == INT_MIN and end == INT_MAX */
|
||||
if ((s64)end - (s64)start > BPF_MAX_LOOPS) {
|
||||
s->cur = s->end = 0;
|
||||
return -E2BIG;
|
||||
}
|
||||
|
||||
/* user will call bpf_iter_num_next() first,
|
||||
* which will set s->cur to exactly start value;
|
||||
* underflow shouldn't matter
|
||||
*/
|
||||
s->cur = start - 1;
|
||||
s->end = end;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
__bpf_kfunc int *bpf_iter_num_next(struct bpf_iter_num* it)
|
||||
{
|
||||
struct bpf_iter_num_kern *s = (void *)it;
|
||||
|
||||
/* check failed initialization or if we are done (same behavior);
|
||||
* need to be careful about overflow, so convert to s64 for checks,
|
||||
* e.g., if s->cur == s->end == INT_MAX, we can't just do
|
||||
* s->cur + 1 >= s->end
|
||||
*/
|
||||
if ((s64)(s->cur + 1) >= s->end) {
|
||||
s->cur = s->end = 0;
|
||||
return NULL;
|
||||
}
|
||||
|
||||
s->cur++;
|
||||
|
||||
return &s->cur;
|
||||
}
|
||||
|
||||
__bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter_num *it)
|
||||
{
|
||||
struct bpf_iter_num_kern *s = (void *)it;
|
||||
|
||||
s->cur = s->end = 0;
|
||||
}
|
||||
|
||||
__diag_pop();
|
||||
|
||||
@@ -51,11 +51,21 @@ owner_storage(struct bpf_local_storage_map *smap, void *owner)
|
||||
return map->ops->map_owner_storage_ptr(owner);
|
||||
}
|
||||
|
||||
static bool selem_linked_to_storage_lockless(const struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
return !hlist_unhashed_lockless(&selem->snode);
|
||||
}
|
||||
|
||||
static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
return !hlist_unhashed(&selem->snode);
|
||||
}
|
||||
|
||||
static bool selem_linked_to_map_lockless(const struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
return !hlist_unhashed_lockless(&selem->map_node);
|
||||
}
|
||||
|
||||
static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
return !hlist_unhashed(&selem->map_node);
|
||||
@@ -70,11 +80,28 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
|
||||
if (charge_mem && mem_charge(smap, owner, smap->elem_size))
|
||||
return NULL;
|
||||
|
||||
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
|
||||
gfp_flags | __GFP_NOWARN);
|
||||
if (smap->bpf_ma) {
|
||||
migrate_disable();
|
||||
selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
|
||||
migrate_enable();
|
||||
if (selem)
|
||||
/* Keep the original bpf_map_kzalloc behavior
|
||||
* before started using the bpf_mem_cache_alloc.
|
||||
*
|
||||
* No need to use zero_map_value. The bpf_selem_free()
|
||||
* only does bpf_mem_cache_free when there is
|
||||
* no other bpf prog is using the selem.
|
||||
*/
|
||||
memset(SDATA(selem)->data, 0, smap->map.value_size);
|
||||
} else {
|
||||
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
|
||||
gfp_flags | __GFP_NOWARN);
|
||||
}
|
||||
|
||||
if (selem) {
|
||||
if (value)
|
||||
copy_map_value(&smap->map, SDATA(selem)->data, value);
|
||||
/* No need to call check_and_init_map_value as memory is zero init */
|
||||
return selem;
|
||||
}
|
||||
|
||||
@@ -84,7 +111,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void bpf_local_storage_free_rcu(struct rcu_head *rcu)
|
||||
/* rcu tasks trace callback for bpf_ma == false */
|
||||
static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
struct bpf_local_storage *local_storage;
|
||||
|
||||
@@ -98,7 +126,66 @@ void bpf_local_storage_free_rcu(struct rcu_head *rcu)
|
||||
kfree_rcu(local_storage, rcu);
|
||||
}
|
||||
|
||||
static void bpf_selem_free_rcu(struct rcu_head *rcu)
|
||||
static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
struct bpf_local_storage *local_storage;
|
||||
|
||||
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
|
||||
bpf_mem_cache_raw_free(local_storage);
|
||||
}
|
||||
|
||||
static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
if (rcu_trace_implies_rcu_gp())
|
||||
bpf_local_storage_free_rcu(rcu);
|
||||
else
|
||||
call_rcu(rcu, bpf_local_storage_free_rcu);
|
||||
}
|
||||
|
||||
/* Handle bpf_ma == false */
|
||||
static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
|
||||
bool vanilla_rcu)
|
||||
{
|
||||
if (vanilla_rcu)
|
||||
kfree_rcu(local_storage, rcu);
|
||||
else
|
||||
call_rcu_tasks_trace(&local_storage->rcu,
|
||||
__bpf_local_storage_free_trace_rcu);
|
||||
}
|
||||
|
||||
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
|
||||
struct bpf_local_storage_map *smap,
|
||||
bool bpf_ma, bool reuse_now)
|
||||
{
|
||||
if (!local_storage)
|
||||
return;
|
||||
|
||||
if (!bpf_ma) {
|
||||
__bpf_local_storage_free(local_storage, reuse_now);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!reuse_now) {
|
||||
call_rcu_tasks_trace(&local_storage->rcu,
|
||||
bpf_local_storage_free_trace_rcu);
|
||||
return;
|
||||
}
|
||||
|
||||
if (smap) {
|
||||
migrate_disable();
|
||||
bpf_mem_cache_free(&smap->storage_ma, local_storage);
|
||||
migrate_enable();
|
||||
} else {
|
||||
/* smap could be NULL if the selem that triggered
|
||||
* this 'local_storage' creation had been long gone.
|
||||
* In this case, directly do call_rcu().
|
||||
*/
|
||||
call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
|
||||
}
|
||||
}
|
||||
|
||||
/* rcu tasks trace callback for bpf_ma == false */
|
||||
static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
struct bpf_local_storage_elem *selem;
|
||||
|
||||
@@ -109,13 +196,63 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
|
||||
kfree_rcu(selem, rcu);
|
||||
}
|
||||
|
||||
/* Handle bpf_ma == false */
|
||||
static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
bool vanilla_rcu)
|
||||
{
|
||||
if (vanilla_rcu)
|
||||
kfree_rcu(selem, rcu);
|
||||
else
|
||||
call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
|
||||
}
|
||||
|
||||
static void bpf_selem_free_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
struct bpf_local_storage_elem *selem;
|
||||
|
||||
selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
|
||||
bpf_mem_cache_raw_free(selem);
|
||||
}
|
||||
|
||||
static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
if (rcu_trace_implies_rcu_gp())
|
||||
bpf_selem_free_rcu(rcu);
|
||||
else
|
||||
call_rcu(rcu, bpf_selem_free_rcu);
|
||||
}
|
||||
|
||||
void bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
struct bpf_local_storage_map *smap,
|
||||
bool reuse_now)
|
||||
{
|
||||
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
|
||||
|
||||
if (!smap->bpf_ma) {
|
||||
__bpf_selem_free(selem, reuse_now);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!reuse_now) {
|
||||
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
|
||||
} else {
|
||||
/* Instead of using the vanilla call_rcu(),
|
||||
* bpf_mem_cache_free will be able to reuse selem
|
||||
* immediately.
|
||||
*/
|
||||
migrate_disable();
|
||||
bpf_mem_cache_free(&smap->selem_ma, selem);
|
||||
migrate_enable();
|
||||
}
|
||||
}
|
||||
|
||||
/* local_storage->lock must be held and selem->local_storage == local_storage.
|
||||
* The caller must ensure selem->smap is still valid to be
|
||||
* dereferenced for its smap->elem_size and smap->cache_idx.
|
||||
*/
|
||||
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
|
||||
struct bpf_local_storage_elem *selem,
|
||||
bool uncharge_mem, bool use_trace_rcu)
|
||||
bool uncharge_mem, bool reuse_now)
|
||||
{
|
||||
struct bpf_local_storage_map *smap;
|
||||
bool free_local_storage;
|
||||
@@ -159,40 +296,75 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
|
||||
SDATA(selem))
|
||||
RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
|
||||
|
||||
if (use_trace_rcu)
|
||||
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
|
||||
else
|
||||
kfree_rcu(selem, rcu);
|
||||
bpf_selem_free(selem, smap, reuse_now);
|
||||
|
||||
if (rcu_access_pointer(local_storage->smap) == smap)
|
||||
RCU_INIT_POINTER(local_storage->smap, NULL);
|
||||
|
||||
return free_local_storage;
|
||||
}
|
||||
|
||||
static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
|
||||
bool use_trace_rcu)
|
||||
static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
|
||||
struct bpf_local_storage_map *storage_smap,
|
||||
struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
|
||||
struct bpf_local_storage_map *selem_smap;
|
||||
|
||||
/* local_storage->smap may be NULL. If it is, get the bpf_ma
|
||||
* from any selem in the local_storage->list. The bpf_ma of all
|
||||
* local_storage and selem should have the same value
|
||||
* for the same map type.
|
||||
*
|
||||
* If the local_storage->list is already empty, the caller will not
|
||||
* care about the bpf_ma value also because the caller is not
|
||||
* responsibile to free the local_storage.
|
||||
*/
|
||||
|
||||
if (storage_smap)
|
||||
return storage_smap->bpf_ma;
|
||||
|
||||
if (!selem) {
|
||||
struct hlist_node *n;
|
||||
|
||||
n = rcu_dereference_check(hlist_first_rcu(&local_storage->list),
|
||||
bpf_rcu_lock_held());
|
||||
if (!n)
|
||||
return false;
|
||||
|
||||
selem = hlist_entry(n, struct bpf_local_storage_elem, snode);
|
||||
}
|
||||
selem_smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
|
||||
|
||||
return selem_smap->bpf_ma;
|
||||
}
|
||||
|
||||
static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
|
||||
bool reuse_now)
|
||||
{
|
||||
struct bpf_local_storage_map *storage_smap;
|
||||
struct bpf_local_storage *local_storage;
|
||||
bool free_local_storage = false;
|
||||
bool bpf_ma, free_local_storage = false;
|
||||
unsigned long flags;
|
||||
|
||||
if (unlikely(!selem_linked_to_storage(selem)))
|
||||
if (unlikely(!selem_linked_to_storage_lockless(selem)))
|
||||
/* selem has already been unlinked from sk */
|
||||
return;
|
||||
|
||||
local_storage = rcu_dereference_check(selem->local_storage,
|
||||
bpf_rcu_lock_held());
|
||||
storage_smap = rcu_dereference_check(local_storage->smap,
|
||||
bpf_rcu_lock_held());
|
||||
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
|
||||
|
||||
raw_spin_lock_irqsave(&local_storage->lock, flags);
|
||||
if (likely(selem_linked_to_storage(selem)))
|
||||
free_local_storage = bpf_selem_unlink_storage_nolock(
|
||||
local_storage, selem, true, use_trace_rcu);
|
||||
local_storage, selem, true, reuse_now);
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
|
||||
if (free_local_storage) {
|
||||
if (use_trace_rcu)
|
||||
call_rcu_tasks_trace(&local_storage->rcu,
|
||||
bpf_local_storage_free_rcu);
|
||||
else
|
||||
kfree_rcu(local_storage, rcu);
|
||||
}
|
||||
if (free_local_storage)
|
||||
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
|
||||
}
|
||||
|
||||
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
|
||||
@@ -202,13 +374,13 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
|
||||
hlist_add_head_rcu(&selem->snode, &local_storage->list);
|
||||
}
|
||||
|
||||
void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
|
||||
static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
|
||||
{
|
||||
struct bpf_local_storage_map *smap;
|
||||
struct bpf_local_storage_map_bucket *b;
|
||||
unsigned long flags;
|
||||
|
||||
if (unlikely(!selem_linked_to_map(selem)))
|
||||
if (unlikely(!selem_linked_to_map_lockless(selem)))
|
||||
/* selem has already be unlinked from smap */
|
||||
return;
|
||||
|
||||
@@ -232,14 +404,14 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
|
||||
raw_spin_unlock_irqrestore(&b->lock, flags);
|
||||
}
|
||||
|
||||
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu)
|
||||
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
|
||||
{
|
||||
/* Always unlink from map before unlinking from local_storage
|
||||
* because selem will be freed after successfully unlinked from
|
||||
* the local_storage.
|
||||
*/
|
||||
bpf_selem_unlink_map(selem);
|
||||
__bpf_selem_unlink_storage(selem, use_trace_rcu);
|
||||
bpf_selem_unlink_storage(selem, reuse_now);
|
||||
}
|
||||
|
||||
/* If cacheit_lockit is false, this lookup function is lockless */
|
||||
@@ -312,13 +484,21 @@ int bpf_local_storage_alloc(void *owner,
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
|
||||
gfp_flags | __GFP_NOWARN);
|
||||
if (smap->bpf_ma) {
|
||||
migrate_disable();
|
||||
storage = bpf_mem_cache_alloc_flags(&smap->storage_ma, gfp_flags);
|
||||
migrate_enable();
|
||||
} else {
|
||||
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
|
||||
gfp_flags | __GFP_NOWARN);
|
||||
}
|
||||
|
||||
if (!storage) {
|
||||
err = -ENOMEM;
|
||||
goto uncharge;
|
||||
}
|
||||
|
||||
RCU_INIT_POINTER(storage->smap, smap);
|
||||
INIT_HLIST_HEAD(&storage->list);
|
||||
raw_spin_lock_init(&storage->lock);
|
||||
storage->owner = owner;
|
||||
@@ -358,7 +538,7 @@ int bpf_local_storage_alloc(void *owner,
|
||||
return 0;
|
||||
|
||||
uncharge:
|
||||
kfree(storage);
|
||||
bpf_local_storage_free(storage, smap, smap->bpf_ma, true);
|
||||
mem_uncharge(smap, owner, sizeof(*storage));
|
||||
return err;
|
||||
}
|
||||
@@ -402,7 +582,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
|
||||
err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
|
||||
if (err) {
|
||||
kfree(selem);
|
||||
bpf_selem_free(selem, smap, true);
|
||||
mem_uncharge(smap, owner, smap->elem_size);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
@@ -420,7 +600,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
err = check_flags(old_sdata, map_flags);
|
||||
if (err)
|
||||
return ERR_PTR(err);
|
||||
if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
|
||||
if (old_sdata && selem_linked_to_storage_lockless(SELEM(old_sdata))) {
|
||||
copy_map_value_locked(&smap->map, old_sdata->data,
|
||||
value, false);
|
||||
return old_sdata;
|
||||
@@ -485,7 +665,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
if (old_sdata) {
|
||||
bpf_selem_unlink_map(SELEM(old_sdata));
|
||||
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
|
||||
false, true);
|
||||
false, false);
|
||||
}
|
||||
|
||||
unlock:
|
||||
@@ -496,7 +676,7 @@ unlock_err:
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
if (selem) {
|
||||
mem_uncharge(smap, owner, smap->elem_size);
|
||||
kfree(selem);
|
||||
bpf_selem_free(selem, smap, true);
|
||||
}
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
@@ -552,40 +732,6 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct bpf_local_storage_map *__bpf_local_storage_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
struct bpf_local_storage_map *smap;
|
||||
unsigned int i;
|
||||
u32 nbuckets;
|
||||
|
||||
smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
|
||||
if (!smap)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
bpf_map_init_from_attr(&smap->map, attr);
|
||||
|
||||
nbuckets = roundup_pow_of_two(num_possible_cpus());
|
||||
/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
|
||||
nbuckets = max_t(u32, 2, nbuckets);
|
||||
smap->bucket_log = ilog2(nbuckets);
|
||||
|
||||
smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
|
||||
nbuckets, GFP_USER | __GFP_NOWARN);
|
||||
if (!smap->buckets) {
|
||||
bpf_map_area_free(smap);
|
||||
return ERR_PTR(-ENOMEM);
|
||||
}
|
||||
|
||||
for (i = 0; i < nbuckets; i++) {
|
||||
INIT_HLIST_HEAD(&smap->buckets[i].list);
|
||||
raw_spin_lock_init(&smap->buckets[i].lock);
|
||||
}
|
||||
|
||||
smap->elem_size = offsetof(struct bpf_local_storage_elem,
|
||||
sdata.data[attr->value_size]);
|
||||
|
||||
return smap;
|
||||
}
|
||||
|
||||
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
|
||||
const struct btf *btf,
|
||||
const struct btf_type *key_type,
|
||||
@@ -603,11 +749,16 @@ int bpf_local_storage_map_check_btf(const struct bpf_map *map,
|
||||
return 0;
|
||||
}
|
||||
|
||||
bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
|
||||
void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
|
||||
{
|
||||
struct bpf_local_storage_map *storage_smap;
|
||||
struct bpf_local_storage_elem *selem;
|
||||
bool free_storage = false;
|
||||
bool bpf_ma, free_storage = false;
|
||||
struct hlist_node *n;
|
||||
unsigned long flags;
|
||||
|
||||
storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
|
||||
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
|
||||
|
||||
/* Neither the bpf_prog nor the bpf_map's syscall
|
||||
* could be modifying the local_storage->list now.
|
||||
@@ -618,6 +769,7 @@ bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
|
||||
* when unlinking elem from the local_storage->list and
|
||||
* the map's bucket->list.
|
||||
*/
|
||||
raw_spin_lock_irqsave(&local_storage->lock, flags);
|
||||
hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
|
||||
/* Always unlink from map before unlinking from
|
||||
* local_storage.
|
||||
@@ -630,24 +782,89 @@ bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
|
||||
* of the loop will set the free_cgroup_storage to true.
|
||||
*/
|
||||
free_storage = bpf_selem_unlink_storage_nolock(
|
||||
local_storage, selem, false, false);
|
||||
local_storage, selem, false, true);
|
||||
}
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
|
||||
return free_storage;
|
||||
if (free_storage)
|
||||
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
|
||||
}
|
||||
|
||||
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_local_storage_map *smap = (struct bpf_local_storage_map *)map;
|
||||
u64 usage = sizeof(*smap);
|
||||
|
||||
/* The dynamically callocated selems are not counted currently. */
|
||||
usage += sizeof(*smap->buckets) * (1ULL << smap->bucket_log);
|
||||
return usage;
|
||||
}
|
||||
|
||||
/* When bpf_ma == true, the bpf_mem_alloc is used to allocate and free memory.
|
||||
* A deadlock free allocator is useful for storage that the bpf prog can easily
|
||||
* get a hold of the owner PTR_TO_BTF_ID in any context. eg. bpf_get_current_task_btf.
|
||||
* The task and cgroup storage fall into this case. The bpf_mem_alloc reuses
|
||||
* memory immediately. To be reuse-immediate safe, the owner destruction
|
||||
* code path needs to go through a rcu grace period before calling
|
||||
* bpf_local_storage_destroy().
|
||||
*
|
||||
* When bpf_ma == false, the kmalloc and kfree are used.
|
||||
*/
|
||||
struct bpf_map *
|
||||
bpf_local_storage_map_alloc(union bpf_attr *attr,
|
||||
struct bpf_local_storage_cache *cache)
|
||||
struct bpf_local_storage_cache *cache,
|
||||
bool bpf_ma)
|
||||
{
|
||||
struct bpf_local_storage_map *smap;
|
||||
unsigned int i;
|
||||
u32 nbuckets;
|
||||
int err;
|
||||
|
||||
smap = __bpf_local_storage_map_alloc(attr);
|
||||
if (IS_ERR(smap))
|
||||
return ERR_CAST(smap);
|
||||
smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
|
||||
if (!smap)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
bpf_map_init_from_attr(&smap->map, attr);
|
||||
|
||||
nbuckets = roundup_pow_of_two(num_possible_cpus());
|
||||
/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
|
||||
nbuckets = max_t(u32, 2, nbuckets);
|
||||
smap->bucket_log = ilog2(nbuckets);
|
||||
|
||||
smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
|
||||
nbuckets, GFP_USER | __GFP_NOWARN);
|
||||
if (!smap->buckets) {
|
||||
err = -ENOMEM;
|
||||
goto free_smap;
|
||||
}
|
||||
|
||||
for (i = 0; i < nbuckets; i++) {
|
||||
INIT_HLIST_HEAD(&smap->buckets[i].list);
|
||||
raw_spin_lock_init(&smap->buckets[i].lock);
|
||||
}
|
||||
|
||||
smap->elem_size = offsetof(struct bpf_local_storage_elem,
|
||||
sdata.data[attr->value_size]);
|
||||
|
||||
smap->bpf_ma = bpf_ma;
|
||||
if (bpf_ma) {
|
||||
err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
|
||||
if (err)
|
||||
goto free_smap;
|
||||
|
||||
err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
|
||||
if (err) {
|
||||
bpf_mem_alloc_destroy(&smap->selem_ma);
|
||||
goto free_smap;
|
||||
}
|
||||
}
|
||||
|
||||
smap->cache_idx = bpf_local_storage_cache_idx_get(cache);
|
||||
return &smap->map;
|
||||
|
||||
free_smap:
|
||||
kvfree(smap->buckets);
|
||||
bpf_map_area_free(smap);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
void bpf_local_storage_map_free(struct bpf_map *map,
|
||||
@@ -689,7 +906,7 @@ void bpf_local_storage_map_free(struct bpf_map *map,
|
||||
migrate_disable();
|
||||
this_cpu_inc(*busy_counter);
|
||||
}
|
||||
bpf_selem_unlink(selem, false);
|
||||
bpf_selem_unlink(selem, true);
|
||||
if (busy_counter) {
|
||||
this_cpu_dec(*busy_counter);
|
||||
migrate_enable();
|
||||
@@ -713,6 +930,10 @@ void bpf_local_storage_map_free(struct bpf_map *map,
|
||||
*/
|
||||
synchronize_rcu();
|
||||
|
||||
if (smap->bpf_ma) {
|
||||
bpf_mem_alloc_destroy(&smap->selem_ma);
|
||||
bpf_mem_alloc_destroy(&smap->storage_ma);
|
||||
}
|
||||
kvfree(smap->buckets);
|
||||
bpf_map_area_free(smap);
|
||||
}
|
||||
|
||||
@@ -11,11 +11,13 @@
|
||||
#include <linux/refcount.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/btf_ids.h>
|
||||
#include <linux/rcupdate_wait.h>
|
||||
|
||||
enum bpf_struct_ops_state {
|
||||
BPF_STRUCT_OPS_STATE_INIT,
|
||||
BPF_STRUCT_OPS_STATE_INUSE,
|
||||
BPF_STRUCT_OPS_STATE_TOBEFREE,
|
||||
BPF_STRUCT_OPS_STATE_READY,
|
||||
};
|
||||
|
||||
#define BPF_STRUCT_OPS_COMMON_VALUE \
|
||||
@@ -58,6 +60,13 @@ struct bpf_struct_ops_map {
|
||||
struct bpf_struct_ops_value kvalue;
|
||||
};
|
||||
|
||||
struct bpf_struct_ops_link {
|
||||
struct bpf_link link;
|
||||
struct bpf_map __rcu *map;
|
||||
};
|
||||
|
||||
static DEFINE_MUTEX(update_mutex);
|
||||
|
||||
#define VALUE_PREFIX "bpf_struct_ops_"
|
||||
#define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1)
|
||||
|
||||
@@ -249,6 +258,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
|
||||
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
|
||||
struct bpf_struct_ops_value *uvalue, *kvalue;
|
||||
enum bpf_struct_ops_state state;
|
||||
s64 refcnt;
|
||||
|
||||
if (unlikely(*(u32 *)key != 0))
|
||||
return -ENOENT;
|
||||
@@ -267,7 +277,14 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
|
||||
uvalue = value;
|
||||
memcpy(uvalue, st_map->uvalue, map->value_size);
|
||||
uvalue->state = state;
|
||||
refcount_set(&uvalue->refcnt, refcount_read(&kvalue->refcnt));
|
||||
|
||||
/* This value offers the user space a general estimate of how
|
||||
* many sockets are still utilizing this struct_ops for TCP
|
||||
* congestion control. The number might not be exact, but it
|
||||
* should sufficiently meet our present goals.
|
||||
*/
|
||||
refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt);
|
||||
refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0));
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -349,8 +366,8 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
|
||||
model, flags, tlinks, NULL);
|
||||
}
|
||||
|
||||
static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
|
||||
const struct bpf_struct_ops *st_ops = st_map->st_ops;
|
||||
@@ -491,12 +508,29 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
*(unsigned long *)(udata + moff) = prog->aux->id;
|
||||
}
|
||||
|
||||
refcount_set(&kvalue->refcnt, 1);
|
||||
bpf_map_inc(map);
|
||||
if (st_map->map.map_flags & BPF_F_LINK) {
|
||||
err = st_ops->validate(kdata);
|
||||
if (err)
|
||||
goto reset_unlock;
|
||||
set_memory_rox((long)st_map->image, 1);
|
||||
/* Let bpf_link handle registration & unregistration.
|
||||
*
|
||||
* Pair with smp_load_acquire() during lookup_elem().
|
||||
*/
|
||||
smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY);
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
set_memory_rox((long)st_map->image, 1);
|
||||
err = st_ops->reg(kdata);
|
||||
if (likely(!err)) {
|
||||
/* This refcnt increment on the map here after
|
||||
* 'st_ops->reg()' is secure since the state of the
|
||||
* map must be set to INIT at this moment, and thus
|
||||
* bpf_struct_ops_map_delete_elem() can't unregister
|
||||
* or transition it to TOBEFREE concurrently.
|
||||
*/
|
||||
bpf_map_inc(map);
|
||||
/* Pair with smp_load_acquire() during lookup_elem().
|
||||
* It ensures the above udata updates (e.g. prog->aux->id)
|
||||
* can be seen once BPF_STRUCT_OPS_STATE_INUSE is set.
|
||||
@@ -512,7 +546,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
*/
|
||||
set_memory_nx((long)st_map->image, 1);
|
||||
set_memory_rw((long)st_map->image, 1);
|
||||
bpf_map_put(map);
|
||||
|
||||
reset_unlock:
|
||||
bpf_struct_ops_map_put_progs(st_map);
|
||||
@@ -524,20 +557,22 @@ unlock:
|
||||
return err;
|
||||
}
|
||||
|
||||
static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
enum bpf_struct_ops_state prev_state;
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
|
||||
st_map = (struct bpf_struct_ops_map *)map;
|
||||
if (st_map->map.map_flags & BPF_F_LINK)
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
prev_state = cmpxchg(&st_map->kvalue.state,
|
||||
BPF_STRUCT_OPS_STATE_INUSE,
|
||||
BPF_STRUCT_OPS_STATE_TOBEFREE);
|
||||
switch (prev_state) {
|
||||
case BPF_STRUCT_OPS_STATE_INUSE:
|
||||
st_map->st_ops->unreg(&st_map->kvalue.data);
|
||||
if (refcount_dec_and_test(&st_map->kvalue.refcnt))
|
||||
bpf_map_put(map);
|
||||
bpf_map_put(map);
|
||||
return 0;
|
||||
case BPF_STRUCT_OPS_STATE_TOBEFREE:
|
||||
return -EINPROGRESS;
|
||||
@@ -570,7 +605,7 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key,
|
||||
kfree(value);
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
static void __bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
|
||||
|
||||
@@ -582,10 +617,32 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
bpf_map_area_free(st_map);
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
{
|
||||
/* The struct_ops's function may switch to another struct_ops.
|
||||
*
|
||||
* For example, bpf_tcp_cc_x->init() may switch to
|
||||
* another tcp_cc_y by calling
|
||||
* setsockopt(TCP_CONGESTION, "tcp_cc_y").
|
||||
* During the switch, bpf_struct_ops_put(tcp_cc_x) is called
|
||||
* and its refcount may reach 0 which then free its
|
||||
* trampoline image while tcp_cc_x is still running.
|
||||
*
|
||||
* A vanilla rcu gp is to wait for all bpf-tcp-cc prog
|
||||
* to finish. bpf-tcp-cc prog is non sleepable.
|
||||
* A rcu_tasks gp is to wait for the last few insn
|
||||
* in the tramopline image to finish before releasing
|
||||
* the trampoline image.
|
||||
*/
|
||||
synchronize_rcu_mult(call_rcu, call_rcu_tasks);
|
||||
|
||||
__bpf_struct_ops_map_free(map);
|
||||
}
|
||||
|
||||
static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
|
||||
{
|
||||
if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 ||
|
||||
attr->map_flags || !attr->btf_vmlinux_value_type_id)
|
||||
(attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id)
|
||||
return -EINVAL;
|
||||
return 0;
|
||||
}
|
||||
@@ -609,6 +666,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
|
||||
if (attr->value_size != vt->size)
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
if (attr->map_flags & BPF_F_LINK && (!st_ops->validate || !st_ops->update))
|
||||
return ERR_PTR(-EOPNOTSUPP);
|
||||
|
||||
t = st_ops->type;
|
||||
|
||||
st_map_size = sizeof(*st_map) +
|
||||
@@ -630,7 +690,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
|
||||
NUMA_NO_NODE);
|
||||
st_map->image = bpf_jit_alloc_exec(PAGE_SIZE);
|
||||
if (!st_map->uvalue || !st_map->links || !st_map->image) {
|
||||
bpf_struct_ops_map_free(map);
|
||||
__bpf_struct_ops_map_free(map);
|
||||
return ERR_PTR(-ENOMEM);
|
||||
}
|
||||
|
||||
@@ -641,6 +701,21 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
|
||||
return map;
|
||||
}
|
||||
|
||||
static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
|
||||
const struct bpf_struct_ops *st_ops = st_map->st_ops;
|
||||
const struct btf_type *vt = st_ops->value_type;
|
||||
u64 usage;
|
||||
|
||||
usage = sizeof(*st_map) +
|
||||
vt->size - sizeof(struct bpf_struct_ops_value);
|
||||
usage += vt->size;
|
||||
usage += btf_type_vlen(vt) * sizeof(struct bpf_links *);
|
||||
usage += PAGE_SIZE;
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(bpf_struct_ops_map_btf_ids, struct, bpf_struct_ops_map)
|
||||
const struct bpf_map_ops bpf_struct_ops_map_ops = {
|
||||
.map_alloc_check = bpf_struct_ops_map_alloc_check,
|
||||
@@ -651,6 +726,7 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
|
||||
.map_delete_elem = bpf_struct_ops_map_delete_elem,
|
||||
.map_update_elem = bpf_struct_ops_map_update_elem,
|
||||
.map_seq_show_elem = bpf_struct_ops_map_seq_show_elem,
|
||||
.map_mem_usage = bpf_struct_ops_map_mem_usage,
|
||||
.map_btf_id = &bpf_struct_ops_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -660,41 +736,175 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
|
||||
bool bpf_struct_ops_get(const void *kdata)
|
||||
{
|
||||
struct bpf_struct_ops_value *kvalue;
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
struct bpf_map *map;
|
||||
|
||||
kvalue = container_of(kdata, struct bpf_struct_ops_value, data);
|
||||
st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue);
|
||||
|
||||
return refcount_inc_not_zero(&kvalue->refcnt);
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_put_rcu(struct rcu_head *head)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
|
||||
st_map = container_of(head, struct bpf_struct_ops_map, rcu);
|
||||
bpf_map_put(&st_map->map);
|
||||
map = __bpf_map_inc_not_zero(&st_map->map, false);
|
||||
return !IS_ERR(map);
|
||||
}
|
||||
|
||||
void bpf_struct_ops_put(const void *kdata)
|
||||
{
|
||||
struct bpf_struct_ops_value *kvalue;
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
|
||||
kvalue = container_of(kdata, struct bpf_struct_ops_value, data);
|
||||
if (refcount_dec_and_test(&kvalue->refcnt)) {
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue);
|
||||
|
||||
st_map = container_of(kvalue, struct bpf_struct_ops_map,
|
||||
kvalue);
|
||||
/* The struct_ops's function may switch to another struct_ops.
|
||||
*
|
||||
* For example, bpf_tcp_cc_x->init() may switch to
|
||||
* another tcp_cc_y by calling
|
||||
* setsockopt(TCP_CONGESTION, "tcp_cc_y").
|
||||
* During the switch, bpf_struct_ops_put(tcp_cc_x) is called
|
||||
* and its map->refcnt may reach 0 which then free its
|
||||
* trampoline image while tcp_cc_x is still running.
|
||||
*
|
||||
* Thus, a rcu grace period is needed here.
|
||||
*/
|
||||
call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu);
|
||||
}
|
||||
bpf_map_put(&st_map->map);
|
||||
}
|
||||
|
||||
static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
|
||||
|
||||
return map->map_type == BPF_MAP_TYPE_STRUCT_OPS &&
|
||||
map->map_flags & BPF_F_LINK &&
|
||||
/* Pair with smp_store_release() during map_update */
|
||||
smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY;
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link)
|
||||
{
|
||||
struct bpf_struct_ops_link *st_link;
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
|
||||
st_link = container_of(link, struct bpf_struct_ops_link, link);
|
||||
st_map = (struct bpf_struct_ops_map *)
|
||||
rcu_dereference_protected(st_link->map, true);
|
||||
if (st_map) {
|
||||
/* st_link->map can be NULL if
|
||||
* bpf_struct_ops_link_create() fails to register.
|
||||
*/
|
||||
st_map->st_ops->unreg(&st_map->kvalue.data);
|
||||
bpf_map_put(&st_map->map);
|
||||
}
|
||||
kfree(st_link);
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_link_show_fdinfo(const struct bpf_link *link,
|
||||
struct seq_file *seq)
|
||||
{
|
||||
struct bpf_struct_ops_link *st_link;
|
||||
struct bpf_map *map;
|
||||
|
||||
st_link = container_of(link, struct bpf_struct_ops_link, link);
|
||||
rcu_read_lock();
|
||||
map = rcu_dereference(st_link->map);
|
||||
seq_printf(seq, "map_id:\t%d\n", map->id);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link,
|
||||
struct bpf_link_info *info)
|
||||
{
|
||||
struct bpf_struct_ops_link *st_link;
|
||||
struct bpf_map *map;
|
||||
|
||||
st_link = container_of(link, struct bpf_struct_ops_link, link);
|
||||
rcu_read_lock();
|
||||
map = rcu_dereference(st_link->map);
|
||||
info->struct_ops.map_id = map->id;
|
||||
rcu_read_unlock();
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map *new_map,
|
||||
struct bpf_map *expected_old_map)
|
||||
{
|
||||
struct bpf_struct_ops_map *st_map, *old_st_map;
|
||||
struct bpf_map *old_map;
|
||||
struct bpf_struct_ops_link *st_link;
|
||||
int err = 0;
|
||||
|
||||
st_link = container_of(link, struct bpf_struct_ops_link, link);
|
||||
st_map = container_of(new_map, struct bpf_struct_ops_map, map);
|
||||
|
||||
if (!bpf_struct_ops_valid_to_reg(new_map))
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&update_mutex);
|
||||
|
||||
old_map = rcu_dereference_protected(st_link->map, lockdep_is_held(&update_mutex));
|
||||
if (expected_old_map && old_map != expected_old_map) {
|
||||
err = -EPERM;
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
old_st_map = container_of(old_map, struct bpf_struct_ops_map, map);
|
||||
/* The new and old struct_ops must be the same type. */
|
||||
if (st_map->st_ops != old_st_map->st_ops) {
|
||||
err = -EINVAL;
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data);
|
||||
if (err)
|
||||
goto err_out;
|
||||
|
||||
bpf_map_inc(new_map);
|
||||
rcu_assign_pointer(st_link->map, new_map);
|
||||
bpf_map_put(old_map);
|
||||
|
||||
err_out:
|
||||
mutex_unlock(&update_mutex);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static const struct bpf_link_ops bpf_struct_ops_map_lops = {
|
||||
.dealloc = bpf_struct_ops_map_link_dealloc,
|
||||
.show_fdinfo = bpf_struct_ops_map_link_show_fdinfo,
|
||||
.fill_link_info = bpf_struct_ops_map_link_fill_link_info,
|
||||
.update_map = bpf_struct_ops_map_link_update,
|
||||
};
|
||||
|
||||
int bpf_struct_ops_link_create(union bpf_attr *attr)
|
||||
{
|
||||
struct bpf_struct_ops_link *link = NULL;
|
||||
struct bpf_link_primer link_primer;
|
||||
struct bpf_struct_ops_map *st_map;
|
||||
struct bpf_map *map;
|
||||
int err;
|
||||
|
||||
map = bpf_map_get(attr->link_create.map_fd);
|
||||
if (IS_ERR(map))
|
||||
return PTR_ERR(map);
|
||||
|
||||
st_map = (struct bpf_struct_ops_map *)map;
|
||||
|
||||
if (!bpf_struct_ops_valid_to_reg(map)) {
|
||||
err = -EINVAL;
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
link = kzalloc(sizeof(*link), GFP_USER);
|
||||
if (!link) {
|
||||
err = -ENOMEM;
|
||||
goto err_out;
|
||||
}
|
||||
bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_map_lops, NULL);
|
||||
|
||||
err = bpf_link_prime(&link->link, &link_primer);
|
||||
if (err)
|
||||
goto err_out;
|
||||
|
||||
err = st_map->st_ops->reg(st_map->kvalue.data);
|
||||
if (err) {
|
||||
bpf_link_cleanup(&link_primer);
|
||||
link = NULL;
|
||||
goto err_out;
|
||||
}
|
||||
RCU_INIT_POINTER(link->map, map);
|
||||
|
||||
return bpf_link_settle(&link_primer);
|
||||
|
||||
err_out:
|
||||
bpf_map_put(map);
|
||||
kfree(link);
|
||||
return err;
|
||||
}
|
||||
|
||||
|
||||
@@ -72,8 +72,6 @@ task_storage_lookup(struct task_struct *task, struct bpf_map *map,
|
||||
void bpf_task_storage_free(struct task_struct *task)
|
||||
{
|
||||
struct bpf_local_storage *local_storage;
|
||||
bool free_task_storage = false;
|
||||
unsigned long flags;
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
@@ -84,14 +82,9 @@ void bpf_task_storage_free(struct task_struct *task)
|
||||
}
|
||||
|
||||
bpf_task_storage_lock();
|
||||
raw_spin_lock_irqsave(&local_storage->lock, flags);
|
||||
free_task_storage = bpf_local_storage_unlink_nolock(local_storage);
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
bpf_local_storage_destroy(local_storage);
|
||||
bpf_task_storage_unlock();
|
||||
rcu_read_unlock();
|
||||
|
||||
if (free_task_storage)
|
||||
kfree_rcu(local_storage, rcu);
|
||||
}
|
||||
|
||||
static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
|
||||
@@ -127,8 +120,8 @@ out:
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_local_storage_data *sdata;
|
||||
struct task_struct *task;
|
||||
@@ -175,12 +168,12 @@ static int task_storage_delete(struct task_struct *task, struct bpf_map *map,
|
||||
if (!nobusy)
|
||||
return -EBUSY;
|
||||
|
||||
bpf_selem_unlink(SELEM(sdata), true);
|
||||
bpf_selem_unlink(SELEM(sdata), false);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
static long bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct task_struct *task;
|
||||
unsigned int f_flags;
|
||||
@@ -316,7 +309,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
|
||||
|
||||
static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
return bpf_local_storage_map_alloc(attr, &task_cache);
|
||||
return bpf_local_storage_map_alloc(attr, &task_cache, true);
|
||||
}
|
||||
|
||||
static void task_storage_map_free(struct bpf_map *map)
|
||||
@@ -335,6 +328,7 @@ const struct bpf_map_ops task_storage_map_ops = {
|
||||
.map_update_elem = bpf_pid_task_storage_update_elem,
|
||||
.map_delete_elem = bpf_pid_task_storage_delete_elem,
|
||||
.map_check_btf = bpf_local_storage_map_check_btf,
|
||||
.map_mem_usage = bpf_local_storage_map_mem_usage,
|
||||
.map_btf_id = &bpf_local_storage_map_btf_id[0],
|
||||
.map_owner_storage_ptr = task_storage_ptr,
|
||||
};
|
||||
@@ -344,7 +338,7 @@ const struct bpf_func_proto bpf_task_storage_get_recur_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
|
||||
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
@@ -355,7 +349,7 @@ const struct bpf_func_proto bpf_task_storage_get_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
|
||||
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
@@ -366,7 +360,7 @@ const struct bpf_func_proto bpf_task_storage_delete_recur_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
|
||||
};
|
||||
|
||||
@@ -375,6 +369,6 @@ const struct bpf_func_proto bpf_task_storage_delete_proto = {
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID,
|
||||
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
|
||||
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
|
||||
};
|
||||
|
||||
475
kernel/bpf/btf.c
475
kernel/bpf/btf.c
@@ -25,6 +25,9 @@
|
||||
#include <linux/bsearch.h>
|
||||
#include <linux/kobject.h>
|
||||
#include <linux/sysfs.h>
|
||||
|
||||
#include <net/netfilter/nf_bpf_link.h>
|
||||
|
||||
#include <net/sock.h>
|
||||
#include "../tools/lib/bpf/relo_core.h"
|
||||
|
||||
@@ -207,6 +210,12 @@ enum btf_kfunc_hook {
|
||||
BTF_KFUNC_HOOK_TRACING,
|
||||
BTF_KFUNC_HOOK_SYSCALL,
|
||||
BTF_KFUNC_HOOK_FMODRET,
|
||||
BTF_KFUNC_HOOK_CGROUP_SKB,
|
||||
BTF_KFUNC_HOOK_SCHED_ACT,
|
||||
BTF_KFUNC_HOOK_SK_SKB,
|
||||
BTF_KFUNC_HOOK_SOCKET_FILTER,
|
||||
BTF_KFUNC_HOOK_LWT,
|
||||
BTF_KFUNC_HOOK_NETFILTER,
|
||||
BTF_KFUNC_HOOK_MAX,
|
||||
};
|
||||
|
||||
@@ -572,8 +581,8 @@ static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
|
||||
*btf_p = btf;
|
||||
return ret;
|
||||
}
|
||||
spin_lock_bh(&btf_idr_lock);
|
||||
btf_put(btf);
|
||||
spin_lock_bh(&btf_idr_lock);
|
||||
}
|
||||
spin_unlock_bh(&btf_idr_lock);
|
||||
return ret;
|
||||
@@ -1661,10 +1670,8 @@ static void btf_struct_metas_free(struct btf_struct_metas *tab)
|
||||
|
||||
if (!tab)
|
||||
return;
|
||||
for (i = 0; i < tab->cnt; i++) {
|
||||
for (i = 0; i < tab->cnt; i++)
|
||||
btf_record_free(tab->types[i].record);
|
||||
kfree(tab->types[i].field_offs);
|
||||
}
|
||||
kfree(tab);
|
||||
}
|
||||
|
||||
@@ -3226,12 +3233,6 @@ static void btf_struct_log(struct btf_verifier_env *env,
|
||||
btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
|
||||
}
|
||||
|
||||
enum btf_field_info_type {
|
||||
BTF_FIELD_SPIN_LOCK,
|
||||
BTF_FIELD_TIMER,
|
||||
BTF_FIELD_KPTR,
|
||||
};
|
||||
|
||||
enum {
|
||||
BTF_FIELD_IGNORE = 0,
|
||||
BTF_FIELD_FOUND = 1,
|
||||
@@ -3283,9 +3284,9 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
|
||||
/* Reject extra tags */
|
||||
if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
|
||||
return -EINVAL;
|
||||
if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
|
||||
if (!strcmp("kptr_untrusted", __btf_name_by_offset(btf, t->name_off)))
|
||||
type = BPF_KPTR_UNREF;
|
||||
else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
|
||||
else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
|
||||
type = BPF_KPTR_REF;
|
||||
else
|
||||
return -EINVAL;
|
||||
@@ -3394,6 +3395,7 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
|
||||
field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
|
||||
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
|
||||
field_mask_test_name(BPF_RB_NODE, "bpf_rb_node");
|
||||
field_mask_test_name(BPF_REFCOUNT, "bpf_refcount");
|
||||
|
||||
/* Only return BPF_KPTR when all other types with matchable names fail */
|
||||
if (field_mask & BPF_KPTR) {
|
||||
@@ -3442,6 +3444,7 @@ static int btf_find_struct_field(const struct btf *btf,
|
||||
case BPF_TIMER:
|
||||
case BPF_LIST_NODE:
|
||||
case BPF_RB_NODE:
|
||||
case BPF_REFCOUNT:
|
||||
ret = btf_find_struct(btf, member_type, off, sz, field_type,
|
||||
idx < info_cnt ? &info[idx] : &tmp);
|
||||
if (ret < 0)
|
||||
@@ -3507,6 +3510,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
|
||||
case BPF_TIMER:
|
||||
case BPF_LIST_NODE:
|
||||
case BPF_RB_NODE:
|
||||
case BPF_REFCOUNT:
|
||||
ret = btf_find_struct(btf, var_type, off, sz, field_type,
|
||||
idx < info_cnt ? &info[idx] : &tmp);
|
||||
if (ret < 0)
|
||||
@@ -3557,7 +3561,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
|
||||
{
|
||||
struct module *mod = NULL;
|
||||
const struct btf_type *t;
|
||||
struct btf *kernel_btf;
|
||||
/* If a matching btf type is found in kernel or module BTFs, kptr_ref
|
||||
* is that BTF, otherwise it's program BTF
|
||||
*/
|
||||
struct btf *kptr_btf;
|
||||
int ret;
|
||||
s32 id;
|
||||
|
||||
@@ -3566,7 +3573,20 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
|
||||
*/
|
||||
t = btf_type_by_id(btf, info->kptr.type_id);
|
||||
id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
|
||||
&kernel_btf);
|
||||
&kptr_btf);
|
||||
if (id == -ENOENT) {
|
||||
/* btf_parse_kptr should only be called w/ btf = program BTF */
|
||||
WARN_ON_ONCE(btf_is_kernel(btf));
|
||||
|
||||
/* Type exists only in program BTF. Assume that it's a MEM_ALLOC
|
||||
* kptr allocated via bpf_obj_new
|
||||
*/
|
||||
field->kptr.dtor = NULL;
|
||||
id = info->kptr.type_id;
|
||||
kptr_btf = (struct btf *)btf;
|
||||
btf_get(kptr_btf);
|
||||
goto found_dtor;
|
||||
}
|
||||
if (id < 0)
|
||||
return id;
|
||||
|
||||
@@ -3583,20 +3603,20 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
|
||||
* can be used as a referenced pointer and be stored in a map at
|
||||
* the same time.
|
||||
*/
|
||||
dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
|
||||
dtor_btf_id = btf_find_dtor_kfunc(kptr_btf, id);
|
||||
if (dtor_btf_id < 0) {
|
||||
ret = dtor_btf_id;
|
||||
goto end_btf;
|
||||
}
|
||||
|
||||
dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
|
||||
dtor_func = btf_type_by_id(kptr_btf, dtor_btf_id);
|
||||
if (!dtor_func) {
|
||||
ret = -ENOENT;
|
||||
goto end_btf;
|
||||
}
|
||||
|
||||
if (btf_is_module(kernel_btf)) {
|
||||
mod = btf_try_get_module(kernel_btf);
|
||||
if (btf_is_module(kptr_btf)) {
|
||||
mod = btf_try_get_module(kptr_btf);
|
||||
if (!mod) {
|
||||
ret = -ENXIO;
|
||||
goto end_btf;
|
||||
@@ -3606,7 +3626,7 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
|
||||
/* We already verified dtor_func to be btf_type_is_func
|
||||
* in register_btf_id_dtor_kfuncs.
|
||||
*/
|
||||
dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
|
||||
dtor_func_name = __btf_name_by_offset(kptr_btf, dtor_func->name_off);
|
||||
addr = kallsyms_lookup_name(dtor_func_name);
|
||||
if (!addr) {
|
||||
ret = -EINVAL;
|
||||
@@ -3615,14 +3635,15 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
|
||||
field->kptr.dtor = (void *)addr;
|
||||
}
|
||||
|
||||
found_dtor:
|
||||
field->kptr.btf_id = id;
|
||||
field->kptr.btf = kernel_btf;
|
||||
field->kptr.btf = kptr_btf;
|
||||
field->kptr.module = mod;
|
||||
return 0;
|
||||
end_mod:
|
||||
module_put(mod);
|
||||
end_btf:
|
||||
btf_put(kernel_btf);
|
||||
btf_put(kptr_btf);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@@ -3684,12 +3705,24 @@ static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field,
|
||||
__alignof__(struct bpf_rb_node));
|
||||
}
|
||||
|
||||
static int btf_field_cmp(const void *_a, const void *_b, const void *priv)
|
||||
{
|
||||
const struct btf_field *a = (const struct btf_field *)_a;
|
||||
const struct btf_field *b = (const struct btf_field *)_b;
|
||||
|
||||
if (a->offset < b->offset)
|
||||
return -1;
|
||||
else if (a->offset > b->offset)
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
|
||||
u32 field_mask, u32 value_size)
|
||||
{
|
||||
struct btf_field_info info_arr[BTF_FIELDS_MAX];
|
||||
u32 next_off = 0, field_type_size;
|
||||
struct btf_record *rec;
|
||||
u32 next_off = 0;
|
||||
int ret, i, cnt;
|
||||
|
||||
ret = btf_find_field(btf, t, field_mask, info_arr, ARRAY_SIZE(info_arr));
|
||||
@@ -3708,8 +3741,10 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
|
||||
|
||||
rec->spin_lock_off = -EINVAL;
|
||||
rec->timer_off = -EINVAL;
|
||||
rec->refcount_off = -EINVAL;
|
||||
for (i = 0; i < cnt; i++) {
|
||||
if (info_arr[i].off + btf_field_type_size(info_arr[i].type) > value_size) {
|
||||
field_type_size = btf_field_type_size(info_arr[i].type);
|
||||
if (info_arr[i].off + field_type_size > value_size) {
|
||||
WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, value_size);
|
||||
ret = -EFAULT;
|
||||
goto end;
|
||||
@@ -3718,11 +3753,12 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
|
||||
ret = -EEXIST;
|
||||
goto end;
|
||||
}
|
||||
next_off = info_arr[i].off + btf_field_type_size(info_arr[i].type);
|
||||
next_off = info_arr[i].off + field_type_size;
|
||||
|
||||
rec->field_mask |= info_arr[i].type;
|
||||
rec->fields[i].offset = info_arr[i].off;
|
||||
rec->fields[i].type = info_arr[i].type;
|
||||
rec->fields[i].size = field_type_size;
|
||||
|
||||
switch (info_arr[i].type) {
|
||||
case BPF_SPIN_LOCK:
|
||||
@@ -3735,6 +3771,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
|
||||
/* Cache offset for faster lookup at runtime */
|
||||
rec->timer_off = rec->fields[i].offset;
|
||||
break;
|
||||
case BPF_REFCOUNT:
|
||||
WARN_ON_ONCE(rec->refcount_off >= 0);
|
||||
/* Cache offset for faster lookup at runtime */
|
||||
rec->refcount_off = rec->fields[i].offset;
|
||||
break;
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
|
||||
@@ -3768,30 +3809,16 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
|
||||
goto end;
|
||||
}
|
||||
|
||||
/* need collection identity for non-owning refs before allowing this
|
||||
*
|
||||
* Consider a node type w/ both list and rb_node fields:
|
||||
* struct node {
|
||||
* struct bpf_list_node l;
|
||||
* struct bpf_rb_node r;
|
||||
* }
|
||||
*
|
||||
* Used like so:
|
||||
* struct node *n = bpf_obj_new(....);
|
||||
* bpf_list_push_front(&list_head, &n->l);
|
||||
* bpf_rbtree_remove(&rb_root, &n->r);
|
||||
*
|
||||
* It should not be possible to rbtree_remove the node since it hasn't
|
||||
* been added to a tree. But push_front converts n to a non-owning
|
||||
* reference, and rbtree_remove accepts the non-owning reference to
|
||||
* a type w/ bpf_rb_node field.
|
||||
*/
|
||||
if (btf_record_has_field(rec, BPF_LIST_NODE) &&
|
||||
if (rec->refcount_off < 0 &&
|
||||
btf_record_has_field(rec, BPF_LIST_NODE) &&
|
||||
btf_record_has_field(rec, BPF_RB_NODE)) {
|
||||
ret = -EINVAL;
|
||||
goto end;
|
||||
}
|
||||
|
||||
sort_r(rec->fields, rec->cnt, sizeof(struct btf_field), btf_field_cmp,
|
||||
NULL, rec);
|
||||
|
||||
return rec;
|
||||
end:
|
||||
btf_record_free(rec);
|
||||
@@ -3873,61 +3900,6 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int btf_field_offs_cmp(const void *_a, const void *_b, const void *priv)
|
||||
{
|
||||
const u32 a = *(const u32 *)_a;
|
||||
const u32 b = *(const u32 *)_b;
|
||||
|
||||
if (a < b)
|
||||
return -1;
|
||||
else if (a > b)
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void btf_field_offs_swap(void *_a, void *_b, int size, const void *priv)
|
||||
{
|
||||
struct btf_field_offs *foffs = (void *)priv;
|
||||
u32 *off_base = foffs->field_off;
|
||||
u32 *a = _a, *b = _b;
|
||||
u8 *sz_a, *sz_b;
|
||||
|
||||
sz_a = foffs->field_sz + (a - off_base);
|
||||
sz_b = foffs->field_sz + (b - off_base);
|
||||
|
||||
swap(*a, *b);
|
||||
swap(*sz_a, *sz_b);
|
||||
}
|
||||
|
||||
struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec)
|
||||
{
|
||||
struct btf_field_offs *foffs;
|
||||
u32 i, *off;
|
||||
u8 *sz;
|
||||
|
||||
BUILD_BUG_ON(ARRAY_SIZE(foffs->field_off) != ARRAY_SIZE(foffs->field_sz));
|
||||
if (IS_ERR_OR_NULL(rec))
|
||||
return NULL;
|
||||
|
||||
foffs = kzalloc(sizeof(*foffs), GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!foffs)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
off = foffs->field_off;
|
||||
sz = foffs->field_sz;
|
||||
for (i = 0; i < rec->cnt; i++) {
|
||||
off[i] = rec->fields[i].offset;
|
||||
sz[i] = btf_field_type_size(rec->fields[i].type);
|
||||
}
|
||||
foffs->cnt = rec->cnt;
|
||||
|
||||
if (foffs->cnt == 1)
|
||||
return foffs;
|
||||
sort_r(foffs->field_off, foffs->cnt, sizeof(foffs->field_off[0]),
|
||||
btf_field_offs_cmp, btf_field_offs_swap, foffs);
|
||||
return foffs;
|
||||
}
|
||||
|
||||
static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
|
||||
u32 type_id, void *data, u8 bits_offset,
|
||||
struct btf_show *show)
|
||||
@@ -5332,6 +5304,7 @@ static const char *alloc_obj_fields[] = {
|
||||
"bpf_list_node",
|
||||
"bpf_rb_root",
|
||||
"bpf_rb_node",
|
||||
"bpf_refcount",
|
||||
};
|
||||
|
||||
static struct btf_struct_metas *
|
||||
@@ -5370,7 +5343,6 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
|
||||
for (i = 1; i < n; i++) {
|
||||
struct btf_struct_metas *new_tab;
|
||||
const struct btf_member *member;
|
||||
struct btf_field_offs *foffs;
|
||||
struct btf_struct_meta *type;
|
||||
struct btf_record *record;
|
||||
const struct btf_type *t;
|
||||
@@ -5406,23 +5378,13 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
|
||||
type = &tab->types[tab->cnt];
|
||||
type->btf_id = i;
|
||||
record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
|
||||
BPF_RB_ROOT | BPF_RB_NODE, t->size);
|
||||
BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT, t->size);
|
||||
/* The record cannot be unset, treat it as an error if so */
|
||||
if (IS_ERR_OR_NULL(record)) {
|
||||
ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
|
||||
goto free;
|
||||
}
|
||||
foffs = btf_parse_field_offs(record);
|
||||
/* We need the field_offs to be valid for a valid record,
|
||||
* either both should be set or both should be unset.
|
||||
*/
|
||||
if (IS_ERR_OR_NULL(foffs)) {
|
||||
btf_record_free(record);
|
||||
ret = -EFAULT;
|
||||
goto free;
|
||||
}
|
||||
type->record = record;
|
||||
type->field_offs = foffs;
|
||||
tab->cnt++;
|
||||
}
|
||||
return tab;
|
||||
@@ -5489,38 +5451,45 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
|
||||
u32 log_level, char __user *log_ubuf, u32 log_size)
|
||||
static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_size)
|
||||
{
|
||||
struct btf_struct_metas *struct_meta_tab;
|
||||
struct btf_verifier_env *env = NULL;
|
||||
struct bpf_verifier_log *log;
|
||||
struct btf *btf = NULL;
|
||||
u8 *data;
|
||||
u32 log_true_size;
|
||||
int err;
|
||||
|
||||
if (btf_data_size > BTF_MAX_SIZE)
|
||||
err = bpf_vlog_finalize(log, &log_true_size);
|
||||
|
||||
if (uattr_size >= offsetofend(union bpf_attr, btf_log_true_size) &&
|
||||
copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, btf_log_true_size),
|
||||
&log_true_size, sizeof(log_true_size)))
|
||||
err = -EFAULT;
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
|
||||
{
|
||||
bpfptr_t btf_data = make_bpfptr(attr->btf, uattr.is_kernel);
|
||||
char __user *log_ubuf = u64_to_user_ptr(attr->btf_log_buf);
|
||||
struct btf_struct_metas *struct_meta_tab;
|
||||
struct btf_verifier_env *env = NULL;
|
||||
struct btf *btf = NULL;
|
||||
u8 *data;
|
||||
int err, ret;
|
||||
|
||||
if (attr->btf_size > BTF_MAX_SIZE)
|
||||
return ERR_PTR(-E2BIG);
|
||||
|
||||
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!env)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
log = &env->log;
|
||||
if (log_level || log_ubuf || log_size) {
|
||||
/* user requested verbose verifier output
|
||||
* and supplied buffer to store the verification trace
|
||||
*/
|
||||
log->level = log_level;
|
||||
log->ubuf = log_ubuf;
|
||||
log->len_total = log_size;
|
||||
|
||||
/* log attributes have to be sane */
|
||||
if (!bpf_verifier_log_attr_valid(log)) {
|
||||
err = -EINVAL;
|
||||
goto errout;
|
||||
}
|
||||
}
|
||||
/* user could have requested verbose verifier output
|
||||
* and supplied buffer to store the verification trace
|
||||
*/
|
||||
err = bpf_vlog_init(&env->log, attr->btf_log_level,
|
||||
log_ubuf, attr->btf_log_size);
|
||||
if (err)
|
||||
goto errout_free;
|
||||
|
||||
btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!btf) {
|
||||
@@ -5529,16 +5498,16 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
|
||||
}
|
||||
env->btf = btf;
|
||||
|
||||
data = kvmalloc(btf_data_size, GFP_KERNEL | __GFP_NOWARN);
|
||||
data = kvmalloc(attr->btf_size, GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!data) {
|
||||
err = -ENOMEM;
|
||||
goto errout;
|
||||
}
|
||||
|
||||
btf->data = data;
|
||||
btf->data_size = btf_data_size;
|
||||
btf->data_size = attr->btf_size;
|
||||
|
||||
if (copy_from_bpfptr(data, btf_data, btf_data_size)) {
|
||||
if (copy_from_bpfptr(data, btf_data, attr->btf_size)) {
|
||||
err = -EFAULT;
|
||||
goto errout;
|
||||
}
|
||||
@@ -5561,7 +5530,7 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
|
||||
if (err)
|
||||
goto errout;
|
||||
|
||||
struct_meta_tab = btf_parse_struct_metas(log, btf);
|
||||
struct_meta_tab = btf_parse_struct_metas(&env->log, btf);
|
||||
if (IS_ERR(struct_meta_tab)) {
|
||||
err = PTR_ERR(struct_meta_tab);
|
||||
goto errout;
|
||||
@@ -5578,10 +5547,9 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
|
||||
}
|
||||
}
|
||||
|
||||
if (log->level && bpf_verifier_log_full(log)) {
|
||||
err = -ENOSPC;
|
||||
goto errout_meta;
|
||||
}
|
||||
err = finalize_log(&env->log, uattr, uattr_size);
|
||||
if (err)
|
||||
goto errout_free;
|
||||
|
||||
btf_verifier_env_free(env);
|
||||
refcount_set(&btf->refcnt, 1);
|
||||
@@ -5590,6 +5558,11 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
|
||||
errout_meta:
|
||||
btf_free_struct_meta_tab(btf);
|
||||
errout:
|
||||
/* overwrite err with -ENOSPC or -EFAULT */
|
||||
ret = finalize_log(&env->log, uattr, uattr_size);
|
||||
if (ret)
|
||||
err = ret;
|
||||
errout_free:
|
||||
btf_verifier_env_free(env);
|
||||
if (btf)
|
||||
btf_free(btf);
|
||||
@@ -5684,6 +5657,10 @@ again:
|
||||
* int socket_filter_bpf_prog(struct __sk_buff *skb)
|
||||
* { // no fields of skb are ever used }
|
||||
*/
|
||||
if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0)
|
||||
return ctx_type;
|
||||
if (strcmp(ctx_tname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0)
|
||||
return ctx_type;
|
||||
if (strcmp(ctx_tname, tname)) {
|
||||
/* bpf_user_pt_regs_t is a typedef, so resolve it to
|
||||
* underlying struct and check name again
|
||||
@@ -5891,12 +5868,8 @@ struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
|
||||
|
||||
static bool is_int_ptr(struct btf *btf, const struct btf_type *t)
|
||||
{
|
||||
/* t comes in already as a pointer */
|
||||
t = btf_type_by_id(btf, t->type);
|
||||
|
||||
/* allow const */
|
||||
if (BTF_INFO_KIND(t->info) == BTF_KIND_CONST)
|
||||
t = btf_type_by_id(btf, t->type);
|
||||
/* skip modifiers */
|
||||
t = btf_type_skip_modifiers(btf, t->type, NULL);
|
||||
|
||||
return btf_type_is_int(t);
|
||||
}
|
||||
@@ -6147,7 +6120,8 @@ enum bpf_struct_walk_result {
|
||||
|
||||
static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
|
||||
const struct btf_type *t, int off, int size,
|
||||
u32 *next_btf_id, enum bpf_type_flag *flag)
|
||||
u32 *next_btf_id, enum bpf_type_flag *flag,
|
||||
const char **field_name)
|
||||
{
|
||||
u32 i, moff, mtrue_end, msize = 0, total_nelems = 0;
|
||||
const struct btf_type *mtype, *elem_type = NULL;
|
||||
@@ -6155,6 +6129,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
|
||||
const char *tname, *mname, *tag_value;
|
||||
u32 vlen, elem_id, mid;
|
||||
|
||||
*flag = 0;
|
||||
again:
|
||||
tname = __btf_name_by_offset(btf, t->name_off);
|
||||
if (!btf_type_is_struct(t)) {
|
||||
@@ -6186,11 +6161,13 @@ again:
|
||||
if (off < moff)
|
||||
goto error;
|
||||
|
||||
/* Only allow structure for now, can be relaxed for
|
||||
* other types later.
|
||||
*/
|
||||
/* allow structure and integer */
|
||||
t = btf_type_skip_modifiers(btf, array_elem->type,
|
||||
NULL);
|
||||
|
||||
if (btf_type_is_int(t))
|
||||
return WALK_SCALAR;
|
||||
|
||||
if (!btf_type_is_struct(t))
|
||||
goto error;
|
||||
|
||||
@@ -6321,6 +6298,15 @@ error:
|
||||
* of this field or inside of this struct
|
||||
*/
|
||||
if (btf_type_is_struct(mtype)) {
|
||||
if (BTF_INFO_KIND(mtype->info) == BTF_KIND_UNION &&
|
||||
btf_type_vlen(mtype) != 1)
|
||||
/*
|
||||
* walking unions yields untrusted pointers
|
||||
* with exception of __bpf_md_ptr and other
|
||||
* unions with a single member
|
||||
*/
|
||||
*flag |= PTR_UNTRUSTED;
|
||||
|
||||
/* our field must be inside that union or struct */
|
||||
t = mtype;
|
||||
|
||||
@@ -6365,7 +6351,9 @@ error:
|
||||
stype = btf_type_skip_modifiers(btf, mtype->type, &id);
|
||||
if (btf_type_is_struct(stype)) {
|
||||
*next_btf_id = id;
|
||||
*flag = tmp_flag;
|
||||
*flag |= tmp_flag;
|
||||
if (field_name)
|
||||
*field_name = mname;
|
||||
return WALK_PTR;
|
||||
}
|
||||
}
|
||||
@@ -6392,7 +6380,8 @@ error:
|
||||
int btf_struct_access(struct bpf_verifier_log *log,
|
||||
const struct bpf_reg_state *reg,
|
||||
int off, int size, enum bpf_access_type atype __maybe_unused,
|
||||
u32 *next_btf_id, enum bpf_type_flag *flag)
|
||||
u32 *next_btf_id, enum bpf_type_flag *flag,
|
||||
const char **field_name)
|
||||
{
|
||||
const struct btf *btf = reg->btf;
|
||||
enum bpf_type_flag tmp_flag = 0;
|
||||
@@ -6424,7 +6413,7 @@ int btf_struct_access(struct bpf_verifier_log *log,
|
||||
|
||||
t = btf_type_by_id(btf, id);
|
||||
do {
|
||||
err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
|
||||
err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag, field_name);
|
||||
|
||||
switch (err) {
|
||||
case WALK_PTR:
|
||||
@@ -6499,7 +6488,7 @@ again:
|
||||
type = btf_type_by_id(btf, id);
|
||||
if (!type)
|
||||
return false;
|
||||
err = btf_struct_walk(log, btf, type, off, 1, &id, &flag);
|
||||
err = btf_struct_walk(log, btf, type, off, 1, &id, &flag, NULL);
|
||||
if (err != WALK_STRUCT)
|
||||
return false;
|
||||
|
||||
@@ -7180,15 +7169,12 @@ static int __btf_new_fd(struct btf *btf)
|
||||
return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC);
|
||||
}
|
||||
|
||||
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr)
|
||||
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
|
||||
{
|
||||
struct btf *btf;
|
||||
int ret;
|
||||
|
||||
btf = btf_parse(make_bpfptr(attr->btf, uattr.is_kernel),
|
||||
attr->btf_size, attr->btf_log_level,
|
||||
u64_to_user_ptr(attr->btf_log_buf),
|
||||
attr->btf_log_size);
|
||||
btf = btf_parse(attr, uattr, uattr_size);
|
||||
if (IS_ERR(btf))
|
||||
return PTR_ERR(btf);
|
||||
|
||||
@@ -7578,6 +7564,108 @@ BTF_ID_LIST_GLOBAL(btf_tracing_ids, MAX_BTF_TRACING_TYPE)
|
||||
BTF_TRACING_TYPE_xxx
|
||||
#undef BTF_TRACING_TYPE
|
||||
|
||||
static int btf_check_iter_kfuncs(struct btf *btf, const char *func_name,
|
||||
const struct btf_type *func, u32 func_flags)
|
||||
{
|
||||
u32 flags = func_flags & (KF_ITER_NEW | KF_ITER_NEXT | KF_ITER_DESTROY);
|
||||
const char *name, *sfx, *iter_name;
|
||||
const struct btf_param *arg;
|
||||
const struct btf_type *t;
|
||||
char exp_name[128];
|
||||
u32 nr_args;
|
||||
|
||||
/* exactly one of KF_ITER_{NEW,NEXT,DESTROY} can be set */
|
||||
if (!flags || (flags & (flags - 1)))
|
||||
return -EINVAL;
|
||||
|
||||
/* any BPF iter kfunc should have `struct bpf_iter_<type> *` first arg */
|
||||
nr_args = btf_type_vlen(func);
|
||||
if (nr_args < 1)
|
||||
return -EINVAL;
|
||||
|
||||
arg = &btf_params(func)[0];
|
||||
t = btf_type_skip_modifiers(btf, arg->type, NULL);
|
||||
if (!t || !btf_type_is_ptr(t))
|
||||
return -EINVAL;
|
||||
t = btf_type_skip_modifiers(btf, t->type, NULL);
|
||||
if (!t || !__btf_type_is_struct(t))
|
||||
return -EINVAL;
|
||||
|
||||
name = btf_name_by_offset(btf, t->name_off);
|
||||
if (!name || strncmp(name, ITER_PREFIX, sizeof(ITER_PREFIX) - 1))
|
||||
return -EINVAL;
|
||||
|
||||
/* sizeof(struct bpf_iter_<type>) should be a multiple of 8 to
|
||||
* fit nicely in stack slots
|
||||
*/
|
||||
if (t->size == 0 || (t->size % 8))
|
||||
return -EINVAL;
|
||||
|
||||
/* validate bpf_iter_<type>_{new,next,destroy}(struct bpf_iter_<type> *)
|
||||
* naming pattern
|
||||
*/
|
||||
iter_name = name + sizeof(ITER_PREFIX) - 1;
|
||||
if (flags & KF_ITER_NEW)
|
||||
sfx = "new";
|
||||
else if (flags & KF_ITER_NEXT)
|
||||
sfx = "next";
|
||||
else /* (flags & KF_ITER_DESTROY) */
|
||||
sfx = "destroy";
|
||||
|
||||
snprintf(exp_name, sizeof(exp_name), "bpf_iter_%s_%s", iter_name, sfx);
|
||||
if (strcmp(func_name, exp_name))
|
||||
return -EINVAL;
|
||||
|
||||
/* only iter constructor should have extra arguments */
|
||||
if (!(flags & KF_ITER_NEW) && nr_args != 1)
|
||||
return -EINVAL;
|
||||
|
||||
if (flags & KF_ITER_NEXT) {
|
||||
/* bpf_iter_<type>_next() should return pointer */
|
||||
t = btf_type_skip_modifiers(btf, func->type, NULL);
|
||||
if (!t || !btf_type_is_ptr(t))
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (flags & KF_ITER_DESTROY) {
|
||||
/* bpf_iter_<type>_destroy() should return void */
|
||||
t = btf_type_by_id(btf, func->type);
|
||||
if (!t || !btf_type_is_void(t))
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
|
||||
{
|
||||
const struct btf_type *func;
|
||||
const char *func_name;
|
||||
int err;
|
||||
|
||||
/* any kfunc should be FUNC -> FUNC_PROTO */
|
||||
func = btf_type_by_id(btf, func_id);
|
||||
if (!func || !btf_type_is_func(func))
|
||||
return -EINVAL;
|
||||
|
||||
/* sanity check kfunc name */
|
||||
func_name = btf_name_by_offset(btf, func->name_off);
|
||||
if (!func_name || !func_name[0])
|
||||
return -EINVAL;
|
||||
|
||||
func = btf_type_by_id(btf, func->type);
|
||||
if (!func || !btf_type_is_func_proto(func))
|
||||
return -EINVAL;
|
||||
|
||||
if (func_flags & (KF_ITER_NEW | KF_ITER_NEXT | KF_ITER_DESTROY)) {
|
||||
err = btf_check_iter_kfuncs(btf, func_name, func, func_flags);
|
||||
if (err)
|
||||
return err;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Kernel Function (kfunc) BTF ID set registration API */
|
||||
|
||||
static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
|
||||
@@ -7705,6 +7793,21 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
|
||||
return BTF_KFUNC_HOOK_TRACING;
|
||||
case BPF_PROG_TYPE_SYSCALL:
|
||||
return BTF_KFUNC_HOOK_SYSCALL;
|
||||
case BPF_PROG_TYPE_CGROUP_SKB:
|
||||
return BTF_KFUNC_HOOK_CGROUP_SKB;
|
||||
case BPF_PROG_TYPE_SCHED_ACT:
|
||||
return BTF_KFUNC_HOOK_SCHED_ACT;
|
||||
case BPF_PROG_TYPE_SK_SKB:
|
||||
return BTF_KFUNC_HOOK_SK_SKB;
|
||||
case BPF_PROG_TYPE_SOCKET_FILTER:
|
||||
return BTF_KFUNC_HOOK_SOCKET_FILTER;
|
||||
case BPF_PROG_TYPE_LWT_OUT:
|
||||
case BPF_PROG_TYPE_LWT_IN:
|
||||
case BPF_PROG_TYPE_LWT_XMIT:
|
||||
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
|
||||
return BTF_KFUNC_HOOK_LWT;
|
||||
case BPF_PROG_TYPE_NETFILTER:
|
||||
return BTF_KFUNC_HOOK_NETFILTER;
|
||||
default:
|
||||
return BTF_KFUNC_HOOK_MAX;
|
||||
}
|
||||
@@ -7741,7 +7844,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
|
||||
const struct btf_kfunc_id_set *kset)
|
||||
{
|
||||
struct btf *btf;
|
||||
int ret;
|
||||
int ret, i;
|
||||
|
||||
btf = btf_get_module_btf(kset->owner);
|
||||
if (!btf) {
|
||||
@@ -7758,7 +7861,15 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
|
||||
if (IS_ERR(btf))
|
||||
return PTR_ERR(btf);
|
||||
|
||||
for (i = 0; i < kset->set->cnt; i++) {
|
||||
ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id,
|
||||
kset->set->pairs[i].flags);
|
||||
if (ret)
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
ret = btf_populate_kfunc_set(btf, hook, kset->set);
|
||||
err_out:
|
||||
btf_put(btf);
|
||||
return ret;
|
||||
}
|
||||
@@ -8249,12 +8360,10 @@ check_modules:
|
||||
btf_get(mod_btf);
|
||||
spin_unlock_bh(&btf_idr_lock);
|
||||
cands = bpf_core_add_cands(cands, mod_btf, btf_nr_types(main_btf));
|
||||
if (IS_ERR(cands)) {
|
||||
btf_put(mod_btf);
|
||||
return ERR_CAST(cands);
|
||||
}
|
||||
spin_lock_bh(&btf_idr_lock);
|
||||
btf_put(mod_btf);
|
||||
if (IS_ERR(cands))
|
||||
return ERR_CAST(cands);
|
||||
spin_lock_bh(&btf_idr_lock);
|
||||
}
|
||||
spin_unlock_bh(&btf_idr_lock);
|
||||
/* cands is a pointer to kmalloced memory here if cands->cnt > 0
|
||||
@@ -8336,16 +8445,15 @@ out:
|
||||
|
||||
bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
|
||||
const struct bpf_reg_state *reg,
|
||||
int off)
|
||||
const char *field_name, u32 btf_id, const char *suffix)
|
||||
{
|
||||
struct btf *btf = reg->btf;
|
||||
const struct btf_type *walk_type, *safe_type;
|
||||
const char *tname;
|
||||
char safe_tname[64];
|
||||
long ret, safe_id;
|
||||
const struct btf_member *member, *m_walk = NULL;
|
||||
const struct btf_member *member;
|
||||
u32 i;
|
||||
const char *walk_name;
|
||||
|
||||
walk_type = btf_type_by_id(btf, reg->btf_id);
|
||||
if (!walk_type)
|
||||
@@ -8353,7 +8461,7 @@ bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
|
||||
|
||||
tname = btf_name_by_offset(btf, walk_type->name_off);
|
||||
|
||||
ret = snprintf(safe_tname, sizeof(safe_tname), "%s__safe_fields", tname);
|
||||
ret = snprintf(safe_tname, sizeof(safe_tname), "%s%s", tname, suffix);
|
||||
if (ret < 0)
|
||||
return false;
|
||||
|
||||
@@ -8365,30 +8473,17 @@ bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
|
||||
if (!safe_type)
|
||||
return false;
|
||||
|
||||
for_each_member(i, walk_type, member) {
|
||||
u32 moff;
|
||||
|
||||
/* We're looking for the PTR_TO_BTF_ID member in the struct
|
||||
* type we're walking which matches the specified offset.
|
||||
* Below, we'll iterate over the fields in the safe variant of
|
||||
* the struct and see if any of them has a matching type /
|
||||
* name.
|
||||
*/
|
||||
moff = __btf_member_bit_offset(walk_type, member) / 8;
|
||||
if (off == moff) {
|
||||
m_walk = member;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (m_walk == NULL)
|
||||
return false;
|
||||
|
||||
walk_name = __btf_name_by_offset(btf, m_walk->name_off);
|
||||
for_each_member(i, safe_type, member) {
|
||||
const char *m_name = __btf_name_by_offset(btf, member->name_off);
|
||||
const struct btf_type *mtype = btf_type_by_id(btf, member->type);
|
||||
u32 id;
|
||||
|
||||
if (!btf_type_is_ptr(mtype))
|
||||
continue;
|
||||
|
||||
btf_type_skip_modifiers(btf, mtype->type, &id);
|
||||
/* If we match on both type and name, the field is considered trusted. */
|
||||
if (m_walk->type == member->type && !strcmp(walk_name, m_name))
|
||||
if (btf_id == id && !strcmp(field_name, m_name))
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
@@ -1921,14 +1921,17 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
|
||||
if (ret < 0)
|
||||
goto out;
|
||||
|
||||
if (ctx.optlen > max_optlen || ctx.optlen < 0) {
|
||||
if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
|
||||
ret = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (ctx.optlen != 0) {
|
||||
if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
|
||||
put_user(ctx.optlen, optlen)) {
|
||||
if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
|
||||
ret = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
if (put_user(ctx.optlen, optlen)) {
|
||||
ret = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
@@ -2223,10 +2226,12 @@ static u32 sysctl_convert_ctx_access(enum bpf_access_type type,
|
||||
BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos),
|
||||
treg, si->dst_reg,
|
||||
offsetof(struct bpf_sysctl_kern, ppos));
|
||||
*insn++ = BPF_STX_MEM(
|
||||
BPF_SIZEOF(u32), treg, si->src_reg,
|
||||
*insn++ = BPF_RAW_INSN(
|
||||
BPF_CLASS(si->code) | BPF_MEM | BPF_SIZEOF(u32),
|
||||
treg, si->src_reg,
|
||||
bpf_ctx_narrow_access_offset(
|
||||
0, sizeof(u32), sizeof(loff_t)));
|
||||
0, sizeof(u32), sizeof(loff_t)),
|
||||
si->imm);
|
||||
*insn++ = BPF_LDX_MEM(
|
||||
BPF_DW, treg, si->dst_reg,
|
||||
offsetof(struct bpf_sysctl_kern, tmp_reg));
|
||||
@@ -2376,10 +2381,17 @@ static bool cg_sockopt_is_valid_access(int off, int size,
|
||||
return true;
|
||||
}
|
||||
|
||||
#define CG_SOCKOPT_ACCESS_FIELD(T, F) \
|
||||
T(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \
|
||||
si->dst_reg, si->src_reg, \
|
||||
offsetof(struct bpf_sockopt_kern, F))
|
||||
#define CG_SOCKOPT_READ_FIELD(F) \
|
||||
BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \
|
||||
si->dst_reg, si->src_reg, \
|
||||
offsetof(struct bpf_sockopt_kern, F))
|
||||
|
||||
#define CG_SOCKOPT_WRITE_FIELD(F) \
|
||||
BPF_RAW_INSN((BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F) | \
|
||||
BPF_MEM | BPF_CLASS(si->code)), \
|
||||
si->dst_reg, si->src_reg, \
|
||||
offsetof(struct bpf_sockopt_kern, F), \
|
||||
si->imm)
|
||||
|
||||
static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
|
||||
const struct bpf_insn *si,
|
||||
@@ -2391,25 +2403,25 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
|
||||
|
||||
switch (si->off) {
|
||||
case offsetof(struct bpf_sockopt, sk):
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, sk);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(sk);
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, level):
|
||||
if (type == BPF_WRITE)
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, level);
|
||||
*insn++ = CG_SOCKOPT_WRITE_FIELD(level);
|
||||
else
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, level);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(level);
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, optname):
|
||||
if (type == BPF_WRITE)
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optname);
|
||||
*insn++ = CG_SOCKOPT_WRITE_FIELD(optname);
|
||||
else
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optname);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(optname);
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, optlen):
|
||||
if (type == BPF_WRITE)
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optlen);
|
||||
*insn++ = CG_SOCKOPT_WRITE_FIELD(optlen);
|
||||
else
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optlen);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(optlen);
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, retval):
|
||||
BUILD_BUG_ON(offsetof(struct bpf_cg_run_ctx, run_ctx) != 0);
|
||||
@@ -2429,9 +2441,11 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
|
||||
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct task_struct, bpf_ctx),
|
||||
treg, treg,
|
||||
offsetof(struct task_struct, bpf_ctx));
|
||||
*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
|
||||
treg, si->src_reg,
|
||||
offsetof(struct bpf_cg_run_ctx, retval));
|
||||
*insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_MEM |
|
||||
BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
|
||||
treg, si->src_reg,
|
||||
offsetof(struct bpf_cg_run_ctx, retval),
|
||||
si->imm);
|
||||
*insn++ = BPF_LDX_MEM(BPF_DW, treg, si->dst_reg,
|
||||
offsetof(struct bpf_sockopt_kern, tmp_reg));
|
||||
} else {
|
||||
@@ -2447,10 +2461,10 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
|
||||
}
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, optval):
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(optval);
|
||||
break;
|
||||
case offsetof(struct bpf_sockopt, optval_end):
|
||||
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval_end);
|
||||
*insn++ = CG_SOCKOPT_READ_FIELD(optval_end);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -2529,10 +2543,6 @@ cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_get_current_pid_tgid_proto;
|
||||
case BPF_FUNC_get_current_comm:
|
||||
return &bpf_get_current_comm_proto;
|
||||
case BPF_FUNC_get_current_cgroup_id:
|
||||
return &bpf_get_current_cgroup_id_proto;
|
||||
case BPF_FUNC_get_current_ancestor_cgroup_id:
|
||||
return &bpf_get_current_ancestor_cgroup_id_proto;
|
||||
#ifdef CONFIG_CGROUP_NET_CLASSID
|
||||
case BPF_FUNC_get_cgroup_classid:
|
||||
return &bpf_get_cgroup_classid_curr_proto;
|
||||
|
||||
@@ -1187,6 +1187,7 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
|
||||
s16 off = insn->off;
|
||||
s32 imm = insn->imm;
|
||||
u8 *addr;
|
||||
int err;
|
||||
|
||||
*func_addr_fixed = insn->src_reg != BPF_PSEUDO_CALL;
|
||||
if (!*func_addr_fixed) {
|
||||
@@ -1201,6 +1202,11 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
|
||||
addr = (u8 *)prog->aux->func[off]->bpf_func;
|
||||
else
|
||||
return -EINVAL;
|
||||
} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
|
||||
bpf_jit_supports_far_kfunc_call()) {
|
||||
err = bpf_get_kfunc_addr(prog, insn->imm, insn->off, &addr);
|
||||
if (err)
|
||||
return err;
|
||||
} else {
|
||||
/* Address of a BPF helper call. Since part of the core
|
||||
* kernel, it's always at a fixed location. __bpf_call_base
|
||||
@@ -2732,6 +2738,11 @@ bool __weak bpf_jit_supports_kfunc_call(void)
|
||||
return false;
|
||||
}
|
||||
|
||||
bool __weak bpf_jit_supports_far_kfunc_call(void)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
|
||||
* skb_copy_bits(), so provide a weak definition of it for NET-less config.
|
||||
*/
|
||||
|
||||
@@ -540,7 +540,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
|
||||
}
|
||||
}
|
||||
|
||||
static int cpu_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long cpu_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
|
||||
u32 key_cpu = *(u32 *)key;
|
||||
@@ -553,8 +553,8 @@ static int cpu_map_delete_elem(struct bpf_map *map, void *key)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
|
||||
struct bpf_cpumap_val cpumap_value = {};
|
||||
@@ -667,12 +667,21 @@ static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
|
||||
static long cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
|
||||
{
|
||||
return __bpf_xdp_redirect_map(map, index, flags, 0,
|
||||
__cpu_map_lookup_elem);
|
||||
}
|
||||
|
||||
static u64 cpu_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
u64 usage = sizeof(struct bpf_cpu_map);
|
||||
|
||||
/* Currently the dynamically allocated elements are not counted */
|
||||
usage += (u64)map->max_entries * sizeof(struct bpf_cpu_map_entry *);
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(cpu_map_btf_ids, struct, bpf_cpu_map)
|
||||
const struct bpf_map_ops cpu_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -683,6 +692,7 @@ const struct bpf_map_ops cpu_map_ops = {
|
||||
.map_lookup_elem = cpu_map_lookup_elem,
|
||||
.map_get_next_key = cpu_map_get_next_key,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = cpu_map_mem_usage,
|
||||
.map_btf_id = &cpu_map_btf_ids[0],
|
||||
.map_redirect = cpu_map_redirect,
|
||||
};
|
||||
|
||||
@@ -9,6 +9,7 @@
|
||||
/**
|
||||
* struct bpf_cpumask - refcounted BPF cpumask wrapper structure
|
||||
* @cpumask: The actual cpumask embedded in the struct.
|
||||
* @rcu: The RCU head used to free the cpumask with RCU safety.
|
||||
* @usage: Object reference counter. When the refcount goes to 0, the
|
||||
* memory is released back to the BPF allocator, which provides
|
||||
* RCU safety.
|
||||
@@ -24,6 +25,7 @@
|
||||
*/
|
||||
struct bpf_cpumask {
|
||||
cpumask_t cpumask;
|
||||
struct rcu_head rcu;
|
||||
refcount_t usage;
|
||||
};
|
||||
|
||||
@@ -55,7 +57,7 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_create(void)
|
||||
/* cpumask must be the first element so struct bpf_cpumask be cast to struct cpumask. */
|
||||
BUILD_BUG_ON(offsetof(struct bpf_cpumask, cpumask) != 0);
|
||||
|
||||
cpumask = bpf_mem_alloc(&bpf_cpumask_ma, sizeof(*cpumask));
|
||||
cpumask = bpf_mem_cache_alloc(&bpf_cpumask_ma);
|
||||
if (!cpumask)
|
||||
return NULL;
|
||||
|
||||
@@ -80,32 +82,14 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask)
|
||||
return cpumask;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_cpumask_kptr_get() - Attempt to acquire a reference to a BPF cpumask
|
||||
* stored in a map.
|
||||
* @cpumaskp: A pointer to a BPF cpumask map value.
|
||||
*
|
||||
* Attempts to acquire a reference to a BPF cpumask stored in a map value. The
|
||||
* cpumask returned by this function must either be embedded in a map as a
|
||||
* kptr, or freed with bpf_cpumask_release(). This function may return NULL if
|
||||
* no BPF cpumask was found in the specified map value.
|
||||
*/
|
||||
__bpf_kfunc struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumaskp)
|
||||
static void cpumask_free_cb(struct rcu_head *head)
|
||||
{
|
||||
struct bpf_cpumask *cpumask;
|
||||
|
||||
/* The BPF memory allocator frees memory backing its caches in an RCU
|
||||
* callback. Thus, we can safely use RCU to ensure that the cpumask is
|
||||
* safe to read.
|
||||
*/
|
||||
rcu_read_lock();
|
||||
|
||||
cpumask = READ_ONCE(*cpumaskp);
|
||||
if (cpumask && !refcount_inc_not_zero(&cpumask->usage))
|
||||
cpumask = NULL;
|
||||
|
||||
rcu_read_unlock();
|
||||
return cpumask;
|
||||
cpumask = container_of(head, struct bpf_cpumask, rcu);
|
||||
migrate_disable();
|
||||
bpf_mem_cache_free(&bpf_cpumask_ma, cpumask);
|
||||
migrate_enable();
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -118,14 +102,8 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumas
|
||||
*/
|
||||
__bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask)
|
||||
{
|
||||
if (!cpumask)
|
||||
return;
|
||||
|
||||
if (refcount_dec_and_test(&cpumask->usage)) {
|
||||
migrate_disable();
|
||||
bpf_mem_free(&bpf_cpumask_ma, cpumask);
|
||||
migrate_enable();
|
||||
}
|
||||
if (refcount_dec_and_test(&cpumask->usage))
|
||||
call_rcu(&cpumask->rcu, cpumask_free_cb);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -424,29 +402,28 @@ __diag_pop();
|
||||
|
||||
BTF_SET8_START(cpumask_kfunc_btf_ids)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_create, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE | KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_first, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_and, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_or, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_full, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_any, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_first, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_and, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_or, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_full, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_any, KF_RCU)
|
||||
BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_RCU)
|
||||
BTF_SET8_END(cpumask_kfunc_btf_ids)
|
||||
|
||||
static const struct btf_kfunc_id_set cpumask_kfunc_set = {
|
||||
@@ -468,7 +445,7 @@ static int __init cpumask_kfunc_init(void)
|
||||
},
|
||||
};
|
||||
|
||||
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, 0, false);
|
||||
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false);
|
||||
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set);
|
||||
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set);
|
||||
return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors,
|
||||
|
||||
@@ -809,7 +809,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
|
||||
kfree(dev);
|
||||
}
|
||||
|
||||
static int dev_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long dev_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_dtab_netdev *old_dev;
|
||||
@@ -819,12 +819,14 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
|
||||
return -EINVAL;
|
||||
|
||||
old_dev = unrcu_pointer(xchg(&dtab->netdev_map[k], NULL));
|
||||
if (old_dev)
|
||||
if (old_dev) {
|
||||
call_rcu(&old_dev->rcu, __dev_map_entry_free);
|
||||
atomic_dec((atomic_t *)&dtab->items);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_map_hash_delete_elem(struct bpf_map *map, void *key)
|
||||
static long dev_map_hash_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_dtab_netdev *old_dev;
|
||||
@@ -895,8 +897,8 @@ err_out:
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
static long __dev_map_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_dtab_netdev *dev, *old_dev;
|
||||
@@ -931,19 +933,21 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
|
||||
old_dev = unrcu_pointer(xchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev)));
|
||||
if (old_dev)
|
||||
call_rcu(&old_dev->rcu, __dev_map_entry_free);
|
||||
else
|
||||
atomic_inc((atomic_t *)&dtab->items);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long dev_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
return __dev_map_update_elem(current->nsproxy->net_ns,
|
||||
map, key, value, map_flags);
|
||||
}
|
||||
|
||||
static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
static long __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_dtab_netdev *dev, *old_dev;
|
||||
@@ -995,27 +999,41 @@ out_err:
|
||||
return err;
|
||||
}
|
||||
|
||||
static int dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
return __dev_map_hash_update_elem(current->nsproxy->net_ns,
|
||||
map, key, value, map_flags);
|
||||
}
|
||||
|
||||
static int dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
|
||||
static long dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
|
||||
{
|
||||
return __bpf_xdp_redirect_map(map, ifindex, flags,
|
||||
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,
|
||||
__dev_map_lookup_elem);
|
||||
}
|
||||
|
||||
static int dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
|
||||
static long dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
|
||||
{
|
||||
return __bpf_xdp_redirect_map(map, ifindex, flags,
|
||||
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,
|
||||
__dev_map_hash_lookup_elem);
|
||||
}
|
||||
|
||||
static u64 dev_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
u64 usage = sizeof(struct bpf_dtab);
|
||||
|
||||
if (map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)
|
||||
usage += (u64)dtab->n_buckets * sizeof(struct hlist_head);
|
||||
else
|
||||
usage += (u64)map->max_entries * sizeof(struct bpf_dtab_netdev *);
|
||||
usage += atomic_read((atomic_t *)&dtab->items) *
|
||||
(u64)sizeof(struct bpf_dtab_netdev);
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(dev_map_btf_ids, struct, bpf_dtab)
|
||||
const struct bpf_map_ops dev_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -1026,6 +1044,7 @@ const struct bpf_map_ops dev_map_ops = {
|
||||
.map_update_elem = dev_map_update_elem,
|
||||
.map_delete_elem = dev_map_delete_elem,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = dev_map_mem_usage,
|
||||
.map_btf_id = &dev_map_btf_ids[0],
|
||||
.map_redirect = dev_map_redirect,
|
||||
};
|
||||
@@ -1039,6 +1058,7 @@ const struct bpf_map_ops dev_map_hash_ops = {
|
||||
.map_update_elem = dev_map_hash_update_elem,
|
||||
.map_delete_elem = dev_map_hash_delete_elem,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = dev_map_mem_usage,
|
||||
.map_btf_id = &dev_map_btf_ids[0],
|
||||
.map_redirect = dev_hash_map_redirect,
|
||||
};
|
||||
@@ -1109,9 +1129,11 @@ static int dev_map_notification(struct notifier_block *notifier,
|
||||
if (!dev || netdev != dev->dev)
|
||||
continue;
|
||||
odev = unrcu_pointer(cmpxchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev), NULL));
|
||||
if (dev == odev)
|
||||
if (dev == odev) {
|
||||
call_rcu(&dev->rcu,
|
||||
__dev_map_entry_free);
|
||||
atomic_dec((atomic_t *)&dtab->items);
|
||||
}
|
||||
}
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
@@ -249,7 +249,18 @@ static void htab_free_prealloced_fields(struct bpf_htab *htab)
|
||||
struct htab_elem *elem;
|
||||
|
||||
elem = get_htab_elem(htab, i);
|
||||
bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8));
|
||||
if (htab_is_percpu(htab)) {
|
||||
void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size);
|
||||
int cpu;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
|
||||
cond_resched();
|
||||
}
|
||||
} else {
|
||||
bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8));
|
||||
cond_resched();
|
||||
}
|
||||
cond_resched();
|
||||
}
|
||||
}
|
||||
@@ -596,6 +607,8 @@ free_htab:
|
||||
|
||||
static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
|
||||
{
|
||||
if (likely(key_len % 4 == 0))
|
||||
return jhash2(key, key_len / 4, hashrnd);
|
||||
return jhash(key, key_len, hashrnd);
|
||||
}
|
||||
|
||||
@@ -759,9 +772,17 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
|
||||
static void check_and_free_fields(struct bpf_htab *htab,
|
||||
struct htab_elem *elem)
|
||||
{
|
||||
void *map_value = elem->key + round_up(htab->map.key_size, 8);
|
||||
if (htab_is_percpu(htab)) {
|
||||
void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size);
|
||||
int cpu;
|
||||
|
||||
bpf_obj_free_fields(htab->map.record, map_value);
|
||||
for_each_possible_cpu(cpu)
|
||||
bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
|
||||
} else {
|
||||
void *map_value = elem->key + round_up(htab->map.key_size, 8);
|
||||
|
||||
bpf_obj_free_fields(htab->map.record, map_value);
|
||||
}
|
||||
}
|
||||
|
||||
/* It is called from the bpf_lru_list when the LRU needs to delete
|
||||
@@ -858,9 +879,9 @@ find_first_elem:
|
||||
|
||||
static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
|
||||
{
|
||||
check_and_free_fields(htab, l);
|
||||
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
|
||||
bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
|
||||
check_and_free_fields(htab, l);
|
||||
bpf_mem_cache_free(&htab->ma, l);
|
||||
}
|
||||
|
||||
@@ -918,14 +939,13 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
|
||||
{
|
||||
if (!onallcpus) {
|
||||
/* copy true value_size bytes */
|
||||
memcpy(this_cpu_ptr(pptr), value, htab->map.value_size);
|
||||
copy_map_value(&htab->map, this_cpu_ptr(pptr), value);
|
||||
} else {
|
||||
u32 size = round_up(htab->map.value_size, 8);
|
||||
int off = 0, cpu;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_long_memcpy(per_cpu_ptr(pptr, cpu),
|
||||
value + off, size);
|
||||
copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value + off);
|
||||
off += size;
|
||||
}
|
||||
}
|
||||
@@ -940,16 +960,14 @@ static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr,
|
||||
* (onallcpus=false always when coming from bpf prog).
|
||||
*/
|
||||
if (!onallcpus) {
|
||||
u32 size = round_up(htab->map.value_size, 8);
|
||||
int current_cpu = raw_smp_processor_id();
|
||||
int cpu;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
if (cpu == current_cpu)
|
||||
bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value,
|
||||
size);
|
||||
else
|
||||
memset(per_cpu_ptr(pptr, cpu), 0, size);
|
||||
copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value);
|
||||
else /* Since elem is preallocated, we cannot touch special fields */
|
||||
zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu));
|
||||
}
|
||||
} else {
|
||||
pcpu_copy_value(htab, pptr, value, onallcpus);
|
||||
@@ -1057,8 +1075,8 @@ static int check_flags(struct bpf_htab *htab, struct htab_elem *l_old,
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct htab_elem *l_new = NULL, *l_old;
|
||||
@@ -1159,8 +1177,8 @@ static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
|
||||
bpf_lru_push_free(&htab->lru, &elem->lru_node);
|
||||
}
|
||||
|
||||
static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct htab_elem *l_new, *l_old = NULL;
|
||||
@@ -1226,9 +1244,9 @@ err:
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags,
|
||||
bool onallcpus)
|
||||
static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags,
|
||||
bool onallcpus)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct htab_elem *l_new = NULL, *l_old;
|
||||
@@ -1281,9 +1299,9 @@ err:
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags,
|
||||
bool onallcpus)
|
||||
static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags,
|
||||
bool onallcpus)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct htab_elem *l_new = NULL, *l_old;
|
||||
@@ -1348,21 +1366,21 @@ err:
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int htab_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
static long htab_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
{
|
||||
return __htab_percpu_map_update_elem(map, key, value, map_flags, false);
|
||||
}
|
||||
|
||||
static int htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 map_flags)
|
||||
{
|
||||
return __htab_lru_percpu_map_update_elem(map, key, value, map_flags,
|
||||
false);
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int htab_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long htab_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct hlist_nulls_head *head;
|
||||
@@ -1398,7 +1416,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct hlist_nulls_head *head;
|
||||
@@ -1575,9 +1593,8 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
|
||||
|
||||
pptr = htab_elem_get_ptr(l, key_size);
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_long_memcpy(value + off,
|
||||
per_cpu_ptr(pptr, cpu),
|
||||
roundup_value_size);
|
||||
copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu));
|
||||
check_and_init_map_value(&htab->map, value + off);
|
||||
off += roundup_value_size;
|
||||
}
|
||||
} else {
|
||||
@@ -1772,8 +1789,8 @@ again_nocopy:
|
||||
|
||||
pptr = htab_elem_get_ptr(l, map->key_size);
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_long_memcpy(dst_val + off,
|
||||
per_cpu_ptr(pptr, cpu), size);
|
||||
copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu));
|
||||
check_and_init_map_value(&htab->map, dst_val + off);
|
||||
off += size;
|
||||
}
|
||||
} else {
|
||||
@@ -2046,9 +2063,9 @@ static int __bpf_hash_map_seq_show(struct seq_file *seq, struct htab_elem *elem)
|
||||
roundup_value_size = round_up(map->value_size, 8);
|
||||
pptr = htab_elem_get_ptr(elem, map->key_size);
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_long_memcpy(info->percpu_value_buf + off,
|
||||
per_cpu_ptr(pptr, cpu),
|
||||
roundup_value_size);
|
||||
copy_map_value_long(map, info->percpu_value_buf + off,
|
||||
per_cpu_ptr(pptr, cpu));
|
||||
check_and_init_map_value(map, info->percpu_value_buf + off);
|
||||
off += roundup_value_size;
|
||||
}
|
||||
ctx.value = info->percpu_value_buf;
|
||||
@@ -2119,8 +2136,8 @@ static const struct bpf_iter_seq_info iter_seq_info = {
|
||||
.seq_priv_size = sizeof(struct bpf_iter_seq_hash_map_info),
|
||||
};
|
||||
|
||||
static int bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
|
||||
void *callback_ctx, u64 flags)
|
||||
static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
|
||||
void *callback_ctx, u64 flags)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
struct hlist_nulls_head *head;
|
||||
@@ -2175,6 +2192,44 @@ out:
|
||||
return num_elems;
|
||||
}
|
||||
|
||||
static u64 htab_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
|
||||
u32 value_size = round_up(htab->map.value_size, 8);
|
||||
bool prealloc = htab_is_prealloc(htab);
|
||||
bool percpu = htab_is_percpu(htab);
|
||||
bool lru = htab_is_lru(htab);
|
||||
u64 num_entries;
|
||||
u64 usage = sizeof(struct bpf_htab);
|
||||
|
||||
usage += sizeof(struct bucket) * htab->n_buckets;
|
||||
usage += sizeof(int) * num_possible_cpus() * HASHTAB_MAP_LOCK_COUNT;
|
||||
if (prealloc) {
|
||||
num_entries = map->max_entries;
|
||||
if (htab_has_extra_elems(htab))
|
||||
num_entries += num_possible_cpus();
|
||||
|
||||
usage += htab->elem_size * num_entries;
|
||||
|
||||
if (percpu)
|
||||
usage += value_size * num_possible_cpus() * num_entries;
|
||||
else if (!lru)
|
||||
usage += sizeof(struct htab_elem *) * num_possible_cpus();
|
||||
} else {
|
||||
#define LLIST_NODE_SZ sizeof(struct llist_node)
|
||||
|
||||
num_entries = htab->use_percpu_counter ?
|
||||
percpu_counter_sum(&htab->pcount) :
|
||||
atomic_read(&htab->count);
|
||||
usage += (htab->elem_size + LLIST_NODE_SZ) * num_entries;
|
||||
if (percpu) {
|
||||
usage += (LLIST_NODE_SZ + sizeof(void *)) * num_entries;
|
||||
usage += value_size * num_possible_cpus() * num_entries;
|
||||
}
|
||||
}
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab)
|
||||
const struct bpf_map_ops htab_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -2191,6 +2246,7 @@ const struct bpf_map_ops htab_map_ops = {
|
||||
.map_seq_show_elem = htab_map_seq_show_elem,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_hash_elem,
|
||||
.map_mem_usage = htab_map_mem_usage,
|
||||
BATCH_OPS(htab),
|
||||
.map_btf_id = &htab_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
@@ -2212,6 +2268,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
|
||||
.map_seq_show_elem = htab_map_seq_show_elem,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_hash_elem,
|
||||
.map_mem_usage = htab_map_mem_usage,
|
||||
BATCH_OPS(htab_lru),
|
||||
.map_btf_id = &htab_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
@@ -2292,8 +2349,8 @@ int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value)
|
||||
*/
|
||||
pptr = htab_elem_get_ptr(l, map->key_size);
|
||||
for_each_possible_cpu(cpu) {
|
||||
bpf_long_memcpy(value + off,
|
||||
per_cpu_ptr(pptr, cpu), size);
|
||||
copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
|
||||
check_and_init_map_value(map, value + off);
|
||||
off += size;
|
||||
}
|
||||
ret = 0;
|
||||
@@ -2363,6 +2420,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
|
||||
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_hash_elem,
|
||||
.map_mem_usage = htab_map_mem_usage,
|
||||
BATCH_OPS(htab_percpu),
|
||||
.map_btf_id = &htab_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
@@ -2382,6 +2440,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
|
||||
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
|
||||
.map_set_for_each_callback_args = map_set_for_each_callback_args,
|
||||
.map_for_each_callback = bpf_for_each_hash_elem,
|
||||
.map_mem_usage = htab_map_mem_usage,
|
||||
BATCH_OPS(htab_lru_percpu),
|
||||
.map_btf_id = &htab_map_btf_ids[0],
|
||||
.iter_seq_info = &iter_seq_info,
|
||||
@@ -2519,6 +2578,7 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
|
||||
.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
|
||||
.map_gen_lookup = htab_of_map_gen_lookup,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = htab_map_mem_usage,
|
||||
BATCH_OPS(htab),
|
||||
.map_btf_id = &htab_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
#include <linux/pid_namespace.h>
|
||||
#include <linux/poison.h>
|
||||
#include <linux/proc_ns.h>
|
||||
#include <linux/sched/task.h>
|
||||
#include <linux/security.h>
|
||||
#include <linux/btf_ids.h>
|
||||
#include <linux/bpf_mem_alloc.h>
|
||||
@@ -257,7 +258,7 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
|
||||
goto err_clear;
|
||||
|
||||
/* Verifier guarantees that size > 0 */
|
||||
strscpy(buf, task->comm, size);
|
||||
strscpy_pad(buf, task->comm, size);
|
||||
return 0;
|
||||
err_clear:
|
||||
memset(buf, 0, size);
|
||||
@@ -571,7 +572,7 @@ static const struct bpf_func_proto bpf_strncmp_proto = {
|
||||
.func = bpf_strncmp,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_MEM,
|
||||
.arg1_type = ARG_PTR_TO_MEM | MEM_RDONLY,
|
||||
.arg2_type = ARG_CONST_SIZE,
|
||||
.arg3_type = ARG_PTR_TO_CONST_STR,
|
||||
};
|
||||
@@ -1264,10 +1265,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
|
||||
{
|
||||
struct bpf_hrtimer *t;
|
||||
int ret = 0;
|
||||
enum hrtimer_mode mode;
|
||||
|
||||
if (in_nmi())
|
||||
return -EOPNOTSUPP;
|
||||
if (flags)
|
||||
if (flags > BPF_F_TIMER_ABS)
|
||||
return -EINVAL;
|
||||
__bpf_spin_lock_irqsave(&timer->lock);
|
||||
t = timer->timer;
|
||||
@@ -1275,7 +1277,13 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
|
||||
ret = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
hrtimer_start(&t->timer, ns_to_ktime(nsecs), HRTIMER_MODE_REL_SOFT);
|
||||
|
||||
if (flags & BPF_F_TIMER_ABS)
|
||||
mode = HRTIMER_MODE_ABS_SOFT;
|
||||
else
|
||||
mode = HRTIMER_MODE_REL_SOFT;
|
||||
|
||||
hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
|
||||
out:
|
||||
__bpf_spin_unlock_irqrestore(&timer->lock);
|
||||
return ret;
|
||||
@@ -1420,11 +1428,21 @@ static bool bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr)
|
||||
return ptr->size & DYNPTR_RDONLY_BIT;
|
||||
}
|
||||
|
||||
void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
|
||||
{
|
||||
ptr->size |= DYNPTR_RDONLY_BIT;
|
||||
}
|
||||
|
||||
static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
|
||||
{
|
||||
ptr->size |= type << DYNPTR_TYPE_SHIFT;
|
||||
}
|
||||
|
||||
static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr)
|
||||
{
|
||||
return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT;
|
||||
}
|
||||
|
||||
u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr)
|
||||
{
|
||||
return ptr->size & DYNPTR_SIZE_MASK;
|
||||
@@ -1497,6 +1515,7 @@ static const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
|
||||
BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern *, src,
|
||||
u32, offset, u64, flags)
|
||||
{
|
||||
enum bpf_dynptr_type type;
|
||||
int err;
|
||||
|
||||
if (!src->data || flags)
|
||||
@@ -1506,13 +1525,25 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
/* Source and destination may possibly overlap, hence use memmove to
|
||||
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
|
||||
* pointing to overlapping PTR_TO_MAP_VALUE regions.
|
||||
*/
|
||||
memmove(dst, src->data + src->offset + offset, len);
|
||||
type = bpf_dynptr_get_type(src);
|
||||
|
||||
return 0;
|
||||
switch (type) {
|
||||
case BPF_DYNPTR_TYPE_LOCAL:
|
||||
case BPF_DYNPTR_TYPE_RINGBUF:
|
||||
/* Source and destination may possibly overlap, hence use memmove to
|
||||
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
|
||||
* pointing to overlapping PTR_TO_MAP_VALUE regions.
|
||||
*/
|
||||
memmove(dst, src->data + src->offset + offset, len);
|
||||
return 0;
|
||||
case BPF_DYNPTR_TYPE_SKB:
|
||||
return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
|
||||
case BPF_DYNPTR_TYPE_XDP:
|
||||
return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len);
|
||||
default:
|
||||
WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
|
||||
return -EFAULT;
|
||||
}
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_dynptr_read_proto = {
|
||||
@@ -1529,22 +1560,40 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = {
|
||||
BPF_CALL_5(bpf_dynptr_write, const struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
|
||||
u32, len, u64, flags)
|
||||
{
|
||||
enum bpf_dynptr_type type;
|
||||
int err;
|
||||
|
||||
if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
|
||||
if (!dst->data || bpf_dynptr_is_rdonly(dst))
|
||||
return -EINVAL;
|
||||
|
||||
err = bpf_dynptr_check_off_len(dst, offset, len);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
/* Source and destination may possibly overlap, hence use memmove to
|
||||
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
|
||||
* pointing to overlapping PTR_TO_MAP_VALUE regions.
|
||||
*/
|
||||
memmove(dst->data + dst->offset + offset, src, len);
|
||||
type = bpf_dynptr_get_type(dst);
|
||||
|
||||
return 0;
|
||||
switch (type) {
|
||||
case BPF_DYNPTR_TYPE_LOCAL:
|
||||
case BPF_DYNPTR_TYPE_RINGBUF:
|
||||
if (flags)
|
||||
return -EINVAL;
|
||||
/* Source and destination may possibly overlap, hence use memmove to
|
||||
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
|
||||
* pointing to overlapping PTR_TO_MAP_VALUE regions.
|
||||
*/
|
||||
memmove(dst->data + dst->offset + offset, src, len);
|
||||
return 0;
|
||||
case BPF_DYNPTR_TYPE_SKB:
|
||||
return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
|
||||
flags);
|
||||
case BPF_DYNPTR_TYPE_XDP:
|
||||
if (flags)
|
||||
return -EINVAL;
|
||||
return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
|
||||
default:
|
||||
WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
|
||||
return -EFAULT;
|
||||
}
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_dynptr_write_proto = {
|
||||
@@ -1560,6 +1609,7 @@ static const struct bpf_func_proto bpf_dynptr_write_proto = {
|
||||
|
||||
BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
|
||||
{
|
||||
enum bpf_dynptr_type type;
|
||||
int err;
|
||||
|
||||
if (!ptr->data)
|
||||
@@ -1572,7 +1622,20 @@ BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u3
|
||||
if (bpf_dynptr_is_rdonly(ptr))
|
||||
return 0;
|
||||
|
||||
return (unsigned long)(ptr->data + ptr->offset + offset);
|
||||
type = bpf_dynptr_get_type(ptr);
|
||||
|
||||
switch (type) {
|
||||
case BPF_DYNPTR_TYPE_LOCAL:
|
||||
case BPF_DYNPTR_TYPE_RINGBUF:
|
||||
return (unsigned long)(ptr->data + ptr->offset + offset);
|
||||
case BPF_DYNPTR_TYPE_SKB:
|
||||
case BPF_DYNPTR_TYPE_XDP:
|
||||
/* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */
|
||||
return 0;
|
||||
default:
|
||||
WARN_ONCE(true, "bpf_dynptr_data: unknown dynptr type %d\n", type);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_dynptr_data_proto = {
|
||||
@@ -1693,6 +1756,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
|
||||
return &bpf_cgrp_storage_get_proto;
|
||||
case BPF_FUNC_cgrp_storage_delete:
|
||||
return &bpf_cgrp_storage_delete_proto;
|
||||
case BPF_FUNC_get_current_cgroup_id:
|
||||
return &bpf_get_current_cgroup_id_proto;
|
||||
case BPF_FUNC_get_current_ancestor_cgroup_id:
|
||||
return &bpf_get_current_ancestor_cgroup_id_proto;
|
||||
#endif
|
||||
default:
|
||||
break;
|
||||
@@ -1731,6 +1798,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
|
||||
}
|
||||
}
|
||||
|
||||
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
|
||||
|
||||
void bpf_list_head_free(const struct btf_field *field, void *list_head,
|
||||
struct bpf_spin_lock *spin_lock)
|
||||
{
|
||||
@@ -1761,13 +1830,8 @@ unlock:
|
||||
/* The contained type can also have resources, including a
|
||||
* bpf_list_head which needs to be freed.
|
||||
*/
|
||||
bpf_obj_free_fields(field->graph_root.value_rec, obj);
|
||||
/* bpf_mem_free requires migrate_disable(), since we can be
|
||||
* called from map free path as well apart from BPF program (as
|
||||
* part of map ops doing bpf_obj_free_fields).
|
||||
*/
|
||||
migrate_disable();
|
||||
bpf_mem_free(&bpf_global_ma, obj);
|
||||
__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
|
||||
migrate_enable();
|
||||
}
|
||||
}
|
||||
@@ -1804,10 +1868,9 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
|
||||
obj = pos;
|
||||
obj -= field->graph_root.node_offset;
|
||||
|
||||
bpf_obj_free_fields(field->graph_root.value_rec, obj);
|
||||
|
||||
migrate_disable();
|
||||
bpf_mem_free(&bpf_global_ma, obj);
|
||||
__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
|
||||
migrate_enable();
|
||||
}
|
||||
}
|
||||
@@ -1826,45 +1889,96 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
|
||||
if (!p)
|
||||
return NULL;
|
||||
if (meta)
|
||||
bpf_obj_init(meta->field_offs, p);
|
||||
bpf_obj_init(meta->record, p);
|
||||
return p;
|
||||
}
|
||||
|
||||
/* Must be called under migrate_disable(), as required by bpf_mem_free */
|
||||
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
|
||||
{
|
||||
if (rec && rec->refcount_off >= 0 &&
|
||||
!refcount_dec_and_test((refcount_t *)(p + rec->refcount_off))) {
|
||||
/* Object is refcounted and refcount_dec didn't result in 0
|
||||
* refcount. Return without freeing the object
|
||||
*/
|
||||
return;
|
||||
}
|
||||
|
||||
if (rec)
|
||||
bpf_obj_free_fields(rec, p);
|
||||
bpf_mem_free(&bpf_global_ma, p);
|
||||
}
|
||||
|
||||
__bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
|
||||
{
|
||||
struct btf_struct_meta *meta = meta__ign;
|
||||
void *p = p__alloc;
|
||||
|
||||
if (meta)
|
||||
bpf_obj_free_fields(meta->record, p);
|
||||
bpf_mem_free(&bpf_global_ma, p);
|
||||
__bpf_obj_drop_impl(p, meta ? meta->record : NULL);
|
||||
}
|
||||
|
||||
static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
|
||||
__bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
|
||||
{
|
||||
struct btf_struct_meta *meta = meta__ign;
|
||||
struct bpf_refcount *ref;
|
||||
|
||||
/* Could just cast directly to refcount_t *, but need some code using
|
||||
* bpf_refcount type so that it is emitted in vmlinux BTF
|
||||
*/
|
||||
ref = (struct bpf_refcount *)(p__refcounted_kptr + meta->record->refcount_off);
|
||||
|
||||
refcount_inc((refcount_t *)ref);
|
||||
return (void *)p__refcounted_kptr;
|
||||
}
|
||||
|
||||
static int __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head,
|
||||
bool tail, struct btf_record *rec, u64 off)
|
||||
{
|
||||
struct list_head *n = (void *)node, *h = (void *)head;
|
||||
|
||||
/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
|
||||
* called on its fields, so init here
|
||||
*/
|
||||
if (unlikely(!h->next))
|
||||
INIT_LIST_HEAD(h);
|
||||
if (unlikely(!n->next))
|
||||
INIT_LIST_HEAD(n);
|
||||
if (!list_empty(n)) {
|
||||
/* Only called from BPF prog, no need to migrate_disable */
|
||||
__bpf_obj_drop_impl(n - off, rec);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
tail ? list_add_tail(n, h) : list_add(n, h);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
__bpf_kfunc void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node)
|
||||
__bpf_kfunc int bpf_list_push_front_impl(struct bpf_list_head *head,
|
||||
struct bpf_list_node *node,
|
||||
void *meta__ign, u64 off)
|
||||
{
|
||||
return __bpf_list_add(node, head, false);
|
||||
struct btf_struct_meta *meta = meta__ign;
|
||||
|
||||
return __bpf_list_add(node, head, false,
|
||||
meta ? meta->record : NULL, off);
|
||||
}
|
||||
|
||||
__bpf_kfunc void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node)
|
||||
__bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
|
||||
struct bpf_list_node *node,
|
||||
void *meta__ign, u64 off)
|
||||
{
|
||||
return __bpf_list_add(node, head, true);
|
||||
struct btf_struct_meta *meta = meta__ign;
|
||||
|
||||
return __bpf_list_add(node, head, true,
|
||||
meta ? meta->record : NULL, off);
|
||||
}
|
||||
|
||||
static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
|
||||
{
|
||||
struct list_head *n, *h = (void *)head;
|
||||
|
||||
/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
|
||||
* called on its fields, so init here
|
||||
*/
|
||||
if (unlikely(!h->next))
|
||||
INIT_LIST_HEAD(h);
|
||||
if (list_empty(h))
|
||||
@@ -1890,6 +2004,9 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
|
||||
struct rb_root_cached *r = (struct rb_root_cached *)root;
|
||||
struct rb_node *n = (struct rb_node *)node;
|
||||
|
||||
if (RB_EMPTY_NODE(n))
|
||||
return NULL;
|
||||
|
||||
rb_erase_cached(n, r);
|
||||
RB_CLEAR_NODE(n);
|
||||
return (struct bpf_rb_node *)n;
|
||||
@@ -1898,14 +2015,20 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
|
||||
/* Need to copy rbtree_add_cached's logic here because our 'less' is a BPF
|
||||
* program
|
||||
*/
|
||||
static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
||||
void *less)
|
||||
static int __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
||||
void *less, struct btf_record *rec, u64 off)
|
||||
{
|
||||
struct rb_node **link = &((struct rb_root_cached *)root)->rb_root.rb_node;
|
||||
struct rb_node *parent = NULL, *n = (struct rb_node *)node;
|
||||
bpf_callback_t cb = (bpf_callback_t)less;
|
||||
struct rb_node *parent = NULL;
|
||||
bool leftmost = true;
|
||||
|
||||
if (!RB_EMPTY_NODE(n)) {
|
||||
/* Only called from BPF prog, no need to migrate_disable */
|
||||
__bpf_obj_drop_impl(n - off, rec);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
while (*link) {
|
||||
parent = *link;
|
||||
if (cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0)) {
|
||||
@@ -1916,15 +2039,18 @@ static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
||||
}
|
||||
}
|
||||
|
||||
rb_link_node((struct rb_node *)node, parent, link);
|
||||
rb_insert_color_cached((struct rb_node *)node,
|
||||
(struct rb_root_cached *)root, leftmost);
|
||||
rb_link_node(n, parent, link);
|
||||
rb_insert_color_cached(n, (struct rb_root_cached *)root, leftmost);
|
||||
return 0;
|
||||
}
|
||||
|
||||
__bpf_kfunc void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
||||
bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
|
||||
__bpf_kfunc int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
||||
bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b),
|
||||
void *meta__ign, u64 off)
|
||||
{
|
||||
__bpf_rbtree_add(root, node, (void *)less);
|
||||
struct btf_struct_meta *meta = meta__ign;
|
||||
|
||||
return __bpf_rbtree_add(root, node, (void *)less, meta ? meta->record : NULL, off);
|
||||
}
|
||||
|
||||
__bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
|
||||
@@ -1942,73 +2068,8 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
|
||||
*/
|
||||
__bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
|
||||
{
|
||||
return get_task_struct(p);
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_task_acquire_not_zero - Acquire a reference to a rcu task object. A task
|
||||
* acquired by this kfunc which is not stored in a map as a kptr, must be
|
||||
* released by calling bpf_task_release().
|
||||
* @p: The task on which a reference is being acquired.
|
||||
*/
|
||||
__bpf_kfunc struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p)
|
||||
{
|
||||
/* For the time being this function returns NULL, as it's not currently
|
||||
* possible to safely acquire a reference to a task with RCU protection
|
||||
* using get_task_struct() and put_task_struct(). This is due to the
|
||||
* slightly odd mechanics of p->rcu_users, and how task RCU protection
|
||||
* works.
|
||||
*
|
||||
* A struct task_struct is refcounted by two different refcount_t
|
||||
* fields:
|
||||
*
|
||||
* 1. p->usage: The "true" refcount field which tracks a task's
|
||||
* lifetime. The task is freed as soon as this
|
||||
* refcount drops to 0.
|
||||
*
|
||||
* 2. p->rcu_users: An "RCU users" refcount field which is statically
|
||||
* initialized to 2, and is co-located in a union with
|
||||
* a struct rcu_head field (p->rcu). p->rcu_users
|
||||
* essentially encapsulates a single p->usage
|
||||
* refcount, and when p->rcu_users goes to 0, an RCU
|
||||
* callback is scheduled on the struct rcu_head which
|
||||
* decrements the p->usage refcount.
|
||||
*
|
||||
* There are two important implications to this task refcounting logic
|
||||
* described above. The first is that
|
||||
* refcount_inc_not_zero(&p->rcu_users) cannot be used anywhere, as
|
||||
* after the refcount goes to 0, the RCU callback being scheduled will
|
||||
* cause the memory backing the refcount to again be nonzero due to the
|
||||
* fields sharing a union. The other is that we can't rely on RCU to
|
||||
* guarantee that a task is valid in a BPF program. This is because a
|
||||
* task could have already transitioned to being in the TASK_DEAD
|
||||
* state, had its rcu_users refcount go to 0, and its rcu callback
|
||||
* invoked in which it drops its single p->usage reference. At this
|
||||
* point the task will be freed as soon as the last p->usage reference
|
||||
* goes to 0, without waiting for another RCU gp to elapse. The only
|
||||
* way that a BPF program can guarantee that a task is valid is in this
|
||||
* scenario is to hold a p->usage refcount itself.
|
||||
*
|
||||
* Until we're able to resolve this issue, either by pulling
|
||||
* p->rcu_users and p->rcu out of the union, or by getting rid of
|
||||
* p->usage and just using p->rcu_users for refcounting, we'll just
|
||||
* return NULL here.
|
||||
*/
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
|
||||
* kptr acquired by this kfunc which is not subsequently stored in a map, must
|
||||
* be released by calling bpf_task_release().
|
||||
* @pp: A pointer to a task kptr on which a reference is being acquired.
|
||||
*/
|
||||
__bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
|
||||
{
|
||||
/* We must return NULL here until we have clarity on how to properly
|
||||
* leverage RCU for ensuring a task's lifetime. See the comment above
|
||||
* in bpf_task_acquire_not_zero() for more details.
|
||||
*/
|
||||
if (refcount_inc_not_zero(&p->rcu_users))
|
||||
return p;
|
||||
return NULL;
|
||||
}
|
||||
|
||||
@@ -2018,10 +2079,7 @@ __bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
|
||||
*/
|
||||
__bpf_kfunc void bpf_task_release(struct task_struct *p)
|
||||
{
|
||||
if (!p)
|
||||
return;
|
||||
|
||||
put_task_struct(p);
|
||||
put_task_struct_rcu_user(p);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
@@ -2033,39 +2091,7 @@ __bpf_kfunc void bpf_task_release(struct task_struct *p)
|
||||
*/
|
||||
__bpf_kfunc struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp)
|
||||
{
|
||||
cgroup_get(cgrp);
|
||||
return cgrp;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_cgroup_kptr_get - Acquire a reference on a struct cgroup kptr. A cgroup
|
||||
* kptr acquired by this kfunc which is not subsequently stored in a map, must
|
||||
* be released by calling bpf_cgroup_release().
|
||||
* @cgrpp: A pointer to a cgroup kptr on which a reference is being acquired.
|
||||
*/
|
||||
__bpf_kfunc struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
|
||||
{
|
||||
struct cgroup *cgrp;
|
||||
|
||||
rcu_read_lock();
|
||||
/* Another context could remove the cgroup from the map and release it
|
||||
* at any time, including after we've done the lookup above. This is
|
||||
* safe because we're in an RCU read region, so the cgroup is
|
||||
* guaranteed to remain valid until at least the rcu_read_unlock()
|
||||
* below.
|
||||
*/
|
||||
cgrp = READ_ONCE(*cgrpp);
|
||||
|
||||
if (cgrp && !cgroup_tryget(cgrp))
|
||||
/* If the cgroup had been removed from the map and freed as
|
||||
* described above, cgroup_tryget() will return false. The
|
||||
* cgroup will be freed at some point after the current RCU gp
|
||||
* has ended, so just return NULL to the user.
|
||||
*/
|
||||
cgrp = NULL;
|
||||
rcu_read_unlock();
|
||||
|
||||
return cgrp;
|
||||
return cgroup_tryget(cgrp) ? cgrp : NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -2077,9 +2103,6 @@ __bpf_kfunc struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
|
||||
*/
|
||||
__bpf_kfunc void bpf_cgroup_release(struct cgroup *cgrp)
|
||||
{
|
||||
if (!cgrp)
|
||||
return;
|
||||
|
||||
cgroup_put(cgrp);
|
||||
}
|
||||
|
||||
@@ -2097,10 +2120,28 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
|
||||
if (level > cgrp->level || level < 0)
|
||||
return NULL;
|
||||
|
||||
/* cgrp's refcnt could be 0 here, but ancestors can still be accessed */
|
||||
ancestor = cgrp->ancestors[level];
|
||||
cgroup_get(ancestor);
|
||||
if (!cgroup_tryget(ancestor))
|
||||
return NULL;
|
||||
return ancestor;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_cgroup_from_id - Find a cgroup from its ID. A cgroup returned by this
|
||||
* kfunc which is not subsequently stored in a map, must be released by calling
|
||||
* bpf_cgroup_release().
|
||||
* @cgid: cgroup id.
|
||||
*/
|
||||
__bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
|
||||
{
|
||||
struct cgroup *cgrp;
|
||||
|
||||
cgrp = cgroup_get_from_id(cgid);
|
||||
if (IS_ERR(cgrp))
|
||||
return NULL;
|
||||
return cgrp;
|
||||
}
|
||||
#endif /* CONFIG_CGROUPS */
|
||||
|
||||
/**
|
||||
@@ -2116,12 +2157,146 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
|
||||
rcu_read_lock();
|
||||
p = find_task_by_pid_ns(pid, &init_pid_ns);
|
||||
if (p)
|
||||
bpf_task_acquire(p);
|
||||
p = bpf_task_acquire(p);
|
||||
rcu_read_unlock();
|
||||
|
||||
return p;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
|
||||
* @ptr: The dynptr whose data slice to retrieve
|
||||
* @offset: Offset into the dynptr
|
||||
* @buffer: User-provided buffer to copy contents into
|
||||
* @buffer__szk: Size (in bytes) of the buffer. This is the length of the
|
||||
* requested slice. This must be a constant.
|
||||
*
|
||||
* For non-skb and non-xdp type dynptrs, there is no difference between
|
||||
* bpf_dynptr_slice and bpf_dynptr_data.
|
||||
*
|
||||
* If the intention is to write to the data slice, please use
|
||||
* bpf_dynptr_slice_rdwr.
|
||||
*
|
||||
* The user must check that the returned pointer is not null before using it.
|
||||
*
|
||||
* Please note that in the case of skb and xdp dynptrs, bpf_dynptr_slice
|
||||
* does not change the underlying packet data pointers, so a call to
|
||||
* bpf_dynptr_slice will not invalidate any ctx->data/data_end pointers in
|
||||
* the bpf program.
|
||||
*
|
||||
* Return: NULL if the call failed (eg invalid dynptr), pointer to a read-only
|
||||
* data slice (can be either direct pointer to the data or a pointer to the user
|
||||
* provided buffer, with its contents containing the data, if unable to obtain
|
||||
* direct pointer)
|
||||
*/
|
||||
__bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
|
||||
void *buffer, u32 buffer__szk)
|
||||
{
|
||||
enum bpf_dynptr_type type;
|
||||
u32 len = buffer__szk;
|
||||
int err;
|
||||
|
||||
if (!ptr->data)
|
||||
return NULL;
|
||||
|
||||
err = bpf_dynptr_check_off_len(ptr, offset, len);
|
||||
if (err)
|
||||
return NULL;
|
||||
|
||||
type = bpf_dynptr_get_type(ptr);
|
||||
|
||||
switch (type) {
|
||||
case BPF_DYNPTR_TYPE_LOCAL:
|
||||
case BPF_DYNPTR_TYPE_RINGBUF:
|
||||
return ptr->data + ptr->offset + offset;
|
||||
case BPF_DYNPTR_TYPE_SKB:
|
||||
return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
|
||||
case BPF_DYNPTR_TYPE_XDP:
|
||||
{
|
||||
void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
|
||||
if (xdp_ptr)
|
||||
return xdp_ptr;
|
||||
|
||||
bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
|
||||
return buffer;
|
||||
}
|
||||
default:
|
||||
WARN_ONCE(true, "unknown dynptr type %d\n", type);
|
||||
return NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
|
||||
* @ptr: The dynptr whose data slice to retrieve
|
||||
* @offset: Offset into the dynptr
|
||||
* @buffer: User-provided buffer to copy contents into
|
||||
* @buffer__szk: Size (in bytes) of the buffer. This is the length of the
|
||||
* requested slice. This must be a constant.
|
||||
*
|
||||
* For non-skb and non-xdp type dynptrs, there is no difference between
|
||||
* bpf_dynptr_slice and bpf_dynptr_data.
|
||||
*
|
||||
* The returned pointer is writable and may point to either directly the dynptr
|
||||
* data at the requested offset or to the buffer if unable to obtain a direct
|
||||
* data pointer to (example: the requested slice is to the paged area of an skb
|
||||
* packet). In the case where the returned pointer is to the buffer, the user
|
||||
* is responsible for persisting writes through calling bpf_dynptr_write(). This
|
||||
* usually looks something like this pattern:
|
||||
*
|
||||
* struct eth_hdr *eth = bpf_dynptr_slice_rdwr(&dynptr, 0, buffer, sizeof(buffer));
|
||||
* if (!eth)
|
||||
* return TC_ACT_SHOT;
|
||||
*
|
||||
* // mutate eth header //
|
||||
*
|
||||
* if (eth == buffer)
|
||||
* bpf_dynptr_write(&ptr, 0, buffer, sizeof(buffer), 0);
|
||||
*
|
||||
* Please note that, as in the example above, the user must check that the
|
||||
* returned pointer is not null before using it.
|
||||
*
|
||||
* Please also note that in the case of skb and xdp dynptrs, bpf_dynptr_slice_rdwr
|
||||
* does not change the underlying packet data pointers, so a call to
|
||||
* bpf_dynptr_slice_rdwr will not invalidate any ctx->data/data_end pointers in
|
||||
* the bpf program.
|
||||
*
|
||||
* Return: NULL if the call failed (eg invalid dynptr), pointer to a
|
||||
* data slice (can be either direct pointer to the data or a pointer to the user
|
||||
* provided buffer, with its contents containing the data, if unable to obtain
|
||||
* direct pointer)
|
||||
*/
|
||||
__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
|
||||
void *buffer, u32 buffer__szk)
|
||||
{
|
||||
if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
|
||||
return NULL;
|
||||
|
||||
/* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice.
|
||||
*
|
||||
* For skb-type dynptrs, it is safe to write into the returned pointer
|
||||
* if the bpf program allows skb data writes. There are two possiblities
|
||||
* that may occur when calling bpf_dynptr_slice_rdwr:
|
||||
*
|
||||
* 1) The requested slice is in the head of the skb. In this case, the
|
||||
* returned pointer is directly to skb data, and if the skb is cloned, the
|
||||
* verifier will have uncloned it (see bpf_unclone_prologue()) already.
|
||||
* The pointer can be directly written into.
|
||||
*
|
||||
* 2) Some portion of the requested slice is in the paged buffer area.
|
||||
* In this case, the requested data will be copied out into the buffer
|
||||
* and the returned pointer will be a pointer to the buffer. The skb
|
||||
* will not be pulled. To persist the write, the user will need to call
|
||||
* bpf_dynptr_write(), which will pull the skb and commit the write.
|
||||
*
|
||||
* Similarly for xdp programs, if the requested slice is not across xdp
|
||||
* fragments, then a direct pointer will be returned, otherwise the data
|
||||
* will be copied out into the buffer and the user will need to call
|
||||
* bpf_dynptr_write() to commit changes.
|
||||
*/
|
||||
return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
|
||||
}
|
||||
|
||||
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
|
||||
{
|
||||
return obj;
|
||||
@@ -2150,23 +2325,22 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
|
||||
#endif
|
||||
BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
|
||||
BTF_ID_FLAGS(func, bpf_list_push_front)
|
||||
BTF_ID_FLAGS(func, bpf_list_push_back)
|
||||
BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE)
|
||||
BTF_ID_FLAGS(func, bpf_list_push_front_impl)
|
||||
BTF_ID_FLAGS(func, bpf_list_push_back_impl)
|
||||
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
|
||||
BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
|
||||
BTF_ID_FLAGS(func, bpf_rbtree_add)
|
||||
BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_rbtree_add_impl)
|
||||
BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
|
||||
#endif
|
||||
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
|
||||
BTF_SET8_END(generic_btf_ids)
|
||||
@@ -2190,6 +2364,11 @@ BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx)
|
||||
BTF_ID_FLAGS(func, bpf_rdonly_cast)
|
||||
BTF_ID_FLAGS(func, bpf_rcu_read_lock)
|
||||
BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
|
||||
BTF_ID_FLAGS(func, bpf_dynptr_slice, KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
|
||||
BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
|
||||
BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
|
||||
BTF_SET8_END(common_btf_ids)
|
||||
|
||||
static const struct btf_kfunc_id_set common_kfunc_set = {
|
||||
|
||||
@@ -141,8 +141,8 @@ static void *cgroup_storage_lookup_elem(struct bpf_map *_map, void *key)
|
||||
return &READ_ONCE(storage->buf)->data[0];
|
||||
}
|
||||
|
||||
static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
static long cgroup_storage_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
{
|
||||
struct bpf_cgroup_storage *storage;
|
||||
struct bpf_storage_buffer *new;
|
||||
@@ -348,7 +348,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
|
||||
bpf_map_area_free(map);
|
||||
}
|
||||
|
||||
static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
static long cgroup_storage_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -446,6 +446,12 @@ static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *key,
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
static u64 cgroup_storage_map_usage(const struct bpf_map *map)
|
||||
{
|
||||
/* Currently the dynamically allocated elements are not counted. */
|
||||
return sizeof(struct bpf_cgroup_storage_map);
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct,
|
||||
bpf_cgroup_storage_map)
|
||||
const struct bpf_map_ops cgroup_storage_map_ops = {
|
||||
@@ -457,6 +463,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
|
||||
.map_delete_elem = cgroup_storage_delete_elem,
|
||||
.map_check_btf = cgroup_storage_check_btf,
|
||||
.map_seq_show_elem = cgroup_storage_seq_show_elem,
|
||||
.map_mem_usage = cgroup_storage_map_usage,
|
||||
.map_btf_id = &cgroup_storage_map_btf_ids[0],
|
||||
};
|
||||
|
||||
|
||||
330
kernel/bpf/log.c
Normal file
330
kernel/bpf/log.c
Normal file
@@ -0,0 +1,330 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
|
||||
* Copyright (c) 2016 Facebook
|
||||
* Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
|
||||
*/
|
||||
#include <uapi/linux/btf.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/bpf_verifier.h>
|
||||
#include <linux/math64.h>
|
||||
|
||||
static bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
|
||||
{
|
||||
/* ubuf and len_total should both be specified (or not) together */
|
||||
if (!!log->ubuf != !!log->len_total)
|
||||
return false;
|
||||
/* log buf without log_level is meaningless */
|
||||
if (log->ubuf && log->level == 0)
|
||||
return false;
|
||||
if (log->level & ~BPF_LOG_MASK)
|
||||
return false;
|
||||
if (log->len_total > UINT_MAX >> 2)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
int bpf_vlog_init(struct bpf_verifier_log *log, u32 log_level,
|
||||
char __user *log_buf, u32 log_size)
|
||||
{
|
||||
log->level = log_level;
|
||||
log->ubuf = log_buf;
|
||||
log->len_total = log_size;
|
||||
|
||||
/* log attributes have to be sane */
|
||||
if (!bpf_verifier_log_attr_valid(log))
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void bpf_vlog_update_len_max(struct bpf_verifier_log *log, u32 add_len)
|
||||
{
|
||||
/* add_len includes terminal \0, so no need for +1. */
|
||||
u64 len = log->end_pos + add_len;
|
||||
|
||||
/* log->len_max could be larger than our current len due to
|
||||
* bpf_vlog_reset() calls, so we maintain the max of any length at any
|
||||
* previous point
|
||||
*/
|
||||
if (len > UINT_MAX)
|
||||
log->len_max = UINT_MAX;
|
||||
else if (len > log->len_max)
|
||||
log->len_max = len;
|
||||
}
|
||||
|
||||
void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
|
||||
va_list args)
|
||||
{
|
||||
u64 cur_pos;
|
||||
u32 new_n, n;
|
||||
|
||||
n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args);
|
||||
|
||||
WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1,
|
||||
"verifier log line truncated - local buffer too short\n");
|
||||
|
||||
if (log->level == BPF_LOG_KERNEL) {
|
||||
bool newline = n > 0 && log->kbuf[n - 1] == '\n';
|
||||
|
||||
pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n");
|
||||
return;
|
||||
}
|
||||
|
||||
n += 1; /* include terminating zero */
|
||||
bpf_vlog_update_len_max(log, n);
|
||||
|
||||
if (log->level & BPF_LOG_FIXED) {
|
||||
/* check if we have at least something to put into user buf */
|
||||
new_n = 0;
|
||||
if (log->end_pos < log->len_total) {
|
||||
new_n = min_t(u32, log->len_total - log->end_pos, n);
|
||||
log->kbuf[new_n - 1] = '\0';
|
||||
}
|
||||
|
||||
cur_pos = log->end_pos;
|
||||
log->end_pos += n - 1; /* don't count terminating '\0' */
|
||||
|
||||
if (log->ubuf && new_n &&
|
||||
copy_to_user(log->ubuf + cur_pos, log->kbuf, new_n))
|
||||
goto fail;
|
||||
} else {
|
||||
u64 new_end, new_start;
|
||||
u32 buf_start, buf_end, new_n;
|
||||
|
||||
new_end = log->end_pos + n;
|
||||
if (new_end - log->start_pos >= log->len_total)
|
||||
new_start = new_end - log->len_total;
|
||||
else
|
||||
new_start = log->start_pos;
|
||||
|
||||
log->start_pos = new_start;
|
||||
log->end_pos = new_end - 1; /* don't count terminating '\0' */
|
||||
|
||||
if (!log->ubuf)
|
||||
return;
|
||||
|
||||
new_n = min(n, log->len_total);
|
||||
cur_pos = new_end - new_n;
|
||||
div_u64_rem(cur_pos, log->len_total, &buf_start);
|
||||
div_u64_rem(new_end, log->len_total, &buf_end);
|
||||
/* new_end and buf_end are exclusive indices, so if buf_end is
|
||||
* exactly zero, then it actually points right to the end of
|
||||
* ubuf and there is no wrap around
|
||||
*/
|
||||
if (buf_end == 0)
|
||||
buf_end = log->len_total;
|
||||
|
||||
/* if buf_start > buf_end, we wrapped around;
|
||||
* if buf_start == buf_end, then we fill ubuf completely; we
|
||||
* can't have buf_start == buf_end to mean that there is
|
||||
* nothing to write, because we always write at least
|
||||
* something, even if terminal '\0'
|
||||
*/
|
||||
if (buf_start < buf_end) {
|
||||
/* message fits within contiguous chunk of ubuf */
|
||||
if (copy_to_user(log->ubuf + buf_start,
|
||||
log->kbuf + n - new_n,
|
||||
buf_end - buf_start))
|
||||
goto fail;
|
||||
} else {
|
||||
/* message wraps around the end of ubuf, copy in two chunks */
|
||||
if (copy_to_user(log->ubuf + buf_start,
|
||||
log->kbuf + n - new_n,
|
||||
log->len_total - buf_start))
|
||||
goto fail;
|
||||
if (copy_to_user(log->ubuf,
|
||||
log->kbuf + n - buf_end,
|
||||
buf_end))
|
||||
goto fail;
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
fail:
|
||||
log->ubuf = NULL;
|
||||
}
|
||||
|
||||
void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
|
||||
{
|
||||
char zero = 0;
|
||||
u32 pos;
|
||||
|
||||
if (WARN_ON_ONCE(new_pos > log->end_pos))
|
||||
return;
|
||||
|
||||
if (!bpf_verifier_log_needed(log) || log->level == BPF_LOG_KERNEL)
|
||||
return;
|
||||
|
||||
/* if position to which we reset is beyond current log window,
|
||||
* then we didn't preserve any useful content and should adjust
|
||||
* start_pos to end up with an empty log (start_pos == end_pos)
|
||||
*/
|
||||
log->end_pos = new_pos;
|
||||
if (log->end_pos < log->start_pos)
|
||||
log->start_pos = log->end_pos;
|
||||
|
||||
if (!log->ubuf)
|
||||
return;
|
||||
|
||||
if (log->level & BPF_LOG_FIXED)
|
||||
pos = log->end_pos + 1;
|
||||
else
|
||||
div_u64_rem(new_pos, log->len_total, &pos);
|
||||
|
||||
if (pos < log->len_total && put_user(zero, log->ubuf + pos))
|
||||
log->ubuf = NULL;
|
||||
}
|
||||
|
||||
static void bpf_vlog_reverse_kbuf(char *buf, int len)
|
||||
{
|
||||
int i, j;
|
||||
|
||||
for (i = 0, j = len - 1; i < j; i++, j--)
|
||||
swap(buf[i], buf[j]);
|
||||
}
|
||||
|
||||
static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)
|
||||
{
|
||||
/* we split log->kbuf into two equal parts for both ends of array */
|
||||
int n = sizeof(log->kbuf) / 2, nn;
|
||||
char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
|
||||
|
||||
/* Read ubuf's section [start, end) two chunks at a time, from left
|
||||
* and right side; within each chunk, swap all the bytes; after that
|
||||
* reverse the order of lbuf and rbuf and write result back to ubuf.
|
||||
* This way we'll end up with swapped contents of specified
|
||||
* [start, end) ubuf segment.
|
||||
*/
|
||||
while (end - start > 1) {
|
||||
nn = min(n, (end - start ) / 2);
|
||||
|
||||
if (copy_from_user(lbuf, log->ubuf + start, nn))
|
||||
return -EFAULT;
|
||||
if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
|
||||
return -EFAULT;
|
||||
|
||||
bpf_vlog_reverse_kbuf(lbuf, nn);
|
||||
bpf_vlog_reverse_kbuf(rbuf, nn);
|
||||
|
||||
/* we write lbuf to the right end of ubuf, while rbuf to the
|
||||
* left one to end up with properly reversed overall ubuf
|
||||
*/
|
||||
if (copy_to_user(log->ubuf + start, rbuf, nn))
|
||||
return -EFAULT;
|
||||
if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
|
||||
return -EFAULT;
|
||||
|
||||
start += nn;
|
||||
end -= nn;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int bpf_vlog_finalize(struct bpf_verifier_log *log, u32 *log_size_actual)
|
||||
{
|
||||
u32 sublen;
|
||||
int err;
|
||||
|
||||
*log_size_actual = 0;
|
||||
if (!log || log->level == 0 || log->level == BPF_LOG_KERNEL)
|
||||
return 0;
|
||||
|
||||
if (!log->ubuf)
|
||||
goto skip_log_rotate;
|
||||
/* If we never truncated log, there is nothing to move around. */
|
||||
if (log->start_pos == 0)
|
||||
goto skip_log_rotate;
|
||||
|
||||
/* Otherwise we need to rotate log contents to make it start from the
|
||||
* buffer beginning and be a continuous zero-terminated string. Note
|
||||
* that if log->start_pos != 0 then we definitely filled up entire log
|
||||
* buffer with no gaps, and we just need to shift buffer contents to
|
||||
* the left by (log->start_pos % log->len_total) bytes.
|
||||
*
|
||||
* Unfortunately, user buffer could be huge and we don't want to
|
||||
* allocate temporary kernel memory of the same size just to shift
|
||||
* contents in a straightforward fashion. Instead, we'll be clever and
|
||||
* do in-place array rotation. This is a leetcode-style problem, which
|
||||
* could be solved by three rotations.
|
||||
*
|
||||
* Let's say we have log buffer that has to be shifted left by 7 bytes
|
||||
* (spaces and vertical bar is just for demonstrative purposes):
|
||||
* E F G H I J K | A B C D
|
||||
*
|
||||
* First, we reverse entire array:
|
||||
* D C B A | K J I H G F E
|
||||
*
|
||||
* Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
|
||||
* (KJIHGFE), resulting in a properly rotated array:
|
||||
* A B C D | E F G H I J K
|
||||
*
|
||||
* We'll utilize log->kbuf to read user memory chunk by chunk, swap
|
||||
* bytes, and write them back. Doing it byte-by-byte would be
|
||||
* unnecessarily inefficient. Altogether we are going to read and
|
||||
* write each byte twice, for total 4 memory copies between kernel and
|
||||
* user space.
|
||||
*/
|
||||
|
||||
/* length of the chopped off part that will be the beginning;
|
||||
* len(ABCD) in the example above
|
||||
*/
|
||||
div_u64_rem(log->start_pos, log->len_total, &sublen);
|
||||
sublen = log->len_total - sublen;
|
||||
|
||||
err = bpf_vlog_reverse_ubuf(log, 0, log->len_total);
|
||||
err = err ?: bpf_vlog_reverse_ubuf(log, 0, sublen);
|
||||
err = err ?: bpf_vlog_reverse_ubuf(log, sublen, log->len_total);
|
||||
if (err)
|
||||
log->ubuf = NULL;
|
||||
|
||||
skip_log_rotate:
|
||||
*log_size_actual = log->len_max;
|
||||
|
||||
/* properly initialized log has either both ubuf!=NULL and len_total>0
|
||||
* or ubuf==NULL and len_total==0, so if this condition doesn't hold,
|
||||
* we got a fault somewhere along the way, so report it back
|
||||
*/
|
||||
if (!!log->ubuf != !!log->len_total)
|
||||
return -EFAULT;
|
||||
|
||||
/* did truncation actually happen? */
|
||||
if (log->ubuf && log->len_max > log->len_total)
|
||||
return -ENOSPC;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* log_level controls verbosity level of eBPF verifier.
|
||||
* bpf_verifier_log_write() is used to dump the verification trace to the log,
|
||||
* so the user can figure out what's wrong with the program
|
||||
*/
|
||||
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
|
||||
const char *fmt, ...)
|
||||
{
|
||||
va_list args;
|
||||
|
||||
if (!bpf_verifier_log_needed(&env->log))
|
||||
return;
|
||||
|
||||
va_start(args, fmt);
|
||||
bpf_verifier_vlog(&env->log, fmt, args);
|
||||
va_end(args);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_verifier_log_write);
|
||||
|
||||
__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
|
||||
const char *fmt, ...)
|
||||
{
|
||||
va_list args;
|
||||
|
||||
if (!bpf_verifier_log_needed(log))
|
||||
return;
|
||||
|
||||
va_start(args, fmt);
|
||||
bpf_verifier_vlog(log, fmt, args);
|
||||
va_end(args);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_log);
|
||||
@@ -300,8 +300,8 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int trie_update_elem(struct bpf_map *map,
|
||||
void *_key, void *value, u64 flags)
|
||||
static long trie_update_elem(struct bpf_map *map,
|
||||
void *_key, void *value, u64 flags)
|
||||
{
|
||||
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
|
||||
struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL;
|
||||
@@ -431,7 +431,7 @@ out:
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int trie_delete_elem(struct bpf_map *map, void *_key)
|
||||
static long trie_delete_elem(struct bpf_map *map, void *_key)
|
||||
{
|
||||
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
|
||||
struct bpf_lpm_trie_key *key = _key;
|
||||
@@ -720,6 +720,16 @@ static int trie_check_btf(const struct bpf_map *map,
|
||||
-EINVAL : 0;
|
||||
}
|
||||
|
||||
static u64 trie_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
|
||||
u64 elem_size;
|
||||
|
||||
elem_size = sizeof(struct lpm_trie_node) + trie->data_size +
|
||||
trie->map.value_size;
|
||||
return elem_size * READ_ONCE(trie->n_entries);
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(trie_map_btf_ids, struct, lpm_trie)
|
||||
const struct bpf_map_ops trie_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -733,5 +743,6 @@ const struct bpf_map_ops trie_map_ops = {
|
||||
.map_update_batch = generic_map_update_batch,
|
||||
.map_delete_batch = generic_map_delete_batch,
|
||||
.map_check_btf = trie_check_btf,
|
||||
.map_mem_usage = trie_mem_usage,
|
||||
.map_btf_id = &trie_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -56,18 +56,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
|
||||
ret = PTR_ERR(inner_map_meta->record);
|
||||
goto free;
|
||||
}
|
||||
if (inner_map_meta->record) {
|
||||
struct btf_field_offs *field_offs;
|
||||
/* If btf_record is !IS_ERR_OR_NULL, then field_offs is always
|
||||
* valid.
|
||||
*/
|
||||
field_offs = kmemdup(inner_map->field_offs, sizeof(*inner_map->field_offs), GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!field_offs) {
|
||||
ret = -ENOMEM;
|
||||
goto free_rec;
|
||||
}
|
||||
inner_map_meta->field_offs = field_offs;
|
||||
}
|
||||
/* Note: We must use the same BTF, as we also used btf_record_dup above
|
||||
* which relies on BTF being same for both maps, as some members like
|
||||
* record->fields.list_head have pointers like value_rec pointing into
|
||||
@@ -88,8 +76,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
|
||||
|
||||
fdput(f);
|
||||
return inner_map_meta;
|
||||
free_rec:
|
||||
btf_record_free(inner_map_meta->record);
|
||||
free:
|
||||
kfree(inner_map_meta);
|
||||
put:
|
||||
@@ -99,7 +85,6 @@ put:
|
||||
|
||||
void bpf_map_meta_free(struct bpf_map *map_meta)
|
||||
{
|
||||
kfree(map_meta->field_offs);
|
||||
bpf_map_free_record(map_meta);
|
||||
btf_put(map_meta->btf);
|
||||
kfree(map_meta);
|
||||
|
||||
@@ -121,15 +121,8 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head)
|
||||
return entry;
|
||||
}
|
||||
|
||||
static void *__alloc(struct bpf_mem_cache *c, int node)
|
||||
static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags)
|
||||
{
|
||||
/* Allocate, but don't deplete atomic reserves that typical
|
||||
* GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
|
||||
* will allocate from the current numa node which is what we
|
||||
* want here.
|
||||
*/
|
||||
gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT;
|
||||
|
||||
if (c->percpu_size) {
|
||||
void **obj = kmalloc_node(c->percpu_size, flags, node);
|
||||
void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags);
|
||||
@@ -185,7 +178,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
|
||||
*/
|
||||
obj = __llist_del_first(&c->free_by_rcu);
|
||||
if (!obj) {
|
||||
obj = __alloc(c, node);
|
||||
/* Allocate, but don't deplete atomic reserves that typical
|
||||
* GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
|
||||
* will allocate from the current numa node which is what we
|
||||
* want here.
|
||||
*/
|
||||
obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT);
|
||||
if (!obj)
|
||||
break;
|
||||
}
|
||||
@@ -676,3 +674,46 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr)
|
||||
|
||||
unit_free(this_cpu_ptr(ma->cache), ptr);
|
||||
}
|
||||
|
||||
/* Directly does a kfree() without putting 'ptr' back to the free_llist
|
||||
* for reuse and without waiting for a rcu_tasks_trace gp.
|
||||
* The caller must first go through the rcu_tasks_trace gp for 'ptr'
|
||||
* before calling bpf_mem_cache_raw_free().
|
||||
* It could be used when the rcu_tasks_trace callback does not have
|
||||
* a hold on the original bpf_mem_alloc object that allocated the
|
||||
* 'ptr'. This should only be used in the uncommon code path.
|
||||
* Otherwise, the bpf_mem_alloc's free_llist cannot be refilled
|
||||
* and may affect performance.
|
||||
*/
|
||||
void bpf_mem_cache_raw_free(void *ptr)
|
||||
{
|
||||
if (!ptr)
|
||||
return;
|
||||
|
||||
kfree(ptr - LLIST_NODE_SZ);
|
||||
}
|
||||
|
||||
/* When flags == GFP_KERNEL, it signals that the caller will not cause
|
||||
* deadlock when using kmalloc. bpf_mem_cache_alloc_flags() will use
|
||||
* kmalloc if the free_llist is empty.
|
||||
*/
|
||||
void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
|
||||
{
|
||||
struct bpf_mem_cache *c;
|
||||
void *ret;
|
||||
|
||||
c = this_cpu_ptr(ma->cache);
|
||||
|
||||
ret = unit_alloc(c);
|
||||
if (!ret && flags == GFP_KERNEL) {
|
||||
struct mem_cgroup *memcg, *old_memcg;
|
||||
|
||||
memcg = get_memcg(c);
|
||||
old_memcg = set_active_memcg(memcg);
|
||||
ret = __alloc(c, NUMA_NO_NODE, GFP_KERNEL | __GFP_NOWARN | __GFP_ACCOUNT);
|
||||
set_active_memcg(old_memcg);
|
||||
mem_cgroup_put(memcg);
|
||||
}
|
||||
|
||||
return !ret ? NULL : ret + LLIST_NODE_SZ;
|
||||
}
|
||||
|
||||
@@ -563,6 +563,12 @@ void bpf_map_offload_map_free(struct bpf_map *map)
|
||||
bpf_map_area_free(offmap);
|
||||
}
|
||||
|
||||
u64 bpf_map_offload_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
/* The memory dynamically allocated in netdev dev_ops is not counted */
|
||||
return sizeof(struct bpf_offloaded_map);
|
||||
}
|
||||
|
||||
int bpf_map_offload_lookup_elem(struct bpf_map *map, void *key, void *value)
|
||||
{
|
||||
struct bpf_offloaded_map *offmap = map_to_offmap(map);
|
||||
|
||||
@@ -95,7 +95,7 @@ static void queue_stack_map_free(struct bpf_map *map)
|
||||
bpf_map_area_free(qs);
|
||||
}
|
||||
|
||||
static int __queue_map_get(struct bpf_map *map, void *value, bool delete)
|
||||
static long __queue_map_get(struct bpf_map *map, void *value, bool delete)
|
||||
{
|
||||
struct bpf_queue_stack *qs = bpf_queue_stack(map);
|
||||
unsigned long flags;
|
||||
@@ -124,7 +124,7 @@ out:
|
||||
}
|
||||
|
||||
|
||||
static int __stack_map_get(struct bpf_map *map, void *value, bool delete)
|
||||
static long __stack_map_get(struct bpf_map *map, void *value, bool delete)
|
||||
{
|
||||
struct bpf_queue_stack *qs = bpf_queue_stack(map);
|
||||
unsigned long flags;
|
||||
@@ -156,32 +156,32 @@ out:
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int queue_map_peek_elem(struct bpf_map *map, void *value)
|
||||
static long queue_map_peek_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return __queue_map_get(map, value, false);
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int stack_map_peek_elem(struct bpf_map *map, void *value)
|
||||
static long stack_map_peek_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return __stack_map_get(map, value, false);
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int queue_map_pop_elem(struct bpf_map *map, void *value)
|
||||
static long queue_map_pop_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return __queue_map_get(map, value, true);
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int stack_map_pop_elem(struct bpf_map *map, void *value)
|
||||
static long stack_map_pop_elem(struct bpf_map *map, void *value)
|
||||
{
|
||||
return __stack_map_get(map, value, true);
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int queue_stack_map_push_elem(struct bpf_map *map, void *value,
|
||||
u64 flags)
|
||||
static long queue_stack_map_push_elem(struct bpf_map *map, void *value,
|
||||
u64 flags)
|
||||
{
|
||||
struct bpf_queue_stack *qs = bpf_queue_stack(map);
|
||||
unsigned long irq_flags;
|
||||
@@ -227,14 +227,14 @@ static void *queue_stack_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int queue_stack_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
static long queue_stack_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int queue_stack_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long queue_stack_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -246,6 +246,14 @@ static int queue_stack_map_get_next_key(struct bpf_map *map, void *key,
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static u64 queue_stack_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
u64 usage = sizeof(struct bpf_queue_stack);
|
||||
|
||||
usage += ((u64)map->max_entries + 1) * map->value_size;
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(queue_map_btf_ids, struct, bpf_queue_stack)
|
||||
const struct bpf_map_ops queue_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -259,6 +267,7 @@ const struct bpf_map_ops queue_map_ops = {
|
||||
.map_pop_elem = queue_map_pop_elem,
|
||||
.map_peek_elem = queue_map_peek_elem,
|
||||
.map_get_next_key = queue_stack_map_get_next_key,
|
||||
.map_mem_usage = queue_stack_map_mem_usage,
|
||||
.map_btf_id = &queue_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -274,5 +283,6 @@ const struct bpf_map_ops stack_map_ops = {
|
||||
.map_pop_elem = stack_map_pop_elem,
|
||||
.map_peek_elem = stack_map_peek_elem,
|
||||
.map_get_next_key = queue_stack_map_get_next_key,
|
||||
.map_mem_usage = queue_stack_map_mem_usage,
|
||||
.map_btf_id = &queue_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -59,7 +59,7 @@ static void *reuseport_array_lookup_elem(struct bpf_map *map, void *key)
|
||||
}
|
||||
|
||||
/* Called from syscall only */
|
||||
static int reuseport_array_delete_elem(struct bpf_map *map, void *key)
|
||||
static long reuseport_array_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct reuseport_array *array = reuseport_array(map);
|
||||
u32 index = *(u32 *)key;
|
||||
@@ -335,6 +335,13 @@ static int reuseport_array_get_next_key(struct bpf_map *map, void *key,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static u64 reuseport_array_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct reuseport_array *array;
|
||||
|
||||
return struct_size(array, ptrs, map->max_entries);
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(reuseport_array_map_btf_ids, struct, reuseport_array)
|
||||
const struct bpf_map_ops reuseport_array_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -344,5 +351,6 @@ const struct bpf_map_ops reuseport_array_ops = {
|
||||
.map_lookup_elem = reuseport_array_lookup_elem,
|
||||
.map_get_next_key = reuseport_array_get_next_key,
|
||||
.map_delete_elem = reuseport_array_delete_elem,
|
||||
.map_mem_usage = reuseport_array_mem_usage,
|
||||
.map_btf_id = &reuseport_array_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -19,6 +19,7 @@
|
||||
(offsetof(struct bpf_ringbuf, consumer_pos) >> PAGE_SHIFT)
|
||||
/* consumer page and producer page */
|
||||
#define RINGBUF_POS_PAGES 2
|
||||
#define RINGBUF_NR_META_PAGES (RINGBUF_PGOFF + RINGBUF_POS_PAGES)
|
||||
|
||||
#define RINGBUF_MAX_RECORD_SZ (UINT_MAX/4)
|
||||
|
||||
@@ -96,7 +97,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
|
||||
{
|
||||
const gfp_t flags = GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL |
|
||||
__GFP_NOWARN | __GFP_ZERO;
|
||||
int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
|
||||
int nr_meta_pages = RINGBUF_NR_META_PAGES;
|
||||
int nr_data_pages = data_sz >> PAGE_SHIFT;
|
||||
int nr_pages = nr_meta_pages + nr_data_pages;
|
||||
struct page **pages, *page;
|
||||
@@ -241,13 +242,13 @@ static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
return ERR_PTR(-ENOTSUPP);
|
||||
}
|
||||
|
||||
static int ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 flags)
|
||||
static long ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 flags)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static int ringbuf_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long ringbuf_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
@@ -336,6 +337,21 @@ static __poll_t ringbuf_map_poll_user(struct bpf_map *map, struct file *filp,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static u64 ringbuf_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_ringbuf *rb;
|
||||
int nr_data_pages;
|
||||
int nr_meta_pages;
|
||||
u64 usage = sizeof(struct bpf_ringbuf_map);
|
||||
|
||||
rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
|
||||
usage += (u64)rb->nr_pages << PAGE_SHIFT;
|
||||
nr_meta_pages = RINGBUF_NR_META_PAGES;
|
||||
nr_data_pages = map->max_entries >> PAGE_SHIFT;
|
||||
usage += (nr_meta_pages + 2 * nr_data_pages) * sizeof(struct page *);
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map)
|
||||
const struct bpf_map_ops ringbuf_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -347,6 +363,7 @@ const struct bpf_map_ops ringbuf_map_ops = {
|
||||
.map_update_elem = ringbuf_map_update_elem,
|
||||
.map_delete_elem = ringbuf_map_delete_elem,
|
||||
.map_get_next_key = ringbuf_map_get_next_key,
|
||||
.map_mem_usage = ringbuf_map_mem_usage,
|
||||
.map_btf_id = &ringbuf_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -361,6 +378,7 @@ const struct bpf_map_ops user_ringbuf_map_ops = {
|
||||
.map_update_elem = ringbuf_map_update_elem,
|
||||
.map_delete_elem = ringbuf_map_delete_elem,
|
||||
.map_get_next_key = ringbuf_map_get_next_key,
|
||||
.map_mem_usage = ringbuf_map_mem_usage,
|
||||
.map_btf_id = &user_ringbuf_map_btf_ids[0],
|
||||
};
|
||||
|
||||
|
||||
@@ -618,14 +618,14 @@ static int stack_map_get_next_key(struct bpf_map *map, void *key,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
static long stack_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* Called from syscall or from eBPF program */
|
||||
static int stack_map_delete_elem(struct bpf_map *map, void *key)
|
||||
static long stack_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
|
||||
struct stack_map_bucket *old_bucket;
|
||||
@@ -654,6 +654,19 @@ static void stack_map_free(struct bpf_map *map)
|
||||
put_callchain_buffers();
|
||||
}
|
||||
|
||||
static u64 stack_map_mem_usage(const struct bpf_map *map)
|
||||
{
|
||||
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
|
||||
u64 value_size = map->value_size;
|
||||
u64 n_buckets = smap->n_buckets;
|
||||
u64 enties = map->max_entries;
|
||||
u64 usage = sizeof(*smap);
|
||||
|
||||
usage += n_buckets * sizeof(struct stack_map_bucket *);
|
||||
usage += enties * (sizeof(struct stack_map_bucket) + value_size);
|
||||
return usage;
|
||||
}
|
||||
|
||||
BTF_ID_LIST_SINGLE(stack_trace_map_btf_ids, struct, bpf_stack_map)
|
||||
const struct bpf_map_ops stack_trace_map_ops = {
|
||||
.map_meta_equal = bpf_map_meta_equal,
|
||||
@@ -664,5 +677,6 @@ const struct bpf_map_ops stack_trace_map_ops = {
|
||||
.map_update_elem = stack_map_update_elem,
|
||||
.map_delete_elem = stack_map_delete_elem,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = stack_map_mem_usage,
|
||||
.map_btf_id = &stack_trace_map_btf_ids[0],
|
||||
};
|
||||
|
||||
@@ -35,6 +35,7 @@
|
||||
#include <linux/rcupdate_trace.h>
|
||||
#include <linux/memcontrol.h>
|
||||
#include <linux/trace_events.h>
|
||||
#include <net/netfilter/nf_bpf_link.h>
|
||||
|
||||
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
|
||||
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
|
||||
@@ -105,6 +106,7 @@ const struct bpf_map_ops bpf_map_offload_ops = {
|
||||
.map_alloc = bpf_map_offload_map_alloc,
|
||||
.map_free = bpf_map_offload_map_free,
|
||||
.map_check_btf = map_check_no_btf,
|
||||
.map_mem_usage = bpf_map_offload_map_mem_usage,
|
||||
};
|
||||
|
||||
static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
|
||||
@@ -128,6 +130,8 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
|
||||
}
|
||||
if (attr->map_ifindex)
|
||||
ops = &bpf_map_offload_ops;
|
||||
if (!ops->map_mem_usage)
|
||||
return ERR_PTR(-EINVAL);
|
||||
map = ops->map_alloc(attr);
|
||||
if (IS_ERR(map))
|
||||
return map;
|
||||
@@ -517,14 +521,14 @@ static int btf_field_cmp(const void *a, const void *b)
|
||||
}
|
||||
|
||||
struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset,
|
||||
enum btf_field_type type)
|
||||
u32 field_mask)
|
||||
{
|
||||
struct btf_field *field;
|
||||
|
||||
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & type))
|
||||
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask))
|
||||
return NULL;
|
||||
field = bsearch(&offset, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp);
|
||||
if (!field || !(field->type & type))
|
||||
if (!field || !(field->type & field_mask))
|
||||
return NULL;
|
||||
return field;
|
||||
}
|
||||
@@ -549,6 +553,7 @@ void btf_record_free(struct btf_record *rec)
|
||||
case BPF_RB_NODE:
|
||||
case BPF_SPIN_LOCK:
|
||||
case BPF_TIMER:
|
||||
case BPF_REFCOUNT:
|
||||
/* Nothing to release */
|
||||
break;
|
||||
default:
|
||||
@@ -596,6 +601,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
|
||||
case BPF_RB_NODE:
|
||||
case BPF_SPIN_LOCK:
|
||||
case BPF_TIMER:
|
||||
case BPF_REFCOUNT:
|
||||
/* Nothing to acquire */
|
||||
break;
|
||||
default:
|
||||
@@ -647,6 +653,8 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
|
||||
bpf_timer_cancel_and_free(obj + rec->timer_off);
|
||||
}
|
||||
|
||||
extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
|
||||
|
||||
void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
|
||||
{
|
||||
const struct btf_field *fields;
|
||||
@@ -656,8 +664,10 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
|
||||
return;
|
||||
fields = rec->fields;
|
||||
for (i = 0; i < rec->cnt; i++) {
|
||||
struct btf_struct_meta *pointee_struct_meta;
|
||||
const struct btf_field *field = &fields[i];
|
||||
void *field_ptr = obj + field->offset;
|
||||
void *xchgd_field;
|
||||
|
||||
switch (fields[i].type) {
|
||||
case BPF_SPIN_LOCK:
|
||||
@@ -669,7 +679,22 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
|
||||
WRITE_ONCE(*(u64 *)field_ptr, 0);
|
||||
break;
|
||||
case BPF_KPTR_REF:
|
||||
field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
|
||||
xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
|
||||
if (!xchgd_field)
|
||||
break;
|
||||
|
||||
if (!btf_is_kernel(field->kptr.btf)) {
|
||||
pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
|
||||
field->kptr.btf_id);
|
||||
WARN_ON_ONCE(!pointee_struct_meta);
|
||||
migrate_disable();
|
||||
__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
|
||||
pointee_struct_meta->record :
|
||||
NULL);
|
||||
migrate_enable();
|
||||
} else {
|
||||
field->kptr.dtor(xchgd_field);
|
||||
}
|
||||
break;
|
||||
case BPF_LIST_HEAD:
|
||||
if (WARN_ON_ONCE(rec->spin_lock_off < 0))
|
||||
@@ -683,6 +708,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
|
||||
break;
|
||||
case BPF_LIST_NODE:
|
||||
case BPF_RB_NODE:
|
||||
case BPF_REFCOUNT:
|
||||
break;
|
||||
default:
|
||||
WARN_ON_ONCE(1);
|
||||
@@ -695,14 +721,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
|
||||
static void bpf_map_free_deferred(struct work_struct *work)
|
||||
{
|
||||
struct bpf_map *map = container_of(work, struct bpf_map, work);
|
||||
struct btf_field_offs *foffs = map->field_offs;
|
||||
struct btf_record *rec = map->record;
|
||||
|
||||
security_bpf_map_free(map);
|
||||
bpf_map_release_memcg(map);
|
||||
/* implementation dependent freeing */
|
||||
map->ops->map_free(map);
|
||||
/* Delay freeing of field_offs and btf_record for maps, as map_free
|
||||
/* Delay freeing of btf_record for maps, as map_free
|
||||
* callback usually needs access to them. It is better to do it here
|
||||
* than require each callback to do the free itself manually.
|
||||
*
|
||||
@@ -711,7 +736,6 @@ static void bpf_map_free_deferred(struct work_struct *work)
|
||||
* eventually calls bpf_map_free_meta, since inner_map_meta is only a
|
||||
* template bpf_map struct used during verification.
|
||||
*/
|
||||
kfree(foffs);
|
||||
btf_record_free(rec);
|
||||
}
|
||||
|
||||
@@ -771,17 +795,10 @@ static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PROC_FS
|
||||
/* Provides an approximation of the map's memory footprint.
|
||||
* Used only to provide a backward compatibility and display
|
||||
* a reasonable "memlock" info.
|
||||
*/
|
||||
static unsigned long bpf_map_memory_footprint(const struct bpf_map *map)
|
||||
/* Show the memory usage of a bpf map */
|
||||
static u64 bpf_map_memory_usage(const struct bpf_map *map)
|
||||
{
|
||||
unsigned long size;
|
||||
|
||||
size = round_up(map->key_size + bpf_map_value_size(map), 8);
|
||||
|
||||
return round_up(map->max_entries * size, PAGE_SIZE);
|
||||
return map->ops->map_mem_usage(map);
|
||||
}
|
||||
|
||||
static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
|
||||
@@ -803,7 +820,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
|
||||
"max_entries:\t%u\n"
|
||||
"map_flags:\t%#x\n"
|
||||
"map_extra:\t%#llx\n"
|
||||
"memlock:\t%lu\n"
|
||||
"memlock:\t%llu\n"
|
||||
"map_id:\t%u\n"
|
||||
"frozen:\t%u\n",
|
||||
map->map_type,
|
||||
@@ -812,7 +829,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
|
||||
map->max_entries,
|
||||
map->map_flags,
|
||||
(unsigned long long)map->map_extra,
|
||||
bpf_map_memory_footprint(map),
|
||||
bpf_map_memory_usage(map),
|
||||
map->id,
|
||||
READ_ONCE(map->frozen));
|
||||
if (type) {
|
||||
@@ -1019,7 +1036,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
|
||||
|
||||
map->record = btf_parse_fields(btf, value_type,
|
||||
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
|
||||
BPF_RB_ROOT,
|
||||
BPF_RB_ROOT | BPF_REFCOUNT,
|
||||
map->value_size);
|
||||
if (!IS_ERR_OR_NULL(map->record)) {
|
||||
int i;
|
||||
@@ -1058,10 +1075,17 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
|
||||
break;
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_REFCOUNT:
|
||||
if (map->map_type != BPF_MAP_TYPE_HASH &&
|
||||
map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&
|
||||
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
|
||||
map->map_type != BPF_MAP_TYPE_LRU_PERCPU_HASH &&
|
||||
map->map_type != BPF_MAP_TYPE_ARRAY &&
|
||||
map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) {
|
||||
map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY &&
|
||||
map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
|
||||
map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
|
||||
map->map_type != BPF_MAP_TYPE_TASK_STORAGE &&
|
||||
map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) {
|
||||
ret = -EOPNOTSUPP;
|
||||
goto free_map_tab;
|
||||
}
|
||||
@@ -1104,7 +1128,6 @@ free_map_tab:
|
||||
static int map_create(union bpf_attr *attr)
|
||||
{
|
||||
int numa_node = bpf_map_attr_numa_node(attr);
|
||||
struct btf_field_offs *foffs;
|
||||
struct bpf_map *map;
|
||||
int f_flags;
|
||||
int err;
|
||||
@@ -1184,17 +1207,9 @@ static int map_create(union bpf_attr *attr)
|
||||
attr->btf_vmlinux_value_type_id;
|
||||
}
|
||||
|
||||
|
||||
foffs = btf_parse_field_offs(map->record);
|
||||
if (IS_ERR(foffs)) {
|
||||
err = PTR_ERR(foffs);
|
||||
goto free_map;
|
||||
}
|
||||
map->field_offs = foffs;
|
||||
|
||||
err = security_bpf_map_alloc(map);
|
||||
if (err)
|
||||
goto free_map_field_offs;
|
||||
goto free_map;
|
||||
|
||||
err = bpf_map_alloc_id(map);
|
||||
if (err)
|
||||
@@ -1218,8 +1233,6 @@ static int map_create(union bpf_attr *attr)
|
||||
|
||||
free_map_sec:
|
||||
security_bpf_map_free(map);
|
||||
free_map_field_offs:
|
||||
kfree(map->field_offs);
|
||||
free_map:
|
||||
btf_put(map->btf);
|
||||
map->ops->map_free(map);
|
||||
@@ -1285,8 +1298,10 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
|
||||
return map;
|
||||
}
|
||||
|
||||
/* map_idr_lock should have been held */
|
||||
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
|
||||
/* map_idr_lock should have been held or the map should have been
|
||||
* protected by rcu read lock.
|
||||
*/
|
||||
struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
|
||||
{
|
||||
int refold;
|
||||
|
||||
@@ -2049,6 +2064,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
|
||||
{
|
||||
bpf_prog_kallsyms_del_all(prog);
|
||||
btf_put(prog->aux->btf);
|
||||
module_put(prog->aux->mod);
|
||||
kvfree(prog->aux->jited_linfo);
|
||||
kvfree(prog->aux->linfo);
|
||||
kfree(prog->aux->kfunc_tab);
|
||||
@@ -2439,7 +2455,6 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
|
||||
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
|
||||
case BPF_PROG_TYPE_SK_SKB:
|
||||
case BPF_PROG_TYPE_SK_MSG:
|
||||
case BPF_PROG_TYPE_LIRC_MODE2:
|
||||
case BPF_PROG_TYPE_FLOW_DISSECTOR:
|
||||
case BPF_PROG_TYPE_CGROUP_DEVICE:
|
||||
case BPF_PROG_TYPE_CGROUP_SOCK:
|
||||
@@ -2448,6 +2463,7 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
|
||||
case BPF_PROG_TYPE_CGROUP_SYSCTL:
|
||||
case BPF_PROG_TYPE_SOCK_OPS:
|
||||
case BPF_PROG_TYPE_EXT: /* extends any prog */
|
||||
case BPF_PROG_TYPE_NETFILTER:
|
||||
return true;
|
||||
case BPF_PROG_TYPE_CGROUP_SKB:
|
||||
/* always unpriv */
|
||||
@@ -2477,9 +2493,9 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
|
||||
}
|
||||
|
||||
/* last field in 'union bpf_attr' used by this command */
|
||||
#define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
|
||||
#define BPF_PROG_LOAD_LAST_FIELD log_true_size
|
||||
|
||||
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
|
||||
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
|
||||
{
|
||||
enum bpf_prog_type type = attr->prog_type;
|
||||
struct bpf_prog *prog, *dst_prog = NULL;
|
||||
@@ -2629,7 +2645,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
|
||||
goto free_prog_sec;
|
||||
|
||||
/* run eBPF verifier */
|
||||
err = bpf_check(&prog, attr, uattr);
|
||||
err = bpf_check(&prog, attr, uattr, uattr_size);
|
||||
if (err < 0)
|
||||
goto free_used_maps;
|
||||
|
||||
@@ -2804,16 +2820,19 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
|
||||
const struct bpf_prog *prog = link->prog;
|
||||
char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
|
||||
|
||||
bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
|
||||
seq_printf(m,
|
||||
"link_type:\t%s\n"
|
||||
"link_id:\t%u\n"
|
||||
"prog_tag:\t%s\n"
|
||||
"prog_id:\t%u\n",
|
||||
"link_id:\t%u\n",
|
||||
bpf_link_type_strs[link->type],
|
||||
link->id,
|
||||
prog_tag,
|
||||
prog->aux->id);
|
||||
link->id);
|
||||
if (prog) {
|
||||
bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
|
||||
seq_printf(m,
|
||||
"prog_tag:\t%s\n"
|
||||
"prog_id:\t%u\n",
|
||||
prog_tag,
|
||||
prog->aux->id);
|
||||
}
|
||||
if (link->ops->show_fdinfo)
|
||||
link->ops->show_fdinfo(link, m);
|
||||
}
|
||||
@@ -3095,6 +3114,11 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
|
||||
if (err)
|
||||
goto out_unlock;
|
||||
|
||||
if (tgt_info.tgt_mod) {
|
||||
module_put(prog->aux->mod);
|
||||
prog->aux->mod = tgt_info.tgt_mod;
|
||||
}
|
||||
|
||||
tr = bpf_trampoline_get(key, &tgt_info);
|
||||
if (!tr) {
|
||||
err = -ENOMEM;
|
||||
@@ -4288,7 +4312,8 @@ static int bpf_link_get_info_by_fd(struct file *file,
|
||||
|
||||
info.type = link->type;
|
||||
info.id = link->id;
|
||||
info.prog_id = link->prog->aux->id;
|
||||
if (link->prog)
|
||||
info.prog_id = link->prog->aux->id;
|
||||
|
||||
if (link->ops->fill_link_info) {
|
||||
err = link->ops->fill_link_info(link, &info);
|
||||
@@ -4338,9 +4363,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
|
||||
return err;
|
||||
}
|
||||
|
||||
#define BPF_BTF_LOAD_LAST_FIELD btf_log_level
|
||||
#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
|
||||
|
||||
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr)
|
||||
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
|
||||
{
|
||||
if (CHECK_ATTR(BPF_BTF_LOAD))
|
||||
return -EINVAL;
|
||||
@@ -4348,7 +4373,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr)
|
||||
if (!bpf_capable())
|
||||
return -EPERM;
|
||||
|
||||
return btf_new_fd(attr, uattr);
|
||||
return btf_new_fd(attr, uattr, uattr_size);
|
||||
}
|
||||
|
||||
#define BPF_BTF_GET_FD_BY_ID_LAST_FIELD btf_id
|
||||
@@ -4551,6 +4576,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
|
||||
if (CHECK_ATTR(BPF_LINK_CREATE))
|
||||
return -EINVAL;
|
||||
|
||||
if (attr->link_create.attach_type == BPF_STRUCT_OPS)
|
||||
return bpf_struct_ops_link_create(attr);
|
||||
|
||||
prog = bpf_prog_get(attr->link_create.prog_fd);
|
||||
if (IS_ERR(prog))
|
||||
return PTR_ERR(prog);
|
||||
@@ -4562,6 +4590,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
|
||||
|
||||
switch (prog->type) {
|
||||
case BPF_PROG_TYPE_EXT:
|
||||
case BPF_PROG_TYPE_NETFILTER:
|
||||
break;
|
||||
case BPF_PROG_TYPE_PERF_EVENT:
|
||||
case BPF_PROG_TYPE_TRACEPOINT:
|
||||
@@ -4628,6 +4657,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
|
||||
case BPF_PROG_TYPE_XDP:
|
||||
ret = bpf_xdp_link_attach(attr, prog);
|
||||
break;
|
||||
case BPF_PROG_TYPE_NETFILTER:
|
||||
ret = bpf_nf_link_attach(attr, prog);
|
||||
break;
|
||||
#endif
|
||||
case BPF_PROG_TYPE_PERF_EVENT:
|
||||
case BPF_PROG_TYPE_TRACEPOINT:
|
||||
@@ -4649,6 +4681,35 @@ out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int link_update_map(struct bpf_link *link, union bpf_attr *attr)
|
||||
{
|
||||
struct bpf_map *new_map, *old_map = NULL;
|
||||
int ret;
|
||||
|
||||
new_map = bpf_map_get(attr->link_update.new_map_fd);
|
||||
if (IS_ERR(new_map))
|
||||
return PTR_ERR(new_map);
|
||||
|
||||
if (attr->link_update.flags & BPF_F_REPLACE) {
|
||||
old_map = bpf_map_get(attr->link_update.old_map_fd);
|
||||
if (IS_ERR(old_map)) {
|
||||
ret = PTR_ERR(old_map);
|
||||
goto out_put;
|
||||
}
|
||||
} else if (attr->link_update.old_map_fd) {
|
||||
ret = -EINVAL;
|
||||
goto out_put;
|
||||
}
|
||||
|
||||
ret = link->ops->update_map(link, new_map, old_map);
|
||||
|
||||
if (old_map)
|
||||
bpf_map_put(old_map);
|
||||
out_put:
|
||||
bpf_map_put(new_map);
|
||||
return ret;
|
||||
}
|
||||
|
||||
#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
|
||||
|
||||
static int link_update(union bpf_attr *attr)
|
||||
@@ -4669,6 +4730,11 @@ static int link_update(union bpf_attr *attr)
|
||||
if (IS_ERR(link))
|
||||
return PTR_ERR(link);
|
||||
|
||||
if (link->ops->update_map) {
|
||||
ret = link_update_map(link, attr);
|
||||
goto out_put_link;
|
||||
}
|
||||
|
||||
new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
|
||||
if (IS_ERR(new_prog)) {
|
||||
ret = PTR_ERR(new_prog);
|
||||
@@ -4989,7 +5055,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
|
||||
err = map_freeze(&attr);
|
||||
break;
|
||||
case BPF_PROG_LOAD:
|
||||
err = bpf_prog_load(&attr, uattr);
|
||||
err = bpf_prog_load(&attr, uattr, size);
|
||||
break;
|
||||
case BPF_OBJ_PIN:
|
||||
err = bpf_obj_pin(&attr);
|
||||
@@ -5034,7 +5100,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
|
||||
err = bpf_raw_tracepoint_open(&attr);
|
||||
break;
|
||||
case BPF_BTF_LOAD:
|
||||
err = bpf_btf_load(&attr, uattr);
|
||||
err = bpf_btf_load(&attr, uattr, size);
|
||||
break;
|
||||
case BPF_BTF_GET_FD_BY_ID:
|
||||
err = bpf_btf_get_fd_by_id(&attr);
|
||||
|
||||
@@ -9,7 +9,6 @@
|
||||
#include <linux/btf.h>
|
||||
#include <linux/rcupdate_trace.h>
|
||||
#include <linux/rcupdate_wait.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/static_call.h>
|
||||
#include <linux/bpf_verifier.h>
|
||||
#include <linux/bpf_lsm.h>
|
||||
@@ -172,26 +171,6 @@ out:
|
||||
return tr;
|
||||
}
|
||||
|
||||
static int bpf_trampoline_module_get(struct bpf_trampoline *tr)
|
||||
{
|
||||
struct module *mod;
|
||||
int err = 0;
|
||||
|
||||
preempt_disable();
|
||||
mod = __module_text_address((unsigned long) tr->func.addr);
|
||||
if (mod && !try_module_get(mod))
|
||||
err = -ENOENT;
|
||||
preempt_enable();
|
||||
tr->mod = mod;
|
||||
return err;
|
||||
}
|
||||
|
||||
static void bpf_trampoline_module_put(struct bpf_trampoline *tr)
|
||||
{
|
||||
module_put(tr->mod);
|
||||
tr->mod = NULL;
|
||||
}
|
||||
|
||||
static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
|
||||
{
|
||||
void *ip = tr->func.addr;
|
||||
@@ -202,8 +181,6 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
|
||||
else
|
||||
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
|
||||
|
||||
if (!ret)
|
||||
bpf_trampoline_module_put(tr);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@@ -238,9 +215,6 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
|
||||
tr->func.ftrace_managed = true;
|
||||
}
|
||||
|
||||
if (bpf_trampoline_module_get(tr))
|
||||
return -ENOENT;
|
||||
|
||||
if (tr->func.ftrace_managed) {
|
||||
ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 1);
|
||||
ret = register_ftrace_direct(tr->fops, (long)new_addr);
|
||||
@@ -248,8 +222,6 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
|
||||
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
|
||||
}
|
||||
|
||||
if (ret)
|
||||
bpf_trampoline_module_put(tr);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1465,8 +1465,18 @@ static struct cgroup *current_cgns_cgroup_dfl(void)
|
||||
{
|
||||
struct css_set *cset;
|
||||
|
||||
cset = current->nsproxy->cgroup_ns->root_cset;
|
||||
return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
|
||||
if (current->nsproxy) {
|
||||
cset = current->nsproxy->cgroup_ns->root_cset;
|
||||
return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
|
||||
} else {
|
||||
/*
|
||||
* NOTE: This function may be called from bpf_cgroup_from_id()
|
||||
* on a task which has already passed exit_task_namespaces() and
|
||||
* nsproxy == NULL. Fall back to cgrp_dfl_root which will make all
|
||||
* cgroups visible for lookups.
|
||||
*/
|
||||
return &cgrp_dfl_root.cgrp;
|
||||
}
|
||||
}
|
||||
|
||||
/* look up cgroup associated with given css_set on the specified hierarchy */
|
||||
|
||||
@@ -246,7 +246,6 @@ static inline void kmemleak_load_module(const struct module *mod,
|
||||
void init_build_id(struct module *mod, const struct load_info *info);
|
||||
void layout_symtab(struct module *mod, struct load_info *info);
|
||||
void add_kallsyms(struct module *mod, const struct load_info *info);
|
||||
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name);
|
||||
|
||||
static inline bool sect_empty(const Elf_Shdr *sect)
|
||||
{
|
||||
|
||||
@@ -442,7 +442,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
|
||||
}
|
||||
|
||||
/* Given a module and name of symbol, find and return the symbol's value */
|
||||
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
|
||||
static unsigned long __find_kallsyms_symbol_value(struct module *mod, const char *name)
|
||||
{
|
||||
unsigned int i;
|
||||
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
|
||||
@@ -466,7 +466,7 @@ static unsigned long __module_kallsyms_lookup_name(const char *name)
|
||||
if (colon) {
|
||||
mod = find_module_all(name, colon - name, false);
|
||||
if (mod)
|
||||
return find_kallsyms_symbol_value(mod, colon + 1);
|
||||
return __find_kallsyms_symbol_value(mod, colon + 1);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -475,7 +475,7 @@ static unsigned long __module_kallsyms_lookup_name(const char *name)
|
||||
|
||||
if (mod->state == MODULE_STATE_UNFORMED)
|
||||
continue;
|
||||
ret = find_kallsyms_symbol_value(mod, name);
|
||||
ret = __find_kallsyms_symbol_value(mod, name);
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
@@ -494,6 +494,16 @@ unsigned long module_kallsyms_lookup_name(const char *name)
|
||||
return ret;
|
||||
}
|
||||
|
||||
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
|
||||
{
|
||||
unsigned long ret;
|
||||
|
||||
preempt_disable();
|
||||
ret = __find_kallsyms_symbol_value(mod, name);
|
||||
preempt_enable();
|
||||
return ret;
|
||||
}
|
||||
|
||||
int module_kallsyms_on_each_symbol(const char *modname,
|
||||
int (*fn)(void *, const char *,
|
||||
struct module *, unsigned long),
|
||||
|
||||
@@ -1453,10 +1453,6 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
NULL : &bpf_probe_read_compat_str_proto;
|
||||
#endif
|
||||
#ifdef CONFIG_CGROUPS
|
||||
case BPF_FUNC_get_current_cgroup_id:
|
||||
return &bpf_get_current_cgroup_id_proto;
|
||||
case BPF_FUNC_get_current_ancestor_cgroup_id:
|
||||
return &bpf_get_current_ancestor_cgroup_id_proto;
|
||||
case BPF_FUNC_cgrp_storage_get:
|
||||
return &bpf_cgrp_storage_get_proto;
|
||||
case BPF_FUNC_cgrp_storage_delete:
|
||||
|
||||
Reference in New Issue
Block a user