Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
     default value allows for better BIG TCP performances

   - Reduce compound page head access for zero-copy data transfers

   - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
     possible

   - Threaded NAPI improvements, adding defer skb free support and
     unneeded softirq avoidance

   - Address dst_entry reference count scalability issues, via false
     sharing avoidance and optimize refcount tracking

   - Add lockless accesses annotation to sk_err[_soft]

   - Optimize again the skb struct layout

   - Extends the skb drop reasons to make it usable by multiple
     subsystems

   - Better const qualifier awareness for socket casts

  BPF:

   - Add skb and XDP typed dynptrs which allow BPF programs for more
     ergonomic and less brittle iteration through data and
     variable-sized accesses

   - Add a new BPF netfilter program type and minimal support to hook
     BPF programs to netfilter hooks such as prerouting or forward

   - Add more precise memory usage reporting for all BPF map types

   - Adds support for using {FOU,GUE} encap with an ipip device
     operating in collect_md mode and add a set of BPF kfuncs for
     controlling encap params

   - Allow BPF programs to detect at load time whether a particular
     kfunc exists or not, and also add support for this in light
     skeleton

   - Bigger batch of BPF verifier improvements to prepare for upcoming
     BPF open-coded iterators allowing for less restrictive looping
     capabilities

   - Rework RCU enforcement in the verifier, add kptr_rcu and enforce
     BPF programs to NULL-check before passing such pointers into kfunc

   - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
     in local storage maps

   - Enable RCU semantics for task BPF kptrs and allow referenced kptr
     tasks to be stored in BPF maps

   - Add support for refcounted local kptrs to the verifier for allowing
     shared ownership, useful for adding a node to both the BPF list and
     rbtree

   - Add BPF verifier support for ST instructions in
     convert_ctx_access() which will help new -mcpu=v4 clang flag to
     start emitting them

   - Add ARM32 USDT support to libbpf

   - Improve bpftool's visual program dump which produces the control
     flow graph in a DOT format by adding C source inline annotations

  Protocols:

   - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
     indicates the provenance of the IP address

   - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition

   - Add the handshake upcall mechanism, allowing the user-space to
     implement generic TLS handshake on kernel's behalf

   - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
     resilience to nodes failures

   - SCTP: add support for Fair Capacity and Weighted Fair Queueing
     schedulers

   - MPTCP: delay first subflow allocation up to its first usage. This
     will allow for later better LSM interaction

   - xfrm: Remove inner/outer modes from input/output path. These are
     not needed anymore

   - WiFi:
      - reduced neighbor report (RNR) handling for AP mode
      - HW timestamping support
      - support for randomized auth/deauth TA for PASN privacy
      - per-link debugfs for multi-link
      - TC offload support for mac80211 drivers
      - mac80211 mesh fast-xmit and fast-rx support
      - enable Wi-Fi 7 (EHT) mesh support

  Netfilter:

   - Add nf_tables 'brouting' support, to force a packet to be routed
     instead of being bridged

   - Update bridge netfilter and ovs conntrack helpers to handle IPv6
     Jumbo packets properly, i.e. fetch the packet length from
     hop-by-hop extension header. This is needed for BIT TCP support

   - The iptables 32bit compat interface isn't compiled in by default
     anymore

   - Move ip(6)tables builtin icmp matches to the udptcp one. This has
     the advantage that icmp/icmpv6 match doesn't load the
     iptables/ip6tables modules anymore when iptables-nft is used

   - Extended netlink error report for netdevice in flowtables and
     netdev/chains. Allow for incrementally add/delete devices to netdev
     basechain. Allow to create netdev chain without device

  Driver API:

   - Remove redundant Device Control Error Reporting Enable, as PCI core
     has already error reporting enabled at enumeration time

   - Move Multicast DB netlink handlers to core, allowing devices other
     then bridge to use them

   - Allow the page_pool to directly recycle the pages from safely
     localized NAPI

   - Implement lockless TX queue stop/wake combo macros, allowing for
     further code de-duplication and sanitization

   - Add YNL support for user headers and struct attrs

   - Add partial YNL specification for devlink

   - Add partial YNL specification for ethtool

   - Add tc-mqprio and tc-taprio support for preemptible traffic classes

   - Add tx push buf len param to ethtool, specifies the maximum number
     of bytes of a transmitted packet a driver can push directly to the
     underlying device

   - Add basic LED support for switch/phy

   - Add NAPI documentation, stop relaying on external links

   - Convert dsa_master_ioctl() to netdev notifier. This is a
     preparatory work to make the hardware timestamping layer selectable
     by user space

   - Add transceiver support and improve the error messages for CAN-FD
     controllers

  New hardware / drivers:

   - Ethernet:
      - AMD/Pensando core device support
      - MediaTek MT7981 SoC
      - MediaTek MT7988 SoC
      - Broadcom BCM53134 embedded switch
      - Texas Instruments CPSW9G ethernet switch
      - Qualcomm EMAC3 DWMAC ethernet
      - StarFive JH7110 SoC
      - NXP CBTX ethernet PHY

   - WiFi:
      - Apple M1 Pro/Max devices
      - RealTek rtl8710bu/rtl8188gu
      - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset

   - Bluetooth:
      - Realtek RTL8821CS, RTL8851B, RTL8852BS
      - Mediatek MT7663, MT7922
      - NXP w8997
      - Actions Semi ATS2851
      - QTI WCN6855
      - Marvell 88W8997

   - Can:
      - STMicroelectronics bxcan stm32f429

  Drivers:

   - Ethernet NICs:
      - Intel (1G, icg):
         - add tracking and reporting of QBV config errors
         - add support for configuring max SDU for each Tx queue
      - Intel (100G, ice):
         - refactor mailbox overflow detection to support Scalable IOV
         - GNSS interface optimization
      - Intel (i40e):
         - support XDP multi-buffer
      - nVidia/Mellanox:
         - add the support for linux bridge multicast offload
         - enable TC offload for egress and engress MACVLAN over bond
         - add support for VxLAN GBP encap/decap flows offload
         - extend packet offload to fully support libreswan
         - support tunnel mode in mlx5 IPsec packet offload
         - extend XDP multi-buffer support
         - support MACsec VLAN offload
         - add support for dynamic msix vectors allocation
         - drop RX page_cache and fully use page_pool
         - implement thermal zone to report NIC temperature
      - Netronome/Corigine:
         - add support for multi-zone conntrack offload
      - Solarflare/Xilinx:
         - support offloading TC VLAN push/pop actions to the MAE
         - support TC decap rules
         - support unicast PTP

   - Other NICs:
      - Broadcom (bnxt): enforce software based freq adjustments only on
        shared PHC NIC
      - RealTek (r8169): refactor to addess ASPM issues during NAPI poll
      - Micrel (lan8841): add support for PTP_PF_PEROUT
      - Cadence (macb): enable PTP unicast
      - Engleder (tsnep): add XDP socket zero-copy support
      - virtio-net: implement exact header length guest feature
      - veth: add page_pool support for page recycling
      - vxlan: add MDB data path support
      - gve: add XDP support for GQI-QPL format
      - geneve: accept every ethertype
      - macvlan: allow some packets to bypass broadcast queue
      - mana: add support for jumbo frame

   - Ethernet high-speed switches:
      - Microchip (sparx5): Add support for TC flower templates

   - Ethernet embedded switches:
      - Broadcom (b54):
         - configure 6318 and 63268 RGMII ports
      - Marvell (mv88e6xxx):
         - faster C45 bus scan
      - Microchip:
         - lan966x:
            - add support for IS1 VCAP
            - better TX/RX from/to CPU performances
         - ksz9477: add ETS Qdisc support
         - ksz8: enhance static MAC table operations and error handling
         - sama7g5: add PTP capability
      - NXP (ocelot):
         - add support for external ports
         - add support for preemptible traffic classes
      - Texas Instruments:
         - add CPSWxG SGMII support for J7200 and J721E

   - Intel WiFi (iwlwifi):
      - preparation for Wi-Fi 7 EHT and multi-link support
      - EHT (Wi-Fi 7) sniffer support
      - hardware timestamping support for some devices/firwmares
      - TX beacon protection on newer hardware

   - Qualcomm 802.11ax WiFi (ath11k):
      - MU-MIMO parameters support
      - ack signal support for management packets

   - RealTek WiFi (rtw88):
      - SDIO bus support
      - better support for some SDIO devices (e.g. MAC address from
        efuse)

   - RealTek WiFi (rtw89):
      - HW scan support for 8852b
      - better support for 6 GHz scanning
      - support for various newer firmware APIs
      - framework firmware backwards compatibility

   - MediaTek WiFi (mt76):
      - P2P support
      - mesh A-MSDU support
      - EHT (Wi-Fi 7) support
      - coredump support"

* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
  net: phy: hide the PHYLIB_LEDS knob
  net: phy: marvell-88x2222: remove unnecessary (void*) conversions
  tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
  net: amd: Fix link leak when verifying config failed
  net: phy: marvell: Fix inconsistent indenting in led_blink_set
  lan966x: Don't use xdp_frame when action is XDP_TX
  tsnep: Add XDP socket zero-copy TX support
  tsnep: Add XDP socket zero-copy RX support
  tsnep: Move skb receive action to separate function
  tsnep: Add functions for queue enable/disable
  tsnep: Rework TX/RX queue initialization
  tsnep: Replace modulo operation with mask
  net: phy: dp83867: Add led_brightness_set support
  net: phy: Fix reading LED reg property
  drivers: nfc: nfcsim: remove return value check of `dev_dir`
  net: phy: dp83867: Remove unnecessary (void*) conversions
  net: ethtool: coalesce: try to make user settings stick twice
  net: mana: Check if netdev/napi_alloc_frag returns single page
  net: mana: Rename mana_refill_rxoob and remove some empty lines
  net: veth: add page_pool stats
  ...
This commit is contained in:
Linus Torvalds
2023-04-26 16:07:23 -07:00
1925 changed files with 138826 additions and 47261 deletions

View File

@@ -6,7 +6,8 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
endif
CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o

View File

@@ -307,8 +307,8 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
}
/* Called from syscall or from eBPF program */
static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 index = *(u32 *)key;
@@ -386,7 +386,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
}
/* Called from syscall or from eBPF program */
static int array_map_delete_elem(struct bpf_map *map, void *key)
static long array_map_delete_elem(struct bpf_map *map, void *key)
{
return -EINVAL;
}
@@ -686,8 +686,8 @@ static const struct bpf_iter_seq_info iter_seq_info = {
.seq_priv_size = sizeof(struct bpf_iter_seq_array_map_info),
};
static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_fn,
void *callback_ctx, u64 flags)
static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_fn,
void *callback_ctx, u64 flags)
{
u32 i, key, num_elems = 0;
struct bpf_array *array;
@@ -721,6 +721,28 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
return num_elems;
}
static u64 array_map_mem_usage(const struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
bool percpu = map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
u32 elem_size = array->elem_size;
u64 entries = map->max_entries;
u64 usage = sizeof(*array);
if (percpu) {
usage += entries * sizeof(void *);
usage += entries * elem_size * num_possible_cpus();
} else {
if (map->map_flags & BPF_F_MMAPABLE) {
usage = PAGE_ALIGN(usage);
usage += PAGE_ALIGN(entries * elem_size);
} else {
usage += entries * elem_size;
}
}
return usage;
}
BTF_ID_LIST_SINGLE(array_map_btf_ids, struct, bpf_array)
const struct bpf_map_ops array_map_ops = {
.map_meta_equal = array_map_meta_equal,
@@ -742,6 +764,7 @@ const struct bpf_map_ops array_map_ops = {
.map_update_batch = generic_map_update_batch,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_array_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
};
@@ -762,6 +785,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
.map_update_batch = generic_map_update_batch,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_array_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
};
@@ -847,7 +871,7 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
return 0;
}
static int fd_array_map_delete_elem(struct bpf_map *map, void *key)
static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
void *old_ptr;
@@ -1156,6 +1180,7 @@ const struct bpf_map_ops prog_array_map_ops = {
.map_fd_sys_lookup_elem = prog_fd_array_sys_lookup_elem,
.map_release_uref = prog_array_map_clear,
.map_seq_show_elem = prog_array_map_seq_show_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};
@@ -1257,6 +1282,7 @@ const struct bpf_map_ops perf_event_array_map_ops = {
.map_fd_put_ptr = perf_event_fd_array_put_ptr,
.map_release = perf_event_fd_array_release,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};
@@ -1291,6 +1317,7 @@ const struct bpf_map_ops cgroup_array_map_ops = {
.map_fd_get_ptr = cgroup_fd_array_get_ptr,
.map_fd_put_ptr = cgroup_fd_array_put_ptr,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};
#endif
@@ -1379,5 +1406,6 @@ const struct bpf_map_ops array_of_maps_map_ops = {
.map_lookup_batch = generic_map_lookup_batch,
.map_update_batch = generic_map_update_batch,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};

View File

@@ -16,13 +16,6 @@ struct bpf_bloom_filter {
struct bpf_map map;
u32 bitset_mask;
u32 hash_seed;
/* If the size of the values in the bloom filter is u32 aligned,
* then it is more performant to use jhash2 as the underlying hash
* function, else we use jhash. This tracks the number of u32s
* in an u32-aligned value size. If the value size is not u32 aligned,
* this will be 0.
*/
u32 aligned_u32_count;
u32 nr_hash_funcs;
unsigned long bitset[];
};
@@ -32,16 +25,15 @@ static u32 hash(struct bpf_bloom_filter *bloom, void *value,
{
u32 h;
if (bloom->aligned_u32_count)
h = jhash2(value, bloom->aligned_u32_count,
bloom->hash_seed + index);
if (likely(value_size % 4 == 0))
h = jhash2(value, value_size / 4, bloom->hash_seed + index);
else
h = jhash(value, value_size, bloom->hash_seed + index);
return h & bloom->bitset_mask;
}
static int bloom_map_peek_elem(struct bpf_map *map, void *value)
static long bloom_map_peek_elem(struct bpf_map *map, void *value)
{
struct bpf_bloom_filter *bloom =
container_of(map, struct bpf_bloom_filter, map);
@@ -56,7 +48,7 @@ static int bloom_map_peek_elem(struct bpf_map *map, void *value)
return 0;
}
static int bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
static long bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
{
struct bpf_bloom_filter *bloom =
container_of(map, struct bpf_bloom_filter, map);
@@ -73,12 +65,12 @@ static int bloom_map_push_elem(struct bpf_map *map, void *value, u64 flags)
return 0;
}
static int bloom_map_pop_elem(struct bpf_map *map, void *value)
static long bloom_map_pop_elem(struct bpf_map *map, void *value)
{
return -EOPNOTSUPP;
}
static int bloom_map_delete_elem(struct bpf_map *map, void *value)
static long bloom_map_delete_elem(struct bpf_map *map, void *value)
{
return -EOPNOTSUPP;
}
@@ -152,11 +144,6 @@ static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
bloom->nr_hash_funcs = nr_hash_funcs;
bloom->bitset_mask = bitset_mask;
/* Check whether the value size is u32-aligned */
if ((attr->value_size & (sizeof(u32) - 1)) == 0)
bloom->aligned_u32_count =
attr->value_size / sizeof(u32);
if (!(attr->map_flags & BPF_F_ZERO_SEED))
bloom->hash_seed = get_random_u32();
@@ -177,8 +164,8 @@ static void *bloom_map_lookup_elem(struct bpf_map *map, void *key)
return ERR_PTR(-EINVAL);
}
static int bloom_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
static long bloom_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
/* The eBPF program should use map_push_elem instead */
return -EINVAL;
@@ -193,6 +180,17 @@ static int bloom_map_check_btf(const struct bpf_map *map,
return btf_type_is_void(key_type) ? 0 : -EINVAL;
}
static u64 bloom_map_mem_usage(const struct bpf_map *map)
{
struct bpf_bloom_filter *bloom;
u64 bitset_bytes;
bloom = container_of(map, struct bpf_bloom_filter, map);
bitset_bytes = BITS_TO_BYTES((u64)bloom->bitset_mask + 1);
bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
return sizeof(*bloom) + bitset_bytes;
}
BTF_ID_LIST_SINGLE(bpf_bloom_map_btf_ids, struct, bpf_bloom_filter)
const struct bpf_map_ops bloom_filter_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -206,5 +204,6 @@ const struct bpf_map_ops bloom_filter_map_ops = {
.map_update_elem = bloom_map_update_elem,
.map_delete_elem = bloom_map_delete_elem,
.map_check_btf = bloom_map_check_btf,
.map_mem_usage = bloom_map_mem_usage,
.map_btf_id = &bpf_bloom_map_btf_ids[0],
};

View File

@@ -46,8 +46,6 @@ static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
void bpf_cgrp_storage_free(struct cgroup *cgroup)
{
struct bpf_local_storage *local_storage;
bool free_cgroup_storage = false;
unsigned long flags;
rcu_read_lock();
local_storage = rcu_dereference(cgroup->bpf_cgrp_storage);
@@ -57,14 +55,9 @@ void bpf_cgrp_storage_free(struct cgroup *cgroup)
}
bpf_cgrp_storage_lock();
raw_spin_lock_irqsave(&local_storage->lock, flags);
free_cgroup_storage = bpf_local_storage_unlink_nolock(local_storage);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_local_storage_destroy(local_storage);
bpf_cgrp_storage_unlock();
rcu_read_unlock();
if (free_cgroup_storage)
kfree_rcu(local_storage, rcu);
}
static struct bpf_local_storage_data *
@@ -100,8 +93,8 @@ static void *bpf_cgrp_storage_lookup_elem(struct bpf_map *map, void *key)
return sdata ? sdata->data : NULL;
}
static int bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
static long bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *sdata;
struct cgroup *cgroup;
@@ -128,11 +121,11 @@ static int cgroup_storage_delete(struct cgroup *cgroup, struct bpf_map *map)
if (!sdata)
return -ENOENT;
bpf_selem_unlink(SELEM(sdata), true);
bpf_selem_unlink(SELEM(sdata), false);
return 0;
}
static int bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
static long bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
{
struct cgroup *cgroup;
int err, fd;
@@ -156,7 +149,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
{
return bpf_local_storage_map_alloc(attr, &cgroup_cache);
return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
}
static void cgroup_storage_map_free(struct bpf_map *map)
@@ -221,6 +214,7 @@ const struct bpf_map_ops cgrp_storage_map_ops = {
.map_update_elem = bpf_cgrp_storage_update_elem,
.map_delete_elem = bpf_cgrp_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = cgroup_storage_ptr,
};
@@ -230,7 +224,7 @@ const struct bpf_func_proto bpf_cgrp_storage_get_proto = {
.gpl_only = false,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &bpf_cgroup_btf_id[0],
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
.arg4_type = ARG_ANYTHING,
@@ -241,6 +235,6 @@ const struct bpf_func_proto bpf_cgrp_storage_delete_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &bpf_cgroup_btf_id[0],
};

View File

@@ -57,7 +57,6 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
void bpf_inode_storage_free(struct inode *inode)
{
struct bpf_local_storage *local_storage;
bool free_inode_storage = false;
struct bpf_storage_blob *bsb;
bsb = bpf_inode(inode);
@@ -72,13 +71,8 @@ void bpf_inode_storage_free(struct inode *inode)
return;
}
raw_spin_lock_bh(&local_storage->lock);
free_inode_storage = bpf_local_storage_unlink_nolock(local_storage);
raw_spin_unlock_bh(&local_storage->lock);
bpf_local_storage_destroy(local_storage);
rcu_read_unlock();
if (free_inode_storage)
kfree_rcu(local_storage, rcu);
}
static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
@@ -94,8 +88,8 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
return sdata ? sdata->data : NULL;
}
static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *sdata;
struct fd f = fdget_raw(*(int *)key);
@@ -122,12 +116,12 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
if (!sdata)
return -ENOENT;
bpf_selem_unlink(SELEM(sdata), true);
bpf_selem_unlink(SELEM(sdata), false);
return 0;
}
static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
{
struct fd f = fdget_raw(*(int *)key);
int err;
@@ -197,7 +191,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
{
return bpf_local_storage_map_alloc(attr, &inode_cache);
return bpf_local_storage_map_alloc(attr, &inode_cache, false);
}
static void inode_storage_map_free(struct bpf_map *map)
@@ -215,6 +209,7 @@ const struct bpf_map_ops inode_storage_map_ops = {
.map_update_elem = bpf_fd_inode_storage_update_elem,
.map_delete_elem = bpf_fd_inode_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = inode_storage_ptr,
};
@@ -226,7 +221,7 @@ const struct bpf_func_proto bpf_inode_storage_get_proto = {
.gpl_only = false,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &bpf_inode_storage_btf_ids[0],
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
.arg4_type = ARG_ANYTHING,
@@ -237,6 +232,6 @@ const struct bpf_func_proto bpf_inode_storage_delete_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &bpf_inode_storage_btf_ids[0],
};

View File

@@ -776,3 +776,73 @@ const struct bpf_func_proto bpf_loop_proto = {
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
.arg4_type = ARG_ANYTHING,
};
struct bpf_iter_num_kern {
int cur; /* current value, inclusive */
int end; /* final value, exclusive */
} __aligned(8);
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in vmlinux BTF");
__bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
{
struct bpf_iter_num_kern *s = (void *)it;
BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num));
BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num));
BTF_TYPE_EMIT(struct btf_iter_num);
/* start == end is legit, it's an empty range and we'll just get NULL
* on first (and any subsequent) bpf_iter_num_next() call
*/
if (start > end) {
s->cur = s->end = 0;
return -EINVAL;
}
/* avoid overflows, e.g., if start == INT_MIN and end == INT_MAX */
if ((s64)end - (s64)start > BPF_MAX_LOOPS) {
s->cur = s->end = 0;
return -E2BIG;
}
/* user will call bpf_iter_num_next() first,
* which will set s->cur to exactly start value;
* underflow shouldn't matter
*/
s->cur = start - 1;
s->end = end;
return 0;
}
__bpf_kfunc int *bpf_iter_num_next(struct bpf_iter_num* it)
{
struct bpf_iter_num_kern *s = (void *)it;
/* check failed initialization or if we are done (same behavior);
* need to be careful about overflow, so convert to s64 for checks,
* e.g., if s->cur == s->end == INT_MAX, we can't just do
* s->cur + 1 >= s->end
*/
if ((s64)(s->cur + 1) >= s->end) {
s->cur = s->end = 0;
return NULL;
}
s->cur++;
return &s->cur;
}
__bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter_num *it)
{
struct bpf_iter_num_kern *s = (void *)it;
s->cur = s->end = 0;
}
__diag_pop();

View File

@@ -51,11 +51,21 @@ owner_storage(struct bpf_local_storage_map *smap, void *owner)
return map->ops->map_owner_storage_ptr(owner);
}
static bool selem_linked_to_storage_lockless(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed_lockless(&selem->snode);
}
static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->snode);
}
static bool selem_linked_to_map_lockless(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed_lockless(&selem->map_node);
}
static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->map_node);
@@ -70,11 +80,28 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
if (charge_mem && mem_charge(smap, owner, smap->elem_size))
return NULL;
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
gfp_flags | __GFP_NOWARN);
if (smap->bpf_ma) {
migrate_disable();
selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
migrate_enable();
if (selem)
/* Keep the original bpf_map_kzalloc behavior
* before started using the bpf_mem_cache_alloc.
*
* No need to use zero_map_value. The bpf_selem_free()
* only does bpf_mem_cache_free when there is
* no other bpf prog is using the selem.
*/
memset(SDATA(selem)->data, 0, smap->map.value_size);
} else {
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
gfp_flags | __GFP_NOWARN);
}
if (selem) {
if (value)
copy_map_value(&smap->map, SDATA(selem)->data, value);
/* No need to call check_and_init_map_value as memory is zero init */
return selem;
}
@@ -84,7 +111,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return NULL;
}
void bpf_local_storage_free_rcu(struct rcu_head *rcu)
/* rcu tasks trace callback for bpf_ma == false */
static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
@@ -98,7 +126,66 @@ void bpf_local_storage_free_rcu(struct rcu_head *rcu)
kfree_rcu(local_storage, rcu);
}
static void bpf_selem_free_rcu(struct rcu_head *rcu)
static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
bpf_mem_cache_raw_free(local_storage);
}
static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
{
if (rcu_trace_implies_rcu_gp())
bpf_local_storage_free_rcu(rcu);
else
call_rcu(rcu, bpf_local_storage_free_rcu);
}
/* Handle bpf_ma == false */
static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
bool vanilla_rcu)
{
if (vanilla_rcu)
kfree_rcu(local_storage, rcu);
else
call_rcu_tasks_trace(&local_storage->rcu,
__bpf_local_storage_free_trace_rcu);
}
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
bool bpf_ma, bool reuse_now)
{
if (!local_storage)
return;
if (!bpf_ma) {
__bpf_local_storage_free(local_storage, reuse_now);
return;
}
if (!reuse_now) {
call_rcu_tasks_trace(&local_storage->rcu,
bpf_local_storage_free_trace_rcu);
return;
}
if (smap) {
migrate_disable();
bpf_mem_cache_free(&smap->storage_ma, local_storage);
migrate_enable();
} else {
/* smap could be NULL if the selem that triggered
* this 'local_storage' creation had been long gone.
* In this case, directly do call_rcu().
*/
call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
}
}
/* rcu tasks trace callback for bpf_ma == false */
static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage_elem *selem;
@@ -109,13 +196,63 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
kfree_rcu(selem, rcu);
}
/* Handle bpf_ma == false */
static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
bool vanilla_rcu)
{
if (vanilla_rcu)
kfree_rcu(selem, rcu);
else
call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
}
static void bpf_selem_free_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage_elem *selem;
selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
bpf_mem_cache_raw_free(selem);
}
static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
{
if (rcu_trace_implies_rcu_gp())
bpf_selem_free_rcu(rcu);
else
call_rcu(rcu, bpf_selem_free_rcu);
}
void bpf_selem_free(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *smap,
bool reuse_now)
{
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
if (!smap->bpf_ma) {
__bpf_selem_free(selem, reuse_now);
return;
}
if (!reuse_now) {
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
} else {
/* Instead of using the vanilla call_rcu(),
* bpf_mem_cache_free will be able to reuse selem
* immediately.
*/
migrate_disable();
bpf_mem_cache_free(&smap->selem_ma, selem);
migrate_enable();
}
}
/* local_storage->lock must be held and selem->local_storage == local_storage.
* The caller must ensure selem->smap is still valid to be
* dereferenced for its smap->elem_size and smap->cache_idx.
*/
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
bool uncharge_mem, bool use_trace_rcu)
bool uncharge_mem, bool reuse_now)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
@@ -159,40 +296,75 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
SDATA(selem))
RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
if (use_trace_rcu)
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
else
kfree_rcu(selem, rcu);
bpf_selem_free(selem, smap, reuse_now);
if (rcu_access_pointer(local_storage->smap) == smap)
RCU_INIT_POINTER(local_storage->smap, NULL);
return free_local_storage;
}
static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bool use_trace_rcu)
static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *storage_smap,
struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *selem_smap;
/* local_storage->smap may be NULL. If it is, get the bpf_ma
* from any selem in the local_storage->list. The bpf_ma of all
* local_storage and selem should have the same value
* for the same map type.
*
* If the local_storage->list is already empty, the caller will not
* care about the bpf_ma value also because the caller is not
* responsibile to free the local_storage.
*/
if (storage_smap)
return storage_smap->bpf_ma;
if (!selem) {
struct hlist_node *n;
n = rcu_dereference_check(hlist_first_rcu(&local_storage->list),
bpf_rcu_lock_held());
if (!n)
return false;
selem = hlist_entry(n, struct bpf_local_storage_elem, snode);
}
selem_smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
return selem_smap->bpf_ma;
}
static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bool reuse_now)
{
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage;
bool free_local_storage = false;
bool bpf_ma, free_local_storage = false;
unsigned long flags;
if (unlikely(!selem_linked_to_storage(selem)))
if (unlikely(!selem_linked_to_storage_lockless(selem)))
/* selem has already been unlinked from sk */
return;
local_storage = rcu_dereference_check(selem->local_storage,
bpf_rcu_lock_held());
storage_smap = rcu_dereference_check(local_storage->smap,
bpf_rcu_lock_held());
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
free_local_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, true, use_trace_rcu);
local_storage, selem, true, reuse_now);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
if (free_local_storage) {
if (use_trace_rcu)
call_rcu_tasks_trace(&local_storage->rcu,
bpf_local_storage_free_rcu);
else
kfree_rcu(local_storage, rcu);
}
if (free_local_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
}
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
@@ -202,13 +374,13 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
hlist_add_head_rcu(&selem->snode, &local_storage->list);
}
void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
unsigned long flags;
if (unlikely(!selem_linked_to_map(selem)))
if (unlikely(!selem_linked_to_map_lockless(selem)))
/* selem has already be unlinked from smap */
return;
@@ -232,14 +404,14 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
raw_spin_unlock_irqrestore(&b->lock, flags);
}
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu)
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
{
/* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
* the local_storage.
*/
bpf_selem_unlink_map(selem);
__bpf_selem_unlink_storage(selem, use_trace_rcu);
bpf_selem_unlink_storage(selem, reuse_now);
}
/* If cacheit_lockit is false, this lookup function is lockless */
@@ -312,13 +484,21 @@ int bpf_local_storage_alloc(void *owner,
if (err)
return err;
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
gfp_flags | __GFP_NOWARN);
if (smap->bpf_ma) {
migrate_disable();
storage = bpf_mem_cache_alloc_flags(&smap->storage_ma, gfp_flags);
migrate_enable();
} else {
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
gfp_flags | __GFP_NOWARN);
}
if (!storage) {
err = -ENOMEM;
goto uncharge;
}
RCU_INIT_POINTER(storage->smap, smap);
INIT_HLIST_HEAD(&storage->list);
raw_spin_lock_init(&storage->lock);
storage->owner = owner;
@@ -358,7 +538,7 @@ int bpf_local_storage_alloc(void *owner,
return 0;
uncharge:
kfree(storage);
bpf_local_storage_free(storage, smap, smap->bpf_ma, true);
mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
@@ -402,7 +582,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
if (err) {
kfree(selem);
bpf_selem_free(selem, smap, true);
mem_uncharge(smap, owner, smap->elem_size);
return ERR_PTR(err);
}
@@ -420,7 +600,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
err = check_flags(old_sdata, map_flags);
if (err)
return ERR_PTR(err);
if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
if (old_sdata && selem_linked_to_storage_lockless(SELEM(old_sdata))) {
copy_map_value_locked(&smap->map, old_sdata->data,
value, false);
return old_sdata;
@@ -485,7 +665,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (old_sdata) {
bpf_selem_unlink_map(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
false, true);
false, false);
}
unlock:
@@ -496,7 +676,7 @@ unlock_err:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
if (selem) {
mem_uncharge(smap, owner, smap->elem_size);
kfree(selem);
bpf_selem_free(selem, smap, true);
}
return ERR_PTR(err);
}
@@ -552,40 +732,6 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
return 0;
}
static struct bpf_local_storage_map *__bpf_local_storage_map_alloc(union bpf_attr *attr)
{
struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
if (!smap)
return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&smap->map, attr);
nbuckets = roundup_pow_of_two(num_possible_cpus());
/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
nbuckets = max_t(u32, 2, nbuckets);
smap->bucket_log = ilog2(nbuckets);
smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
nbuckets, GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
bpf_map_area_free(smap);
return ERR_PTR(-ENOMEM);
}
for (i = 0; i < nbuckets; i++) {
INIT_HLIST_HEAD(&smap->buckets[i].list);
raw_spin_lock_init(&smap->buckets[i].lock);
}
smap->elem_size = offsetof(struct bpf_local_storage_elem,
sdata.data[attr->value_size]);
return smap;
}
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
const struct btf *btf,
const struct btf_type *key_type,
@@ -603,11 +749,16 @@ int bpf_local_storage_map_check_btf(const struct bpf_map *map,
return 0;
}
bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
{
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage_elem *selem;
bool free_storage = false;
bool bpf_ma, free_storage = false;
struct hlist_node *n;
unsigned long flags;
storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
/* Neither the bpf_prog nor the bpf_map's syscall
* could be modifying the local_storage->list now.
@@ -618,6 +769,7 @@ bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
* when unlinking elem from the local_storage->list and
* the map's bucket->list.
*/
raw_spin_lock_irqsave(&local_storage->lock, flags);
hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
/* Always unlink from map before unlinking from
* local_storage.
@@ -630,24 +782,89 @@ bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
* of the loop will set the free_cgroup_storage to true.
*/
free_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, false, false);
local_storage, selem, false, true);
}
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
return free_storage;
if (free_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
}
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
{
struct bpf_local_storage_map *smap = (struct bpf_local_storage_map *)map;
u64 usage = sizeof(*smap);
/* The dynamically callocated selems are not counted currently. */
usage += sizeof(*smap->buckets) * (1ULL << smap->bucket_log);
return usage;
}
/* When bpf_ma == true, the bpf_mem_alloc is used to allocate and free memory.
* A deadlock free allocator is useful for storage that the bpf prog can easily
* get a hold of the owner PTR_TO_BTF_ID in any context. eg. bpf_get_current_task_btf.
* The task and cgroup storage fall into this case. The bpf_mem_alloc reuses
* memory immediately. To be reuse-immediate safe, the owner destruction
* code path needs to go through a rcu grace period before calling
* bpf_local_storage_destroy().
*
* When bpf_ma == false, the kmalloc and kfree are used.
*/
struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
struct bpf_local_storage_cache *cache)
struct bpf_local_storage_cache *cache,
bool bpf_ma)
{
struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
int err;
smap = __bpf_local_storage_map_alloc(attr);
if (IS_ERR(smap))
return ERR_CAST(smap);
smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
if (!smap)
return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&smap->map, attr);
nbuckets = roundup_pow_of_two(num_possible_cpus());
/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
nbuckets = max_t(u32, 2, nbuckets);
smap->bucket_log = ilog2(nbuckets);
smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
nbuckets, GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
err = -ENOMEM;
goto free_smap;
}
for (i = 0; i < nbuckets; i++) {
INIT_HLIST_HEAD(&smap->buckets[i].list);
raw_spin_lock_init(&smap->buckets[i].lock);
}
smap->elem_size = offsetof(struct bpf_local_storage_elem,
sdata.data[attr->value_size]);
smap->bpf_ma = bpf_ma;
if (bpf_ma) {
err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
if (err)
goto free_smap;
err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
if (err) {
bpf_mem_alloc_destroy(&smap->selem_ma);
goto free_smap;
}
}
smap->cache_idx = bpf_local_storage_cache_idx_get(cache);
return &smap->map;
free_smap:
kvfree(smap->buckets);
bpf_map_area_free(smap);
return ERR_PTR(err);
}
void bpf_local_storage_map_free(struct bpf_map *map,
@@ -689,7 +906,7 @@ void bpf_local_storage_map_free(struct bpf_map *map,
migrate_disable();
this_cpu_inc(*busy_counter);
}
bpf_selem_unlink(selem, false);
bpf_selem_unlink(selem, true);
if (busy_counter) {
this_cpu_dec(*busy_counter);
migrate_enable();
@@ -713,6 +930,10 @@ void bpf_local_storage_map_free(struct bpf_map *map,
*/
synchronize_rcu();
if (smap->bpf_ma) {
bpf_mem_alloc_destroy(&smap->selem_ma);
bpf_mem_alloc_destroy(&smap->storage_ma);
}
kvfree(smap->buckets);
bpf_map_area_free(smap);
}

View File

@@ -11,11 +11,13 @@
#include <linux/refcount.h>
#include <linux/mutex.h>
#include <linux/btf_ids.h>
#include <linux/rcupdate_wait.h>
enum bpf_struct_ops_state {
BPF_STRUCT_OPS_STATE_INIT,
BPF_STRUCT_OPS_STATE_INUSE,
BPF_STRUCT_OPS_STATE_TOBEFREE,
BPF_STRUCT_OPS_STATE_READY,
};
#define BPF_STRUCT_OPS_COMMON_VALUE \
@@ -58,6 +60,13 @@ struct bpf_struct_ops_map {
struct bpf_struct_ops_value kvalue;
};
struct bpf_struct_ops_link {
struct bpf_link link;
struct bpf_map __rcu *map;
};
static DEFINE_MUTEX(update_mutex);
#define VALUE_PREFIX "bpf_struct_ops_"
#define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1)
@@ -249,6 +258,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
struct bpf_struct_ops_value *uvalue, *kvalue;
enum bpf_struct_ops_state state;
s64 refcnt;
if (unlikely(*(u32 *)key != 0))
return -ENOENT;
@@ -267,7 +277,14 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
uvalue = value;
memcpy(uvalue, st_map->uvalue, map->value_size);
uvalue->state = state;
refcount_set(&uvalue->refcnt, refcount_read(&kvalue->refcnt));
/* This value offers the user space a general estimate of how
* many sockets are still utilizing this struct_ops for TCP
* congestion control. The number might not be exact, but it
* should sufficiently meet our present goals.
*/
refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt);
refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0));
return 0;
}
@@ -349,8 +366,8 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
model, flags, tlinks, NULL);
}
static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
const struct bpf_struct_ops *st_ops = st_map->st_ops;
@@ -491,12 +508,29 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
*(unsigned long *)(udata + moff) = prog->aux->id;
}
refcount_set(&kvalue->refcnt, 1);
bpf_map_inc(map);
if (st_map->map.map_flags & BPF_F_LINK) {
err = st_ops->validate(kdata);
if (err)
goto reset_unlock;
set_memory_rox((long)st_map->image, 1);
/* Let bpf_link handle registration & unregistration.
*
* Pair with smp_load_acquire() during lookup_elem().
*/
smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY);
goto unlock;
}
set_memory_rox((long)st_map->image, 1);
err = st_ops->reg(kdata);
if (likely(!err)) {
/* This refcnt increment on the map here after
* 'st_ops->reg()' is secure since the state of the
* map must be set to INIT at this moment, and thus
* bpf_struct_ops_map_delete_elem() can't unregister
* or transition it to TOBEFREE concurrently.
*/
bpf_map_inc(map);
/* Pair with smp_load_acquire() during lookup_elem().
* It ensures the above udata updates (e.g. prog->aux->id)
* can be seen once BPF_STRUCT_OPS_STATE_INUSE is set.
@@ -512,7 +546,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
*/
set_memory_nx((long)st_map->image, 1);
set_memory_rw((long)st_map->image, 1);
bpf_map_put(map);
reset_unlock:
bpf_struct_ops_map_put_progs(st_map);
@@ -524,20 +557,22 @@ unlock:
return err;
}
static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
{
enum bpf_struct_ops_state prev_state;
struct bpf_struct_ops_map *st_map;
st_map = (struct bpf_struct_ops_map *)map;
if (st_map->map.map_flags & BPF_F_LINK)
return -EOPNOTSUPP;
prev_state = cmpxchg(&st_map->kvalue.state,
BPF_STRUCT_OPS_STATE_INUSE,
BPF_STRUCT_OPS_STATE_TOBEFREE);
switch (prev_state) {
case BPF_STRUCT_OPS_STATE_INUSE:
st_map->st_ops->unreg(&st_map->kvalue.data);
if (refcount_dec_and_test(&st_map->kvalue.refcnt))
bpf_map_put(map);
bpf_map_put(map);
return 0;
case BPF_STRUCT_OPS_STATE_TOBEFREE:
return -EINPROGRESS;
@@ -570,7 +605,7 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key,
kfree(value);
}
static void bpf_struct_ops_map_free(struct bpf_map *map)
static void __bpf_struct_ops_map_free(struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
@@ -582,10 +617,32 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
bpf_map_area_free(st_map);
}
static void bpf_struct_ops_map_free(struct bpf_map *map)
{
/* The struct_ops's function may switch to another struct_ops.
*
* For example, bpf_tcp_cc_x->init() may switch to
* another tcp_cc_y by calling
* setsockopt(TCP_CONGESTION, "tcp_cc_y").
* During the switch, bpf_struct_ops_put(tcp_cc_x) is called
* and its refcount may reach 0 which then free its
* trampoline image while tcp_cc_x is still running.
*
* A vanilla rcu gp is to wait for all bpf-tcp-cc prog
* to finish. bpf-tcp-cc prog is non sleepable.
* A rcu_tasks gp is to wait for the last few insn
* in the tramopline image to finish before releasing
* the trampoline image.
*/
synchronize_rcu_mult(call_rcu, call_rcu_tasks);
__bpf_struct_ops_map_free(map);
}
static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
{
if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 ||
attr->map_flags || !attr->btf_vmlinux_value_type_id)
(attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id)
return -EINVAL;
return 0;
}
@@ -609,6 +666,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
if (attr->value_size != vt->size)
return ERR_PTR(-EINVAL);
if (attr->map_flags & BPF_F_LINK && (!st_ops->validate || !st_ops->update))
return ERR_PTR(-EOPNOTSUPP);
t = st_ops->type;
st_map_size = sizeof(*st_map) +
@@ -630,7 +690,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
NUMA_NO_NODE);
st_map->image = bpf_jit_alloc_exec(PAGE_SIZE);
if (!st_map->uvalue || !st_map->links || !st_map->image) {
bpf_struct_ops_map_free(map);
__bpf_struct_ops_map_free(map);
return ERR_PTR(-ENOMEM);
}
@@ -641,6 +701,21 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
return map;
}
static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
const struct bpf_struct_ops *st_ops = st_map->st_ops;
const struct btf_type *vt = st_ops->value_type;
u64 usage;
usage = sizeof(*st_map) +
vt->size - sizeof(struct bpf_struct_ops_value);
usage += vt->size;
usage += btf_type_vlen(vt) * sizeof(struct bpf_links *);
usage += PAGE_SIZE;
return usage;
}
BTF_ID_LIST_SINGLE(bpf_struct_ops_map_btf_ids, struct, bpf_struct_ops_map)
const struct bpf_map_ops bpf_struct_ops_map_ops = {
.map_alloc_check = bpf_struct_ops_map_alloc_check,
@@ -651,6 +726,7 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
.map_delete_elem = bpf_struct_ops_map_delete_elem,
.map_update_elem = bpf_struct_ops_map_update_elem,
.map_seq_show_elem = bpf_struct_ops_map_seq_show_elem,
.map_mem_usage = bpf_struct_ops_map_mem_usage,
.map_btf_id = &bpf_struct_ops_map_btf_ids[0],
};
@@ -660,41 +736,175 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
bool bpf_struct_ops_get(const void *kdata)
{
struct bpf_struct_ops_value *kvalue;
struct bpf_struct_ops_map *st_map;
struct bpf_map *map;
kvalue = container_of(kdata, struct bpf_struct_ops_value, data);
st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue);
return refcount_inc_not_zero(&kvalue->refcnt);
}
static void bpf_struct_ops_put_rcu(struct rcu_head *head)
{
struct bpf_struct_ops_map *st_map;
st_map = container_of(head, struct bpf_struct_ops_map, rcu);
bpf_map_put(&st_map->map);
map = __bpf_map_inc_not_zero(&st_map->map, false);
return !IS_ERR(map);
}
void bpf_struct_ops_put(const void *kdata)
{
struct bpf_struct_ops_value *kvalue;
struct bpf_struct_ops_map *st_map;
kvalue = container_of(kdata, struct bpf_struct_ops_value, data);
if (refcount_dec_and_test(&kvalue->refcnt)) {
struct bpf_struct_ops_map *st_map;
st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue);
st_map = container_of(kvalue, struct bpf_struct_ops_map,
kvalue);
/* The struct_ops's function may switch to another struct_ops.
*
* For example, bpf_tcp_cc_x->init() may switch to
* another tcp_cc_y by calling
* setsockopt(TCP_CONGESTION, "tcp_cc_y").
* During the switch, bpf_struct_ops_put(tcp_cc_x) is called
* and its map->refcnt may reach 0 which then free its
* trampoline image while tcp_cc_x is still running.
*
* Thus, a rcu grace period is needed here.
*/
call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu);
}
bpf_map_put(&st_map->map);
}
static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
return map->map_type == BPF_MAP_TYPE_STRUCT_OPS &&
map->map_flags & BPF_F_LINK &&
/* Pair with smp_store_release() during map_update */
smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY;
}
static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link)
{
struct bpf_struct_ops_link *st_link;
struct bpf_struct_ops_map *st_map;
st_link = container_of(link, struct bpf_struct_ops_link, link);
st_map = (struct bpf_struct_ops_map *)
rcu_dereference_protected(st_link->map, true);
if (st_map) {
/* st_link->map can be NULL if
* bpf_struct_ops_link_create() fails to register.
*/
st_map->st_ops->unreg(&st_map->kvalue.data);
bpf_map_put(&st_map->map);
}
kfree(st_link);
}
static void bpf_struct_ops_map_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
struct bpf_struct_ops_link *st_link;
struct bpf_map *map;
st_link = container_of(link, struct bpf_struct_ops_link, link);
rcu_read_lock();
map = rcu_dereference(st_link->map);
seq_printf(seq, "map_id:\t%d\n", map->id);
rcu_read_unlock();
}
static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
struct bpf_struct_ops_link *st_link;
struct bpf_map *map;
st_link = container_of(link, struct bpf_struct_ops_link, link);
rcu_read_lock();
map = rcu_dereference(st_link->map);
info->struct_ops.map_id = map->id;
rcu_read_unlock();
return 0;
}
static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map *new_map,
struct bpf_map *expected_old_map)
{
struct bpf_struct_ops_map *st_map, *old_st_map;
struct bpf_map *old_map;
struct bpf_struct_ops_link *st_link;
int err = 0;
st_link = container_of(link, struct bpf_struct_ops_link, link);
st_map = container_of(new_map, struct bpf_struct_ops_map, map);
if (!bpf_struct_ops_valid_to_reg(new_map))
return -EINVAL;
mutex_lock(&update_mutex);
old_map = rcu_dereference_protected(st_link->map, lockdep_is_held(&update_mutex));
if (expected_old_map && old_map != expected_old_map) {
err = -EPERM;
goto err_out;
}
old_st_map = container_of(old_map, struct bpf_struct_ops_map, map);
/* The new and old struct_ops must be the same type. */
if (st_map->st_ops != old_st_map->st_ops) {
err = -EINVAL;
goto err_out;
}
err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data);
if (err)
goto err_out;
bpf_map_inc(new_map);
rcu_assign_pointer(st_link->map, new_map);
bpf_map_put(old_map);
err_out:
mutex_unlock(&update_mutex);
return err;
}
static const struct bpf_link_ops bpf_struct_ops_map_lops = {
.dealloc = bpf_struct_ops_map_link_dealloc,
.show_fdinfo = bpf_struct_ops_map_link_show_fdinfo,
.fill_link_info = bpf_struct_ops_map_link_fill_link_info,
.update_map = bpf_struct_ops_map_link_update,
};
int bpf_struct_ops_link_create(union bpf_attr *attr)
{
struct bpf_struct_ops_link *link = NULL;
struct bpf_link_primer link_primer;
struct bpf_struct_ops_map *st_map;
struct bpf_map *map;
int err;
map = bpf_map_get(attr->link_create.map_fd);
if (IS_ERR(map))
return PTR_ERR(map);
st_map = (struct bpf_struct_ops_map *)map;
if (!bpf_struct_ops_valid_to_reg(map)) {
err = -EINVAL;
goto err_out;
}
link = kzalloc(sizeof(*link), GFP_USER);
if (!link) {
err = -ENOMEM;
goto err_out;
}
bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_map_lops, NULL);
err = bpf_link_prime(&link->link, &link_primer);
if (err)
goto err_out;
err = st_map->st_ops->reg(st_map->kvalue.data);
if (err) {
bpf_link_cleanup(&link_primer);
link = NULL;
goto err_out;
}
RCU_INIT_POINTER(link->map, map);
return bpf_link_settle(&link_primer);
err_out:
bpf_map_put(map);
kfree(link);
return err;
}

View File

@@ -72,8 +72,6 @@ task_storage_lookup(struct task_struct *task, struct bpf_map *map,
void bpf_task_storage_free(struct task_struct *task)
{
struct bpf_local_storage *local_storage;
bool free_task_storage = false;
unsigned long flags;
rcu_read_lock();
@@ -84,14 +82,9 @@ void bpf_task_storage_free(struct task_struct *task)
}
bpf_task_storage_lock();
raw_spin_lock_irqsave(&local_storage->lock, flags);
free_task_storage = bpf_local_storage_unlink_nolock(local_storage);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_local_storage_destroy(local_storage);
bpf_task_storage_unlock();
rcu_read_unlock();
if (free_task_storage)
kfree_rcu(local_storage, rcu);
}
static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
@@ -127,8 +120,8 @@ out:
return ERR_PTR(err);
}
static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *sdata;
struct task_struct *task;
@@ -175,12 +168,12 @@ static int task_storage_delete(struct task_struct *task, struct bpf_map *map,
if (!nobusy)
return -EBUSY;
bpf_selem_unlink(SELEM(sdata), true);
bpf_selem_unlink(SELEM(sdata), false);
return 0;
}
static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
static long bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
{
struct task_struct *task;
unsigned int f_flags;
@@ -316,7 +309,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
{
return bpf_local_storage_map_alloc(attr, &task_cache);
return bpf_local_storage_map_alloc(attr, &task_cache, true);
}
static void task_storage_map_free(struct bpf_map *map)
@@ -335,6 +328,7 @@ const struct bpf_map_ops task_storage_map_ops = {
.map_update_elem = bpf_pid_task_storage_update_elem,
.map_delete_elem = bpf_pid_task_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = task_storage_ptr,
};
@@ -344,7 +338,7 @@ const struct bpf_func_proto bpf_task_storage_get_recur_proto = {
.gpl_only = false,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
.arg4_type = ARG_ANYTHING,
@@ -355,7 +349,7 @@ const struct bpf_func_proto bpf_task_storage_get_proto = {
.gpl_only = false,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
.arg4_type = ARG_ANYTHING,
@@ -366,7 +360,7 @@ const struct bpf_func_proto bpf_task_storage_delete_recur_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
};
@@ -375,6 +369,6 @@ const struct bpf_func_proto bpf_task_storage_delete_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
.arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
};

View File

@@ -25,6 +25,9 @@
#include <linux/bsearch.h>
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <net/netfilter/nf_bpf_link.h>
#include <net/sock.h>
#include "../tools/lib/bpf/relo_core.h"
@@ -207,6 +210,12 @@ enum btf_kfunc_hook {
BTF_KFUNC_HOOK_TRACING,
BTF_KFUNC_HOOK_SYSCALL,
BTF_KFUNC_HOOK_FMODRET,
BTF_KFUNC_HOOK_CGROUP_SKB,
BTF_KFUNC_HOOK_SCHED_ACT,
BTF_KFUNC_HOOK_SK_SKB,
BTF_KFUNC_HOOK_SOCKET_FILTER,
BTF_KFUNC_HOOK_LWT,
BTF_KFUNC_HOOK_NETFILTER,
BTF_KFUNC_HOOK_MAX,
};
@@ -572,8 +581,8 @@ static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
*btf_p = btf;
return ret;
}
spin_lock_bh(&btf_idr_lock);
btf_put(btf);
spin_lock_bh(&btf_idr_lock);
}
spin_unlock_bh(&btf_idr_lock);
return ret;
@@ -1661,10 +1670,8 @@ static void btf_struct_metas_free(struct btf_struct_metas *tab)
if (!tab)
return;
for (i = 0; i < tab->cnt; i++) {
for (i = 0; i < tab->cnt; i++)
btf_record_free(tab->types[i].record);
kfree(tab->types[i].field_offs);
}
kfree(tab);
}
@@ -3226,12 +3233,6 @@ static void btf_struct_log(struct btf_verifier_env *env,
btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
}
enum btf_field_info_type {
BTF_FIELD_SPIN_LOCK,
BTF_FIELD_TIMER,
BTF_FIELD_KPTR,
};
enum {
BTF_FIELD_IGNORE = 0,
BTF_FIELD_FOUND = 1,
@@ -3283,9 +3284,9 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
/* Reject extra tags */
if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
return -EINVAL;
if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
if (!strcmp("kptr_untrusted", __btf_name_by_offset(btf, t->name_off)))
type = BPF_KPTR_UNREF;
else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
type = BPF_KPTR_REF;
else
return -EINVAL;
@@ -3394,6 +3395,7 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
field_mask_test_name(BPF_RB_NODE, "bpf_rb_node");
field_mask_test_name(BPF_REFCOUNT, "bpf_refcount");
/* Only return BPF_KPTR when all other types with matchable names fail */
if (field_mask & BPF_KPTR) {
@@ -3442,6 +3444,7 @@ static int btf_find_struct_field(const struct btf *btf,
case BPF_TIMER:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
ret = btf_find_struct(btf, member_type, off, sz, field_type,
idx < info_cnt ? &info[idx] : &tmp);
if (ret < 0)
@@ -3507,6 +3510,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
case BPF_TIMER:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
ret = btf_find_struct(btf, var_type, off, sz, field_type,
idx < info_cnt ? &info[idx] : &tmp);
if (ret < 0)
@@ -3557,7 +3561,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
{
struct module *mod = NULL;
const struct btf_type *t;
struct btf *kernel_btf;
/* If a matching btf type is found in kernel or module BTFs, kptr_ref
* is that BTF, otherwise it's program BTF
*/
struct btf *kptr_btf;
int ret;
s32 id;
@@ -3566,7 +3573,20 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
*/
t = btf_type_by_id(btf, info->kptr.type_id);
id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
&kernel_btf);
&kptr_btf);
if (id == -ENOENT) {
/* btf_parse_kptr should only be called w/ btf = program BTF */
WARN_ON_ONCE(btf_is_kernel(btf));
/* Type exists only in program BTF. Assume that it's a MEM_ALLOC
* kptr allocated via bpf_obj_new
*/
field->kptr.dtor = NULL;
id = info->kptr.type_id;
kptr_btf = (struct btf *)btf;
btf_get(kptr_btf);
goto found_dtor;
}
if (id < 0)
return id;
@@ -3583,20 +3603,20 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
* can be used as a referenced pointer and be stored in a map at
* the same time.
*/
dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
dtor_btf_id = btf_find_dtor_kfunc(kptr_btf, id);
if (dtor_btf_id < 0) {
ret = dtor_btf_id;
goto end_btf;
}
dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
dtor_func = btf_type_by_id(kptr_btf, dtor_btf_id);
if (!dtor_func) {
ret = -ENOENT;
goto end_btf;
}
if (btf_is_module(kernel_btf)) {
mod = btf_try_get_module(kernel_btf);
if (btf_is_module(kptr_btf)) {
mod = btf_try_get_module(kptr_btf);
if (!mod) {
ret = -ENXIO;
goto end_btf;
@@ -3606,7 +3626,7 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
/* We already verified dtor_func to be btf_type_is_func
* in register_btf_id_dtor_kfuncs.
*/
dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
dtor_func_name = __btf_name_by_offset(kptr_btf, dtor_func->name_off);
addr = kallsyms_lookup_name(dtor_func_name);
if (!addr) {
ret = -EINVAL;
@@ -3615,14 +3635,15 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
field->kptr.dtor = (void *)addr;
}
found_dtor:
field->kptr.btf_id = id;
field->kptr.btf = kernel_btf;
field->kptr.btf = kptr_btf;
field->kptr.module = mod;
return 0;
end_mod:
module_put(mod);
end_btf:
btf_put(kernel_btf);
btf_put(kptr_btf);
return ret;
}
@@ -3684,12 +3705,24 @@ static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field,
__alignof__(struct bpf_rb_node));
}
static int btf_field_cmp(const void *_a, const void *_b, const void *priv)
{
const struct btf_field *a = (const struct btf_field *)_a;
const struct btf_field *b = (const struct btf_field *)_b;
if (a->offset < b->offset)
return -1;
else if (a->offset > b->offset)
return 1;
return 0;
}
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
u32 field_mask, u32 value_size)
{
struct btf_field_info info_arr[BTF_FIELDS_MAX];
u32 next_off = 0, field_type_size;
struct btf_record *rec;
u32 next_off = 0;
int ret, i, cnt;
ret = btf_find_field(btf, t, field_mask, info_arr, ARRAY_SIZE(info_arr));
@@ -3708,8 +3741,10 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
rec->spin_lock_off = -EINVAL;
rec->timer_off = -EINVAL;
rec->refcount_off = -EINVAL;
for (i = 0; i < cnt; i++) {
if (info_arr[i].off + btf_field_type_size(info_arr[i].type) > value_size) {
field_type_size = btf_field_type_size(info_arr[i].type);
if (info_arr[i].off + field_type_size > value_size) {
WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, value_size);
ret = -EFAULT;
goto end;
@@ -3718,11 +3753,12 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
ret = -EEXIST;
goto end;
}
next_off = info_arr[i].off + btf_field_type_size(info_arr[i].type);
next_off = info_arr[i].off + field_type_size;
rec->field_mask |= info_arr[i].type;
rec->fields[i].offset = info_arr[i].off;
rec->fields[i].type = info_arr[i].type;
rec->fields[i].size = field_type_size;
switch (info_arr[i].type) {
case BPF_SPIN_LOCK:
@@ -3735,6 +3771,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
/* Cache offset for faster lookup at runtime */
rec->timer_off = rec->fields[i].offset;
break;
case BPF_REFCOUNT:
WARN_ON_ONCE(rec->refcount_off >= 0);
/* Cache offset for faster lookup at runtime */
rec->refcount_off = rec->fields[i].offset;
break;
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
@@ -3768,30 +3809,16 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
goto end;
}
/* need collection identity for non-owning refs before allowing this
*
* Consider a node type w/ both list and rb_node fields:
* struct node {
* struct bpf_list_node l;
* struct bpf_rb_node r;
* }
*
* Used like so:
* struct node *n = bpf_obj_new(....);
* bpf_list_push_front(&list_head, &n->l);
* bpf_rbtree_remove(&rb_root, &n->r);
*
* It should not be possible to rbtree_remove the node since it hasn't
* been added to a tree. But push_front converts n to a non-owning
* reference, and rbtree_remove accepts the non-owning reference to
* a type w/ bpf_rb_node field.
*/
if (btf_record_has_field(rec, BPF_LIST_NODE) &&
if (rec->refcount_off < 0 &&
btf_record_has_field(rec, BPF_LIST_NODE) &&
btf_record_has_field(rec, BPF_RB_NODE)) {
ret = -EINVAL;
goto end;
}
sort_r(rec->fields, rec->cnt, sizeof(struct btf_field), btf_field_cmp,
NULL, rec);
return rec;
end:
btf_record_free(rec);
@@ -3873,61 +3900,6 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
return 0;
}
static int btf_field_offs_cmp(const void *_a, const void *_b, const void *priv)
{
const u32 a = *(const u32 *)_a;
const u32 b = *(const u32 *)_b;
if (a < b)
return -1;
else if (a > b)
return 1;
return 0;
}
static void btf_field_offs_swap(void *_a, void *_b, int size, const void *priv)
{
struct btf_field_offs *foffs = (void *)priv;
u32 *off_base = foffs->field_off;
u32 *a = _a, *b = _b;
u8 *sz_a, *sz_b;
sz_a = foffs->field_sz + (a - off_base);
sz_b = foffs->field_sz + (b - off_base);
swap(*a, *b);
swap(*sz_a, *sz_b);
}
struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec)
{
struct btf_field_offs *foffs;
u32 i, *off;
u8 *sz;
BUILD_BUG_ON(ARRAY_SIZE(foffs->field_off) != ARRAY_SIZE(foffs->field_sz));
if (IS_ERR_OR_NULL(rec))
return NULL;
foffs = kzalloc(sizeof(*foffs), GFP_KERNEL | __GFP_NOWARN);
if (!foffs)
return ERR_PTR(-ENOMEM);
off = foffs->field_off;
sz = foffs->field_sz;
for (i = 0; i < rec->cnt; i++) {
off[i] = rec->fields[i].offset;
sz[i] = btf_field_type_size(rec->fields[i].type);
}
foffs->cnt = rec->cnt;
if (foffs->cnt == 1)
return foffs;
sort_r(foffs->field_off, foffs->cnt, sizeof(foffs->field_off[0]),
btf_field_offs_cmp, btf_field_offs_swap, foffs);
return foffs;
}
static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
u32 type_id, void *data, u8 bits_offset,
struct btf_show *show)
@@ -5332,6 +5304,7 @@ static const char *alloc_obj_fields[] = {
"bpf_list_node",
"bpf_rb_root",
"bpf_rb_node",
"bpf_refcount",
};
static struct btf_struct_metas *
@@ -5370,7 +5343,6 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
for (i = 1; i < n; i++) {
struct btf_struct_metas *new_tab;
const struct btf_member *member;
struct btf_field_offs *foffs;
struct btf_struct_meta *type;
struct btf_record *record;
const struct btf_type *t;
@@ -5406,23 +5378,13 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
type = &tab->types[tab->cnt];
type->btf_id = i;
record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
BPF_RB_ROOT | BPF_RB_NODE, t->size);
BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT, t->size);
/* The record cannot be unset, treat it as an error if so */
if (IS_ERR_OR_NULL(record)) {
ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
goto free;
}
foffs = btf_parse_field_offs(record);
/* We need the field_offs to be valid for a valid record,
* either both should be set or both should be unset.
*/
if (IS_ERR_OR_NULL(foffs)) {
btf_record_free(record);
ret = -EFAULT;
goto free;
}
type->record = record;
type->field_offs = foffs;
tab->cnt++;
}
return tab;
@@ -5489,38 +5451,45 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
return 0;
}
static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
u32 log_level, char __user *log_ubuf, u32 log_size)
static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_size)
{
struct btf_struct_metas *struct_meta_tab;
struct btf_verifier_env *env = NULL;
struct bpf_verifier_log *log;
struct btf *btf = NULL;
u8 *data;
u32 log_true_size;
int err;
if (btf_data_size > BTF_MAX_SIZE)
err = bpf_vlog_finalize(log, &log_true_size);
if (uattr_size >= offsetofend(union bpf_attr, btf_log_true_size) &&
copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, btf_log_true_size),
&log_true_size, sizeof(log_true_size)))
err = -EFAULT;
return err;
}
static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
{
bpfptr_t btf_data = make_bpfptr(attr->btf, uattr.is_kernel);
char __user *log_ubuf = u64_to_user_ptr(attr->btf_log_buf);
struct btf_struct_metas *struct_meta_tab;
struct btf_verifier_env *env = NULL;
struct btf *btf = NULL;
u8 *data;
int err, ret;
if (attr->btf_size > BTF_MAX_SIZE)
return ERR_PTR(-E2BIG);
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
if (!env)
return ERR_PTR(-ENOMEM);
log = &env->log;
if (log_level || log_ubuf || log_size) {
/* user requested verbose verifier output
* and supplied buffer to store the verification trace
*/
log->level = log_level;
log->ubuf = log_ubuf;
log->len_total = log_size;
/* log attributes have to be sane */
if (!bpf_verifier_log_attr_valid(log)) {
err = -EINVAL;
goto errout;
}
}
/* user could have requested verbose verifier output
* and supplied buffer to store the verification trace
*/
err = bpf_vlog_init(&env->log, attr->btf_log_level,
log_ubuf, attr->btf_log_size);
if (err)
goto errout_free;
btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
if (!btf) {
@@ -5529,16 +5498,16 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
}
env->btf = btf;
data = kvmalloc(btf_data_size, GFP_KERNEL | __GFP_NOWARN);
data = kvmalloc(attr->btf_size, GFP_KERNEL | __GFP_NOWARN);
if (!data) {
err = -ENOMEM;
goto errout;
}
btf->data = data;
btf->data_size = btf_data_size;
btf->data_size = attr->btf_size;
if (copy_from_bpfptr(data, btf_data, btf_data_size)) {
if (copy_from_bpfptr(data, btf_data, attr->btf_size)) {
err = -EFAULT;
goto errout;
}
@@ -5561,7 +5530,7 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
if (err)
goto errout;
struct_meta_tab = btf_parse_struct_metas(log, btf);
struct_meta_tab = btf_parse_struct_metas(&env->log, btf);
if (IS_ERR(struct_meta_tab)) {
err = PTR_ERR(struct_meta_tab);
goto errout;
@@ -5578,10 +5547,9 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
}
}
if (log->level && bpf_verifier_log_full(log)) {
err = -ENOSPC;
goto errout_meta;
}
err = finalize_log(&env->log, uattr, uattr_size);
if (err)
goto errout_free;
btf_verifier_env_free(env);
refcount_set(&btf->refcnt, 1);
@@ -5590,6 +5558,11 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
errout_meta:
btf_free_struct_meta_tab(btf);
errout:
/* overwrite err with -ENOSPC or -EFAULT */
ret = finalize_log(&env->log, uattr, uattr_size);
if (ret)
err = ret;
errout_free:
btf_verifier_env_free(env);
if (btf)
btf_free(btf);
@@ -5684,6 +5657,10 @@ again:
* int socket_filter_bpf_prog(struct __sk_buff *skb)
* { // no fields of skb are ever used }
*/
if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0)
return ctx_type;
if (strcmp(ctx_tname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0)
return ctx_type;
if (strcmp(ctx_tname, tname)) {
/* bpf_user_pt_regs_t is a typedef, so resolve it to
* underlying struct and check name again
@@ -5891,12 +5868,8 @@ struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
static bool is_int_ptr(struct btf *btf, const struct btf_type *t)
{
/* t comes in already as a pointer */
t = btf_type_by_id(btf, t->type);
/* allow const */
if (BTF_INFO_KIND(t->info) == BTF_KIND_CONST)
t = btf_type_by_id(btf, t->type);
/* skip modifiers */
t = btf_type_skip_modifiers(btf, t->type, NULL);
return btf_type_is_int(t);
}
@@ -6147,7 +6120,8 @@ enum bpf_struct_walk_result {
static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
u32 *next_btf_id, enum bpf_type_flag *flag)
u32 *next_btf_id, enum bpf_type_flag *flag,
const char **field_name)
{
u32 i, moff, mtrue_end, msize = 0, total_nelems = 0;
const struct btf_type *mtype, *elem_type = NULL;
@@ -6155,6 +6129,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
const char *tname, *mname, *tag_value;
u32 vlen, elem_id, mid;
*flag = 0;
again:
tname = __btf_name_by_offset(btf, t->name_off);
if (!btf_type_is_struct(t)) {
@@ -6186,11 +6161,13 @@ again:
if (off < moff)
goto error;
/* Only allow structure for now, can be relaxed for
* other types later.
*/
/* allow structure and integer */
t = btf_type_skip_modifiers(btf, array_elem->type,
NULL);
if (btf_type_is_int(t))
return WALK_SCALAR;
if (!btf_type_is_struct(t))
goto error;
@@ -6321,6 +6298,15 @@ error:
* of this field or inside of this struct
*/
if (btf_type_is_struct(mtype)) {
if (BTF_INFO_KIND(mtype->info) == BTF_KIND_UNION &&
btf_type_vlen(mtype) != 1)
/*
* walking unions yields untrusted pointers
* with exception of __bpf_md_ptr and other
* unions with a single member
*/
*flag |= PTR_UNTRUSTED;
/* our field must be inside that union or struct */
t = mtype;
@@ -6365,7 +6351,9 @@ error:
stype = btf_type_skip_modifiers(btf, mtype->type, &id);
if (btf_type_is_struct(stype)) {
*next_btf_id = id;
*flag = tmp_flag;
*flag |= tmp_flag;
if (field_name)
*field_name = mname;
return WALK_PTR;
}
}
@@ -6392,7 +6380,8 @@ error:
int btf_struct_access(struct bpf_verifier_log *log,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype __maybe_unused,
u32 *next_btf_id, enum bpf_type_flag *flag)
u32 *next_btf_id, enum bpf_type_flag *flag,
const char **field_name)
{
const struct btf *btf = reg->btf;
enum bpf_type_flag tmp_flag = 0;
@@ -6424,7 +6413,7 @@ int btf_struct_access(struct bpf_verifier_log *log,
t = btf_type_by_id(btf, id);
do {
err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag, field_name);
switch (err) {
case WALK_PTR:
@@ -6499,7 +6488,7 @@ again:
type = btf_type_by_id(btf, id);
if (!type)
return false;
err = btf_struct_walk(log, btf, type, off, 1, &id, &flag);
err = btf_struct_walk(log, btf, type, off, 1, &id, &flag, NULL);
if (err != WALK_STRUCT)
return false;
@@ -7180,15 +7169,12 @@ static int __btf_new_fd(struct btf *btf)
return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC);
}
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr)
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
{
struct btf *btf;
int ret;
btf = btf_parse(make_bpfptr(attr->btf, uattr.is_kernel),
attr->btf_size, attr->btf_log_level,
u64_to_user_ptr(attr->btf_log_buf),
attr->btf_log_size);
btf = btf_parse(attr, uattr, uattr_size);
if (IS_ERR(btf))
return PTR_ERR(btf);
@@ -7578,6 +7564,108 @@ BTF_ID_LIST_GLOBAL(btf_tracing_ids, MAX_BTF_TRACING_TYPE)
BTF_TRACING_TYPE_xxx
#undef BTF_TRACING_TYPE
static int btf_check_iter_kfuncs(struct btf *btf, const char *func_name,
const struct btf_type *func, u32 func_flags)
{
u32 flags = func_flags & (KF_ITER_NEW | KF_ITER_NEXT | KF_ITER_DESTROY);
const char *name, *sfx, *iter_name;
const struct btf_param *arg;
const struct btf_type *t;
char exp_name[128];
u32 nr_args;
/* exactly one of KF_ITER_{NEW,NEXT,DESTROY} can be set */
if (!flags || (flags & (flags - 1)))
return -EINVAL;
/* any BPF iter kfunc should have `struct bpf_iter_<type> *` first arg */
nr_args = btf_type_vlen(func);
if (nr_args < 1)
return -EINVAL;
arg = &btf_params(func)[0];
t = btf_type_skip_modifiers(btf, arg->type, NULL);
if (!t || !btf_type_is_ptr(t))
return -EINVAL;
t = btf_type_skip_modifiers(btf, t->type, NULL);
if (!t || !__btf_type_is_struct(t))
return -EINVAL;
name = btf_name_by_offset(btf, t->name_off);
if (!name || strncmp(name, ITER_PREFIX, sizeof(ITER_PREFIX) - 1))
return -EINVAL;
/* sizeof(struct bpf_iter_<type>) should be a multiple of 8 to
* fit nicely in stack slots
*/
if (t->size == 0 || (t->size % 8))
return -EINVAL;
/* validate bpf_iter_<type>_{new,next,destroy}(struct bpf_iter_<type> *)
* naming pattern
*/
iter_name = name + sizeof(ITER_PREFIX) - 1;
if (flags & KF_ITER_NEW)
sfx = "new";
else if (flags & KF_ITER_NEXT)
sfx = "next";
else /* (flags & KF_ITER_DESTROY) */
sfx = "destroy";
snprintf(exp_name, sizeof(exp_name), "bpf_iter_%s_%s", iter_name, sfx);
if (strcmp(func_name, exp_name))
return -EINVAL;
/* only iter constructor should have extra arguments */
if (!(flags & KF_ITER_NEW) && nr_args != 1)
return -EINVAL;
if (flags & KF_ITER_NEXT) {
/* bpf_iter_<type>_next() should return pointer */
t = btf_type_skip_modifiers(btf, func->type, NULL);
if (!t || !btf_type_is_ptr(t))
return -EINVAL;
}
if (flags & KF_ITER_DESTROY) {
/* bpf_iter_<type>_destroy() should return void */
t = btf_type_by_id(btf, func->type);
if (!t || !btf_type_is_void(t))
return -EINVAL;
}
return 0;
}
static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
{
const struct btf_type *func;
const char *func_name;
int err;
/* any kfunc should be FUNC -> FUNC_PROTO */
func = btf_type_by_id(btf, func_id);
if (!func || !btf_type_is_func(func))
return -EINVAL;
/* sanity check kfunc name */
func_name = btf_name_by_offset(btf, func->name_off);
if (!func_name || !func_name[0])
return -EINVAL;
func = btf_type_by_id(btf, func->type);
if (!func || !btf_type_is_func_proto(func))
return -EINVAL;
if (func_flags & (KF_ITER_NEW | KF_ITER_NEXT | KF_ITER_DESTROY)) {
err = btf_check_iter_kfuncs(btf, func_name, func, func_flags);
if (err)
return err;
}
return 0;
}
/* Kernel Function (kfunc) BTF ID set registration API */
static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
@@ -7705,6 +7793,21 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
return BTF_KFUNC_HOOK_TRACING;
case BPF_PROG_TYPE_SYSCALL:
return BTF_KFUNC_HOOK_SYSCALL;
case BPF_PROG_TYPE_CGROUP_SKB:
return BTF_KFUNC_HOOK_CGROUP_SKB;
case BPF_PROG_TYPE_SCHED_ACT:
return BTF_KFUNC_HOOK_SCHED_ACT;
case BPF_PROG_TYPE_SK_SKB:
return BTF_KFUNC_HOOK_SK_SKB;
case BPF_PROG_TYPE_SOCKET_FILTER:
return BTF_KFUNC_HOOK_SOCKET_FILTER;
case BPF_PROG_TYPE_LWT_OUT:
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_XMIT:
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
return BTF_KFUNC_HOOK_LWT;
case BPF_PROG_TYPE_NETFILTER:
return BTF_KFUNC_HOOK_NETFILTER;
default:
return BTF_KFUNC_HOOK_MAX;
}
@@ -7741,7 +7844,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
const struct btf_kfunc_id_set *kset)
{
struct btf *btf;
int ret;
int ret, i;
btf = btf_get_module_btf(kset->owner);
if (!btf) {
@@ -7758,7 +7861,15 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
if (IS_ERR(btf))
return PTR_ERR(btf);
for (i = 0; i < kset->set->cnt; i++) {
ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id,
kset->set->pairs[i].flags);
if (ret)
goto err_out;
}
ret = btf_populate_kfunc_set(btf, hook, kset->set);
err_out:
btf_put(btf);
return ret;
}
@@ -8249,12 +8360,10 @@ check_modules:
btf_get(mod_btf);
spin_unlock_bh(&btf_idr_lock);
cands = bpf_core_add_cands(cands, mod_btf, btf_nr_types(main_btf));
if (IS_ERR(cands)) {
btf_put(mod_btf);
return ERR_CAST(cands);
}
spin_lock_bh(&btf_idr_lock);
btf_put(mod_btf);
if (IS_ERR(cands))
return ERR_CAST(cands);
spin_lock_bh(&btf_idr_lock);
}
spin_unlock_bh(&btf_idr_lock);
/* cands is a pointer to kmalloced memory here if cands->cnt > 0
@@ -8336,16 +8445,15 @@ out:
bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
const struct bpf_reg_state *reg,
int off)
const char *field_name, u32 btf_id, const char *suffix)
{
struct btf *btf = reg->btf;
const struct btf_type *walk_type, *safe_type;
const char *tname;
char safe_tname[64];
long ret, safe_id;
const struct btf_member *member, *m_walk = NULL;
const struct btf_member *member;
u32 i;
const char *walk_name;
walk_type = btf_type_by_id(btf, reg->btf_id);
if (!walk_type)
@@ -8353,7 +8461,7 @@ bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
tname = btf_name_by_offset(btf, walk_type->name_off);
ret = snprintf(safe_tname, sizeof(safe_tname), "%s__safe_fields", tname);
ret = snprintf(safe_tname, sizeof(safe_tname), "%s%s", tname, suffix);
if (ret < 0)
return false;
@@ -8365,30 +8473,17 @@ bool btf_nested_type_is_trusted(struct bpf_verifier_log *log,
if (!safe_type)
return false;
for_each_member(i, walk_type, member) {
u32 moff;
/* We're looking for the PTR_TO_BTF_ID member in the struct
* type we're walking which matches the specified offset.
* Below, we'll iterate over the fields in the safe variant of
* the struct and see if any of them has a matching type /
* name.
*/
moff = __btf_member_bit_offset(walk_type, member) / 8;
if (off == moff) {
m_walk = member;
break;
}
}
if (m_walk == NULL)
return false;
walk_name = __btf_name_by_offset(btf, m_walk->name_off);
for_each_member(i, safe_type, member) {
const char *m_name = __btf_name_by_offset(btf, member->name_off);
const struct btf_type *mtype = btf_type_by_id(btf, member->type);
u32 id;
if (!btf_type_is_ptr(mtype))
continue;
btf_type_skip_modifiers(btf, mtype->type, &id);
/* If we match on both type and name, the field is considered trusted. */
if (m_walk->type == member->type && !strcmp(walk_name, m_name))
if (btf_id == id && !strcmp(field_name, m_name))
return true;
}

View File

@@ -1921,14 +1921,17 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ret < 0)
goto out;
if (ctx.optlen > max_optlen || ctx.optlen < 0) {
if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
ret = -EFAULT;
goto out;
}
if (ctx.optlen != 0) {
if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
put_user(ctx.optlen, optlen)) {
if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
ret = -EFAULT;
goto out;
}
if (put_user(ctx.optlen, optlen)) {
ret = -EFAULT;
goto out;
}
@@ -2223,10 +2226,12 @@ static u32 sysctl_convert_ctx_access(enum bpf_access_type type,
BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos),
treg, si->dst_reg,
offsetof(struct bpf_sysctl_kern, ppos));
*insn++ = BPF_STX_MEM(
BPF_SIZEOF(u32), treg, si->src_reg,
*insn++ = BPF_RAW_INSN(
BPF_CLASS(si->code) | BPF_MEM | BPF_SIZEOF(u32),
treg, si->src_reg,
bpf_ctx_narrow_access_offset(
0, sizeof(u32), sizeof(loff_t)));
0, sizeof(u32), sizeof(loff_t)),
si->imm);
*insn++ = BPF_LDX_MEM(
BPF_DW, treg, si->dst_reg,
offsetof(struct bpf_sysctl_kern, tmp_reg));
@@ -2376,10 +2381,17 @@ static bool cg_sockopt_is_valid_access(int off, int size,
return true;
}
#define CG_SOCKOPT_ACCESS_FIELD(T, F) \
T(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sockopt_kern, F))
#define CG_SOCKOPT_READ_FIELD(F) \
BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sockopt_kern, F))
#define CG_SOCKOPT_WRITE_FIELD(F) \
BPF_RAW_INSN((BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F) | \
BPF_MEM | BPF_CLASS(si->code)), \
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sockopt_kern, F), \
si->imm)
static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
@@ -2391,25 +2403,25 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
switch (si->off) {
case offsetof(struct bpf_sockopt, sk):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, sk);
*insn++ = CG_SOCKOPT_READ_FIELD(sk);
break;
case offsetof(struct bpf_sockopt, level):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, level);
*insn++ = CG_SOCKOPT_WRITE_FIELD(level);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, level);
*insn++ = CG_SOCKOPT_READ_FIELD(level);
break;
case offsetof(struct bpf_sockopt, optname):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optname);
*insn++ = CG_SOCKOPT_WRITE_FIELD(optname);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optname);
*insn++ = CG_SOCKOPT_READ_FIELD(optname);
break;
case offsetof(struct bpf_sockopt, optlen):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optlen);
*insn++ = CG_SOCKOPT_WRITE_FIELD(optlen);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optlen);
*insn++ = CG_SOCKOPT_READ_FIELD(optlen);
break;
case offsetof(struct bpf_sockopt, retval):
BUILD_BUG_ON(offsetof(struct bpf_cg_run_ctx, run_ctx) != 0);
@@ -2429,9 +2441,11 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct task_struct, bpf_ctx),
treg, treg,
offsetof(struct task_struct, bpf_ctx));
*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
treg, si->src_reg,
offsetof(struct bpf_cg_run_ctx, retval));
*insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_MEM |
BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
treg, si->src_reg,
offsetof(struct bpf_cg_run_ctx, retval),
si->imm);
*insn++ = BPF_LDX_MEM(BPF_DW, treg, si->dst_reg,
offsetof(struct bpf_sockopt_kern, tmp_reg));
} else {
@@ -2447,10 +2461,10 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
}
break;
case offsetof(struct bpf_sockopt, optval):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval);
*insn++ = CG_SOCKOPT_READ_FIELD(optval);
break;
case offsetof(struct bpf_sockopt, optval_end):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval_end);
*insn++ = CG_SOCKOPT_READ_FIELD(optval_end);
break;
}
@@ -2529,10 +2543,6 @@ cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_current_pid_tgid_proto;
case BPF_FUNC_get_current_comm:
return &bpf_get_current_comm_proto;
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_get_current_ancestor_cgroup_id:
return &bpf_get_current_ancestor_cgroup_id_proto;
#ifdef CONFIG_CGROUP_NET_CLASSID
case BPF_FUNC_get_cgroup_classid:
return &bpf_get_cgroup_classid_curr_proto;

View File

@@ -1187,6 +1187,7 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
s16 off = insn->off;
s32 imm = insn->imm;
u8 *addr;
int err;
*func_addr_fixed = insn->src_reg != BPF_PSEUDO_CALL;
if (!*func_addr_fixed) {
@@ -1201,6 +1202,11 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
addr = (u8 *)prog->aux->func[off]->bpf_func;
else
return -EINVAL;
} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
bpf_jit_supports_far_kfunc_call()) {
err = bpf_get_kfunc_addr(prog, insn->imm, insn->off, &addr);
if (err)
return err;
} else {
/* Address of a BPF helper call. Since part of the core
* kernel, it's always at a fixed location. __bpf_call_base
@@ -2732,6 +2738,11 @@ bool __weak bpf_jit_supports_kfunc_call(void)
return false;
}
bool __weak bpf_jit_supports_far_kfunc_call(void)
{
return false;
}
/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
* skb_copy_bits(), so provide a weak definition of it for NET-less config.
*/

View File

@@ -540,7 +540,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
}
}
static int cpu_map_delete_elem(struct bpf_map *map, void *key)
static long cpu_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
u32 key_cpu = *(u32 *)key;
@@ -553,8 +553,8 @@ static int cpu_map_delete_elem(struct bpf_map *map, void *key)
return 0;
}
static int cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
struct bpf_cpumap_val cpumap_value = {};
@@ -667,12 +667,21 @@ static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
return 0;
}
static int cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
static long cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
{
return __bpf_xdp_redirect_map(map, index, flags, 0,
__cpu_map_lookup_elem);
}
static u64 cpu_map_mem_usage(const struct bpf_map *map)
{
u64 usage = sizeof(struct bpf_cpu_map);
/* Currently the dynamically allocated elements are not counted */
usage += (u64)map->max_entries * sizeof(struct bpf_cpu_map_entry *);
return usage;
}
BTF_ID_LIST_SINGLE(cpu_map_btf_ids, struct, bpf_cpu_map)
const struct bpf_map_ops cpu_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -683,6 +692,7 @@ const struct bpf_map_ops cpu_map_ops = {
.map_lookup_elem = cpu_map_lookup_elem,
.map_get_next_key = cpu_map_get_next_key,
.map_check_btf = map_check_no_btf,
.map_mem_usage = cpu_map_mem_usage,
.map_btf_id = &cpu_map_btf_ids[0],
.map_redirect = cpu_map_redirect,
};

View File

@@ -9,6 +9,7 @@
/**
* struct bpf_cpumask - refcounted BPF cpumask wrapper structure
* @cpumask: The actual cpumask embedded in the struct.
* @rcu: The RCU head used to free the cpumask with RCU safety.
* @usage: Object reference counter. When the refcount goes to 0, the
* memory is released back to the BPF allocator, which provides
* RCU safety.
@@ -24,6 +25,7 @@
*/
struct bpf_cpumask {
cpumask_t cpumask;
struct rcu_head rcu;
refcount_t usage;
};
@@ -55,7 +57,7 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_create(void)
/* cpumask must be the first element so struct bpf_cpumask be cast to struct cpumask. */
BUILD_BUG_ON(offsetof(struct bpf_cpumask, cpumask) != 0);
cpumask = bpf_mem_alloc(&bpf_cpumask_ma, sizeof(*cpumask));
cpumask = bpf_mem_cache_alloc(&bpf_cpumask_ma);
if (!cpumask)
return NULL;
@@ -80,32 +82,14 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask)
return cpumask;
}
/**
* bpf_cpumask_kptr_get() - Attempt to acquire a reference to a BPF cpumask
* stored in a map.
* @cpumaskp: A pointer to a BPF cpumask map value.
*
* Attempts to acquire a reference to a BPF cpumask stored in a map value. The
* cpumask returned by this function must either be embedded in a map as a
* kptr, or freed with bpf_cpumask_release(). This function may return NULL if
* no BPF cpumask was found in the specified map value.
*/
__bpf_kfunc struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumaskp)
static void cpumask_free_cb(struct rcu_head *head)
{
struct bpf_cpumask *cpumask;
/* The BPF memory allocator frees memory backing its caches in an RCU
* callback. Thus, we can safely use RCU to ensure that the cpumask is
* safe to read.
*/
rcu_read_lock();
cpumask = READ_ONCE(*cpumaskp);
if (cpumask && !refcount_inc_not_zero(&cpumask->usage))
cpumask = NULL;
rcu_read_unlock();
return cpumask;
cpumask = container_of(head, struct bpf_cpumask, rcu);
migrate_disable();
bpf_mem_cache_free(&bpf_cpumask_ma, cpumask);
migrate_enable();
}
/**
@@ -118,14 +102,8 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumas
*/
__bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask)
{
if (!cpumask)
return;
if (refcount_dec_and_test(&cpumask->usage)) {
migrate_disable();
bpf_mem_free(&bpf_cpumask_ma, cpumask);
migrate_enable();
}
if (refcount_dec_and_test(&cpumask->usage))
call_rcu(&cpumask->rcu, cpumask_free_cb);
}
/**
@@ -424,29 +402,28 @@ __diag_pop();
BTF_SET8_START(cpumask_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_cpumask_create, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_cpumask_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cpumask_first, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_and, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_or, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_full, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_any, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cpumask_first, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_and, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_or, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_full, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_any, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_RCU)
BTF_SET8_END(cpumask_kfunc_btf_ids)
static const struct btf_kfunc_id_set cpumask_kfunc_set = {
@@ -468,7 +445,7 @@ static int __init cpumask_kfunc_init(void)
},
};
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, 0, false);
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors,

View File

@@ -809,7 +809,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
kfree(dev);
}
static int dev_map_delete_elem(struct bpf_map *map, void *key)
static long dev_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *old_dev;
@@ -819,12 +819,14 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
return -EINVAL;
old_dev = unrcu_pointer(xchg(&dtab->netdev_map[k], NULL));
if (old_dev)
if (old_dev) {
call_rcu(&old_dev->rcu, __dev_map_entry_free);
atomic_dec((atomic_t *)&dtab->items);
}
return 0;
}
static int dev_map_hash_delete_elem(struct bpf_map *map, void *key)
static long dev_map_hash_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *old_dev;
@@ -895,8 +897,8 @@ err_out:
return ERR_PTR(-EINVAL);
}
static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
static long __dev_map_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *dev, *old_dev;
@@ -931,19 +933,21 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
old_dev = unrcu_pointer(xchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev)));
if (old_dev)
call_rcu(&old_dev->rcu, __dev_map_entry_free);
else
atomic_inc((atomic_t *)&dtab->items);
return 0;
}
static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long dev_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
return __dev_map_update_elem(current->nsproxy->net_ns,
map, key, value, map_flags);
}
static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
static long __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *dev, *old_dev;
@@ -995,27 +999,41 @@ out_err:
return err;
}
static int dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
return __dev_map_hash_update_elem(current->nsproxy->net_ns,
map, key, value, map_flags);
}
static int dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
static long dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags,
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,
__dev_map_lookup_elem);
}
static int dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
static long dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags,
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,
__dev_map_hash_lookup_elem);
}
static u64 dev_map_mem_usage(const struct bpf_map *map)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
u64 usage = sizeof(struct bpf_dtab);
if (map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)
usage += (u64)dtab->n_buckets * sizeof(struct hlist_head);
else
usage += (u64)map->max_entries * sizeof(struct bpf_dtab_netdev *);
usage += atomic_read((atomic_t *)&dtab->items) *
(u64)sizeof(struct bpf_dtab_netdev);
return usage;
}
BTF_ID_LIST_SINGLE(dev_map_btf_ids, struct, bpf_dtab)
const struct bpf_map_ops dev_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -1026,6 +1044,7 @@ const struct bpf_map_ops dev_map_ops = {
.map_update_elem = dev_map_update_elem,
.map_delete_elem = dev_map_delete_elem,
.map_check_btf = map_check_no_btf,
.map_mem_usage = dev_map_mem_usage,
.map_btf_id = &dev_map_btf_ids[0],
.map_redirect = dev_map_redirect,
};
@@ -1039,6 +1058,7 @@ const struct bpf_map_ops dev_map_hash_ops = {
.map_update_elem = dev_map_hash_update_elem,
.map_delete_elem = dev_map_hash_delete_elem,
.map_check_btf = map_check_no_btf,
.map_mem_usage = dev_map_mem_usage,
.map_btf_id = &dev_map_btf_ids[0],
.map_redirect = dev_hash_map_redirect,
};
@@ -1109,9 +1129,11 @@ static int dev_map_notification(struct notifier_block *notifier,
if (!dev || netdev != dev->dev)
continue;
odev = unrcu_pointer(cmpxchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev), NULL));
if (dev == odev)
if (dev == odev) {
call_rcu(&dev->rcu,
__dev_map_entry_free);
atomic_dec((atomic_t *)&dtab->items);
}
}
}
rcu_read_unlock();

View File

@@ -249,7 +249,18 @@ static void htab_free_prealloced_fields(struct bpf_htab *htab)
struct htab_elem *elem;
elem = get_htab_elem(htab, i);
bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8));
if (htab_is_percpu(htab)) {
void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size);
int cpu;
for_each_possible_cpu(cpu) {
bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
cond_resched();
}
} else {
bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8));
cond_resched();
}
cond_resched();
}
}
@@ -596,6 +607,8 @@ free_htab:
static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
{
if (likely(key_len % 4 == 0))
return jhash2(key, key_len / 4, hashrnd);
return jhash(key, key_len, hashrnd);
}
@@ -759,9 +772,17 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
static void check_and_free_fields(struct bpf_htab *htab,
struct htab_elem *elem)
{
void *map_value = elem->key + round_up(htab->map.key_size, 8);
if (htab_is_percpu(htab)) {
void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size);
int cpu;
bpf_obj_free_fields(htab->map.record, map_value);
for_each_possible_cpu(cpu)
bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
} else {
void *map_value = elem->key + round_up(htab->map.key_size, 8);
bpf_obj_free_fields(htab->map.record, map_value);
}
}
/* It is called from the bpf_lru_list when the LRU needs to delete
@@ -858,9 +879,9 @@ find_first_elem:
static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
check_and_free_fields(htab, l);
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
check_and_free_fields(htab, l);
bpf_mem_cache_free(&htab->ma, l);
}
@@ -918,14 +939,13 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
{
if (!onallcpus) {
/* copy true value_size bytes */
memcpy(this_cpu_ptr(pptr), value, htab->map.value_size);
copy_map_value(&htab->map, this_cpu_ptr(pptr), value);
} else {
u32 size = round_up(htab->map.value_size, 8);
int off = 0, cpu;
for_each_possible_cpu(cpu) {
bpf_long_memcpy(per_cpu_ptr(pptr, cpu),
value + off, size);
copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value + off);
off += size;
}
}
@@ -940,16 +960,14 @@ static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr,
* (onallcpus=false always when coming from bpf prog).
*/
if (!onallcpus) {
u32 size = round_up(htab->map.value_size, 8);
int current_cpu = raw_smp_processor_id();
int cpu;
for_each_possible_cpu(cpu) {
if (cpu == current_cpu)
bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value,
size);
else
memset(per_cpu_ptr(pptr, cpu), 0, size);
copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value);
else /* Since elem is preallocated, we cannot touch special fields */
zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu));
}
} else {
pcpu_copy_value(htab, pptr, value, onallcpus);
@@ -1057,8 +1075,8 @@ static int check_flags(struct bpf_htab *htab, struct htab_elem *l_old,
}
/* Called from syscall or from eBPF program */
static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new = NULL, *l_old;
@@ -1159,8 +1177,8 @@ static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
bpf_lru_push_free(&htab->lru, &elem->lru_node);
}
static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new, *l_old = NULL;
@@ -1226,9 +1244,9 @@ err:
return ret;
}
static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags,
bool onallcpus)
static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags,
bool onallcpus)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new = NULL, *l_old;
@@ -1281,9 +1299,9 @@ err:
return ret;
}
static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags,
bool onallcpus)
static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags,
bool onallcpus)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new = NULL, *l_old;
@@ -1348,21 +1366,21 @@ err:
return ret;
}
static int htab_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
static long htab_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
return __htab_percpu_map_update_elem(map, key, value, map_flags, false);
}
static int htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
return __htab_lru_percpu_map_update_elem(map, key, value, map_flags,
false);
}
/* Called from syscall or from eBPF program */
static int htab_map_delete_elem(struct bpf_map *map, void *key)
static long htab_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct hlist_nulls_head *head;
@@ -1398,7 +1416,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
return ret;
}
static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct hlist_nulls_head *head;
@@ -1575,9 +1593,8 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
pptr = htab_elem_get_ptr(l, key_size);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off,
per_cpu_ptr(pptr, cpu),
roundup_value_size);
copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu));
check_and_init_map_value(&htab->map, value + off);
off += roundup_value_size;
}
} else {
@@ -1772,8 +1789,8 @@ again_nocopy:
pptr = htab_elem_get_ptr(l, map->key_size);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(dst_val + off,
per_cpu_ptr(pptr, cpu), size);
copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu));
check_and_init_map_value(&htab->map, dst_val + off);
off += size;
}
} else {
@@ -2046,9 +2063,9 @@ static int __bpf_hash_map_seq_show(struct seq_file *seq, struct htab_elem *elem)
roundup_value_size = round_up(map->value_size, 8);
pptr = htab_elem_get_ptr(elem, map->key_size);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(info->percpu_value_buf + off,
per_cpu_ptr(pptr, cpu),
roundup_value_size);
copy_map_value_long(map, info->percpu_value_buf + off,
per_cpu_ptr(pptr, cpu));
check_and_init_map_value(map, info->percpu_value_buf + off);
off += roundup_value_size;
}
ctx.value = info->percpu_value_buf;
@@ -2119,8 +2136,8 @@ static const struct bpf_iter_seq_info iter_seq_info = {
.seq_priv_size = sizeof(struct bpf_iter_seq_hash_map_info),
};
static int bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
void *callback_ctx, u64 flags)
static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
void *callback_ctx, u64 flags)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct hlist_nulls_head *head;
@@ -2175,6 +2192,44 @@ out:
return num_elems;
}
static u64 htab_map_mem_usage(const struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
u32 value_size = round_up(htab->map.value_size, 8);
bool prealloc = htab_is_prealloc(htab);
bool percpu = htab_is_percpu(htab);
bool lru = htab_is_lru(htab);
u64 num_entries;
u64 usage = sizeof(struct bpf_htab);
usage += sizeof(struct bucket) * htab->n_buckets;
usage += sizeof(int) * num_possible_cpus() * HASHTAB_MAP_LOCK_COUNT;
if (prealloc) {
num_entries = map->max_entries;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
usage += htab->elem_size * num_entries;
if (percpu)
usage += value_size * num_possible_cpus() * num_entries;
else if (!lru)
usage += sizeof(struct htab_elem *) * num_possible_cpus();
} else {
#define LLIST_NODE_SZ sizeof(struct llist_node)
num_entries = htab->use_percpu_counter ?
percpu_counter_sum(&htab->pcount) :
atomic_read(&htab->count);
usage += (htab->elem_size + LLIST_NODE_SZ) * num_entries;
if (percpu) {
usage += (LLIST_NODE_SZ + sizeof(void *)) * num_entries;
usage += value_size * num_possible_cpus() * num_entries;
}
}
return usage;
}
BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab)
const struct bpf_map_ops htab_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -2191,6 +2246,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
@@ -2212,6 +2268,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
@@ -2292,8 +2349,8 @@ int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value)
*/
pptr = htab_elem_get_ptr(l, map->key_size);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off,
per_cpu_ptr(pptr, cpu), size);
copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
check_and_init_map_value(map, value + off);
off += size;
}
ret = 0;
@@ -2363,6 +2420,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_percpu),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
@@ -2382,6 +2440,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru_percpu),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
@@ -2519,6 +2578,7 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
.map_gen_lookup = htab_of_map_gen_lookup,
.map_check_btf = map_check_no_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
};

View File

@@ -18,6 +18,7 @@
#include <linux/pid_namespace.h>
#include <linux/poison.h>
#include <linux/proc_ns.h>
#include <linux/sched/task.h>
#include <linux/security.h>
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
@@ -257,7 +258,7 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
goto err_clear;
/* Verifier guarantees that size > 0 */
strscpy(buf, task->comm, size);
strscpy_pad(buf, task->comm, size);
return 0;
err_clear:
memset(buf, 0, size);
@@ -571,7 +572,7 @@ static const struct bpf_func_proto bpf_strncmp_proto = {
.func = bpf_strncmp,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_MEM,
.arg1_type = ARG_PTR_TO_MEM | MEM_RDONLY,
.arg2_type = ARG_CONST_SIZE,
.arg3_type = ARG_PTR_TO_CONST_STR,
};
@@ -1264,10 +1265,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
{
struct bpf_hrtimer *t;
int ret = 0;
enum hrtimer_mode mode;
if (in_nmi())
return -EOPNOTSUPP;
if (flags)
if (flags > BPF_F_TIMER_ABS)
return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
@@ -1275,7 +1277,13 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
ret = -EINVAL;
goto out;
}
hrtimer_start(&t->timer, ns_to_ktime(nsecs), HRTIMER_MODE_REL_SOFT);
if (flags & BPF_F_TIMER_ABS)
mode = HRTIMER_MODE_ABS_SOFT;
else
mode = HRTIMER_MODE_REL_SOFT;
hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
@@ -1420,11 +1428,21 @@ static bool bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr)
return ptr->size & DYNPTR_RDONLY_BIT;
}
void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
{
ptr->size |= DYNPTR_RDONLY_BIT;
}
static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
{
ptr->size |= type << DYNPTR_TYPE_SHIFT;
}
static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr)
{
return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT;
}
u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr)
{
return ptr->size & DYNPTR_SIZE_MASK;
@@ -1497,6 +1515,7 @@ static const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern *, src,
u32, offset, u64, flags)
{
enum bpf_dynptr_type type;
int err;
if (!src->data || flags)
@@ -1506,13 +1525,25 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern
if (err)
return err;
/* Source and destination may possibly overlap, hence use memmove to
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
* pointing to overlapping PTR_TO_MAP_VALUE regions.
*/
memmove(dst, src->data + src->offset + offset, len);
type = bpf_dynptr_get_type(src);
return 0;
switch (type) {
case BPF_DYNPTR_TYPE_LOCAL:
case BPF_DYNPTR_TYPE_RINGBUF:
/* Source and destination may possibly overlap, hence use memmove to
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
* pointing to overlapping PTR_TO_MAP_VALUE regions.
*/
memmove(dst, src->data + src->offset + offset, len);
return 0;
case BPF_DYNPTR_TYPE_SKB:
return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
case BPF_DYNPTR_TYPE_XDP:
return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len);
default:
WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
return -EFAULT;
}
}
static const struct bpf_func_proto bpf_dynptr_read_proto = {
@@ -1529,22 +1560,40 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = {
BPF_CALL_5(bpf_dynptr_write, const struct bpf_dynptr_kern *, dst, u32, offset, void *, src,
u32, len, u64, flags)
{
enum bpf_dynptr_type type;
int err;
if (!dst->data || flags || bpf_dynptr_is_rdonly(dst))
if (!dst->data || bpf_dynptr_is_rdonly(dst))
return -EINVAL;
err = bpf_dynptr_check_off_len(dst, offset, len);
if (err)
return err;
/* Source and destination may possibly overlap, hence use memmove to
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
* pointing to overlapping PTR_TO_MAP_VALUE regions.
*/
memmove(dst->data + dst->offset + offset, src, len);
type = bpf_dynptr_get_type(dst);
return 0;
switch (type) {
case BPF_DYNPTR_TYPE_LOCAL:
case BPF_DYNPTR_TYPE_RINGBUF:
if (flags)
return -EINVAL;
/* Source and destination may possibly overlap, hence use memmove to
* copy the data. E.g. bpf_dynptr_from_mem may create two dynptr
* pointing to overlapping PTR_TO_MAP_VALUE regions.
*/
memmove(dst->data + dst->offset + offset, src, len);
return 0;
case BPF_DYNPTR_TYPE_SKB:
return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len,
flags);
case BPF_DYNPTR_TYPE_XDP:
if (flags)
return -EINVAL;
return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
default:
WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
return -EFAULT;
}
}
static const struct bpf_func_proto bpf_dynptr_write_proto = {
@@ -1560,6 +1609,7 @@ static const struct bpf_func_proto bpf_dynptr_write_proto = {
BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
{
enum bpf_dynptr_type type;
int err;
if (!ptr->data)
@@ -1572,7 +1622,20 @@ BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u3
if (bpf_dynptr_is_rdonly(ptr))
return 0;
return (unsigned long)(ptr->data + ptr->offset + offset);
type = bpf_dynptr_get_type(ptr);
switch (type) {
case BPF_DYNPTR_TYPE_LOCAL:
case BPF_DYNPTR_TYPE_RINGBUF:
return (unsigned long)(ptr->data + ptr->offset + offset);
case BPF_DYNPTR_TYPE_SKB:
case BPF_DYNPTR_TYPE_XDP:
/* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */
return 0;
default:
WARN_ONCE(true, "bpf_dynptr_data: unknown dynptr type %d\n", type);
return 0;
}
}
static const struct bpf_func_proto bpf_dynptr_data_proto = {
@@ -1693,6 +1756,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_cgrp_storage_get_proto;
case BPF_FUNC_cgrp_storage_delete:
return &bpf_cgrp_storage_delete_proto;
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_get_current_ancestor_cgroup_id:
return &bpf_get_current_ancestor_cgroup_id_proto;
#endif
default:
break;
@@ -1731,6 +1798,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
}
}
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock)
{
@@ -1761,13 +1830,8 @@ unlock:
/* The contained type can also have resources, including a
* bpf_list_head which needs to be freed.
*/
bpf_obj_free_fields(field->graph_root.value_rec, obj);
/* bpf_mem_free requires migrate_disable(), since we can be
* called from map free path as well apart from BPF program (as
* part of map ops doing bpf_obj_free_fields).
*/
migrate_disable();
bpf_mem_free(&bpf_global_ma, obj);
__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
migrate_enable();
}
}
@@ -1804,10 +1868,9 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
obj = pos;
obj -= field->graph_root.node_offset;
bpf_obj_free_fields(field->graph_root.value_rec, obj);
migrate_disable();
bpf_mem_free(&bpf_global_ma, obj);
__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
migrate_enable();
}
}
@@ -1826,45 +1889,96 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
if (!p)
return NULL;
if (meta)
bpf_obj_init(meta->field_offs, p);
bpf_obj_init(meta->record, p);
return p;
}
/* Must be called under migrate_disable(), as required by bpf_mem_free */
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
{
if (rec && rec->refcount_off >= 0 &&
!refcount_dec_and_test((refcount_t *)(p + rec->refcount_off))) {
/* Object is refcounted and refcount_dec didn't result in 0
* refcount. Return without freeing the object
*/
return;
}
if (rec)
bpf_obj_free_fields(rec, p);
bpf_mem_free(&bpf_global_ma, p);
}
__bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
{
struct btf_struct_meta *meta = meta__ign;
void *p = p__alloc;
if (meta)
bpf_obj_free_fields(meta->record, p);
bpf_mem_free(&bpf_global_ma, p);
__bpf_obj_drop_impl(p, meta ? meta->record : NULL);
}
static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
__bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
{
struct btf_struct_meta *meta = meta__ign;
struct bpf_refcount *ref;
/* Could just cast directly to refcount_t *, but need some code using
* bpf_refcount type so that it is emitted in vmlinux BTF
*/
ref = (struct bpf_refcount *)(p__refcounted_kptr + meta->record->refcount_off);
refcount_inc((refcount_t *)ref);
return (void *)p__refcounted_kptr;
}
static int __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head,
bool tail, struct btf_record *rec, u64 off)
{
struct list_head *n = (void *)node, *h = (void *)head;
/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
* called on its fields, so init here
*/
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
if (unlikely(!n->next))
INIT_LIST_HEAD(n);
if (!list_empty(n)) {
/* Only called from BPF prog, no need to migrate_disable */
__bpf_obj_drop_impl(n - off, rec);
return -EINVAL;
}
tail ? list_add_tail(n, h) : list_add(n, h);
return 0;
}
__bpf_kfunc void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node)
__bpf_kfunc int bpf_list_push_front_impl(struct bpf_list_head *head,
struct bpf_list_node *node,
void *meta__ign, u64 off)
{
return __bpf_list_add(node, head, false);
struct btf_struct_meta *meta = meta__ign;
return __bpf_list_add(node, head, false,
meta ? meta->record : NULL, off);
}
__bpf_kfunc void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node)
__bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
struct bpf_list_node *node,
void *meta__ign, u64 off)
{
return __bpf_list_add(node, head, true);
struct btf_struct_meta *meta = meta__ign;
return __bpf_list_add(node, head, true,
meta ? meta->record : NULL, off);
}
static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
{
struct list_head *n, *h = (void *)head;
/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
* called on its fields, so init here
*/
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
if (list_empty(h))
@@ -1890,6 +2004,9 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
struct rb_root_cached *r = (struct rb_root_cached *)root;
struct rb_node *n = (struct rb_node *)node;
if (RB_EMPTY_NODE(n))
return NULL;
rb_erase_cached(n, r);
RB_CLEAR_NODE(n);
return (struct bpf_rb_node *)n;
@@ -1898,14 +2015,20 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
/* Need to copy rbtree_add_cached's logic here because our 'less' is a BPF
* program
*/
static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
void *less)
static int __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
void *less, struct btf_record *rec, u64 off)
{
struct rb_node **link = &((struct rb_root_cached *)root)->rb_root.rb_node;
struct rb_node *parent = NULL, *n = (struct rb_node *)node;
bpf_callback_t cb = (bpf_callback_t)less;
struct rb_node *parent = NULL;
bool leftmost = true;
if (!RB_EMPTY_NODE(n)) {
/* Only called from BPF prog, no need to migrate_disable */
__bpf_obj_drop_impl(n - off, rec);
return -EINVAL;
}
while (*link) {
parent = *link;
if (cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0)) {
@@ -1916,15 +2039,18 @@ static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
}
}
rb_link_node((struct rb_node *)node, parent, link);
rb_insert_color_cached((struct rb_node *)node,
(struct rb_root_cached *)root, leftmost);
rb_link_node(n, parent, link);
rb_insert_color_cached(n, (struct rb_root_cached *)root, leftmost);
return 0;
}
__bpf_kfunc void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
__bpf_kfunc int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *node,
bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b),
void *meta__ign, u64 off)
{
__bpf_rbtree_add(root, node, (void *)less);
struct btf_struct_meta *meta = meta__ign;
return __bpf_rbtree_add(root, node, (void *)less, meta ? meta->record : NULL, off);
}
__bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
@@ -1942,73 +2068,8 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
*/
__bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
{
return get_task_struct(p);
}
/**
* bpf_task_acquire_not_zero - Acquire a reference to a rcu task object. A task
* acquired by this kfunc which is not stored in a map as a kptr, must be
* released by calling bpf_task_release().
* @p: The task on which a reference is being acquired.
*/
__bpf_kfunc struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p)
{
/* For the time being this function returns NULL, as it's not currently
* possible to safely acquire a reference to a task with RCU protection
* using get_task_struct() and put_task_struct(). This is due to the
* slightly odd mechanics of p->rcu_users, and how task RCU protection
* works.
*
* A struct task_struct is refcounted by two different refcount_t
* fields:
*
* 1. p->usage: The "true" refcount field which tracks a task's
* lifetime. The task is freed as soon as this
* refcount drops to 0.
*
* 2. p->rcu_users: An "RCU users" refcount field which is statically
* initialized to 2, and is co-located in a union with
* a struct rcu_head field (p->rcu). p->rcu_users
* essentially encapsulates a single p->usage
* refcount, and when p->rcu_users goes to 0, an RCU
* callback is scheduled on the struct rcu_head which
* decrements the p->usage refcount.
*
* There are two important implications to this task refcounting logic
* described above. The first is that
* refcount_inc_not_zero(&p->rcu_users) cannot be used anywhere, as
* after the refcount goes to 0, the RCU callback being scheduled will
* cause the memory backing the refcount to again be nonzero due to the
* fields sharing a union. The other is that we can't rely on RCU to
* guarantee that a task is valid in a BPF program. This is because a
* task could have already transitioned to being in the TASK_DEAD
* state, had its rcu_users refcount go to 0, and its rcu callback
* invoked in which it drops its single p->usage reference. At this
* point the task will be freed as soon as the last p->usage reference
* goes to 0, without waiting for another RCU gp to elapse. The only
* way that a BPF program can guarantee that a task is valid is in this
* scenario is to hold a p->usage refcount itself.
*
* Until we're able to resolve this issue, either by pulling
* p->rcu_users and p->rcu out of the union, or by getting rid of
* p->usage and just using p->rcu_users for refcounting, we'll just
* return NULL here.
*/
return NULL;
}
/**
* bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
* kptr acquired by this kfunc which is not subsequently stored in a map, must
* be released by calling bpf_task_release().
* @pp: A pointer to a task kptr on which a reference is being acquired.
*/
__bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
{
/* We must return NULL here until we have clarity on how to properly
* leverage RCU for ensuring a task's lifetime. See the comment above
* in bpf_task_acquire_not_zero() for more details.
*/
if (refcount_inc_not_zero(&p->rcu_users))
return p;
return NULL;
}
@@ -2018,10 +2079,7 @@ __bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
*/
__bpf_kfunc void bpf_task_release(struct task_struct *p)
{
if (!p)
return;
put_task_struct(p);
put_task_struct_rcu_user(p);
}
#ifdef CONFIG_CGROUPS
@@ -2033,39 +2091,7 @@ __bpf_kfunc void bpf_task_release(struct task_struct *p)
*/
__bpf_kfunc struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp)
{
cgroup_get(cgrp);
return cgrp;
}
/**
* bpf_cgroup_kptr_get - Acquire a reference on a struct cgroup kptr. A cgroup
* kptr acquired by this kfunc which is not subsequently stored in a map, must
* be released by calling bpf_cgroup_release().
* @cgrpp: A pointer to a cgroup kptr on which a reference is being acquired.
*/
__bpf_kfunc struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
{
struct cgroup *cgrp;
rcu_read_lock();
/* Another context could remove the cgroup from the map and release it
* at any time, including after we've done the lookup above. This is
* safe because we're in an RCU read region, so the cgroup is
* guaranteed to remain valid until at least the rcu_read_unlock()
* below.
*/
cgrp = READ_ONCE(*cgrpp);
if (cgrp && !cgroup_tryget(cgrp))
/* If the cgroup had been removed from the map and freed as
* described above, cgroup_tryget() will return false. The
* cgroup will be freed at some point after the current RCU gp
* has ended, so just return NULL to the user.
*/
cgrp = NULL;
rcu_read_unlock();
return cgrp;
return cgroup_tryget(cgrp) ? cgrp : NULL;
}
/**
@@ -2077,9 +2103,6 @@ __bpf_kfunc struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
*/
__bpf_kfunc void bpf_cgroup_release(struct cgroup *cgrp)
{
if (!cgrp)
return;
cgroup_put(cgrp);
}
@@ -2097,10 +2120,28 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
if (level > cgrp->level || level < 0)
return NULL;
/* cgrp's refcnt could be 0 here, but ancestors can still be accessed */
ancestor = cgrp->ancestors[level];
cgroup_get(ancestor);
if (!cgroup_tryget(ancestor))
return NULL;
return ancestor;
}
/**
* bpf_cgroup_from_id - Find a cgroup from its ID. A cgroup returned by this
* kfunc which is not subsequently stored in a map, must be released by calling
* bpf_cgroup_release().
* @cgid: cgroup id.
*/
__bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
{
struct cgroup *cgrp;
cgrp = cgroup_get_from_id(cgid);
if (IS_ERR(cgrp))
return NULL;
return cgrp;
}
#endif /* CONFIG_CGROUPS */
/**
@@ -2116,12 +2157,146 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
rcu_read_lock();
p = find_task_by_pid_ns(pid, &init_pid_ns);
if (p)
bpf_task_acquire(p);
p = bpf_task_acquire(p);
rcu_read_unlock();
return p;
}
/**
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
* @buffer: User-provided buffer to copy contents into
* @buffer__szk: Size (in bytes) of the buffer. This is the length of the
* requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
* If the intention is to write to the data slice, please use
* bpf_dynptr_slice_rdwr.
*
* The user must check that the returned pointer is not null before using it.
*
* Please note that in the case of skb and xdp dynptrs, bpf_dynptr_slice
* does not change the underlying packet data pointers, so a call to
* bpf_dynptr_slice will not invalidate any ctx->data/data_end pointers in
* the bpf program.
*
* Return: NULL if the call failed (eg invalid dynptr), pointer to a read-only
* data slice (can be either direct pointer to the data or a pointer to the user
* provided buffer, with its contents containing the data, if unable to obtain
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
void *buffer, u32 buffer__szk)
{
enum bpf_dynptr_type type;
u32 len = buffer__szk;
int err;
if (!ptr->data)
return NULL;
err = bpf_dynptr_check_off_len(ptr, offset, len);
if (err)
return NULL;
type = bpf_dynptr_get_type(ptr);
switch (type) {
case BPF_DYNPTR_TYPE_LOCAL:
case BPF_DYNPTR_TYPE_RINGBUF:
return ptr->data + ptr->offset + offset;
case BPF_DYNPTR_TYPE_SKB:
return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
case BPF_DYNPTR_TYPE_XDP:
{
void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
if (xdp_ptr)
return xdp_ptr;
bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
return buffer;
}
default:
WARN_ONCE(true, "unknown dynptr type %d\n", type);
return NULL;
}
}
/**
* bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
* @buffer: User-provided buffer to copy contents into
* @buffer__szk: Size (in bytes) of the buffer. This is the length of the
* requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
* The returned pointer is writable and may point to either directly the dynptr
* data at the requested offset or to the buffer if unable to obtain a direct
* data pointer to (example: the requested slice is to the paged area of an skb
* packet). In the case where the returned pointer is to the buffer, the user
* is responsible for persisting writes through calling bpf_dynptr_write(). This
* usually looks something like this pattern:
*
* struct eth_hdr *eth = bpf_dynptr_slice_rdwr(&dynptr, 0, buffer, sizeof(buffer));
* if (!eth)
* return TC_ACT_SHOT;
*
* // mutate eth header //
*
* if (eth == buffer)
* bpf_dynptr_write(&ptr, 0, buffer, sizeof(buffer), 0);
*
* Please note that, as in the example above, the user must check that the
* returned pointer is not null before using it.
*
* Please also note that in the case of skb and xdp dynptrs, bpf_dynptr_slice_rdwr
* does not change the underlying packet data pointers, so a call to
* bpf_dynptr_slice_rdwr will not invalidate any ctx->data/data_end pointers in
* the bpf program.
*
* Return: NULL if the call failed (eg invalid dynptr), pointer to a
* data slice (can be either direct pointer to the data or a pointer to the user
* provided buffer, with its contents containing the data, if unable to obtain
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
void *buffer, u32 buffer__szk)
{
if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
return NULL;
/* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice.
*
* For skb-type dynptrs, it is safe to write into the returned pointer
* if the bpf program allows skb data writes. There are two possiblities
* that may occur when calling bpf_dynptr_slice_rdwr:
*
* 1) The requested slice is in the head of the skb. In this case, the
* returned pointer is directly to skb data, and if the skb is cloned, the
* verifier will have uncloned it (see bpf_unclone_prologue()) already.
* The pointer can be directly written into.
*
* 2) Some portion of the requested slice is in the paged buffer area.
* In this case, the requested data will be copied out into the buffer
* and the returned pointer will be a pointer to the buffer. The skb
* will not be pulled. To persist the write, the user will need to call
* bpf_dynptr_write(), which will pull the skb and commit the write.
*
* Similarly for xdp programs, if the requested slice is not across xdp
* fragments, then a direct pointer will be returned, otherwise the data
* will be copied out into the buffer and the user will need to call
* bpf_dynptr_write() to commit changes.
*/
return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
}
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
{
return obj;
@@ -2150,23 +2325,22 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
#endif
BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_list_push_front)
BTF_ID_FLAGS(func, bpf_list_push_back)
BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE)
BTF_ID_FLAGS(func, bpf_list_push_front_impl)
BTF_ID_FLAGS(func, bpf_list_push_back_impl)
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
BTF_ID_FLAGS(func, bpf_rbtree_add)
BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_rbtree_add_impl)
BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
#ifdef CONFIG_CGROUPS
BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_SET8_END(generic_btf_ids)
@@ -2190,6 +2364,11 @@ BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx)
BTF_ID_FLAGS(func, bpf_rdonly_cast)
BTF_ID_FLAGS(func, bpf_rcu_read_lock)
BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
BTF_ID_FLAGS(func, bpf_dynptr_slice, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
BTF_SET8_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {

View File

@@ -141,8 +141,8 @@ static void *cgroup_storage_lookup_elem(struct bpf_map *_map, void *key)
return &READ_ONCE(storage->buf)->data[0];
}
static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
static long cgroup_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
struct bpf_cgroup_storage *storage;
struct bpf_storage_buffer *new;
@@ -348,7 +348,7 @@ static void cgroup_storage_map_free(struct bpf_map *_map)
bpf_map_area_free(map);
}
static int cgroup_storage_delete_elem(struct bpf_map *map, void *key)
static long cgroup_storage_delete_elem(struct bpf_map *map, void *key)
{
return -EINVAL;
}
@@ -446,6 +446,12 @@ static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *key,
rcu_read_unlock();
}
static u64 cgroup_storage_map_usage(const struct bpf_map *map)
{
/* Currently the dynamically allocated elements are not counted. */
return sizeof(struct bpf_cgroup_storage_map);
}
BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct,
bpf_cgroup_storage_map)
const struct bpf_map_ops cgroup_storage_map_ops = {
@@ -457,6 +463,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
.map_delete_elem = cgroup_storage_delete_elem,
.map_check_btf = cgroup_storage_check_btf,
.map_seq_show_elem = cgroup_storage_seq_show_elem,
.map_mem_usage = cgroup_storage_map_usage,
.map_btf_id = &cgroup_storage_map_btf_ids[0],
};

330
kernel/bpf/log.c Normal file
View File

@@ -0,0 +1,330 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
* Copyright (c) 2016 Facebook
* Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
*/
#include <uapi/linux/btf.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/bpf.h>
#include <linux/bpf_verifier.h>
#include <linux/math64.h>
static bool bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
{
/* ubuf and len_total should both be specified (or not) together */
if (!!log->ubuf != !!log->len_total)
return false;
/* log buf without log_level is meaningless */
if (log->ubuf && log->level == 0)
return false;
if (log->level & ~BPF_LOG_MASK)
return false;
if (log->len_total > UINT_MAX >> 2)
return false;
return true;
}
int bpf_vlog_init(struct bpf_verifier_log *log, u32 log_level,
char __user *log_buf, u32 log_size)
{
log->level = log_level;
log->ubuf = log_buf;
log->len_total = log_size;
/* log attributes have to be sane */
if (!bpf_verifier_log_attr_valid(log))
return -EINVAL;
return 0;
}
static void bpf_vlog_update_len_max(struct bpf_verifier_log *log, u32 add_len)
{
/* add_len includes terminal \0, so no need for +1. */
u64 len = log->end_pos + add_len;
/* log->len_max could be larger than our current len due to
* bpf_vlog_reset() calls, so we maintain the max of any length at any
* previous point
*/
if (len > UINT_MAX)
log->len_max = UINT_MAX;
else if (len > log->len_max)
log->len_max = len;
}
void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
va_list args)
{
u64 cur_pos;
u32 new_n, n;
n = vscnprintf(log->kbuf, BPF_VERIFIER_TMP_LOG_SIZE, fmt, args);
WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1,
"verifier log line truncated - local buffer too short\n");
if (log->level == BPF_LOG_KERNEL) {
bool newline = n > 0 && log->kbuf[n - 1] == '\n';
pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n");
return;
}
n += 1; /* include terminating zero */
bpf_vlog_update_len_max(log, n);
if (log->level & BPF_LOG_FIXED) {
/* check if we have at least something to put into user buf */
new_n = 0;
if (log->end_pos < log->len_total) {
new_n = min_t(u32, log->len_total - log->end_pos, n);
log->kbuf[new_n - 1] = '\0';
}
cur_pos = log->end_pos;
log->end_pos += n - 1; /* don't count terminating '\0' */
if (log->ubuf && new_n &&
copy_to_user(log->ubuf + cur_pos, log->kbuf, new_n))
goto fail;
} else {
u64 new_end, new_start;
u32 buf_start, buf_end, new_n;
new_end = log->end_pos + n;
if (new_end - log->start_pos >= log->len_total)
new_start = new_end - log->len_total;
else
new_start = log->start_pos;
log->start_pos = new_start;
log->end_pos = new_end - 1; /* don't count terminating '\0' */
if (!log->ubuf)
return;
new_n = min(n, log->len_total);
cur_pos = new_end - new_n;
div_u64_rem(cur_pos, log->len_total, &buf_start);
div_u64_rem(new_end, log->len_total, &buf_end);
/* new_end and buf_end are exclusive indices, so if buf_end is
* exactly zero, then it actually points right to the end of
* ubuf and there is no wrap around
*/
if (buf_end == 0)
buf_end = log->len_total;
/* if buf_start > buf_end, we wrapped around;
* if buf_start == buf_end, then we fill ubuf completely; we
* can't have buf_start == buf_end to mean that there is
* nothing to write, because we always write at least
* something, even if terminal '\0'
*/
if (buf_start < buf_end) {
/* message fits within contiguous chunk of ubuf */
if (copy_to_user(log->ubuf + buf_start,
log->kbuf + n - new_n,
buf_end - buf_start))
goto fail;
} else {
/* message wraps around the end of ubuf, copy in two chunks */
if (copy_to_user(log->ubuf + buf_start,
log->kbuf + n - new_n,
log->len_total - buf_start))
goto fail;
if (copy_to_user(log->ubuf,
log->kbuf + n - buf_end,
buf_end))
goto fail;
}
}
return;
fail:
log->ubuf = NULL;
}
void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
{
char zero = 0;
u32 pos;
if (WARN_ON_ONCE(new_pos > log->end_pos))
return;
if (!bpf_verifier_log_needed(log) || log->level == BPF_LOG_KERNEL)
return;
/* if position to which we reset is beyond current log window,
* then we didn't preserve any useful content and should adjust
* start_pos to end up with an empty log (start_pos == end_pos)
*/
log->end_pos = new_pos;
if (log->end_pos < log->start_pos)
log->start_pos = log->end_pos;
if (!log->ubuf)
return;
if (log->level & BPF_LOG_FIXED)
pos = log->end_pos + 1;
else
div_u64_rem(new_pos, log->len_total, &pos);
if (pos < log->len_total && put_user(zero, log->ubuf + pos))
log->ubuf = NULL;
}
static void bpf_vlog_reverse_kbuf(char *buf, int len)
{
int i, j;
for (i = 0, j = len - 1; i < j; i++, j--)
swap(buf[i], buf[j]);
}
static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)
{
/* we split log->kbuf into two equal parts for both ends of array */
int n = sizeof(log->kbuf) / 2, nn;
char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
/* Read ubuf's section [start, end) two chunks at a time, from left
* and right side; within each chunk, swap all the bytes; after that
* reverse the order of lbuf and rbuf and write result back to ubuf.
* This way we'll end up with swapped contents of specified
* [start, end) ubuf segment.
*/
while (end - start > 1) {
nn = min(n, (end - start ) / 2);
if (copy_from_user(lbuf, log->ubuf + start, nn))
return -EFAULT;
if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
return -EFAULT;
bpf_vlog_reverse_kbuf(lbuf, nn);
bpf_vlog_reverse_kbuf(rbuf, nn);
/* we write lbuf to the right end of ubuf, while rbuf to the
* left one to end up with properly reversed overall ubuf
*/
if (copy_to_user(log->ubuf + start, rbuf, nn))
return -EFAULT;
if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
return -EFAULT;
start += nn;
end -= nn;
}
return 0;
}
int bpf_vlog_finalize(struct bpf_verifier_log *log, u32 *log_size_actual)
{
u32 sublen;
int err;
*log_size_actual = 0;
if (!log || log->level == 0 || log->level == BPF_LOG_KERNEL)
return 0;
if (!log->ubuf)
goto skip_log_rotate;
/* If we never truncated log, there is nothing to move around. */
if (log->start_pos == 0)
goto skip_log_rotate;
/* Otherwise we need to rotate log contents to make it start from the
* buffer beginning and be a continuous zero-terminated string. Note
* that if log->start_pos != 0 then we definitely filled up entire log
* buffer with no gaps, and we just need to shift buffer contents to
* the left by (log->start_pos % log->len_total) bytes.
*
* Unfortunately, user buffer could be huge and we don't want to
* allocate temporary kernel memory of the same size just to shift
* contents in a straightforward fashion. Instead, we'll be clever and
* do in-place array rotation. This is a leetcode-style problem, which
* could be solved by three rotations.
*
* Let's say we have log buffer that has to be shifted left by 7 bytes
* (spaces and vertical bar is just for demonstrative purposes):
* E F G H I J K | A B C D
*
* First, we reverse entire array:
* D C B A | K J I H G F E
*
* Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
* (KJIHGFE), resulting in a properly rotated array:
* A B C D | E F G H I J K
*
* We'll utilize log->kbuf to read user memory chunk by chunk, swap
* bytes, and write them back. Doing it byte-by-byte would be
* unnecessarily inefficient. Altogether we are going to read and
* write each byte twice, for total 4 memory copies between kernel and
* user space.
*/
/* length of the chopped off part that will be the beginning;
* len(ABCD) in the example above
*/
div_u64_rem(log->start_pos, log->len_total, &sublen);
sublen = log->len_total - sublen;
err = bpf_vlog_reverse_ubuf(log, 0, log->len_total);
err = err ?: bpf_vlog_reverse_ubuf(log, 0, sublen);
err = err ?: bpf_vlog_reverse_ubuf(log, sublen, log->len_total);
if (err)
log->ubuf = NULL;
skip_log_rotate:
*log_size_actual = log->len_max;
/* properly initialized log has either both ubuf!=NULL and len_total>0
* or ubuf==NULL and len_total==0, so if this condition doesn't hold,
* we got a fault somewhere along the way, so report it back
*/
if (!!log->ubuf != !!log->len_total)
return -EFAULT;
/* did truncation actually happen? */
if (log->ubuf && log->len_max > log->len_total)
return -ENOSPC;
return 0;
}
/* log_level controls verbosity level of eBPF verifier.
* bpf_verifier_log_write() is used to dump the verification trace to the log,
* so the user can figure out what's wrong with the program
*/
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...)
{
va_list args;
if (!bpf_verifier_log_needed(&env->log))
return;
va_start(args, fmt);
bpf_verifier_vlog(&env->log, fmt, args);
va_end(args);
}
EXPORT_SYMBOL_GPL(bpf_verifier_log_write);
__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
const char *fmt, ...)
{
va_list args;
if (!bpf_verifier_log_needed(log))
return;
va_start(args, fmt);
bpf_verifier_vlog(log, fmt, args);
va_end(args);
}
EXPORT_SYMBOL_GPL(bpf_log);

View File

@@ -300,8 +300,8 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
}
/* Called from syscall or from eBPF program */
static int trie_update_elem(struct bpf_map *map,
void *_key, void *value, u64 flags)
static long trie_update_elem(struct bpf_map *map,
void *_key, void *value, u64 flags)
{
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL;
@@ -431,7 +431,7 @@ out:
}
/* Called from syscall or from eBPF program */
static int trie_delete_elem(struct bpf_map *map, void *_key)
static long trie_delete_elem(struct bpf_map *map, void *_key)
{
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct bpf_lpm_trie_key *key = _key;
@@ -720,6 +720,16 @@ static int trie_check_btf(const struct bpf_map *map,
-EINVAL : 0;
}
static u64 trie_mem_usage(const struct bpf_map *map)
{
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
u64 elem_size;
elem_size = sizeof(struct lpm_trie_node) + trie->data_size +
trie->map.value_size;
return elem_size * READ_ONCE(trie->n_entries);
}
BTF_ID_LIST_SINGLE(trie_map_btf_ids, struct, lpm_trie)
const struct bpf_map_ops trie_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -733,5 +743,6 @@ const struct bpf_map_ops trie_map_ops = {
.map_update_batch = generic_map_update_batch,
.map_delete_batch = generic_map_delete_batch,
.map_check_btf = trie_check_btf,
.map_mem_usage = trie_mem_usage,
.map_btf_id = &trie_map_btf_ids[0],
};

View File

@@ -56,18 +56,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
ret = PTR_ERR(inner_map_meta->record);
goto free;
}
if (inner_map_meta->record) {
struct btf_field_offs *field_offs;
/* If btf_record is !IS_ERR_OR_NULL, then field_offs is always
* valid.
*/
field_offs = kmemdup(inner_map->field_offs, sizeof(*inner_map->field_offs), GFP_KERNEL | __GFP_NOWARN);
if (!field_offs) {
ret = -ENOMEM;
goto free_rec;
}
inner_map_meta->field_offs = field_offs;
}
/* Note: We must use the same BTF, as we also used btf_record_dup above
* which relies on BTF being same for both maps, as some members like
* record->fields.list_head have pointers like value_rec pointing into
@@ -88,8 +76,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
fdput(f);
return inner_map_meta;
free_rec:
btf_record_free(inner_map_meta->record);
free:
kfree(inner_map_meta);
put:
@@ -99,7 +85,6 @@ put:
void bpf_map_meta_free(struct bpf_map *map_meta)
{
kfree(map_meta->field_offs);
bpf_map_free_record(map_meta);
btf_put(map_meta->btf);
kfree(map_meta);

View File

@@ -121,15 +121,8 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head)
return entry;
}
static void *__alloc(struct bpf_mem_cache *c, int node)
static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags)
{
/* Allocate, but don't deplete atomic reserves that typical
* GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
* will allocate from the current numa node which is what we
* want here.
*/
gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT;
if (c->percpu_size) {
void **obj = kmalloc_node(c->percpu_size, flags, node);
void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags);
@@ -185,7 +178,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
*/
obj = __llist_del_first(&c->free_by_rcu);
if (!obj) {
obj = __alloc(c, node);
/* Allocate, but don't deplete atomic reserves that typical
* GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
* will allocate from the current numa node which is what we
* want here.
*/
obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT);
if (!obj)
break;
}
@@ -676,3 +674,46 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr)
unit_free(this_cpu_ptr(ma->cache), ptr);
}
/* Directly does a kfree() without putting 'ptr' back to the free_llist
* for reuse and without waiting for a rcu_tasks_trace gp.
* The caller must first go through the rcu_tasks_trace gp for 'ptr'
* before calling bpf_mem_cache_raw_free().
* It could be used when the rcu_tasks_trace callback does not have
* a hold on the original bpf_mem_alloc object that allocated the
* 'ptr'. This should only be used in the uncommon code path.
* Otherwise, the bpf_mem_alloc's free_llist cannot be refilled
* and may affect performance.
*/
void bpf_mem_cache_raw_free(void *ptr)
{
if (!ptr)
return;
kfree(ptr - LLIST_NODE_SZ);
}
/* When flags == GFP_KERNEL, it signals that the caller will not cause
* deadlock when using kmalloc. bpf_mem_cache_alloc_flags() will use
* kmalloc if the free_llist is empty.
*/
void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
{
struct bpf_mem_cache *c;
void *ret;
c = this_cpu_ptr(ma->cache);
ret = unit_alloc(c);
if (!ret && flags == GFP_KERNEL) {
struct mem_cgroup *memcg, *old_memcg;
memcg = get_memcg(c);
old_memcg = set_active_memcg(memcg);
ret = __alloc(c, NUMA_NO_NODE, GFP_KERNEL | __GFP_NOWARN | __GFP_ACCOUNT);
set_active_memcg(old_memcg);
mem_cgroup_put(memcg);
}
return !ret ? NULL : ret + LLIST_NODE_SZ;
}

View File

@@ -563,6 +563,12 @@ void bpf_map_offload_map_free(struct bpf_map *map)
bpf_map_area_free(offmap);
}
u64 bpf_map_offload_map_mem_usage(const struct bpf_map *map)
{
/* The memory dynamically allocated in netdev dev_ops is not counted */
return sizeof(struct bpf_offloaded_map);
}
int bpf_map_offload_lookup_elem(struct bpf_map *map, void *key, void *value)
{
struct bpf_offloaded_map *offmap = map_to_offmap(map);

View File

@@ -95,7 +95,7 @@ static void queue_stack_map_free(struct bpf_map *map)
bpf_map_area_free(qs);
}
static int __queue_map_get(struct bpf_map *map, void *value, bool delete)
static long __queue_map_get(struct bpf_map *map, void *value, bool delete)
{
struct bpf_queue_stack *qs = bpf_queue_stack(map);
unsigned long flags;
@@ -124,7 +124,7 @@ out:
}
static int __stack_map_get(struct bpf_map *map, void *value, bool delete)
static long __stack_map_get(struct bpf_map *map, void *value, bool delete)
{
struct bpf_queue_stack *qs = bpf_queue_stack(map);
unsigned long flags;
@@ -156,32 +156,32 @@ out:
}
/* Called from syscall or from eBPF program */
static int queue_map_peek_elem(struct bpf_map *map, void *value)
static long queue_map_peek_elem(struct bpf_map *map, void *value)
{
return __queue_map_get(map, value, false);
}
/* Called from syscall or from eBPF program */
static int stack_map_peek_elem(struct bpf_map *map, void *value)
static long stack_map_peek_elem(struct bpf_map *map, void *value)
{
return __stack_map_get(map, value, false);
}
/* Called from syscall or from eBPF program */
static int queue_map_pop_elem(struct bpf_map *map, void *value)
static long queue_map_pop_elem(struct bpf_map *map, void *value)
{
return __queue_map_get(map, value, true);
}
/* Called from syscall or from eBPF program */
static int stack_map_pop_elem(struct bpf_map *map, void *value)
static long stack_map_pop_elem(struct bpf_map *map, void *value)
{
return __stack_map_get(map, value, true);
}
/* Called from syscall or from eBPF program */
static int queue_stack_map_push_elem(struct bpf_map *map, void *value,
u64 flags)
static long queue_stack_map_push_elem(struct bpf_map *map, void *value,
u64 flags)
{
struct bpf_queue_stack *qs = bpf_queue_stack(map);
unsigned long irq_flags;
@@ -227,14 +227,14 @@ static void *queue_stack_map_lookup_elem(struct bpf_map *map, void *key)
}
/* Called from syscall or from eBPF program */
static int queue_stack_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
static long queue_stack_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
return -EINVAL;
}
/* Called from syscall or from eBPF program */
static int queue_stack_map_delete_elem(struct bpf_map *map, void *key)
static long queue_stack_map_delete_elem(struct bpf_map *map, void *key)
{
return -EINVAL;
}
@@ -246,6 +246,14 @@ static int queue_stack_map_get_next_key(struct bpf_map *map, void *key,
return -EINVAL;
}
static u64 queue_stack_map_mem_usage(const struct bpf_map *map)
{
u64 usage = sizeof(struct bpf_queue_stack);
usage += ((u64)map->max_entries + 1) * map->value_size;
return usage;
}
BTF_ID_LIST_SINGLE(queue_map_btf_ids, struct, bpf_queue_stack)
const struct bpf_map_ops queue_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -259,6 +267,7 @@ const struct bpf_map_ops queue_map_ops = {
.map_pop_elem = queue_map_pop_elem,
.map_peek_elem = queue_map_peek_elem,
.map_get_next_key = queue_stack_map_get_next_key,
.map_mem_usage = queue_stack_map_mem_usage,
.map_btf_id = &queue_map_btf_ids[0],
};
@@ -274,5 +283,6 @@ const struct bpf_map_ops stack_map_ops = {
.map_pop_elem = stack_map_pop_elem,
.map_peek_elem = stack_map_peek_elem,
.map_get_next_key = queue_stack_map_get_next_key,
.map_mem_usage = queue_stack_map_mem_usage,
.map_btf_id = &queue_map_btf_ids[0],
};

View File

@@ -59,7 +59,7 @@ static void *reuseport_array_lookup_elem(struct bpf_map *map, void *key)
}
/* Called from syscall only */
static int reuseport_array_delete_elem(struct bpf_map *map, void *key)
static long reuseport_array_delete_elem(struct bpf_map *map, void *key)
{
struct reuseport_array *array = reuseport_array(map);
u32 index = *(u32 *)key;
@@ -335,6 +335,13 @@ static int reuseport_array_get_next_key(struct bpf_map *map, void *key,
return 0;
}
static u64 reuseport_array_mem_usage(const struct bpf_map *map)
{
struct reuseport_array *array;
return struct_size(array, ptrs, map->max_entries);
}
BTF_ID_LIST_SINGLE(reuseport_array_map_btf_ids, struct, reuseport_array)
const struct bpf_map_ops reuseport_array_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -344,5 +351,6 @@ const struct bpf_map_ops reuseport_array_ops = {
.map_lookup_elem = reuseport_array_lookup_elem,
.map_get_next_key = reuseport_array_get_next_key,
.map_delete_elem = reuseport_array_delete_elem,
.map_mem_usage = reuseport_array_mem_usage,
.map_btf_id = &reuseport_array_map_btf_ids[0],
};

View File

@@ -19,6 +19,7 @@
(offsetof(struct bpf_ringbuf, consumer_pos) >> PAGE_SHIFT)
/* consumer page and producer page */
#define RINGBUF_POS_PAGES 2
#define RINGBUF_NR_META_PAGES (RINGBUF_PGOFF + RINGBUF_POS_PAGES)
#define RINGBUF_MAX_RECORD_SZ (UINT_MAX/4)
@@ -96,7 +97,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
{
const gfp_t flags = GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL |
__GFP_NOWARN | __GFP_ZERO;
int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
int nr_meta_pages = RINGBUF_NR_META_PAGES;
int nr_data_pages = data_sz >> PAGE_SHIFT;
int nr_pages = nr_meta_pages + nr_data_pages;
struct page **pages, *page;
@@ -241,13 +242,13 @@ static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
return ERR_PTR(-ENOTSUPP);
}
static int ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 flags)
static long ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 flags)
{
return -ENOTSUPP;
}
static int ringbuf_map_delete_elem(struct bpf_map *map, void *key)
static long ringbuf_map_delete_elem(struct bpf_map *map, void *key)
{
return -ENOTSUPP;
}
@@ -336,6 +337,21 @@ static __poll_t ringbuf_map_poll_user(struct bpf_map *map, struct file *filp,
return 0;
}
static u64 ringbuf_map_mem_usage(const struct bpf_map *map)
{
struct bpf_ringbuf *rb;
int nr_data_pages;
int nr_meta_pages;
u64 usage = sizeof(struct bpf_ringbuf_map);
rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
usage += (u64)rb->nr_pages << PAGE_SHIFT;
nr_meta_pages = RINGBUF_NR_META_PAGES;
nr_data_pages = map->max_entries >> PAGE_SHIFT;
usage += (nr_meta_pages + 2 * nr_data_pages) * sizeof(struct page *);
return usage;
}
BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map)
const struct bpf_map_ops ringbuf_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -347,6 +363,7 @@ const struct bpf_map_ops ringbuf_map_ops = {
.map_update_elem = ringbuf_map_update_elem,
.map_delete_elem = ringbuf_map_delete_elem,
.map_get_next_key = ringbuf_map_get_next_key,
.map_mem_usage = ringbuf_map_mem_usage,
.map_btf_id = &ringbuf_map_btf_ids[0],
};
@@ -361,6 +378,7 @@ const struct bpf_map_ops user_ringbuf_map_ops = {
.map_update_elem = ringbuf_map_update_elem,
.map_delete_elem = ringbuf_map_delete_elem,
.map_get_next_key = ringbuf_map_get_next_key,
.map_mem_usage = ringbuf_map_mem_usage,
.map_btf_id = &user_ringbuf_map_btf_ids[0],
};

View File

@@ -618,14 +618,14 @@ static int stack_map_get_next_key(struct bpf_map *map, void *key,
return 0;
}
static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
static long stack_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
return -EINVAL;
}
/* Called from syscall or from eBPF program */
static int stack_map_delete_elem(struct bpf_map *map, void *key)
static long stack_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *old_bucket;
@@ -654,6 +654,19 @@ static void stack_map_free(struct bpf_map *map)
put_callchain_buffers();
}
static u64 stack_map_mem_usage(const struct bpf_map *map)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
u64 value_size = map->value_size;
u64 n_buckets = smap->n_buckets;
u64 enties = map->max_entries;
u64 usage = sizeof(*smap);
usage += n_buckets * sizeof(struct stack_map_bucket *);
usage += enties * (sizeof(struct stack_map_bucket) + value_size);
return usage;
}
BTF_ID_LIST_SINGLE(stack_trace_map_btf_ids, struct, bpf_stack_map)
const struct bpf_map_ops stack_trace_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -664,5 +677,6 @@ const struct bpf_map_ops stack_trace_map_ops = {
.map_update_elem = stack_map_update_elem,
.map_delete_elem = stack_map_delete_elem,
.map_check_btf = map_check_no_btf,
.map_mem_usage = stack_map_mem_usage,
.map_btf_id = &stack_trace_map_btf_ids[0],
};

View File

@@ -35,6 +35,7 @@
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
#include <linux/trace_events.h>
#include <net/netfilter/nf_bpf_link.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -105,6 +106,7 @@ const struct bpf_map_ops bpf_map_offload_ops = {
.map_alloc = bpf_map_offload_map_alloc,
.map_free = bpf_map_offload_map_free,
.map_check_btf = map_check_no_btf,
.map_mem_usage = bpf_map_offload_map_mem_usage,
};
static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
@@ -128,6 +130,8 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
}
if (attr->map_ifindex)
ops = &bpf_map_offload_ops;
if (!ops->map_mem_usage)
return ERR_PTR(-EINVAL);
map = ops->map_alloc(attr);
if (IS_ERR(map))
return map;
@@ -517,14 +521,14 @@ static int btf_field_cmp(const void *a, const void *b)
}
struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset,
enum btf_field_type type)
u32 field_mask)
{
struct btf_field *field;
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & type))
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask))
return NULL;
field = bsearch(&offset, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp);
if (!field || !(field->type & type))
if (!field || !(field->type & field_mask))
return NULL;
return field;
}
@@ -549,6 +553,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_RB_NODE:
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
/* Nothing to release */
break;
default:
@@ -596,6 +601,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_RB_NODE:
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
/* Nothing to acquire */
break;
default:
@@ -647,6 +653,8 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
bpf_timer_cancel_and_free(obj + rec->timer_off);
}
extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{
const struct btf_field *fields;
@@ -656,8 +664,10 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
return;
fields = rec->fields;
for (i = 0; i < rec->cnt; i++) {
struct btf_struct_meta *pointee_struct_meta;
const struct btf_field *field = &fields[i];
void *field_ptr = obj + field->offset;
void *xchgd_field;
switch (fields[i].type) {
case BPF_SPIN_LOCK:
@@ -669,7 +679,22 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
WRITE_ONCE(*(u64 *)field_ptr, 0);
break;
case BPF_KPTR_REF:
field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
if (!xchgd_field)
break;
if (!btf_is_kernel(field->kptr.btf)) {
pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
field->kptr.btf_id);
WARN_ON_ONCE(!pointee_struct_meta);
migrate_disable();
__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
pointee_struct_meta->record :
NULL);
migrate_enable();
} else {
field->kptr.dtor(xchgd_field);
}
break;
case BPF_LIST_HEAD:
if (WARN_ON_ONCE(rec->spin_lock_off < 0))
@@ -683,6 +708,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
break;
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
break;
default:
WARN_ON_ONCE(1);
@@ -695,14 +721,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
static void bpf_map_free_deferred(struct work_struct *work)
{
struct bpf_map *map = container_of(work, struct bpf_map, work);
struct btf_field_offs *foffs = map->field_offs;
struct btf_record *rec = map->record;
security_bpf_map_free(map);
bpf_map_release_memcg(map);
/* implementation dependent freeing */
map->ops->map_free(map);
/* Delay freeing of field_offs and btf_record for maps, as map_free
/* Delay freeing of btf_record for maps, as map_free
* callback usually needs access to them. It is better to do it here
* than require each callback to do the free itself manually.
*
@@ -711,7 +736,6 @@ static void bpf_map_free_deferred(struct work_struct *work)
* eventually calls bpf_map_free_meta, since inner_map_meta is only a
* template bpf_map struct used during verification.
*/
kfree(foffs);
btf_record_free(rec);
}
@@ -771,17 +795,10 @@ static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
}
#ifdef CONFIG_PROC_FS
/* Provides an approximation of the map's memory footprint.
* Used only to provide a backward compatibility and display
* a reasonable "memlock" info.
*/
static unsigned long bpf_map_memory_footprint(const struct bpf_map *map)
/* Show the memory usage of a bpf map */
static u64 bpf_map_memory_usage(const struct bpf_map *map)
{
unsigned long size;
size = round_up(map->key_size + bpf_map_value_size(map), 8);
return round_up(map->max_entries * size, PAGE_SIZE);
return map->ops->map_mem_usage(map);
}
static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
@@ -803,7 +820,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
"max_entries:\t%u\n"
"map_flags:\t%#x\n"
"map_extra:\t%#llx\n"
"memlock:\t%lu\n"
"memlock:\t%llu\n"
"map_id:\t%u\n"
"frozen:\t%u\n",
map->map_type,
@@ -812,7 +829,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
map->max_entries,
map->map_flags,
(unsigned long long)map->map_extra,
bpf_map_memory_footprint(map),
bpf_map_memory_usage(map),
map->id,
READ_ONCE(map->frozen));
if (type) {
@@ -1019,7 +1036,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
BPF_RB_ROOT,
BPF_RB_ROOT | BPF_REFCOUNT,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
@@ -1058,10 +1075,17 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
break;
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_REFCOUNT:
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_PERCPU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) {
map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY &&
map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
map->map_type != BPF_MAP_TYPE_TASK_STORAGE &&
map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) {
ret = -EOPNOTSUPP;
goto free_map_tab;
}
@@ -1104,7 +1128,6 @@ free_map_tab:
static int map_create(union bpf_attr *attr)
{
int numa_node = bpf_map_attr_numa_node(attr);
struct btf_field_offs *foffs;
struct bpf_map *map;
int f_flags;
int err;
@@ -1184,17 +1207,9 @@ static int map_create(union bpf_attr *attr)
attr->btf_vmlinux_value_type_id;
}
foffs = btf_parse_field_offs(map->record);
if (IS_ERR(foffs)) {
err = PTR_ERR(foffs);
goto free_map;
}
map->field_offs = foffs;
err = security_bpf_map_alloc(map);
if (err)
goto free_map_field_offs;
goto free_map;
err = bpf_map_alloc_id(map);
if (err)
@@ -1218,8 +1233,6 @@ static int map_create(union bpf_attr *attr)
free_map_sec:
security_bpf_map_free(map);
free_map_field_offs:
kfree(map->field_offs);
free_map:
btf_put(map->btf);
map->ops->map_free(map);
@@ -1285,8 +1298,10 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
return map;
}
/* map_idr_lock should have been held */
static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
/* map_idr_lock should have been held or the map should have been
* protected by rcu read lock.
*/
struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
{
int refold;
@@ -2049,6 +2064,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
{
bpf_prog_kallsyms_del_all(prog);
btf_put(prog->aux->btf);
module_put(prog->aux->mod);
kvfree(prog->aux->jited_linfo);
kvfree(prog->aux->linfo);
kfree(prog->aux->kfunc_tab);
@@ -2439,7 +2455,6 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
case BPF_PROG_TYPE_SK_SKB:
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2448,6 +2463,7 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_SOCK_OPS:
case BPF_PROG_TYPE_EXT: /* extends any prog */
case BPF_PROG_TYPE_NETFILTER:
return true;
case BPF_PROG_TYPE_CGROUP_SKB:
/* always unpriv */
@@ -2477,9 +2493,9 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
#define BPF_PROG_LOAD_LAST_FIELD log_true_size
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog, *dst_prog = NULL;
@@ -2629,7 +2645,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
goto free_prog_sec;
/* run eBPF verifier */
err = bpf_check(&prog, attr, uattr);
err = bpf_check(&prog, attr, uattr, uattr_size);
if (err < 0)
goto free_used_maps;
@@ -2804,16 +2820,19 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
const struct bpf_prog *prog = link->prog;
char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
seq_printf(m,
"link_type:\t%s\n"
"link_id:\t%u\n"
"prog_tag:\t%s\n"
"prog_id:\t%u\n",
"link_id:\t%u\n",
bpf_link_type_strs[link->type],
link->id,
prog_tag,
prog->aux->id);
link->id);
if (prog) {
bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
seq_printf(m,
"prog_tag:\t%s\n"
"prog_id:\t%u\n",
prog_tag,
prog->aux->id);
}
if (link->ops->show_fdinfo)
link->ops->show_fdinfo(link, m);
}
@@ -3095,6 +3114,11 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
if (err)
goto out_unlock;
if (tgt_info.tgt_mod) {
module_put(prog->aux->mod);
prog->aux->mod = tgt_info.tgt_mod;
}
tr = bpf_trampoline_get(key, &tgt_info);
if (!tr) {
err = -ENOMEM;
@@ -4288,7 +4312,8 @@ static int bpf_link_get_info_by_fd(struct file *file,
info.type = link->type;
info.id = link->id;
info.prog_id = link->prog->aux->id;
if (link->prog)
info.prog_id = link->prog->aux->id;
if (link->ops->fill_link_info) {
err = link->ops->fill_link_info(link, &info);
@@ -4338,9 +4363,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
return err;
}
#define BPF_BTF_LOAD_LAST_FIELD btf_log_level
#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr)
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
{
if (CHECK_ATTR(BPF_BTF_LOAD))
return -EINVAL;
@@ -4348,7 +4373,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr)
if (!bpf_capable())
return -EPERM;
return btf_new_fd(attr, uattr);
return btf_new_fd(attr, uattr, uattr_size);
}
#define BPF_BTF_GET_FD_BY_ID_LAST_FIELD btf_id
@@ -4551,6 +4576,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
if (CHECK_ATTR(BPF_LINK_CREATE))
return -EINVAL;
if (attr->link_create.attach_type == BPF_STRUCT_OPS)
return bpf_struct_ops_link_create(attr);
prog = bpf_prog_get(attr->link_create.prog_fd);
if (IS_ERR(prog))
return PTR_ERR(prog);
@@ -4562,6 +4590,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
switch (prog->type) {
case BPF_PROG_TYPE_EXT:
case BPF_PROG_TYPE_NETFILTER:
break;
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
@@ -4628,6 +4657,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog);
break;
case BPF_PROG_TYPE_NETFILTER:
ret = bpf_nf_link_attach(attr, prog);
break;
#endif
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
@@ -4649,6 +4681,35 @@ out:
return ret;
}
static int link_update_map(struct bpf_link *link, union bpf_attr *attr)
{
struct bpf_map *new_map, *old_map = NULL;
int ret;
new_map = bpf_map_get(attr->link_update.new_map_fd);
if (IS_ERR(new_map))
return PTR_ERR(new_map);
if (attr->link_update.flags & BPF_F_REPLACE) {
old_map = bpf_map_get(attr->link_update.old_map_fd);
if (IS_ERR(old_map)) {
ret = PTR_ERR(old_map);
goto out_put;
}
} else if (attr->link_update.old_map_fd) {
ret = -EINVAL;
goto out_put;
}
ret = link->ops->update_map(link, new_map, old_map);
if (old_map)
bpf_map_put(old_map);
out_put:
bpf_map_put(new_map);
return ret;
}
#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
static int link_update(union bpf_attr *attr)
@@ -4669,6 +4730,11 @@ static int link_update(union bpf_attr *attr)
if (IS_ERR(link))
return PTR_ERR(link);
if (link->ops->update_map) {
ret = link_update_map(link, attr);
goto out_put_link;
}
new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
if (IS_ERR(new_prog)) {
ret = PTR_ERR(new_prog);
@@ -4989,7 +5055,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
err = map_freeze(&attr);
break;
case BPF_PROG_LOAD:
err = bpf_prog_load(&attr, uattr);
err = bpf_prog_load(&attr, uattr, size);
break;
case BPF_OBJ_PIN:
err = bpf_obj_pin(&attr);
@@ -5034,7 +5100,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
err = bpf_raw_tracepoint_open(&attr);
break;
case BPF_BTF_LOAD:
err = bpf_btf_load(&attr, uattr);
err = bpf_btf_load(&attr, uattr, size);
break;
case BPF_BTF_GET_FD_BY_ID:
err = bpf_btf_get_fd_by_id(&attr);

View File

@@ -9,7 +9,6 @@
#include <linux/btf.h>
#include <linux/rcupdate_trace.h>
#include <linux/rcupdate_wait.h>
#include <linux/module.h>
#include <linux/static_call.h>
#include <linux/bpf_verifier.h>
#include <linux/bpf_lsm.h>
@@ -172,26 +171,6 @@ out:
return tr;
}
static int bpf_trampoline_module_get(struct bpf_trampoline *tr)
{
struct module *mod;
int err = 0;
preempt_disable();
mod = __module_text_address((unsigned long) tr->func.addr);
if (mod && !try_module_get(mod))
err = -ENOENT;
preempt_enable();
tr->mod = mod;
return err;
}
static void bpf_trampoline_module_put(struct bpf_trampoline *tr)
{
module_put(tr->mod);
tr->mod = NULL;
}
static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
{
void *ip = tr->func.addr;
@@ -202,8 +181,6 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
else
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
if (!ret)
bpf_trampoline_module_put(tr);
return ret;
}
@@ -238,9 +215,6 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
tr->func.ftrace_managed = true;
}
if (bpf_trampoline_module_get(tr))
return -ENOENT;
if (tr->func.ftrace_managed) {
ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 1);
ret = register_ftrace_direct(tr->fops, (long)new_addr);
@@ -248,8 +222,6 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
}
if (ret)
bpf_trampoline_module_put(tr);
return ret;
}

File diff suppressed because it is too large Load Diff

View File

@@ -1465,8 +1465,18 @@ static struct cgroup *current_cgns_cgroup_dfl(void)
{
struct css_set *cset;
cset = current->nsproxy->cgroup_ns->root_cset;
return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
if (current->nsproxy) {
cset = current->nsproxy->cgroup_ns->root_cset;
return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
} else {
/*
* NOTE: This function may be called from bpf_cgroup_from_id()
* on a task which has already passed exit_task_namespaces() and
* nsproxy == NULL. Fall back to cgrp_dfl_root which will make all
* cgroups visible for lookups.
*/
return &cgrp_dfl_root.cgrp;
}
}
/* look up cgroup associated with given css_set on the specified hierarchy */

View File

@@ -246,7 +246,6 @@ static inline void kmemleak_load_module(const struct module *mod,
void init_build_id(struct module *mod, const struct load_info *info);
void layout_symtab(struct module *mod, struct load_info *info);
void add_kallsyms(struct module *mod, const struct load_info *info);
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name);
static inline bool sect_empty(const Elf_Shdr *sect)
{

View File

@@ -442,7 +442,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
}
/* Given a module and name of symbol, find and return the symbol's value */
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
static unsigned long __find_kallsyms_symbol_value(struct module *mod, const char *name)
{
unsigned int i;
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
@@ -466,7 +466,7 @@ static unsigned long __module_kallsyms_lookup_name(const char *name)
if (colon) {
mod = find_module_all(name, colon - name, false);
if (mod)
return find_kallsyms_symbol_value(mod, colon + 1);
return __find_kallsyms_symbol_value(mod, colon + 1);
return 0;
}
@@ -475,7 +475,7 @@ static unsigned long __module_kallsyms_lookup_name(const char *name)
if (mod->state == MODULE_STATE_UNFORMED)
continue;
ret = find_kallsyms_symbol_value(mod, name);
ret = __find_kallsyms_symbol_value(mod, name);
if (ret)
return ret;
}
@@ -494,6 +494,16 @@ unsigned long module_kallsyms_lookup_name(const char *name)
return ret;
}
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
{
unsigned long ret;
preempt_disable();
ret = __find_kallsyms_symbol_value(mod, name);
preempt_enable();
return ret;
}
int module_kallsyms_on_each_symbol(const char *modname,
int (*fn)(void *, const char *,
struct module *, unsigned long),

View File

@@ -1453,10 +1453,6 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
NULL : &bpf_probe_read_compat_str_proto;
#endif
#ifdef CONFIG_CGROUPS
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_get_current_ancestor_cgroup_id:
return &bpf_get_current_ancestor_cgroup_id_proto;
case BPF_FUNC_cgrp_storage_get:
return &bpf_cgrp_storage_get_proto;
case BPF_FUNC_cgrp_storage_delete: