Commit Graph

8532 Commits

Author SHA1 Message Date
Arkadiusz Kubalewski d7999f5ea6 ice: implement dpll interface to control cgu
Control over clock generation unit is required for further development
of Synchronous Ethernet feature. Interface provides ability to obtain
current state of a dpll, its sources and outputs which are pins, and
allows their configuration.

Co-developed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:50:20 +01:00
Arkadiusz Kubalewski 8a3a565ff2 ice: add admin commands to access cgu configuration
Add firmware admin command to access clock generation unit
configuration, it is required to enable Extended PTP and SyncE features
in the driver.
Add definitions of possible hardware variations of input and output pins
related to clock generation unit and functions to access the data.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:50:20 +01:00
David S. Miller 319d9c2a93 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next
-queue

Tony Nguyen says:

====================
Support rx-fcs on/off for VFs

Ahmed Zaki says:

Allow the user to turn on/off the CRC/FCS stripping through ethtool. We
first add the CRC offload capability in the virtchannel, then the feature
is enabled in ice and iavf drivers.

We make sure that the netdev features are fixed such that CRC stripping
cannot be disabled if VLAN rx offload (VLAN strip) is enabled. Also, VLAN
stripping cannot be enabled unless CRC stripping is ON.

Testing was done using tcpdump to make sure that the CRC is included in
the frame after:

    # ethtool -K <interface> rx-fcs on

and is not included when it is back "off". Also, ethtool should return an
error for the above command if "rx-vlan-offload" is already on and at least
one VLAN interface/filter exists on the VF.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:46:19 +01:00
David S. Miller b6a7eeb44a Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Introduce Intel IDPF driver

Pavan Kumar Linga says:

This patch series introduces the Intel Infrastructure Data Path Function
(IDPF) driver. It is used for both physical and virtual functions. Except
for some of the device operations the rest of the functionality is the
same for both PF and VF. IDPF uses virtchnl version2 opcodes and
structures defined in the virtchnl2 header file which helps the driver
to learn the capabilities and register offsets from the device
Control Plane (CP) instead of assuming the default values.

The format of the series follows the driver init flow to interface open.
To start with, probe gets called and kicks off the driver initialization
by spawning the 'vc_event_task' work queue which in turn calls the
'hard reset' function. As part of that, the mailbox is initialized which
is used to send/receive the virtchnl messages to/from the CP. Once that is
done, 'core init' kicks in which requests all the required global resources
from the CP and spawns the 'init_task' work queue to create the vports.

Based on the capability information received, the driver creates the said
number of vports (one or many) where each vport is associated to a netdev.
Also, each vport has its own resources such as queues, vectors etc.
From there, rest of the netdev_ops and data path are added.

IDPF implements both single queue which is traditional queueing model
as well as split queue model. In split queue model, it uses separate queue
for both completion descriptors and buffers which helps to implement
out-of-order completions. It also helps to implement asymmetric queues,
for example multiple RX completion queues can be processed by a single
RX buffer queue and multiple TX buffer queues can be processed by a
single TX completion queue. In single queue model, same queue is used
for both descriptor completions as well as buffer completions. It also
supports features such as generic checksum offload, generic receive
offload (hardware GRO) etc.
---
v7:
Patch 2:
 * removed pci_[disable|enable]_pcie_error_reporting as they are dropped
   from the core
Patch 4, 9:
 * used 'kasprintf' instead of 'snprintf' to avoid providing explicit
   character string size which also fixes "-Wformat-truncation" warnings
Patch 14:
 * used 'ethtool_sprintf' instead of 'snprintf' to avoid providing explicit
   character string size which also fixes "-Wformat-truncation" warning
 * add string format argument to the 'ethtool_sprintf' to avoid warning on
   "-Wformat-security"

v6: https://lore.kernel.org/netdev/20230825235954.894050-1-pavan.kumar.linga@intel.com/
Note: 'Acked-by' was only added to patches 1, 2, 12 and not to the other
   patches because of the changes in v6

Patch 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15:
 * renamed 'reset_lock' to 'vport_ctrl_lock' to reflect the lock usage
 * to avoid defensive programming, used 'vport_ctrl_lock' for the user
   callbacks that access the 'vport' to prevent the hardware reset thread
   from releasing the 'vport', when the user callback is in progress
 * added some variables to netdev private structure to avoid vport access
   if possible from ethtool and ndo callbacks
 * moved 'mac_filter_list_lock' and MAC related flags to vport_config
   structure and refactored mac filter flow to handle asynchronous
   ndo mac filter callbacks
 * stop the queues before starting the reset flow to avoid TX hangs
 * removed 'sw_mutex' and 'stop_mutex' as they are not needed anymore
 * added missing clear bit in 'init_task' error path
 * renamed labels appropriately
Patch 8:
 * replaced page_pool_put_page with page_pool_put_full_page
 * for the page pool max_len, used PAGE_SIZE
Patch 10, 11, 13:
 * made use of the 'netif_txq_maybe_stop', '__netif_txq_completed_wake'
   helper macros
Patch 13:
 * removed IDPF_HR_RESET_IN_PROG flag check in idpf_tx_singleq_start
   as it is defensive
Patch 14:
 * removed max descriptor check as the core does that
 * removed unnecessary error messages
 * removed the stats that are common between the ones reported by ethtool
   and ip link
 * replaced snprintf with ethtool_sprintf
 * added a comment to explain the reason for the max queue check
 * as the netdev queues are set on alloc, there is no need to set
   them again on reset unless there is a queue change, so move the
   'idpf_set_real_num_queues' to 'idpf_initiate_soft_reset'
 Patch 15:
 * reworded the 'configure SRIOV' in the commit message

v5: https://lore.kernel.org/netdev/20230816004305.216136-1-anthony.l.nguyen@intel.com/
Most Patches:
 * wrapped line limit to 80 chars to those which don't effect readability
Patch 12:
 * in skb_add_rx_frag, offset 'headlen' w.r.t page_offset when adding a
   frag to avoid adding the header again
Patch 14:
 * added NULL check for 'rxq' when dereferencing it in page_pool_get_stats

v4: https://lore.kernel.org/netdev/20230808003416.3805142-1-anthony.l.nguyen@intel.com/
Patch 1:
 * s/virtcnl/virtchnl
 * removed the kernel doc for the error code definitions that don't exist
 * reworded the summary part in the virtchnl2 header
Patch 3:
 * don't set local variable to NULL on error
 * renamed sq_send_command_out label with err_unlock
 * don't use __GFP_ZERO in dma_alloc_coherent
Patch 4:
 * introduced mailbox workqueue to process mailbox interrupts
Patch 3, 4, 5, 6, 7, 8, 9, 11, 15:
 * removed unnecessary variable 0-init
Patch 3, 5, 7, 8, 9, 15:
 * removed defensive programming checks wherever applicable
 * removed IDPF_CAP_FIELD_LAST as it can be treated as defensive
   programming
Patch 3, 4, 5, 6, 7:
 * replaced IDPF_DFLT_MBX_BUF_SIZE with IDPF_CTLQ_MAX_BUF_LEN
Patch 2 to 15:
 * add kernel-doc for idpf.h and idpf_txrx.h enums and structures
Patch 4, 5, 15:
 * adjusted the destroy sequence of the workqueues as per the alloc
   sequence
Patch 4, 5, 9, 15:
 * scrub unnecessary flags in 'idpf_flags'
   - IDPF_REMOVE_IN_PROG flag can take care of the cases where
     IDPF_REL_RES_IN_PROG is used, removed the later one
   - IDPF_REQ_[TX|RX]_SPLITQ are replaced with struct variables
   - IDPF_CANCEL_[SERVICE|STATS]_TASK are redundant as the work queue
     doesn't get rescheduled again after 'cancel_delayed_work_sync'
   - IDPF_HR_CORE_RESET is removed as there is no set_bit for this flag
   - IDPF_MB_INTR_TRIGGER is removed as it is not needed anymore with the
     mailbox workqueue implementation
Patch 7 to 15:
 * replaced the custom buffer recycling code with page pool API
 * switched the header split buffer allocations from using a bunch of
   pages to using one large chunk of DMA memory
 * reordered some of the flows in vport_open to support page pool
Patch 8, 12:
 * don't suppress the alloc errors by using __GFP_NOWARN
Patch 9:
 * removed dyn_ctl_clrpba_m as it is not being used
Patch 14:
 * introduced enum idpf_vport_reset_cause instead of using vport flags
 * introduced page pool stats

v3: https://lore.kernel.org/netdev/20230616231341.2885622-1-anthony.l.nguyen@intel.com/
Patch 5:
 * instead of void, used 'struct virtchnl2_create_vport' type for
   vport_params_recvd and vport_params_reqd and removed the typecasting
 * used u16/u32 as needed instead of int for variables which cannot be
   negative and updated in all the places whereever applicable
Patch 6:
 * changed the commit message to "add ptypes and MAC filter support"
 * used the sender Signed-off-by as the last tag on all the patches
 * removed unnecessary variables 0-init
 * instead of fixing the code in this commit, fixed it in the commit
   where the change was introduced first
 * moved get_type_info struct on to the stack instead of memory alloc
 * moved mutex_lock and ptype_info memory alloc outside while loop and
   adjusted the return flow
 * used 'break' instead of 'continue' in ptype id switch case

v2: https://lore.kernel.org/netdev/20230614171428.1504179-1-anthony.l.nguyen@intel.com/
Patch 2:
 * added "Intel(R)" to the DRV_SUMMARY and Makefile.
Patch 4, 5, 6, 15:
 * replaced IDPF_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for the
   adapter related virtchnl opcodes.
 * get the mutex lock in the virtchnl send thread itself instead of
   in receive thread.
Patch 5, 6, 7, 8, 9, 11, 14, 15:
 * replaced IDPF_VPORT_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for
   the vport related virtchnl opcodes.
 * get the mutex lock in the virtchnl send thread itself instead of
   in receive thread.
Patch 6:
 * converted get_ptype_info logic from 1:N to 1:1 message exchange for
   better handling of mutex lock.
Patch 15:
 * introduced 'stats_lock' spinlock to avoid concurrent stats update.

v1: https://lore.kernel.org/netdev/20230530234501.2680230-1-anthony.l.nguyen@intel.com/

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 12:00:56 +01:00
Paolo Abeni f2fa1c812c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 19:48:23 +02:00
Joshua Hay a251eee621 idpf: add SRIOV support and other ndo_ops
Add support for SRIOV: send the requested number of VFs
to the device Control Plane, via the virtchnl message
and then enable the VFs using 'pci_enable_sriov'.

Add other ndo ops supported by the driver such as features_check,
set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64,
set_features, and tx_timeout. Initialize the statistics task which
requests the queue related statistics to the CP. Add loopback
and promiscuous mode support and the respective virtchnl messages.

Finally, add documentation and build support for the driver.

Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Alan Brady 02cbfba1ad idpf: add ethtool callbacks
Initialize all the ethtool ops that are supported by the driver and
add the necessary support for the ethtool callbacks. Also add
asynchronous link notification virtchnl support where the device
Control Plane sends the link status and link speed as an
asynchronous event message. Driver report the link speed on
ethtool .idpf_get_link_ksettings query.

Introduce soft reset function which is used by some of the ethtool
callbacks such as .set_channels, .set_ringparam etc. to change the
existing queue configuration. It deletes the existing queues by sending
delete queues virtchnl message to the CP and calls the 'vport_stop' flow
which disables the queues, vport etc. New set of queues are requested to
the CP and reconfigure the queue context by calling the 'vport_open'
flow. Soft reset flow also adjusts the number of vectors associated to a
vport if .set_channels is called.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Joshua Hay a5ab9ee0df idpf: add singleq start_xmit and napi poll
Add the start_xmit, TX and RX napi poll support for the single queue
model. Unlike split queue model, single queue uses same queue to post
buffer descriptors and completed descriptors.

Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Alan Brady 3a8845af66 idpf: add RX splitq napi poll support
Add support to handle interrupts for the RX completion queue and
RX buffer queue. When the interrupt fires on RX completion queue,
process the RX descriptors that are received. Allocate and prepare
the SKB with the RX packet info, for both data and header buffer.

IDPF uses software maintained refill queues to manage buffers between
RX queue producer and the buffer queue consumer. They are required in
order to maintain a lockless buffer management system and are strictly
software only constructs. Instead of updating the RX buffer queue tail
with available buffers right after the clean routine, it posts the
buffer ids to the refill queues, only to post them to the HW later.

If the generic receive offload (GRO) is enabled in the capabilities
and turned on by default or via ethtool, then HW performs the
packet coalescing if certain criteria are met by the incoming
packets and updates the RX descriptor. Similar to GRO, if generic
checksum is enabled, HW computes the checksum and updates the
respective fields in the descriptor. Add support to update the
SKB fields with the GRO and the generic checksum received.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Joshua Hay c2d548cad1 idpf: add TX splitq napi poll support
Add support to handle the interrupts for the TX completion queue and
process the various completion types.

In the flow scheduling mode, the driver processes primarily buffer
completions as well as descriptor completions occasionally. This mode
supports out of order TX completions. To do so, HW generates one buffer
completion per packet. Each of those completions contains the unique tag
provided during the TX encoding which is used to locate the packet either
on the TX buffer ring or in a hash table. The hash table is used to track
TX buffer information so the descriptor(s) for a given packet can be
reused while the driver is still waiting on the buffer completion(s).

Packets end up in the hash table in one of 2 ways: 1) a packet was
stashed during descriptor completion cleaning, or 2) because an out of
order buffer completion was processed. A descriptor completion arrives
only every so often and is primarily used to guarantee the TX descriptor
ring can be reused without having to wait on the individual buffer
completions. E.g. a descriptor completion for N+16 guarantees HW read all
of the descriptors for packets N through N+15, therefore all of the
buffers for packets N through N+15 are stashed into the hash table and the
descriptors can be reused for more TX packets. Similarly, a packet can be
stashed in the hash table because an out an order buffer completion was
processed. E.g. processing a buffer completion for packet N+3 implies that
HW read all of the descriptors for packets N through N+3 and they can be
reused. However, the HW did not do the DMA yet. The buffers for packets N
through N+2 cannot be freed, so they are stashed in the hash table.
In either case, the buffer completions will eventually be processed for
all of the stashed packets, and all of the buffers will be cleaned from
the hash table.

In queue based scheduling mode, the driver processes primarily descriptor
completions and cleans the TX ring the conventional way.

Finally, the driver triggers a TX queue drain after sending the disable
queues virtchnl message. When the HW completes the queue draining, it
sends the driver a queue marker packet completion. The driver determines
when all TX queues have been drained and proceeds with the disable flow.

With this, the driver can send TX packets and clean up the resources
properly.

Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Joshua Hay 6818c4d5b3 idpf: add splitq start_xmit
Add start_xmit support for split queue model. To start with, add the
necessary checks to linearize the skb if it uses more number of
buffers than the hardware supported limit. Stop the transmit queue
if there are no enough descriptors available for the skb to use or
if there we're going to potentially overrun the completion queue.
Finally prepare the descriptor with all the required
information and update the tail.

Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Pavan Kumar Linga d4d5587182 idpf: initialize interrupts and enable vport
To further continue 'vport open', initialize all the resources
required for the interrupts. To start with, initialize the
queue vector indices with the ones received from the device
Control Plane. Now that all the TX and RX queues are initialized,
map the RX descriptor and buffer queues as well as TX completion
queues to the allocated vectors. Initialize and enable the napi
handler for the napi polling. Finally, request the IRQs for the
interrupt vectors from the stack and setup the interrupt handler.

Once the interrupt init is done, send 'map queue vector', 'enable
queues' and 'enable vport' virtchnl messages to the CP to complete
the 'vport open' flow.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Alan Brady 95af467d9a idpf: configure resources for RX queues
Similar to the TX, RX also supports both single and split queue models.
In single queue model, the same descriptor queue is used by SW to post
buffer descriptors to HW and by HW to post completed descriptors
to SW. In split queue model, "RX buffer queues" are used to pass
descriptor buffers from SW to HW whereas "RX queues" are used to
post the descriptor completions i.e. descriptors that point to
completed buffers, from HW to SW. "RX queue group" is a set of
RX queues grouped together and will be serviced by a "RX buffer queue
group". IDPF supports 2 buffer queues i.e. large buffer (4KB) queue
and small buffer (2KB) queue per buffer queue group. HW uses large
buffers for 'hardware gro' feature and also if the packet size is
more than 2KB, if not 2KB buffers are used.

Add all the resources required for the RX queues initialization.
Allocate memory for the RX queue and RX buffer queue groups. Initialize
the software maintained refill queues for buffer management algorithm.

Same like the TX queues, initialize the queue parameters for the RX
queues and send the config RX queue virtchnl message to the device
Control Plane.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:24 -07:00
Alan Brady 1c325aac10 idpf: configure resources for TX queues
IDPF supports two queue models i.e. single queue which is a traditional
queueing model as well as split queue model. In single queue model,
the same descriptor queue is used by SW to post descriptors to the HW,
HW to post completed descriptors to SW. In split queue model, "TX Queues"
are used to pass buffers from SW to HW and "TX Completion Queues"
are used to post descriptor completions from HW to SW. Device supports
asymmetric ratio of TX queues to TX completion queues. Considering
this, queue group mechanism is used i.e. some TX queues are grouped
together which will be serviced by only one TX completion queue
per TX queue group.

Add all the resources required for the TX queues initialization.
To start with, allocate memory for the TX queue groups, TX queues and
TX completion queues. Then, allocate the descriptors for both TX and
TX completion queues, and bookkeeping buffers for TX queues alone.
Also, allocate queue vectors for the vport and initialize the TX queue
related fields for each queue vector.

Initialize the queue parameters such as q_id, q_type and tail register
offset with the info received from the device control plane (CP).
Once all the TX queues are configured, send config TX queue virtchnl
message to the CP with all the TX queue context information.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Pavan Kumar Linga ce1b75d063 idpf: add ptypes and MAC filter support
Add the virtchnl support to request the packet types. Parse the responses
received from CP and based on the protocol headers, populate the packet
type structure with necessary information. Initialize the MAC address
and add the virtchnl support to add and del MAC address.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Pavan Kumar Linga 0fe45467a1 idpf: add create vport and netdev configuration
Add the required support to create a vport by spawning
the init task. Once the vport is created, initialize and
allocate the resources needed for it. Configure and register
a netdev for each vport with all the features supported
by the device based on the capabilities received from the
device Control Plane. Spawn the init task till all the default
vports are created.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Pavan Kumar Linga 4930fbf419 idpf: add core init and interrupt request
As the mailbox is setup, add the necessary send and receive
mailbox message framework to support the virtchnl communication
between the driver and device Control Plane (CP).

Add the core initialization. To start with, driver confirms the
virtchnl version with the CP. Once that is done, it requests
and gets the required capabilities and resources needed such as
max vectors, queues etc.

Based on the vector information received in 'VIRTCHNL2_OP_GET_CAPS',
request the stack to allocate the required vectors. Finally add
the interrupt handling mechanism for the mailbox queue and enable
the interrupt.

Note: Checkpatch issues a warning about IDPF_FOREACH_VPORT_VC_STATE and
IDPF_GEN_STRING being complex macros and should be enclosed in parentheses
but it's not the case. They are never used as a statement and instead only
used to define the enum and array.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Joshua Hay 8077c72756 idpf: add controlq init and reset checks
At the end of the probe, initialize and schedule the event workqueue.
It calls the hard reset function where reset checks are done to find
if the device is out of the reset. Control queue initialization and
the necessary control queue support is added.

Introduce function pointers for the register operations which are
different between PF and VF devices.

Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Phani Burra e850efed5e idpf: add module register and probe functionality
Add the required support to register IDPF PCI driver, as well as
probe and remove call backs. Enable the PCI device and request
the kernel to reserve the memory resources that will be used by the
driver. Finally map the BAR0 address space.

Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Pavan Kumar Linga 0d7502a9b4 virtchnl: add virtchnl version 2 ops
Virtchnl version 1 is an interface used by the current generation of
foundational NICs to negotiate the capabilities and configure the
HW resources such as queues, vectors, RSS LUT, etc between the PF
and VF drivers. It is not extensible to enable new features supported
in the next generation of NICs/IPUs and to negotiate descriptor types,
packet types and register offsets.

To overcome the limitations of the existing interface, introduce
the virtchnl version 2 and add the necessary opcodes, structures,
definitions, and descriptor formats. The driver also learns the
data queue and other register offsets to use instead of hardcoding
them. The advantage of this approach is that it gives the flexibility
to modify the register offsets if needed, restrict the use of
certain descriptor types and negotiate the supported packet types.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Co-developed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 14:59:23 -07:00
Norbert Zulinski 7559d67242 iavf: Add ability to turn off CRC stripping for VF
Previously CRC stripping was always enabled for VF.

Now it is possible to turn off CRC stripping via ethtool:

    #ethtool -K <interface> rx-fcs on

To turn off CRC stripping, first VLAN stripping must be disabled:

    #ethtool -K <interface> rx-vlan-offload off

if any VLAN interfaces exists, otherwise VLAN stripping will be turned
off by the driver.

In iavf_configure_queues add check if CRC stripping is enabled for
VF, if it's enabled then set crc_disabled to false on every VF's
queue. In iavf_set_features add check if CRC stripping setting was
changed then schedule reset.

Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 08:59:14 -07:00
Haiyue Wang 7bd48d8d41 ice: Check CRC strip requirement for VLAN strip
When VLAN strip is enabled, the CRC strip must not be disabled. And when
the CRC strip is disabled, the VLAN strip should not be enabled.

The driver needs to check CRC strip disable setting parameter before
configuring the Rx/Tx queues, otherwise, in current error handling,
the already set Tx queue context doesn't roll back correctly, it will
cause the Tx queue setup failure next time:
"Failed to set LAN Tx queue context"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 08:58:23 -07:00
Haiyue Wang 730cb74181 ice: Support FCS/CRC strip disable for VF
To support CRC strip enable/disable functionality, VF needs the explicit
request VIRTCHNL_VF_OFFLOAD_CRC offload. Then according to crc_disable
flag of Rx queue configuration information to set up the queue context.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13 08:57:28 -07:00
Corinna Vinschen bc6ed2fa24 igb: clean up in all error paths when enabling SR-IOV
After commit 50f303496d ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.

In case of one test machine with a dual port 82580, this hang occurred:

[  232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[  233.093257] igb 0000:41:00.1: IOV Disabled
[  233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[  233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.352248] igb 0000:41:00.0:   device [8086:1516] error status/mask=00100000
[  233.361088] igb 0000:41:00.0:    [20] UnsupReq               (First)
[  233.368183] igb 0000:41:00.0: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.388779] igb 0000:41:00.1:   device [8086:1516] error status/mask=00100000
[  233.397629] igb 0000:41:00.1:    [20] UnsupReq               (First)
[  233.404736] igb 0000:41:00.1: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[  233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[  233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[  234.157244] igb 0000:41:00.0: IOV Disabled
[  371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[  371.627489]       Not tainted 6.4.0-dirty #2
[  371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[  371.641000] task:irq/35-aerdrv   state:D stack:0     pid:257   ppid:2      f0
[  371.650330] Call Trace:
[  371.653061]  <TASK>
[  371.655407]  __schedule+0x20e/0x660
[  371.659313]  schedule+0x5a/0xd0
[  371.662824]  schedule_preempt_disabled+0x11/0x20
[  371.667983]  __mutex_lock.constprop.0+0x372/0x6c0
[  371.673237]  ? __pfx_aer_root_reset+0x10/0x10
[  371.678105]  report_error_detected+0x25/0x1c0
[  371.682974]  ? __pfx_report_normal_detected+0x10/0x10
[  371.688618]  pci_walk_bus+0x72/0x90
[  371.692519]  pcie_do_recovery+0xb2/0x330
[  371.696899]  aer_process_err_devices+0x117/0x170
[  371.702055]  aer_isr+0x1c0/0x1e0
[  371.705661]  ? __set_cpus_allowed_ptr+0x54/0xa0
[  371.710723]  ? __pfx_irq_thread_fn+0x10/0x10
[  371.715496]  irq_thread_fn+0x20/0x60
[  371.719491]  irq_thread+0xe6/0x1b0
[  371.723291]  ? __pfx_irq_thread_dtor+0x10/0x10
[  371.728255]  ? __pfx_irq_thread+0x10/0x10
[  371.732731]  kthread+0xe2/0x110
[  371.736243]  ? __pfx_kthread+0x10/0x10
[  371.740430]  ret_from_fork+0x2c/0x50
[  371.744428]  </TASK>

The reproducer was a simple script:

  #!/bin/sh
  for i in `seq 1 5`; do
    modprobe -rv igb
    modprobe -v igb max_vfs=1
    sleep 1
    modprobe -rv igb
  done

It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210.  Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.

Prior to commit 50f303496d ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch.  After this
commit it only returned the error code.

So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.

Fix this problem by cleaning up again if pci_enable_sriov() fails.

Fixes: 50f303496d ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:24:29 +01:00
Vadim Fedorenko 3c44191dd7 ixgbe: fix timestamp configuration code
The commit in fixes introduced flags to control the status of hardware
configuration while processing packets. At the same time another structure
is used to provide configuration of timestamper to user-space applications.
The way it was coded makes this structures go out of sync easily. The
repro is easy for 82599 chips:

[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 12

The eth0 device is properly configured to timestamp any PTPv2 events.

[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
current settings:
tx_type 1
rx_filter 12
SIOCSHWTSTAMP failed: Numerical result out of range
The requested time stamping mode is not supported by the hardware.

The error is properly returned because HW doesn't support all packets
timestamping. But the adapter->flags is cleared of timestamp flags
even though no HW configuration was done. From that point no RX timestamps
are received by user-space application. But configuration shows good
values:

[root@hostname ~]# hwstamp_ctl -i eth0
current settings:
tx_type 1
rx_filter 12

Fix the issue by applying new flags only when the HW was actually
configured.

Fixes: a9763f3cb5 ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:23:46 +01:00
Brett Creeley 221465de6b iavf: Fix promiscuous mode configuration flow messages
Currently when configuring promiscuous mode on the AVF we detect a
change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to
determine whether or not we need to request/release promiscuous mode
and/or multicast promiscuous mode. The problem is that the AQ calls for
setting/clearing promiscuous/multicast mode are treated separately. This
leads to a case where we can trigger two promiscuous mode AQ calls in
a row with the incorrect state. To fix this make a few changes.

Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous
IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags.

In iavf_set_rx_mode() detect if there is a change in the
netdev->flags in comparison with adapter->flags and set the
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in
iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE
and call iavf_set_promiscuous() if it's set.

In iavf_set_promiscuous() check again to see which (if any) promiscuous
mode bits have changed when comparing the netdev->flags with the
adapter->flags. Use this to set the flags which get sent to the PF
driver.

Add a spinlock that is used for updating current_netdev_promisc_flags
and only allows one promiscuous mode AQ at a time.

[1] Fixes the fact that we will only have one AQ call in the aq_required
queue at any one time.

[2] Streamlines the change in promiscuous mode to only set one AQ
required bit.

[3] This allows us to keep track of the current state of the flags and
also makes it so we can take the most recent netdev->flags promiscuous
mode state.

[4] This fixes the problem where a change in the netdev->flags can cause
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(),
but cleared in iavf_set_promiscuous() before the change is ever made via
AQ call.

Fixes: 47d3483988 ("i40evf: Add driver support for promiscuous mode")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11 08:53:20 -07:00
Andrii Staikov 5ca636d927 i40e: fix potential memory leaks in i40e_remove()
Instead of freeing memory of a single VSI, make sure
the memory for all VSIs is cleared before releasing VSIs.
Add releasing of their resources in a loop with the iteration
number equal to the number of allocated VSIs.

Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11 08:53:08 -07:00
Olga Zaborska 6319685bdc igb: Change IGB_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb devices can use as low as 64 descriptors.
This change will unify igb with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")

Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05 10:52:25 -07:00
Olga Zaborska 8360717524 igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igbvf devices can use as low as 64 descriptors.
This change will unify igbvf with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")

Fixes: d4e0fe01a3 ("igbvf: add new driver to support 82576 virtual functions")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05 10:52:08 -07:00
Olga Zaborska 5aa4827971 igc: Change IGC_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igc devices can use as low as 64 descriptors.
This change will unify igc with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")

Fixes: 0507ef8a03 ("igc: Add transmit and receive fastpath and interrupt handlers")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05 10:52:00 -07:00
Corinna Vinschen fa09bc40b2 igb: disable virtualization features on 82580
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.

Fixes: 55cac248ca ("igb: Add full support for 82580 devices")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-03 14:07:40 +01:00
Paolo Abeni c873512ef3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.6 net-next PR.

No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-29 07:44:56 +02:00
Radoslaw Tyl bb5ed01cd2 igb: set max size RX buffer when store bad packet is enabled
Increase the RX buffer size to 3K when the SBP bit is on. The size of
the RX buffer determines the number of pages allocated which may not
be sufficient for receive frames larger than the set MTU size.

Cc: stable@vger.kernel.org
Fixes: 89eaefb61d ("igb: Support RX-ALL feature flag.")
Reported-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 07:02:44 +01:00
Jacob Keller 0aacec49c2 ice: avoid executing commands on other ports when driving sync
The ice hardware has a synchronization mechanism used to drive the
simultaneous application of commands on both PHY ports and the source timer
in the MAC.

When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will
simultaneously apply the commands programmed for the main timer and each
PHY port. Neither the main timer command register, nor the PHY port command
registers auto clear on command execution.

During the execution of a timer command intended for a single port on E822
devices, such as those used to configure a PHY during link up, the driver
is not correctly clearing the previous commands.

This results in unintentionally executing the last programmed command on
the main timer and other PHY ports whenever performing reconfiguration on
E822 ports after link up. This results in unintended side effects on other
timers, depending on what command was previously programmed.

To fix this, the driver must ensure that the main timer and all other PHY
ports are properly initialized to perform no action.

The enumeration for timer commands does not include an enumeration value
for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a
timer command to hardware, leave the command bits set to zero which
indicates that no operation should be performed on that port.

Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports
other than the one being configured, write their timer command register to
ICE_PTP_NOP. This ensures that no side effect happens on the timer command.

To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always
initialize all other ports to ICE_PTP_NOP. This ensures that no side
effects happen on the other ports.

Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in
ice_sync_phy_timer_e822() and ice_start_phy_timer_e822().

With both of these changes, the driver should no longer execute a stale
command on the main timer or another PHY port when reconfiguring one of the
PHY ports after link up.

Fixes: 3a7496234d ("ice: implement basic E822 PTP support")
Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25 10:06:27 +01:00
Sasha Neftin 1fe4f45ea4 e1000e: Add support for the next LOM generation
Add devices IDs for the next LOM generations that will be available on the
next Intel Client platforms. This patch provides the initial support for
these devices.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24 12:55:25 -07:00
Sasha Neftin 6b8aa753a9 igc: Decrease PTM short interval from 10 us to 1 us
With the 10us interval, we were seeing PTM transactions take around 12us.
Hardware team suggested this interval could be lowered to 1us which was
confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took
around 2us.

Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24 12:55:25 -07:00
Vinicius Costa Gomes 3ed247e789 igc: Add support for multiple in-flight TX timestamps
Add support for using the four sets of timestamping registers that
i225/i226 have available for TX.

In some workloads, where multiple applications request hardware
transmission timestamps, it was possible that some of those requests
were denied because the only in use register was already occupied.

This is also in preparation to future support for hardware
timestamping with multiple PTP domains. With multiple domains chances
of multiple TX timestamps being requested at the same time increase.

Before:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
               |          responses            |     TX timestamp offset (ns)
rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
1000       100   0.00%   0.00%   0.00% 100.00%       +1     +41     +73     13
1500       150   0.00%   0.00%   0.00% 100.00%       +9     +49     +87     15
2250       225   0.00%   0.00%   0.00% 100.00%       +9     +42     +79     13
3375       337   0.00%   0.00%   0.00% 100.00%      +11     +46     +81     13
5062       506   0.00%   0.00%   0.00% 100.00%       +7     +44     +80     13
7593       759   0.00%   0.00%   0.00% 100.00%       +9     +44     +79     12
11389     1138   0.00%   0.00%   0.00% 100.00%      +14     +51     +87     13
17083     1708   0.00%   0.00%   0.00% 100.00%       +1     +41     +80     14
25624     2562   0.00%   0.00%   0.00% 100.00%      +11     +50   +5107     51
38436     3843   0.00%   0.00%   0.00% 100.00%       -2     +36   +7843     38
57654     5765   0.00%   0.00%   0.00% 100.00%       +4     +42  +10503     69
86481     8648   0.00%   0.00%   0.00% 100.00%      +11     +54   +5492     65
129721   12972   0.00%   0.00%   0.00% 100.00%      +31   +2680   +6942   2606
194581   16384  16.79%   0.00%   0.87%  82.34%      +73   +4444  +15879   3116
291871   16384  35.05%   0.00%   1.53%  63.42%     +188   +5381  +17019   3035
437806   16384  54.95%   0.00%   2.55%  42.50%     +233   +6302  +13885   2846

After:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
               |          responses            |     TX timestamp offset (ns)
rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
1000       100   0.00%   0.00%   0.00% 100.00%      -20     +12     +43     13
1500       150   0.00%   0.00%   0.00% 100.00%      -23     +18     +57     14
2250       225   0.00%   0.00%   0.00% 100.00%       -2     +33     +67     13
3375       337   0.00%   0.00%   0.00% 100.00%       +1     +38     +76     13
5062       506   0.00%   0.00%   0.00% 100.00%       +9     +52     +93     14
7593       759   0.00%   0.00%   0.00% 100.00%      +11     +47     +82     13
11389     1138   0.00%   0.00%   0.00% 100.00%       -9     +27     +74     13
17083     1708   0.00%   0.00%   0.00% 100.00%      -13     +25     +66     14
25624     2562   0.00%   0.00%   0.00% 100.00%       -8     +28     +65     13
38436     3843   0.00%   0.00%   0.00% 100.00%      -13     +28     +69     13
57654     5765   0.00%   0.00%   0.00% 100.00%      -11     +32     +71     14
86481     8648   0.00%   0.00%   0.00% 100.00%       +2     +44     +83     14
129721   12972  15.36%   0.00%   0.35%  84.29%       -2   +2248  +22907   4252
194581   16384  42.98%   0.00%   1.98%  55.04%       -4   +5278  +65039   5856
291871   16384  54.33%   0.00%   2.21%  43.46%       -3   +6306  +22608   5665

We can see that with 4 registers, as expected, we are able to handle a
increasing number of requests more consistently, but as soon as all
registers are in use, the decrease in quality of service happens in a
sharp step.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24 12:55:24 -07:00
Jakub Kicinski 57ce6427e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/net/inet_sock.h
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  c274af2242 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/

Adjacent changes:

drivers/net/bonding/bond_alb.c
  e74216b8de ("bonding: fix macvlan over alb bond support")
  f11e5bd159 ("bonding: support balance-alb with openvswitch")

drivers/net/ethernet/broadcom/bgmac.c
  d6499f0b7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
  23a14488ea ("net: bgmac: Fix return value check for fixed_phy_register()")

drivers/net/ethernet/broadcom/genet/bcmmii.c
  32bbe64a13 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
  acf50d1adb ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")

net/sctp/socket.c
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  b09bde5c35 ("inet: move inet->mc_loop to inet->inet_frags")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 10:51:39 -07:00
Andrii Staikov 9525a3c38a i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.

Fixes: c87c938f62 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 11:48:04 +01:00
Sasha Neftin de43975721 igc: Fix the typo in the PTM Control macro
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
requests. The bit resolution of this field is six bits. That bit five was
missing in the mask. This patch comes to correct the typo in the
IGC_PTM_CTRL_SHRT_CYC macro.

Fixes: a90ec84837 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:32:15 -07:00
Alessio Igor Bogani b888c510f7 igb: Avoid starting unnecessary workqueues
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
PTP related workqueues.

In this way we can fix this:
 BUG: unable to handle page fault for address: ffffc9000440b6f8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP
 [...]
 Workqueue: events igb_ptp_overflow_check
 RIP: 0010:igb_rd32+0x1f/0x60
 [...]
 Call Trace:
  igb_ptp_read_82580+0x20/0x50
  timecounter_read+0x15/0x60
  igb_ptp_overflow_check+0x1a/0x50
  process_one_work+0x1cb/0x3c0
  worker_thread+0x53/0x3f0
  ? rescuer_thread+0x370/0x370
  kthread+0x142/0x160
  ? kthread_associate_blkcg+0xc0/0xc0
  ret_from_fork+0x1f/0x30

Fixes: 1f6e8178d6 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
Fixes: d339b13316 ("igb: add PTP Hardware Clock code")
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 17:24:28 -07:00
Petr Oros 67f6317dfa ice: Fix NULL pointer deref during VF reset
During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.

More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and cause panic. The window for this racing condition
has been around for a long time but it's much worse after commit
227bf4500a ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.

Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()

With our reproducer, we can hit BUG:
~8h before commit 227bf4500a ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500a ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h

There was commit cf90b74341 ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.

[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6420.665382] #PF: supervisor read access in kernel mode
[ 6420.670521] #PF: error_code(0x0000) - not-present page
[ 6420.675659] PGD 0
[ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
[ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
[ 6420.698729] Workqueue: ice ice_service_task [ice]
[ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
[ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
[ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
[ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
[ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
[ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
[ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
[ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
[ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
[ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
[ 6420.783742] FS:  0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
[ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
[ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
[ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6420.811841] PKRU: 55555554
[ 6420.811842] Call Trace:
[ 6420.811843]  <TASK>
[ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
[ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
[ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
[ 6420.845884]  ? __schedule+0x212/0x550
[ 6420.849550]  process_one_work+0x1e2/0x3b0
[ 6420.853563]  ? rescuer_thread+0x390/0x390
[ 6420.857577]  worker_thread+0x50/0x3a0
[ 6420.861242]  ? rescuer_thread+0x390/0x390
[ 6420.865253]  kthread+0xdd/0x100
[ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
[ 6420.873194]  ret_from_fork+0x1f/0x30
[ 6420.876774]  </TASK>
[ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
 r

Fixes: efe4186000 ("ice: Fix memory corruption in VF driver")
Fixes: f23df5220d ("ice: Fix spurious interrupt during removal of trusted VF")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21 09:27:02 -07:00
Petr Oros 0ecff05e6c Revert "ice: Fix ice VF reset during iavf initialization"
This reverts commit 7255355a06.

After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable

ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.

Fixes: 7255355a06 ("ice: Fix ice VF reset during iavf initialization")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21 09:25:59 -07:00
Jesse Brandeburg 10083aef78 ice: fix receive buffer size miscalculation
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.

Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.

The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.

Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21 09:19:47 -07:00
Jakub Kicinski c6cfc6cd76 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-08-17 (ice)

This series contains updates to ice driver only.

Jan removes unused functions and refactors code to make, possible,
functions static.

Jake rearranges some functions to be logically grouped.

Marcin removes an unnecessary call to disable VLAN stripping.

Yang Yingliang utilizes list_for_each_entry() helper for a couple list
traversals.

Przemek removes some parameters from ice_aq_alloc_free_res() which were
always the same and reworks ice_aq_wait_for_event() to reduce chance of
race.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: split ice_aq_wait_for_event() func into two
  ice: embed &ice_rq_event_info event into struct ice_aq_task
  ice: ice_aq_check_events: fix off-by-one check when filling buffer
  ice: drop two params from ice_aq_alloc_free_res()
  ice: use list_for_each_entry() helper
  ice: Remove redundant VSI configuration in eswitch setup
  ice: move E810T functions to before device agnostic ones
  ice: refactor ice_vsi_is_vlan_pruning_ena
  ice: refactor ice_ptp_hw to make functions static
  ice: refactor ice_sched to make functions static
  ice: Utilize assign_bit() helper
  ice: refactor ice_vf_lib to make functions static
  ice: refactor ice_lib to make functions static
  ice: refactor ice_ddp to make functions static
  ice: remove unused methods
====================

Link: https://lore.kernel.org/r/20230817212239.2601543-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:09:02 -07:00
Jakub Kicinski 74f9d556f9 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
virtchnl: fix fake 1-elem arrays

Alexander Lobakin says:

6.5-rc1 started spitting warning splats when composing virtchnl
messages, precisely on virtchnl_rss_key and virtchnl_lut:

[   84.167709] memcpy: detected field-spanning write (size 52) of single
field "vrk->key" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1095
(size 1)
[   84.169915] WARNING: CPU: 3 PID: 11 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1095 iavf_set_rss_key+0x123/0x140 [iavf]
...
[   84.191982] Call Trace:
[   84.192439]  <TASK>
[   84.192900]  ? __warn+0xc9/0x1a0
[   84.193353]  ? iavf_set_rss_key+0x123/0x140 [iavf]
[   84.193818]  ? report_bug+0x12c/0x1b0
[   84.194266]  ? handle_bug+0x42/0x70
[   84.194714]  ? exc_invalid_op+0x1a/0x50
[   84.195149]  ? asm_exc_invalid_op+0x1a/0x20
[   84.195592]  ? iavf_set_rss_key+0x123/0x140 [iavf]
[   84.196033]  iavf_watchdog_task+0xb0c/0xe00 [iavf]
...
[   84.225476] memcpy: detected field-spanning write (size 64) of single
field "vrl->lut" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1127
(size 1)
[   84.227190] WARNING: CPU: 27 PID: 1044 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1127 iavf_set_rss_lut+0x123/0x140 [iavf]
...
[   84.246601] Call Trace:
[   84.247228]  <TASK>
[   84.247840]  ? __warn+0xc9/0x1a0
[   84.248263]  ? iavf_set_rss_lut+0x123/0x140 [iavf]
[   84.248698]  ? report_bug+0x12c/0x1b0
[   84.249122]  ? handle_bug+0x42/0x70
[   84.249549]  ? exc_invalid_op+0x1a/0x50
[   84.249970]  ? asm_exc_invalid_op+0x1a/0x20
[   84.250390]  ? iavf_set_rss_lut+0x123/0x140 [iavf]
[   84.250820]  iavf_watchdog_task+0xb16/0xe00 [iavf]

Gustavo already tried to fix those back in 2021[0][1]. Unfortunately,
a VM can run a different kernel than the host, meaning that those
structures are sorta ABI.
However, it is possible to have proper flex arrays + struct_size()
calculations and still send the very same messages with the same sizes.
The common rule is:

elem[1] -> elem[]
size = struct_size() + <difference between the old and the new msg size>

The "old" size in the current code is calculated 3 different ways for
10 virtchnl structures total. Each commit addresses one of the ways
cumulatively instead of per-structure.

I was planning to send it to -net initially, but given that virtchnl was
renamed from i40evf and got some fat style cleanup commits in the past,
it's not very straightforward to even pick appropriate SHAs, not
speaking of automatic portability. I may send manual backports for
a couple of the latest supported kernels later on if anyone needs it
at all.

[0] https://lore.kernel.org/all/20210525230912.GA175802@embeddedor
[1] https://lore.kernel.org/all/20210525231851.GA176647@embeddedor

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  virtchnl: fix fake 1-elem arrays for structures allocated as `nents`
  virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`
  virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1
====================

Link: https://lore.kernel.org/r/20230816210657.1326772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 15:22:05 -07:00
Jakub Kicinski 7ff57803d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 12:44:56 -07:00
Jakub Kicinski 820a38d8f2 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e)

This series contains updates to iavf and i40e drivers.

Piotr adds checks for unsupported Flow Director rules on iavf.

Andrii replaces incorrect 'write' messaging on read operations for i40e.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: fix misleading debug logs
  iavf: fix FDIR rule fields masks validation
====================

Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 14:35:34 -07:00
Przemek Kitszel fb9840c4ec ice: split ice_aq_wait_for_event() func into two
Mitigate race between registering on wait list and receiving
AQ Response from FW.

ice_aq_prep_for_event() should be called before sending AQ command,
ice_aq_wait_for_event() should be called after sending AQ command,
to wait for AQ Response.

Please note, that this was found by reading the code,
an actual race has not yet materialized.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17 13:59:46 -07:00
Przemek Kitszel b214b98a7f ice: embed &ice_rq_event_info event into struct ice_aq_task
Expose struct ice_aq_task to callers,
what takes burden of memory ownership out from AQ-wait family of functions,
and reduces need for heap-based allocations.

Embed struct ice_rq_event_info event into struct ice_aq_task
(instead of it being a ptr) to remove some more code from the callers.

Subsequent commit will improve more based on this one.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17 13:59:46 -07:00