linux

Author	SHA1	Message	Date
Nicolas Dichtel	3632679d9e	ipv{4,6}/raw: fix output xfrm lookup wrt protocol With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-23 15:38:59 +02:00
Takashi Iwai	03a58514d4	Merge branch 'topic/midi20' into for-next This is a (largish) patch set for adding the support of MIDI 2.0 functionality, mainly targeted for USB devices. MIDI 2.0 is a complete overhaul of the 40-years old MIDI 1.0. Unlike MIDI 1.0 byte stream, MIDI 2.0 uses packets in 32bit words for Universal MIDI Packet (UMP) protocol. It supports both MIDI 1.0 commands for compatibility and the extended MIDI 2.0 commands for higher resolutions and more functions. For supporting the UMP, the patch set extends the existing ALSA rawmidi and sequencer interfaces, and adds the USB MIDI 2.0 support to the standard USB-audio driver. The rawmidi for UMP has a different device name (/dev/snd/umpCD) and it reads/writes UMP packet data in 32bit CPU-native endianness. For the old MIDI 1.0 applications, the legacy rawmidi interface is provided, too. As default, USB-audio driver will take the alternate setting for MIDI 2.0 interface, and the compatibility with MIDI 1.0 is provided via the rawmidi common layer. However, user may let the driver falling back to the old MIDI 1.0 interface by a module option, too. A UMP-capable rawmidi device can create the corresponding ALSA sequencer client(s) to support the UMP Endpoint and UMP Group connections. As a nature of ALSA sequencer, arbitrary connections between clients/ports are allowed, and the ALSA sequencer core performs the automatic conversions for the connections between a new UMP sequencer client and a legacy MIDI 1.0 sequencer client. It allows the existing application to use MIDI 2.0 devices without changes. The MIDI-CI, which is another major extension in MIDI 2.0, isn't covered by this patch set. It would be implemented rather in user-space. Roughly speaking, the first half of this patch set is for extending the rawmidi and USB-audio, and the second half is for extending the ALSA sequencer interface. The patch set is based on 6.4-rc2 kernel, but all patches can be cleanly applicable on 6.2 and 6.3 kernels, too (while 6.1 and older kernels would need minor adjustment for uapi header changes). The updates for alsa-lib and alsa-utils will follow shortly later. The author thanks members of MIDI Association OS/API Working Group, especially Andrew Mee, for great helps for the initial design and debugging / testing the drivers. Link: https://lore.kernel.org/r/20230523075358.9672-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 14:14:47 +02:00
Takashi Iwai	d2b7060777	ALSA: seq: Add UMP group filter Add a new filter bitmap for UMP groups for reducing the unnecessary read/write when the client is connected to UMP EP seq port. The new group_filter field contains the bitmap for the groups, i.e. when the bit is set, the corresponding group is filtered out and the messages to that group won't be delivered. The filter bitmap consists of each bit of 1-based UMP Group number. The bit 0 is reserved for the future use. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-37-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:38 +02:00
Takashi Iwai	d2d247e35e	ALSA: seq: Add ioctls for client UMP info query and setup Add new ioctls for sequencer clients to query and set the UMP endpoint and block information. As a sequencer client corresponds to a UMP Endpoint, one UMP Endpoint information can be assigned at most to a single sequencer client while multiple UMP block infos can be assigned by passing the type with the offset of block id (i.e. type = block_id + 1). For the kernel client, only SNDRV_SEQ_IOCTL_GET_CLIENT_UMP_INFO is allowed. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-35-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:36 +02:00
Takashi Iwai	81fd444aa3	ALSA: seq: Bind UMP device This patch introduces a new ALSA sequencer client for the kernel UMP object, snd-seq-ump-client. It's a UMP version of snd-seq-midi driver, while this driver creates a sequencer client per UMP endpoint which contains (fixed) 16 ports. The UMP rawmidi device is opened in APPEND mode for output, so that multiple sequencer clients can share the same UMP endpoint, as well as the legacy UMP rawmidi devices that are opened in APPEND mode, too. For input, on the other hand, the incoming data is processed on the fly in the dedicated hook, hence it doesn't open a rawmidi device. The UMP packet group is updated upon delivery depending on the target sequencer port (which corresponds to the actual UMP group). Each sequencer port sets a new port type bit, SNDRV_SEQ_PORT_TYPE_MIDI_UMP, in addition to the other standard types for MIDI. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-33-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:33 +02:00
Takashi Iwai	329ffe11a0	ALSA: seq: Allow suppressing UMP conversions A sequencer client like seq_dummy rather doesn't want to convert UMP events but receives / sends as is. Add a new event filter flag to suppress the automatic UMP conversion and applies accordingly. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-32-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:32 +02:00
Takashi Iwai	a3ca3b3080	ALSA: seq: Add UMP group number to snd_seq_port_info Add yet more new filed "ump_group" to snd_seq_port_info for specifying the associated UMP Group number for each sequencer port. This will be referred in the upcoming automatic UMP conversion in sequencer core. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-30-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:27 +02:00
Takashi Iwai	ff166a9d19	ALSA: seq: Add port direction to snd_seq_port_info Add a new field "direction" to snd_seq_port_info for allowing a client to tell the expected direction of the port access. A port might still allow subscriptions for read/write (e.g. for MIDI-CI) even if the primary usage of the port is a single direction (either input or output only). This new "direction" field can help to indicate such cases. When the direction is unspecified at creating a port and the port has either read or write capability, the corresponding direction bits are set automatically as default. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-29-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:25 +02:00
Takashi Iwai	177ccf811d	ALSA: seq: Support MIDI 2.0 UMP Endpoint port This is an extension to ALSA sequencer infrastructure to support the MIDI 2.0 UMP Endpoint port. It's a "catch-all" port that is supposed to be present for each UMP Endpoint. When this port is read via subscription, it sends any events from all ports (UMP Groups) found in the same client. A UMP Endpoint port can be created with the new capability bit SNDRV_SEQ_PORT_CAP_UMP_ENDPOINT. Although the port assignment isn't strictly defined, it should be the port number 0. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-28-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:24 +02:00
Takashi Iwai	74661932ac	ALSA: seq: Add port inactive flag This extends the ALSA sequencer port capability bit to indicate the "inactive" flag. When this flag is set, the port is essentially invisible, and doesn't appear in the port query ioctls, while the direct access and the connection to this port are still allowed. The active/inactive state can be flipped dynamically, so that it can be visible at any time later. This feature is introduced basically for UMP; some UMP Groups in a UMP Block may be unassigned, hence those are practically invisible. On ALSA sequencer, the corresponding sequencer ports will get this new "inactive" flag to indicate the invisible state. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-27-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:23 +02:00
Takashi Iwai	46397622a3	ALSA: seq: Add UMP support Starting from this commit, we add the basic support of UMP (Universal MIDI Packet) events on ALSA sequencer infrastructure. The biggest change here is that, for transferring UMP packets that are up to 128 bits, we extend the data payload of ALSA sequencer event record when the client is declared to support for the new UMP events. A new event flag bit, SNDRV_SEQ_EVENT_UMP, is defined and it shall be set for the UMP packet events that have the larger payload of 128 bits, defined as struct snd_seq_ump_event. For controlling the UMP feature enablement in kernel, a new Kconfig, CONFIG_SND_SEQ_UMP is introduced. The extended event for UMP is available only when this Kconfig item is set. Similarly, the size of the internal snd_seq_event_cell also increases (in 4 bytes) when the Kconfig item is set. (But the size increase is effective only for 32bit architectures; 64bit archs already have padding there.) Overall, when CONFIG_SND_SEQ_UMP isn't set, there is no change in the event and cell, keeping the old sizes. For applications that want to access the UMP packets, first of all, a sequencer client has to declare the user-protocol to match with the latest one via the new SNDRV_SEQ_IOCTL_USER_PVERSION; otherwise it's treated as if a legacy client without UMP support. Then the client can switch to the new UMP mode (MIDI 1.0 or MIDI 2.0) with a new field, midi_version, in snd_seq_client_info. When switched to UMP mode (midi_version = 1 or 2), the client can write the UMP events with SNDRV_SEQ_EVENT_UMP flag. For reads, the alignment size is changed from snd_seq_event (28 bytes) to snd_seq_ump_event (32 bytes). When a UMP sequencer event is delivered to a legacy sequencer client, it's ignored or handled as an error. Conceptually, ALSA sequencer client and port correspond to the UMP Endpoint and Group, respectively; each client may have multiple ports and each port has the fixed number (16) of channels, total up to 256 channels. As of this commit, ALSA sequencer core just sends and receives the UMP events as-is from/to clients. The automatic conversions between the legacy events and the new UMP events will be implemented in a later patch. Along with this commit, bump the sequencer protocol version to 1.0.3. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-26-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:21 +02:00
Takashi Iwai	afb72505e4	ALSA: seq: Introduce SNDRV_SEQ_IOCTL_USER_PVERSION ioctl For the future extension of ALSA sequencer ABI, introduce a new ioctl SNDRV_SEQ_IOCTL_USER_PVERSION. This is similar like the ioctls used in PCM and other interfaces, for an application to specify its supporting ABI version. The use of this ioctl will be mandatory for the upcoming UMP support. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-25-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:11:19 +02:00
Takashi Iwai	30fc139260	ALSA: ump: Add ioctls to inquiry UMP EP and Block info via control API It'd be convenient to have ioctls to inquiry the UMP Endpoint and UMP Block information directly via the control API without opening the rawmidi interface, just like SNDRV_CTL_IOCTL_RAWMIDI_INFO. This patch extends the rawmidi ioctl handler to support those; new ioctls, SNDRV_CTL_IOCTL_UMP_ENDPOINT_INFO and SNDRV_CTL_IOCTL_UMP_BLOCK_INFO, return the snd_ump_endpoint and snd_ump_block data that is specified by the device field, respectively. Suggested-by: Jaroslav Kysela <perex@perex.cz> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-6-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:10:58 +02:00
Takashi Iwai	127ae6f6da	ALSA: rawmidi: Skip UMP devices at SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICE Applications may look for rawmidi devices with the ioctl SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICE. Returning a UMP device from this ioctl may confuse the existing applications that support only the legacy rawmidi. This patch changes the code to skip the UMP devices from the lookup for avoiding the confusion, and introduces a new ioctl to look for the UMP devices instead. Along with this change, bump the CTL protocol version to 2.0.9. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-5-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:10:57 +02:00
Takashi Iwai	e3a8a5b726	ALSA: rawmidi: UMP support This patch adds the support helpers for UMP (Universal MIDI Packet) in ALSA core. The basic design is that a rawmidi instance is assigned to each UMP Endpoint. A UMP Endpoint provides a UMP stream, typically bidirectional (but can be also uni-directional, too), which may hold up to 16 UMP Groups, where each UMP (input/output) Group corresponds to the traditional MIDI I/O Endpoint. Additionally, the ALSA UMP abstraction provides the multiple UMP Blocks that can be assigned to each UMP Endpoint. A UMP Block is a metadata to hold the UMP Group clusters, and can represent the functions assigned to each UMP Group. A typical implementation of UMP Block is the Group Terminal Blocks of USB MIDI 2.0 specification. For distinguishing from the legacy byte-stream MIDI device, a new device "umpCD" will be created, instead of the standard (MIDI 1.0) devices "midiCD". The UMP instance can be identified by the new rawmidi info bit SNDRV_RAWMIDI_INFO_UMP, too. A UMP rawmidi device reads/writes only in 4-bytes words alignment, stored in CPU native endianness. The transmit and receive functions take care of the input/out data alignment, and may return zero or aligned size, and the params ioctl may return -EINVAL when the given input/output buffer size isn't aligned. A few new UMP-specific ioctls are added for obtaining the new UMP endpoint and block information. As of this commit, no ALSA sequencer instance is attached to UMP devices yet. They will be supported by later patches. Along with those changes, the protocol version for rawmidi is bumped to 2.0.3. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-4-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-23 12:10:54 +02:00
Damien Le Moal	6c91325722	scsi: block: Introduce ioprio hints I/O priorities currently only use 6-bits of the 16-bits ioprio value: the 3-upper bits are used to define up to 8 priority classes (4 of which are valid) and the 3 lower bits of the value are used to define a priority level for the real-time and best-effort class. The remaining 10-bits between the I/O priority class and level are unused, and in fact, cannot be used by the user as doing so would either result in the value being completely ignored, or in an error returned by ioprio_check_cap(). Use these 10-bits of an ioprio value to allow a user to specify I/O hints. An I/O hint is defined as a 10-bitsvalue, allowing up to 1023 different hints to be specified, with the value 0 being reserved as the "no hint" case. An I/O hint can apply to any I/O that specifies a valid priority class other than NONE, regardless of the I/O priority level specified. To do so, the macros IOPRIO_PRIO_HINT() and IOPRIO_PRIO_VALUE_HINT() are introduced in include/uapi/linux/ioprio.h to respectively allow a user to get and set a hint in an ioprio value. To support the ATA and SCSI command duration limits feature, 7 hints are defined: IOPRIO_HINT_DEV_DURATION_LIMIT_1 to IOPRIO_HINT_DEV_DURATION_LIMIT_7, allowing a user to specify which command duration limit descriptor should be applied to the commands serving an I/O. Specifying these hints has for now no effect whatsoever if the target block devices do not support the command duration limits feature. However, in the future, block I/O schedulers can be modified to optimize I/O issuing order based on these hints, even for devices that do not support the command duration limits feature. Given that the 7 duration limits hints defined have no effect on any block layer component, the actual definition of the duration limits implied by these hints remains at the device level. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-3-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-22 17:05:18 -04:00
Damien Le Moal	eca2040972	scsi: block: ioprio: Clean up interface definition The I/O priority user interface defines the 16-bits ioprio values as the combination of the upper 3-bits for an I/O priority class and the lower 13-bits as priority data. However, the kernel only uses the lower 3-bits of the priority data to define priority levels for the RT and BE priority classes. The data part of an ioprio value is completely ignored for the IDLE and NONE classes. This is enforced by checks done in ioprio_check_cap(), which is called for all paths that allow defining an I/O priority for I/Os: the per-context ioprio_set() system call, aio interface and io_uring interface. Clarify this fact in the uapi ioprio.h header file and introduce the IOPRIO_PRIO_LEVEL_MASK and IOPRIO_PRIO_LEVEL() macros for users to define and get priority levels in an ioprio value. The coarser macro IOPRIO_PRIO_DATA() is retained for backward compatibility with old applications already using it. There is no functional change introduced with this. In-kernel users of the IOPRIO_PRIO_DATA() macro which are explicitly handling I/O priority data as a priority level are modified to use the new IOPRIO_PRIO_LEVEL() macro without any functional change. Since f2fs is the only user of this macro not explicitly using that value as a priority level, it is left unchanged. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-2-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-22 17:05:18 -04:00
Tvrtko Ursulin	bc4be0a38b	drm/i915/pmu: Prepare for multi-tile non-engine counters Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch v4: Set I915_PMU_MAX_GTS to 2 (Tvrtko) v5: s/u64/u32 where needed (Ashutosh) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230519154946.3751971-7-umesh.nerlige.ramappa@intel.com	2023-05-22 11:07:52 -07:00
Cezary Rojewski	9510965747	ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfg Constant 'C4_CHANNEL' does not exist on the firmware side. Value 0xC is reserved for 'C7_1' instead. Fixes: `04afbbbb1c` ("ASoC: Intel: Skylake: Update the topology interface structure") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Link: https://lore.kernel.org/r/20230519201711.4073845-4-amadeuszx.slawinski@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>	2023-05-22 11:18:22 +01:00
Ming Lei	1172d5b8be	ublk: support user copy Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:59:17 -06:00
Ming Lei	62fe99cef9	ublk: add read()/write() support for ublk char device Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. IO buffer can be located by iocb->ki_pos which encodes buffer offset, io tag and queue id info, and type of iocb->ki_pos is u64, so it is big enough for holding reasonable queue depth, nr_queues and max io buffer size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:59:17 -06:00
Christian Brauner	6ac3928156	fs: allow to mount beneath top mount Various distributions are adding or are in the process of adding support for system extensions and in the future configuration extensions through various tools. A more detailed explanation on system and configuration extensions can be found on the manpage which is listed below at [1]. System extension images may – dynamically at runtime — extend the /usr/ and /opt/ directory hierarchies with additional files. This is particularly useful on immutable system images where a /usr/ and/or /opt/ hierarchy residing on a read-only file system shall be extended temporarily at runtime without making any persistent modifications. When one or more system extension images are activated, their /usr/ and /opt/ hierarchies are combined via overlayfs with the same hierarchies of the host OS, and the host /usr/ and /opt/ overmounted with it ("merging"). When they are deactivated, the mount point is disassembled — again revealing the unmodified original host version of the hierarchy ("unmerging"). Merging thus makes the extension's resources suddenly appear below the /usr/ and /opt/ hierarchies as if they were included in the base OS image itself. Unmerging makes them disappear again, leaving in place only the files that were shipped with the base OS image itself. System configuration images are similar but operate on directories containing system or service configuration. On nearly all modern distributions mount propagation plays a crucial role and the rootfs of the OS is a shared mount in a peer group (usually with peer group id 1): TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:1 29 1 On such systems all services and containers run in a separate mount namespace and are pivot_root()ed into their rootfs. A separate mount namespace is almost always used as it is the minimal isolation mechanism services have. But usually they are even much more isolated up to the point where they almost become indistinguishable from containers. Mount propagation again plays a crucial role here. The rootfs of all these services is a slave mount to the peer group of the host rootfs. This is done so the service will receive mount propagation events from the host when certain files or directories are updated. In addition, the rootfs of each service, container, and sandbox is also a shared mount in its separate peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:24 master:1 71 47 For people not too familiar with mount propagation, the master:1 means that this is a slave mount to peer group 1. Which as one can see is the host rootfs as indicated by shared:1 above. The shared:24 indicates that the service rootfs is a shared mount in a separate peer group with peer group id 24. A service may run other services. Such nested services will also have a rootfs mount that is a slave to the peer group of the outer service rootfs mount. For containers things are just slighly different. A container's rootfs isn't a slave to the service's or host rootfs' peer group. The rootfs mount of a container is simply a shared mount in its own peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /home/ubuntu/debian-tree / ext4 shared:99 61 60 So whereas services are isolated OS components a container is treated like a separate world and mount propagation into it is restricted to a single well known mount that is a slave to the peer group of the shared mount /run on the host: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /propagate/debian-tree /run/host/incoming tmpfs master:5 71 68 Here, the master:5 indicates that this mount is a slave to the peer group with peer group id 5. This allows to propagate mounts into the container and served as a workaround for not being able to insert mounts into mount namespaces directly. But the new mount api does support inserting mounts directly. For the interested reader the blogpost in [2] might be worth reading where I explain the old and the new approach to inserting mounts into mount namespaces. Containers of course, can themselves be run as services. They often run full systems themselves which means they again run services and containers with the exact same propagation settings explained above. The whole system is designed so that it can be easily updated, including all services in various fine-grained ways without having to enter every single service's mount namespace which would be prohibitively expensive. The mount propagation layout has been carefully chosen so it is possible to propagate updates for system extensions and configurations from the host into all services. The simplest model to update the whole system is to mount on top of /usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc will then propagate into every service. This works cleanly the first time. However, when the system is updated multiple times it becomes necessary to unmount the first update on /opt, /usr, /etc and then propagate the new update. But this means, there's an interval where the old base system is accessible. This has to be avoided to protect against downgrade attacks. The vfs already exposes a mechanism to userspace whereby mounts can be mounted beneath an existing mount. Such mounts are internally referred to as "tucked". The patch series exposes the ability to mount beneath a top mount through the new MOVE_MOUNT_BENEATH flag for the move_mount() system call. This allows userspace to seamlessly upgrade mounts. After this series the only thing that will have changed is that mounting beneath an existing mount can be done explicitly instead of just implicitly. Today, there are two scenarios where a mount can be mounted beneath an existing mount instead of on top of it: (1) When a service or container is started in a new mount namespace and pivot_root()s into its new rootfs. The way this is done is by mounting the new rootfs beneath the old rootfs: fd_newroot = open("/var/lib/machines/fedora", ...); fd_oldroot = open("/", ...); fchdir(fd_newroot); pivot_root(".", "."); After the pivot_root(".", ".") call the new rootfs is mounted beneath the old rootfs which can then be unmounted to reveal the underlying mount: fchdir(fd_oldroot); umount2(".", MNT_DETACH); Since pivot_root() moves the caller into a new rootfs no mounts must be propagated out of the new rootfs as a consequence of the pivot_root() call. Thus, the mounts cannot be shared. (2) When a mount is propagated to a mount that already has another mount mounted on the same dentry. The easiest example for this is to create a new mount namespace. The following commands will create a mount namespace where the rootfs mount / will be a slave to the peer group of the host rootfs / mount's peer group. IOW, it will receive propagation from the host: mount --make-shared / unshare --mount --propagation=slave Now a new mount on the /mnt dentry in that mount namespace is created. (As it can be confusing it should be spelled out that the tmpfs mount on the /mnt dentry that was just created doesn't propagate back to the host because the rootfs mount / of the mount namespace isn't a peer of the host rootfs.): mount -t tmpfs tmpfs /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt tmpfs tmpfs Now another terminal in the host mount namespace can observe that the mount indeed hasn't propagated back to into the host mount namespace. A new mount can now be created on top of the /mnt dentry with the rootfs mount / as its parent: mount --bind /opt /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 shared:1 The mount namespace that was created earlier can now observe that the bind mount created on the host has propagated into it: TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 master:1 └─/mnt tmpfs tmpfs But instead of having been mounted on top of the tmpfs mount at the /mnt dentry the /opt mount has been mounted on top of the rootfs mount at the /mnt dentry. And the tmpfs mount has been remounted on top of the propagated /opt mount at the /opt dentry. So in other words, the propagated mount has been mounted beneath the preexisting mount in that mount namespace. Mount namespaces make this easy to illustrate but it's also easy to mount beneath an existing mount in the same mount namespace (The following example assumes a shared rootfs mount / with peer group id 1): mount --bind /opt /opt TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/opt] ext4 188 29 shared:1 If another mount is mounted on top of the /opt mount at the /opt dentry: mount --bind /tmp /opt The following clunky mount tree will result: TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/tmp] ext4 405 29 shared:1 └─/opt /dev/sda2[/opt] ext4 188 405 shared:1 └─/opt /dev/sda2[/tmp] ext4 404 188 shared:1 The /tmp mount is mounted beneath the /opt mount and another copy is mounted on top of the /opt mount. This happens because the rootfs / and the /opt mount are shared mounts in the same peer group. When the new /tmp mount is supposed to be mounted at the /opt dentry then the /tmp mount first propagates to the root mount at the /opt dentry. But there already is the /opt mount mounted at the /opt dentry. So the old /opt mount at the /opt dentry will be mounted on top of the new /tmp mount at the /tmp dentry, i.e. @opt->mnt_parent is @tmp and @opt->mnt_mountpoint is /tmp (Note that @opt->mnt_root is /opt which is what shows up as /opt under SOURCE). So again, a mount will be mounted beneath a preexisting mount. (Fwiw, a few iterations of mount --bind /opt /opt in a loop on a shared rootfs is a good example of what could be referred to as mount explosion.) The main point is that such mounts allows userspace to umount a top mount and reveal an underlying mount. So for example, umounting the tmpfs mount on /mnt that was created in example (1) using mount namespaces reveals the /opt mount which was mounted beneath it. In (2) where a mount was mounted beneath the top mount in the same mount namespace unmounting the top mount would unmount both the top mount and the mount beneath. In the process the original mount would be remounted on top of the rootfs mount / at the /opt dentry again. This again, is a result of mount propagation only this time it's umount propagation. However, this can be avoided by simply making the parent mount / of the @opt mount a private or slave mount. Then the top mount and the original mount can be unmounted to reveal the mount beneath. These two examples are fairly arcane and are merely added to make it clear how mount propagation has effects on current and future features. More common use-cases will just be things like: mount -t btrfs /dev/sdA /mnt mount -t xfs /dev/sdB --beneath /mnt umount /mnt after which we'll have updated from a btrfs filesystem to a xfs filesystem without ever revealing the underlying mountpoint. The crux is that the proposed mechanism already exists and that it is so powerful as to cover cases where mounts are supposed to be updated with new versions. Crucially, it offers an important flexibility. Namely that updates to a system may either be forced or can be delayed and the umount of the top mount be left to a service if it is a cooperative one. This adds a new flag to move_mount() that allows to explicitly move a beneath the top mount adhering to the following semantics: * Mounts cannot be mounted beneath the rootfs. This restriction encompasses the rootfs but also chroots via chroot() and pivot_root(). To mount a mount beneath the rootfs or a chroot, pivot_root() can be used as illustrated above. * The source mount must be a private mount to force the kernel to allocate a new, unused peer group id. This isn't a required restriction but a voluntary one. It avoids repeating a semantical quirk that already exists today. If bind mounts which already have a peer group id are inserted into mount trees that have the same peer group id this can cause a lot of mount propagation events to be generated (For example, consider running mount --bind /opt /opt in a loop where the parent mount is a shared mount.). * Avoid getting rid of the top mount in the kernel. Cooperative services need to be able to unmount the top mount themselves. This also avoids a good deal of additional complexity. The umount would have to be propagated which would be another rather expensive operation. So namespace_lock() and lock_mount_hash() would potentially have to be held for a long time for both a mount and umount propagation. That should be avoided. * The path to mount beneath must be mounted and attached. * The top mount and its parent must be in the caller's mount namespace and the caller must be able to mount in that mount namespace. * The caller must be able to unmount the top mount to prove that they could reveal the underlying mount. * The propagation tree is calculated based on the destination mount's parent mount and the destination mount's mountpoint on the parent mount. Of course, if the parent of the destination mount and the destination mount are shared mounts in the same peer group and the mountpoint of the new mount to be mounted is a subdir of their ->mnt_root then both will receive a mount of /opt. That's probably easier to understand with an example. Assuming a standard shared rootfs /: mount --bind /opt /opt mount --bind /tmp /opt will cause the same mount tree as: mount --bind /opt /opt mount --beneath /tmp /opt because both / and /opt are shared mounts/peers in the same peer group and the /opt dentry is a subdirectory of both the parent's and the child's ->mnt_root. If a mount tree like that is created it almost always is an accident or abuse of mount propagation. Realistically what most people probably mean in this scenarios is: mount --bind /opt /opt mount --make-private /opt mount --make-shared /opt This forces the allocation of a new separate peer group for the /opt mount. Aferwards a mount --bind or mount --beneath actually makes sense as the / and /opt mount belong to different peer groups. Before that it's likely just confusion about what the user wanted to achieve. * Refuse MOVE_MOUNT_BENEATH if: (1) the @mnt_from has been overmounted in between path resolution and acquiring @namespace_sem when locking @mnt_to. This avoids the proliferation of shadow mounts. (2) if @to_mnt is moved to a different mountpoint while acquiring @namespace_sem to lock @to_mnt. (3) if @to_mnt is unmounted while acquiring @namespace_sem to lock @to_mnt. (4) if the parent of the target mount propagates to the target mount at the same mountpoint. This would mean mounting @mnt_from on @mnt_to->mnt_parent and then propagating a copy @c of @mnt_from onto @mnt_to. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. (5) if the parent mount @mnt_to->mnt_parent propagates to @mnt_from at the same mountpoint. If @mnt_to->mnt_parent propagates to @mnt_from this would mean propagating a copy @c of @mnt_from on top of @mnt_from. Afterwards @mnt_from would be mounted on top of @mnt_to->mnt_parent and @mnt_to would be unmounted from @mnt->mnt_parent and remounted on @mnt_from. But since @c is already mounted on @mnt_from, @mnt_to would ultimately be remounted on top of @c. Afterwards, @mnt_from would be covered by a copy @c of @mnt_from and @c would be covered by @mnt_from itself. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. Cases (1) to (3) are required as they deal with races that would cause bugs or unexpected behavior for users. Cases (4) and (5) refuse semantical quirks that would not be a bug but would cause weird mount trees to be created. While they can already be created via other means (mount --bind /opt /opt x n) there's no reason to repeat past mistakes in new features. Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [1] Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [2] Link: https://github.com/flatcar/sysext-bakery Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1 Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2 Link: https://github.com/systemd/systemd/pull/26013 Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Message-Id: <20230202-fs-move-mount-replace-v4-4-98f3d80d7eaa@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-05-19 04:30:22 +02:00
Jeremy Sowden	b9f9a485fb	netfilter: nft_exthdr: add boolean DCCP option matching The xt_dccp iptables module supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend nft_exthdr to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-05-18 08:48:54 +02:00
Rodrigo Vivi	9c3a985f88	Merge drm/drm-next into drm-intel-next Backmerge to get some hwmon dependencies. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-05-17 09:30:24 -04:00
Ricardo Koller	2f440b72e8	KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Add a capability for userspace to specify the eager split chunk size. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230426172330.1439644-6-ricarkol@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-05-16 17:39:18 +00:00
Josh Triplett	6e76ac5958	io_uring: Add io_uring_setup flag to pre-register ring fd and never install it With IORING_REGISTER_USE_REGISTERED_RING, an application can register the ring fd and use it via registered index rather than installed fd. This allows using a registered ring for everything except the initial mmap. With IORING_SETUP_NO_MMAP, io_uring_setup uses buffers allocated by the user, rather than requiring a subsequent mmap. The combination of the two allows a user to operate entirely via a registered ring fd, making it unnecessary to ever install the fd in the first place. So, add a flag IORING_SETUP_REGISTERED_FD_ONLY to make io_uring_setup register the fd and return a registered index, without installing the fd. This allows an application to avoid touching the fd table at all, and allows a library to never even momentarily install a file descriptor. This splits out an io_ring_add_registered_file helper from io_ring_add_registered_fd, for use by io_uring_setup. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-16 08:06:00 -06:00
Jens Axboe	03d89a2de2	io_uring: support for user allocated memory for rings/sqes Currently io_uring applications must call mmap(2) twice to map the rings themselves, and the sqes array. This works fine, but it does not support using huge pages to back the rings/sqes. Provide a way for the application to pass in pre-allocated memory for the rings/sqes, which can then suitably be allocated from shmfs or via mmap to get huge page support. Particularly for larger rings, this reduces the TLBs needed. If an application wishes to take advantage of that, it must pre-allocate the memory needed for the sq/cq ring, and the sqes. The former must be passed in via the io_uring_params->cq_off.user_data field, while the latter is passed in via the io_uring_params->sq_off.user_data field. Then it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field, and io_uring will then map the existing memory into the kernel for shared use. The application must not call mmap(2) to map rings as it otherwise would have, that will now fail with -EINVAL if this setup flag was used. The pages used for the rings and sqes must be contigious. The intent here is clearly that huge pages should be used, otherwise the normal setup procedure works fine as-is. The application may use one huge page for both the rings and sqes. Outside of those initialization changes, everything works like it did before. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-16 08:04:55 -06:00
Oswald Buddenhagen	1298bc978a	ALSA: emu10k1: enable bit-exact playback, part 1: DSP attenuation Fractional multiplication with the maximal value 2^31-1 causes some tiny distortion. Instead, we want to multiply with the full 2^31. The catch is of course that this cannot be represented in the DSP's signed 32 bit registers. One way to deal with this is to encode 1.0 as a negative number and special-case it. As a matter of fact, the SbLive! code path already contained such code, though the controls never actually exercised it. A more efficient approach is to use negative values, which actually extend to -2^31. Accordingly, for all the volume adjustments we now use the MAC1 instruction which negates the X operand. The range of the controls in highres mode is extended downwards, so -1 is the new zero/mute. At maximal excursion, real zero is not mute any more, but I don't think anyone will notice this behavior change. ;-) That also required making the min/max/values in the control structs signed. This technically changes the user space interface, but it seems implausible that someone would notice - the numbers were actually treated as if they were signed anyway (and in the actual mixer iface they _are_). And without this change, the min value didn't even make sense in the first place (and no-one noticed, because it was always 0). Tested-by: Jonathan Dowland <jon@dow.land> Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Link: https://lore.kernel.org/r/20230514170323.3408834-7-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-15 22:06:21 +02:00
Ranjani Sridharan	be3c215342	ASoC: SOF: Separate the tokens for input and output pin index Using the same token ID for both input and output format pin index results in collisions and incorrect pin index getting parsed from topology. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com Reviewed-by: Paul Olaru <paul.olaru@oss.nxp.com Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20230515104403.32207-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org	2023-05-15 20:10:17 +09:00
Juha-Pekka Heikkila	c7c12de893	drm/fourcc: define Intel Meteorlake related ccs modifiers Add Tile4 type ccs modifiers with aux buffer needed for MTL Bspec: 49251, 49252, 49253 Cc: dri-devel@lists.freedesktop.org Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230514184240.6184-1-juhapekka.heikkila@gmail.com	2023-05-15 11:33:12 +03:00
Athanasios Oikonomou	2a3f9d4def	media: dvb: bump DVB API version Bump the DVB API version in order userspace to be aware of the changes recently implemented in enumerations for DVB-S2(X) and DVB-C2. Related: commit `6508a50fe8` ("media: dvb: add DVB-C2 and DVB-S2X parameter values") Link: https://lore.kernel.org/linux-media/20230110071421.31498-1-athoik@gmail.com Cc: Robert Schlabbach <robert_s@gmx.net> Signed-off-by: Athanasios Oikonomou <athoik@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>	2023-05-14 16:05:28 +01:00
Athanasios Oikonomou	1825788e2a	media: dvb: add missing DVB-S2X FEC parameter values This commit is adding the missing short FEC Missed on commit `6508a50fe8` More info: https://dvb.org/wp-content/uploads/2021/02/A083-2r2_DVB-S2X_Draft-EN-302-307-2-v131_Feb_2021.pdf Table 1: S2X System configurations and application areas Please note that 128APSK, 256APSK and 256APSK-L and FEC 29/45, 31/45 are still missing from enums. Link: https://lore.kernel.org/linux-media/20230111194608.1853-1-athoik@gmail.com Cc: Robert Schlabbach <robert_s@gmx.net> Cc: Tom Richardson <trichardson@availink.com> Signed-off-by: Athanasios Oikonomou <athoik@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>	2023-05-14 16:05:28 +01:00
Vladimir Nikishkin	69474a8a58	net: vxlan: Add nolocalbypass option to vxlan. If a packet needs to be encapsulated towards a local destination IP, the packet will undergo a "local bypass" and be injected into the Rx path as if it was received by the target VXLAN device without undergoing encapsulation. If such a device does not exist, the packet will be dropped. There are scenarios where we do not want to perform such a bypass, but instead want the packet to be encapsulated and locally received by a user space program for post-processing. To that end, add a new VXLAN device attribute that controls whether a "local bypass" is performed or not. Default to performing a bypass to maintain existing behavior. Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-13 17:02:33 +01:00
Chuck Lever	eefca7ec51	net/handshake: Enable the SNI extension to work properly Enable the upper layer protocol to specify the SNI peername. This avoids the need for tlshd to use a DNS lookup, which can return a hostname that doesn't match the incoming certificate's SubjectName. Fixes: `2fd5532044` ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-12 09:24:08 +01:00
Alan Previn	d1da138f24	drm/i915/uapi/pxp: Add a GET_PARAM for PXP Because of the additional firmware, component-driver and initialization depedencies required on MTL platform before a PXP context can be created, UMD calling for PXP creation as a way to get-caps can take a long time. An actual real world customer stack has seen this happen in the 4-to-8 second range after the kernel starts (which sees MESA's init appear in the middle of this range as the compositor comes up). To avoid unncessary delays experienced by the UMD for get-caps purposes, add a GET_PARAM for I915_PARAM_PXP_SUPPORT. However, some failures can still occur after all the depedencies are met (such as firmware init flow failure, bios configurations or SOC fusing not allowing PXP enablement). Those scenarios will only be known to user space when it attempts creating a PXP context and is documented in the GEM UAPI headers. While making this change, create a helper that is common to both GET_PARAM caller and intel_pxp_start since the latter does similar checks. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230511231738.1077674-7-alan.previn.teres.alexis@intel.com	2023-05-11 17:26:30 -07:00
Alan Previn	99afb7cc8c	drm/i915/pxp: Add ARB session creation and cleanup Add MTL's function for ARB session creation using PXP firmware version 4.3 ABI structure format. While relooking at the ARB session creation flow in intel_pxp_start, let's address missing UAPI documentation. Without actually changing backward compatible behavior, update i915's drm-uapi comments that describe the possible error values when creating a context with I915_CONTEXT_PARAM_PROTECTED_CONTENT: Since the first merge of PXP support on ADL, i915 returns -ENXIO if a dependency such as firmware or component driver was yet to be loaded or returns -EIO if the creation attempt failed when requested by the PXP firmware (specific firmware error responses are reported in dmesg). Add MTL's function for ARB session invalidation but this reuses PXP firmware version 4.2 ABI structure format. For both cases, in the back-end gsccs functions for sending messages to the firmware inspect the GSC-CS-Mem-Header's pending-bit which means the GSC firmware is busy and we should retry. Given the last hw requirement, lets also update functions in front-end layer that wait for session creation or teardown completion to use new worst case timeout periods. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230511231738.1077674-6-alan.previn.teres.alexis@intel.com	2023-05-11 17:26:29 -07:00
Maxime Ripard	ff32fcca64	Merge drm/drm-next into drm-misc-next Start the 6.5 release cycle. Signed-off-by: Maxime Ripard <maxime@cerno.tech>	2023-05-09 15:03:40 +02:00
Jaroslav Kysela	a4bb75c4f1	ALSA: uapi: pcm: control the filling of the silence samples for drain Introduce SNDRV_PCM_INFO_PERFECT_DRAIN and SNDRV_PCM_HW_PARAMS_NO_DRAIN_SILENCE flags to fully control the filling of the silence samples in the drain ioctl. Actually, the configurable silencing is going to be implemented in the user space [1], but drivers (hardware) may not require this operation. Those flags do the bidirectional setup for this operation: 1) driver may notify the presence of the perfect drain 2) user space may not require the filling of the silence samples to inhibit clicks If we decide to move this operation to the kernel space in future, the SNDRV_PCM_INFO_PERFECT_DRAIN flag may handle this situation without double "silence" processing (user + kernel space). The ALSA API should be universal, so forcing the behaviour (modifying of the ring buffer with any samples) for the drain operation is not ideal. [1] https://lore.kernel.org/alsa-devel/20230502115010.986325-1-perex@perex.cz/ [ fixed a typo in comment by tiwai ] Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230502115536.986900-1-perex@perex.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-05-08 15:23:01 +02:00
Linus Torvalds	a3b111b046	Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()	2023-05-06 08:28:58 -07:00
Linus Torvalds	7994beabfb	Merge tag 'dmaengine-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Apple admac t8112 device support - StarFive JH7110 DMA controller Updates: - Big pile of idxd updates to support IAA 2.0 device capabilities, DSA 2.0 Event Log and completion record faulting features and new DSA operations - at_xdmac supend & resume updates and driver code cleanup - k3-udma supend & resume support - k3-psil thread support for J784s4" * tag 'dmaengine-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (57 commits) dmaengine: idxd: add per wq PRS disable dmaengine: idxd: add pid to exported sysfs attribute for opened file dmaengine: idxd: expose fault counters to sysfs dmaengine: idxd: add a device to represent the file opened dmaengine: idxd: add per file user counters for completion record faults dmaengine: idxd: process batch descriptor completion record faults dmaengine: idxd: add descs_completed field for completion record dmaengine: idxd: process user page faults for completion record dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling dmaengine: idxd: create kmem cache for event log fault items dmaengine: idxd: add per DSA wq workqueue for processing cr faults dmanegine: idxd: add debugfs for event log dump dmaengine: idxd: add interrupt handling for event log dmaengine: idxd: setup event log configuration dmaengine: idxd: add event log size sysfs attribute dmaengine: idxd: make misc interrupt one shot dt-bindings: dma: snps,dw-axi-dmac: constrain the items of resets for JH7110 dma dt-bindings: dma: Drop unneeded quotes dmaengine: at_xdmac: align declaration of ret with the rest of variables dmaengine: at_xdmac: add a warning message regarding for unpaused channels ...	2023-05-03 11:11:56 -07:00
Linus Torvalds	c8c655c34e	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void " throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...	2023-05-01 12:06:20 -07:00
Linus Torvalds	7acc137211	Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull compute express link updates from Dan Williams: "DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's blessing, and the CXL core continues to mature its media management capabilities with support for listing and injecting media errors. Some late fixes that missed v6.3-final are also included: - Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle mailbox) to be a facility of the PCI core rather than the CXL core. This is foundational for upcoming support for PCI device-attestation and PCIe / CXL link encryption. - Add support for retrieving and injecting poison for CXL memory expanders. This enabling uses trace-events to convey CXL media error records to user tooling. It includes translation of device-local addresses (DPA) to system physical addresses (SPA) and their corresponding CXL region. - Fixes for decoder enumeration that missed v6.3-final - Miscellaneous fixups" * tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits) cxl/test: Add mock test for set_timestamp cxl/mbox: Update CMD_RC_TABLE tools/testing/cxl: Require CONFIG_DEBUG_FS tools/testing/cxl: Add a sysfs attr to test poison inject limits tools/testing/cxl: Use injected poison for get poison list tools/testing/cxl: Mock the Clear Poison mailbox command tools/testing/cxl: Mock the Inject Poison mailbox command cxl/mem: Add debugfs attributes for poison inject and clear cxl/memdev: Trace inject and clear poison as cxl_poison events cxl/memdev: Warn of poison inject or clear to a mapped region cxl/memdev: Add support for the Clear Poison mailbox command cxl/memdev: Add support for the Inject Poison mailbox command tools/testing/cxl: Mock support for Get Poison List cxl/trace: Add an HPA to cxl_poison trace events cxl/region: Provide region info to the cxl_poison trace event cxl/memdev: Add trigger_poison_list sysfs attribute cxl/trace: Add TRACE support for CXL media-error records cxl/mbox: Add GET_POISON_LIST mailbox command cxl/mbox: Initialize the poison state cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all ...	2023-04-30 11:51:51 -07:00
Linus Torvalds	af3877265d	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...	2023-04-29 17:21:24 -07:00
Linus Torvalds	4e1c80ae5c	Merge tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "The big ticket item for this release is that support for RPC-with-TLS [RFC 9289] has been added to the Linux NFS server. The goal is to provide a simple-to-deploy, low-overhead in-transit confidentiality and peer authentication mechanism. It can supplement NFS Kerberos and it can protect the use of legacy non-cryptographic user authentication flavors such as AUTH_SYS. The TLS Record protocol is handled entirely by kTLS, meaning it can use either software encryption or offload encryption to smart NICs. Aside from that, work continues on improving NFSD's open file cache. Among the many clean-ups in that area is a patch to convert the rhashtable to use the list-hashing version of that data structure" * tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) NFSD: Handle new xprtsec= export option SUNRPC: Support TLS handshake in the server-side TCP socket code NFSD: Clean up xattr memory allocation flags NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop SUNRPC: Clear rq_xid when receiving a new RPC Call SUNRPC: Recognize control messages in server-side TCP socket code SUNRPC: Be even lazier about releasing pages SUNRPC: Convert svc_xprt_release() to the release_pages() API SUNRPC: Relocate svc_free_res_pages() nfsd: simplify the delayed disposal list code SUNRPC: Ignore return value of ->xpo_sendto SUNRPC: Ensure server-side sockets have a sock->file NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor() sunrpc: simplify two-level sysctl registration for svcrdma_parm_table SUNRPC: return proper error from get_expiry() lockd: add some client-side tracepoints nfs: move nfs_fhandle_hash to common include file lockd: server should unlock lock if client rejects the grant lockd: fix races in client GRANTED_MSG wait logic lockd: move struct nlm_wait to lockd.h ...	2023-04-29 11:04:14 -07:00
Linus Torvalds	d579c468d7	Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There are users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. * tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits) ring-buffer: Sync IRQ works before buffer destruction tracing: Add missing spaces in trace_print_hex_seq() ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus recordmcount: Fix memory leaks in the uwrite function tracing/user_events: Limit max fault-in attempts tracing/user_events: Prevent same address and bit per process tracing/user_events: Ensure bit is cleared on unregister tracing/user_events: Ensure write index cannot be negative seq_buf: Add seq_buf_do_printk() helper tracing: Fix print_fields() for __dyn_loc/__rel_loc tracing/user_events: Set event filter_type from type ring-buffer: Clearly check null ptr returned by rb_set_head_page() tracing: Unbreak user events tracing/user_events: Use print_format_fields() for trace output tracing/user_events: Align structs with tabs for readability tracing/user_events: Limit global user_event count tracing/user_events: Charge event allocs to cgroups tracing/user_events: Update documentation for ABI tracing/user_events: Use write ABI in example tracing/user_events: Add ABI self-test ...	2023-04-28 15:57:53 -07:00
Ville Syrjälä	99e7e3b600	drm/uapi: Document CTM matrix better Document in which order the CTM matrix elements are stored. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230411222931.15127-2-ville.syrjala@linux.intel.com Reviewed-by: Xaver Hugl <xaver.hugl@gmail.com> Acked-by: Simon Ser <contact@emersion.fr>	2023-04-28 14:21:09 +03:00
Linus Torvalds	33afd4b763	Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly singleton patches all over the place. Series of note are: - updates to scripts/gdb from Glenn Washburn - kexec cleanups from Bjorn Helgaas" * tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits) mailmap: add entries for Paul Mackerras libgcc: add forward declarations for generic library routines mailmap: add entry for Oleksandr ocfs2: reduce ioctl stack usage fs/proc: add Kthread flag to /proc/$pid/status ia64: fix an addr to taddr in huge_pte_offset() checkpatch: introduce proper bindings license check epoll: rename global epmutex scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry() scripts/gdb: create linux/vfs.py for VFS related GDB helpers uapi/linux/const.h: prefer ISO-friendly __typeof__ delayacct: track delays from IRQ/SOFTIRQ scripts/gdb: timerlist: convert int chunks to str scripts/gdb: print interrupts scripts/gdb: raise error with reduced debugging information scripts/gdb: add a Radix Tree Parser lib/rbtree: use '+' instead of '\|' for setting color. proc/stat: remove arch_idle_time() checkpatch: check for misuse of the link tags checkpatch: allow Closes tags with links ...	2023-04-27 19:57:00 -07:00
Linus Torvalds	7fa8a8ee94	Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...	2023-04-27 19:42:02 -07:00
Eric Blake	2686eb845d	uapi nbd: add cookie alias to handle The uapi <linux/nbd.h> header declares a 'char handle[8]' per request; which is overloaded in English (are you referring to "handle" the verb, such as handling a signal or writing a callback handler, or "handle" the noun, the value used in a lookup table to correlate a response back to the request). Many user-space NBD implementations (both servers and clients) have instead used 'uint64_t cookie' or similar, as it is easier to directly assign an integer than to futz around with memcpy. In fact, upstream documentation is now encouraging this shift in terminology: https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Accomplish this by use of an anonymous union to provide the alias for anyone getting the definition from the uapi; this does not break existing clients, while exposing the nicer name for those who prefer it. Note that block/nbd.c still uses the term handle (in fact, it actually combines a 32-bit cookie and a 32-bit tag into the 64-bit handle), but that internal usage is not changed by the public uapi, since no compliant NBD server has any reason to inspect or alter the 64 bits sent over the socket. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230410180611.1051618-3-eblake@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-27 19:15:11 -06:00
Eric Blake	daf376a366	uapi nbd: improve doc links to userspace spec The uapi <linux/nbd.h> header intentionally documents only the NBD server features that the kernel module will utilize as a client. But while it already had one mention of skipped bits due to userspace extensions, it did not actually direct the reader to the canonical source to learn about those extensions. While touching comments, fix an outdated reference that listed only READ and WRITE as commands. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230410180611.1051618-2-eblake@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-27 19:15:11 -06:00

... 10 11 12 13 14 ...

13198 Commits