Pull x86 fix from Thomas Gleixner:
"A single fix for x86.
Revert the recent change to the MTRR code which aimed to support
SEV-SNP guests on Hyper-V. It caused a regression on XEN Dom0 kernels.
The underlying issue of MTTR (mis)handling in the x86 code needs some
deeper investigation and is definitely not 6.2 material"
* tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Revert 90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
Pull timer fix from Thomas Gleixner:
"A fix for a long standing issue in the alarmtimer code.
Posix-timers armed with a short interval with an ignored signal result
in an unpriviledged DoS. Due to the ignored signal the timer switches
into self rearm mode. This issue had been "fixed" before but a rework
of the alarmtimer code 5 years ago lost that workaround.
There is no real good solution for this issue, which is also worked
around in the core posix-timer code in the same way, but it certainly
moved way up on the ever growing todo list"
* tag 'timers-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Prevent starvation by small intervals and SIG_IGN
Pull irq fix from Thomas Gleixner:
"A single build fix for the PCI/MSI infrastructure.
The addition of the new alloc/free interfaces in this cycle forgot to
add stub functions for pci_msix_alloc_irq_at() and pci_msix_free_irq()
for the CONFIG_PCI_MSI=n case"
* tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Provide missing stubs for CONFIG_PCI_MSI=n
Pull kvm/x86 fixes from Paolo Bonzini:
- zero all padding for KVM_GET_DEBUGREGS
- fix rST warning
- disable vPMU support on hybrid CPUs
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: initialize all of the kvm_debugregs structure before sending it to userspace
perf/x86: Refuse to export capabilities for hybrid PMUs
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Documentation/hw-vuln: Fix rST warning
Pull arm64 regression fix from Will Deacon:
"Apologies for the _extremely_ late pull request here, but we had a
'perf' (i.e. CPU PMU) regression on the Apple M1 reported on Wednesday
[1] which was introduced by bd27568117 ("perf: Rewrite core context
handling") during the merge window.
Mark and I looked into this and noticed an additional problem caused
by the same patch, where the 'CHAIN' event (used to combine two
adjacent 32-bit counters into a single 64-bit counter) was not being
filtered correctly. Mark posted a series on Thursday [2] which
addresses both of these regressions and I queued it the same day.
The changes are small, self-contained and have been confirmed to fix
the original regression.
Summary:
- Fix 'perf' regression for non-standard CPU PMU hardware (i.e. Apple
M1)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: perf: reject CHAIN events at creation time
arm_pmu: fix event CPU filtering
Pull block fix from Jens Axboe:
"I guess this is what can happen when you prep things early for going
away, something else comes in last minute. This one fixes another
regression in 6.2 for NVMe, from this release, and hence we should
probably get it submitted for 6.2.
Still waiting for the original reporter (see bugzilla linked in the
commit) to test this, but Keith managed to setup and recreate the
issue and tested the patch that way"
* tag 'block-6.2-2023-02-17' of git://git.kernel.dk/linux:
nvme-pci: refresh visible attrs for cmb attributes
Pull misc fixes from Andrew Morton:
"Six hotfixes. Five are cc:stable: four for MM, one for nilfs2.
Also a MAINTAINERS update"
* tag 'mm-hotfixes-stable-2023-02-17-15-16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix underflow in second superblock position calculations
hugetlb: check for undefined shift on 32 bit architectures
mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
MAINTAINERS: update FPU EMULATOR web page
mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
mm/filemap: fix page end in filemap_get_read_batch
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b@syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/
Fixes: 42d7395feb ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull powerpc fix from Michael Ellerman:
- Prevent fallthrough to hash TLB flush when using radix
Thanks to Benjamin Gray and Erhard Furtner.
* tag 'powerpc-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Prevent fallthrough to hash TLB flush when using radix
Pull NFS client fix from Trond Myklebust:
"Unfortunately, we found another bug in the NFSv4.2 READ_PLUS code.
Since it has not been possible to fix the bug in time for the 6.2
release, let's just revert the Kconfig change that enables it:
- Revert 'NFSv4.2: Change the default KConfig value for READ_PLUS'"
* tag 'nfs-for-6.2-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "NFSv4.2: Change the default KConfig value for READ_PLUS"
Pull sound fixes from Takashi Iwai:
"A few last-minute fixes. The significant ones are two ASoC SOF
regression fixes while the rest are trivial HD-audio quirks.
All are small / one-liners and should be pretty safe to take"
* tag 'sound-fix-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
ALSA: hda/realtek: Enable mute/micmute LEDs and speaker support for HP Laptops
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform.
ALSA: hda/realtek - fixed wrong gpio assigned
ALSA: hda: Fix codec device field initializan
ALSA: hda/conexant: add a new hda codec SN6180
ASoC: SOF: ops: refine parameters order in function snd_sof_dsp_update8
Pull gpio fix from Bartosz Golaszewski:
- fix a memory leak in gpio-sim that was triggered every time libgpiod
tests are run in user-space
* tag 'gpio-fixes-for-v6.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix a memory leak
Pull ata fixes from Damien Le Moal:
"Three small fixes for 6.2 final:
- Disable READ LOG DMA EXT for Samsung MZ7LH drives as these drives
choke on that command, from Patrick.
- Add Intel Tiger Lake UP{3,4} to the list of supported AHCI
controllers (this is not technically a bug fix, but it is trivial
enough that I add it here), from Simon.
- Fix code comments in the pata_octeon_cf driver as incorrect
formatting was causing warnings from kernel-doc, from Randy"
* tag 'ata-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_octeon_cf: drop kernel-doc notation
ata: ahci: Add Tiger Lake UP{3,4} AHCI controller
ata: libata-core: Disable READ LOG DMA EXT for Samsung MZ7LH
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix potential resource leaks in SDIO card detection error path
MMC host:
- jz4740: Decrease maximum clock rate to workaround bug on JZ4760(B)
- meson-gx: Fix SDIO support to get some WiFi modules to work again
- mmc_spi: Fix error handling in ->probe()"
* tag 'mmc-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: jz4740: Work around bug on JZ4760(B)
mmc: mmc_spi: fix error handling in mmc_spi_probe()
mmc: sdio: fix possible resource leaks in some error paths
mmc: meson-gx: fix SDIO mode if cap_sdio_irq isn't set
Pull scheduler fixes from Ingo Molnar:
- Fix user-after-free bug in call_usermodehelper_exec()
- Fix missing user_cpus_ptr update in __set_cpus_allowed_ptr_locked()
- Fix PSI use-after-free bug in ep_remove_wait_queue()
* tag 'sched-urgent-2023-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/psi: Fix use-after-free in ep_remove_wait_queue()
sched/core: Fix a missed update of user_cpus_ptr
freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL
Pull NVMe fix from Christoph:
"nvme fix for Linux 6.2
- fix visibility of the CMB sysfs attributes (Keith Busch)"
* tag 'nvme-6.2-2022-02-17' of git://git.infradead.org/nvme:
nvme-pci: refresh visible attrs for cmb attributes
This reverts commit 7fd461c47c.
Unfortunately, it has come to our attention that there is still a bug
somewhere in the READ_PLUS code that can result in nfsroot systems on
ARM to crash during boot.
Let's do the right thing and revert this change so we don't break
people's nfsroot setups.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037
Fixes: 86adbf0cdb ("nvme: simplify transport specific device attribute handling")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull drm fixes from Dave Airlie:
"Just a final collection of misc fixes, the biggest disables the
recently added dynamic debugging support, it has a regression that
needs some bigger fixes.
Otherwise a bunch of fixes across the board, vc4, amdgpu and vmwgfx
mostly, with some smaller i915 and ast fixes.
drm:
- dynamic debug disable for now
fbdev:
- deferred i/o device close fix
amdgpu:
- Fix GC11.x suspend warning
- Fix display warning
vc4:
- YUV planes fix
- hdmi display fix
- crtc reduced blanking fix
ast:
- fix start address computation
vmwgfx:
- fix bo/handle races
i915:
- gen11 WA fix"
* tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fail atomic_check early on normalize_zpos error
drm/amd/amdgpu: fix warning during suspend
drm/vmwgfx: Do not drop the reference to the handle too soon
drm/vmwgfx: Stop accessing buffer objects which failed init
drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
drm: Disable dynamic debug as broken
drm/ast: Fix start address computation
fbdev: Fix invalid page access after closing deferred I/O devices
drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking
drm/vc4: hdmi: Always enable GCP with AVMUTE cleared
drm/vc4: Fix YUV plane handling when planes are in different buffers
During collapse, in a few places we check to see if a given small page has
any unaccounted references. If the refcount on the page doesn't match our
expectations, it must be there is an unknown user concurrently interested
in the page, and so it's not safe to move the contents elsewhere.
However, the unaccounted pins are likely an ephemeral state.
In this situation, MADV_COLLAPSE returns -EINVAL when it should return
-EAGAIN. This could cause userspace to conclude that the syscall
failed, when it in fact could succeed by retrying.
Link: https://lkml.kernel.org/r/20230125015738.912924-1-zokeefe@google.com
Fixes: 7d8faaf155 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips. I found that the page end offset calculation in
filemap_get_read_batch() was off by one.
When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255. "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().
The below simple patch fixes the problem. This code was introduced in
kernel 5.12.
Link: https://lkml.kernel.org/r/20230208022400.28962-1-coolqyj@163.com
Fixes: cbd59c48ae ("mm/filemap: use head pages in generic_file_buffered_read")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently it's possible for a user to open CHAIN events arbitrarily,
which we previously tried to rule out in commit:
ca2b497253 ("arm64: perf: Reject stand-alone CHAIN events for PMUv3")
Which allowed the events to be opened, but prevented them from being
scheduled by by using an arm_pmu::filter_match hook to reject the
relevant events.
The CHAIN event filtering in the arm_pmu::filter_match hook was silently
removed in commit:
bd27568117 ("perf: Rewrite core context handling")
As a result, it's now possible for users to open CHAIN events, and for
these to be installed arbitrarily.
Fix this by rejecting CHAIN events at creation time. This avoids the
creation of events which will never count, and doesn't require using the
dynamic filtering.
Attempting to open a CHAIN event (0x1e) will now be rejected:
| # ./perf stat -e armv8_pmuv3/config=0x1e/ ls
| perf
|
| Performance counter stats for 'ls':
|
| <not supported> armv8_pmuv3/config=0x1e/
|
| 0.002197470 seconds time elapsed
|
| 0.000000000 seconds user
| 0.002294000 seconds sys
Other events (e.g. CPU_CYCLES / 0x11) will open as usual:
| # ./perf stat -e armv8_pmuv3/config=0x11/ ls
| perf
|
| Performance counter stats for 'ls':
|
| 2538761 armv8_pmuv3/config=0x11/
|
| 0.002227330 seconds time elapsed
|
| 0.002369000 seconds user
| 0.000000000 seconds sys
Fixes: bd27568117 ("perf: Rewrite core context handling")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230216141240.3833272-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Janne reports that perf has been broken on Apple M1 as of commit:
bd27568117 ("perf: Rewrite core context handling")
That commit replaced the pmu::filter_match() callback with
pmu::filter(), whose return value has the opposite polarity, with true
implying events should be ignored rather than scheduled. While an
attempt was made to update the logic in armv8pmu_filter() and
armpmu_filter() accordingly, the return value remains inverted in a
couple of cases:
* If the arm_pmu does not have an arm_pmu::filter() callback,
armpmu_filter() will always return whether the CPU is supported rather
than whether the CPU is not supported.
As a result, the perf core will not schedule events on supported CPUs,
resulting in a loss of events. Additionally, the perf core will
attempt to schedule events on unsupported CPUs, but this will be
rejected by armpmu_add(), which may result in a loss of events from
other PMUs on those unsupported CPUs.
* If the arm_pmu does have an arm_pmu::filter() callback, and
armpmu_filter() is called on a CPU which is not supported by the
arm_pmu, armpmu_filter() will return false rather than true.
As a result, the perf core will attempt to schedule events on
unsupported CPUs, but this will be rejected by armpmu_add(), which may
result in a loss of events from other PMUs on those unsupported CPUs.
This means a loss of events can be seen with any arm_pmu driver, but
with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an
arm_pmu::filter() callback) the event loss will be more limited and may
go unnoticed, which is how this issue evaded testing so far.
Fix the CPU filtering by performing this consistently in
armpmu_filter(), and remove the redundant arm_pmu::filter() callback and
armv8pmu_filter() implementation.
Commit bd27568117 also silently removed the CHAIN event filtering from
armv8pmu_filter(), which will be addressed by a separate patch without
using the filter callback.
Fixes: bd27568117 ("perf: Rewrite core context handling")
Reported-by: Janne Grunau <j@jannau.net>
Link: https://lore.kernel.org/asahi/20230215-arm_pmu_m1_regression-v1-1-f5a266577c8d@jannau.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Eric Curtin <ecurtin@redhat.com>
Tested-by: Janne Grunau <j@jannau.net>
Link: https://lore.kernel.org/r/20230216141240.3833272-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Fixes from the main networking tree only, probably because all
sub-trees have backed off and haven't submitted their changes.
None of the fixes here are particularly scary and no outstanding
regressions. In an ideal world the "current release" sections would be
empty at this stage but that never happens.
Current release - regressions:
- fix unwanted sign extension in netdev_stats_to_stats64()
Current release - new code bugs:
- initialize net->notrefcnt_tracker earlier
- devlink: fix netdev notifier chain corruption
- nfp: make sure mbox accesses in IPsec code are atomic
- ice: fix check for weight and priority of a scheduling node
Previous releases - regressions:
- ice: xsk: fix cleaning of XDP_TX frame, prevent inf loop
- igb: fix I2C bit banging config with external thermal sensor
Previous releases - always broken:
- sched: tcindex: update imperfect hash filters respecting rcu
- mpls: fix stale pointer if allocation fails during device rename
- dccp/tcp: avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions
- remove WARN_ON_ONCE(sk->sk_forward_alloc) from
sk_stream_kill_queues()
- af_key: fix heap information leak
- ipv6: fix socket connection with DSCP (correct interpretation of
the tclass field vs fib rule matching)
- tipc: fix kernel warning when sending SYN message
- vmxnet3: read RSS information from the correct descriptor (eop)"
* tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
devlink: Fix netdev notifier chain corruption
igb: conditionalize I2C bit banging on external thermal sensor support
net: mpls: fix stale pointer if allocation fails during device rename
net/sched: tcindex: search key must be 16 bits
tipc: fix kernel warning when sending SYN message
igb: Fix PPS input and output using 3rd and 4th SDP
net: use a bounce buffer for copying skb->mark
ixgbe: add double of VLAN header when computing the max MTU
i40e: add double of VLAN header when computing the max MTU
ixgbe: allow to increase MTU to 3K with XDP enabled
net: stmmac: Restrict warning on disabling DMA store and fwd mode
net/sched: act_ctinfo: use percpu stats
net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
ice: fix lost multicast packets in promisc mode
ice: Fix check for weight and priority of a scheduling node
bnxt_en: Fix mqprio and XDP ring checking logic
net: Fix unwanted sign extension in netdev_stats_to_stats64()
net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
af_key: Fix heap information leak
...
Pull block fixes from Jens Axboe:
"Just a few NVMe fixes that should go into the 6.2 release, adding a
quirk and fixing two issues introduced in this release:
- NVMe fixes via Christoph:
- Always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- Add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- Set the DMA mask earlier (Christoph Hellwig)"
* tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux:
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
nvme-pci: set the DMA mask earlier
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Pull spi fix from Mark Brown:
"One more last minute patch for v6.2 updating the parsing of the newly
added spi-cs-setup-delay-ns.
It's been pointed out that due to the way DT parsing works the change
in property size is ABI visible so let's not let a release go out
without it being fixed. The change got split from some earlier ABI
related fixes to the property since the first version sent had a build
error"
* tag 'spi-v6.2-rc8-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Use a 32-bit DT property for spi-cs-setup-delay-ns
Pull gpio fixes from Bartosz Golaszewski:
- fix a potential Kconfig issue with gpio-mlxbf2 not selecting
GPIOLIB_IRQCHIP
- another immutable irqchip conversion, this time for gpio-vf610
- fix a wakeup issue on Clevo NH5xAx
* tag 'gpio-fixes-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mlxbf2: select GPIOLIB_IRQCHIP
gpiolib: acpi: Add a ignore wakeup quirk for Clevo NH5xAx
gpio: vf610: make irq_chip immutable
gpiolib: acpi: remove redundant declaration
The uuid code is very low maintainance now that the major overhaul
has completed, and doesn't need it's own tree. All the recent work
has been done by Andy who'd like to stay on as a reviewer without an
explicit tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This code has been stale for years and I have no way to test it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cited commit changed devlink to register its netdev notifier block on
the global netdev notifier chain instead of on the per network namespace
one.
However, when changing the network namespace of the devlink instance,
devlink still tries to unregister its notifier block from the chain of
the old namespace and register it on the chain of the new namespace.
This results in corruption of the notifier chains, as the same notifier
block is registered on two different chains: The global one and the per
network namespace one. In turn, this causes other problems such as the
inability to dismantle namespaces due to netdev reference count issues.
Fix by preventing devlink from moving its notifier block between
namespaces.
Reproducer:
# echo "10 1" > /sys/bus/netdevsim/new_device
# ip netns add test123
# devlink dev reload netdevsim/netdevsim10 netns test123
# ip netns del test123
[ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 71.938348] leaked reference.
Fixes: 565b4824c3 ("devlink: change port event netdev notifier from per-net to global")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230215073139.1360108-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit a97f8783a9 ("igb: unbreak I2C bit-banging on i350") introduced
code to change I2C settings to bit banging unconditionally.
However, this patch introduced a regression: On an Intel S2600CWR
Server Board with three NICs:
- 1x dual-port copper
Intel I350 Gigabit Network Connection [8086:1521] (rev 01)
fw 1.63, 0x80000dda
- 2x quad-port SFP+ with copper SFP Avago ABCU-5700RZ
Intel I350 Gigabit Fiber Network Connection [8086:1522] (rev 01)
fw 1.52.0
the SFP NICs no longer get link at all. Reverting commit a97f8783a9
or switching to the Intel out-of-tree driver both fix the problem.
Per the igb out-of-tree driver, I2C bit banging on i350 depends on
support for an external thermal sensor (ETS). However, commit
a97f8783a9 added bit banging unconditionally. Additionally, the
out-of-tree driver always calls init_thermal_sensor_thresh on probe,
while our driver only calls init_thermal_sensor_thresh only in
igb_reset(), and only if an ETS is present, ignoring the internal
thermal sensor. The affected SFPs don't provide an ETS. Per Intel,
the behaviour is a result of i350 firmware requirements.
This patch fixes the problem by aligning the behaviour to the
out-of-tree driver:
- split igb_init_i2c() into two functions:
- igb_init_i2c() only performs the basic I2C initialization.
- igb_set_i2c_bb() makes sure that E1000_CTRL_I2C_ENA is set
and enables bit-banging.
- igb_probe() only calls igb_set_i2c_bb() if an ETS is present.
- igb_probe() calls init_thermal_sensor_thresh() unconditionally.
- igb_reset() aligns its behaviour to igb_probe(), i. e., call
igb_set_i2c_bb() if an ETS is present and call
init_thermal_sensor_thresh() unconditionally.
Fixes: a97f8783a9 ("igb: unbreak I2C bit-banging on i350")
Tested-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Co-developed-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230214185549.1306522-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[Why]
drm_atomic_normalize_zpos() can return an error code when there's
modeset lock contention. This was being ignored.
[How]
Bail out of atomic check if normalize_zpos() returns an error.
Fixes: b261509952 ("drm/amd/display: Fix double cursor on non-video RGB MPO")
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-14 (ixgbe, i40e)
This series contains updates to ixgbe and i40e drivers.
Jason Xing corrects comparison of frame sizes for setting MTU with XDP on
ixgbe and adjusts frame size to account for a second VLAN header on ixgbe
and i40e.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: add double of VLAN header when computing the max MTU
i40e: add double of VLAN header when computing the max MTU
ixgbe: allow to increase MTU to 3K with XDP enabled
====================
Link: https://lore.kernel.org/r/20230214185146.1305819-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull apparmor fix from John Johansen:
"Regression fix for getattr mediation of old policy"
* tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in compat permissions for getattr
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.2
- always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- set the DMA mask earlier (Christoph Hellwig)"
* tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme:
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
nvme-pci: set the DMA mask earlier
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Pull nfsd fix from Chuck Lever:
- Fix a teardown bug in the new nfs4_file hashtable
* tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't destroy global nfs4_file table in per-net shutdown
Pull tracing fixlet from Steven Rostedt:
"Make trace_define_field_ext() static.
Just after the fix to TASK_COMM_LEN not converted to its value in
trace_events was pulled, the kernel test robot reported that the
helper function trace_define_field_ext() added to that change was only
used in the file it was defined in but was not declared static.
Make it a local function"
* tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make trace_define_field_ext() static
This fixes a regression in mediation of getattr when old policy built
under an older ABI is loaded and mapped to internal permissions.
The regression does not occur for all getattr permission requests,
only appearing if state zero is the final state in the permission
lookup. This is because despite the first state (index 0) being
guaranteed to not have permissions in both newer and older permission
formats, it may have to carry permissions that were not mediated as
part of an older policy. These backward compat permissions are
mapped here to avoid special casing the mediation code paths.
Since the mapping code already takes into account backwards compat
permission from older formats it can be applied to state 0 to fix
the regression.
Fixes: 408d53e923 ("apparmor: compute file permissions on profile load")
Reported-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The commit 1796f808e4 ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
changed the policy such that I2C touchpads may be able to wake up the
system by default if the system is configured as such.
However for some devices there is a bug, that is causing the touchpad to
instantly wake up the device again once it gets deactivated. The root cause
is still under investigation (see Link tag).
To workaround this problem for the time being, introduce a quirk for this
model that will prevent the wakeup capability for being set for GPIO 16.
Fixes: 1796f808e4 ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
Link: https://lore.kernel.org/linux-acpi/20230210164636.628462-1-wse@tuxedocomputers.com/
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org> # v6.1+
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Since recently, the kernel is nagging about mutable irq_chips:
"not an immutable chip, please consider fixing it!"
Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new
helper functions and call the appropriate gpiolib functions.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a non-root cgroup gets removed when there is a thread that registered
trigger and is polling on a pressure file within the cgroup, the polling
waitqueue gets freed in the following path:
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
However, the polling thread still has a reference to the pressure file and
will access the freed waitqueue when the file is closed or upon exit:
fput
ep_eventpoll_release
ep_free
ep_remove_wait_queue
remove_wait_queue
This results in use-after-free as pasted below.
The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real lifetime.
Using wake_up_pollfree() here might be less than ideal, but it is in line
with the comment at commit 42288cb44c ("wait: add wake_up_pollfree()")
since the waitqueue's lifetime is not tied to file's one and can be
considered as another special case. While this would be fixable by somehow
making cgroup_file_release() be tied to the fput(), it would require
sizable refactoring at cgroups or higher layer which might be more
justifiable if we identify more cases like this.
BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
Write of size 4 at addr ffff88810e625328 by task a.out/4404
CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
print_report+0x16c/0x4e0
kasan_report+0xc3/0xf0
kasan_check_range+0x2d2/0x310
_raw_spin_lock_irqsave+0x60/0xc0
remove_wait_queue+0x1a/0xa0
ep_free+0x12c/0x170
ep_eventpoll_release+0x26/0x30
__fput+0x202/0x400
task_work_run+0x11d/0x170
do_exit+0x495/0x1130
do_group_exit+0x100/0x100
get_signal+0xd67/0xde0
arch_do_signal_or_restart+0x2a/0x2b0
exit_to_user_mode_prepare+0x94/0x100
syscall_exit_to_user_mode+0x20/0x40
do_syscall_64+0x52/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Allocated by task 4404:
kasan_set_track+0x3d/0x60
__kasan_kmalloc+0x85/0x90
psi_trigger_create+0x113/0x3e0
pressure_write+0x146/0x2e0
cgroup_file_write+0x11c/0x250
kernfs_fop_write_iter+0x186/0x220
vfs_write+0x3d8/0x5c0
ksys_write+0x90/0x110
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 4407:
kasan_set_track+0x3d/0x60
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x11d/0x170
slab_free_freelist_hook+0x87/0x150
__kmem_cache_free+0xcb/0x180
psi_trigger_destroy+0x2e8/0x310
cgroup_file_release+0x4f/0xb0
kernfs_drain_open_files+0x165/0x1f0
kernfs_drain+0x162/0x1a0
__kernfs_remove+0x1fb/0x310
kernfs_remove_by_name_ns+0x95/0xe0
cgroup_addrm_files+0x67f/0x700
cgroup_destroy_locked+0x283/0x3c0
cgroup_rmdir+0x29/0x100
kernfs_iop_rmdir+0xd1/0x140
vfs_rmdir+0xfe/0x240
do_rmdir+0x13d/0x280
__x64_sys_rmdir+0x2c/0x30
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Mengchi Cheng <mengcc@amazon.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
The following warning:
Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst:92: ERROR: Unexpected indentation.
was introduced by commit 493a2c2d23. Fix it by placing everything in
the same paragraph and also use a monospace font.
Fixes: 493a2c2d23 ("Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions")
Reported-by: Stephen Rothwell <sfr@canb@auug.org.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.
sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.
Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending a SYN message, this kernel stack trace is observed:
...
[ 13.396352] RIP: 0010:_copy_from_iter+0xb4/0x550
...
[ 13.398494] Call Trace:
[ 13.398630] <TASK>
[ 13.398630] ? __alloc_skb+0xed/0x1a0
[ 13.398630] tipc_msg_build+0x12c/0x670 [tipc]
[ 13.398630] ? shmem_add_to_page_cache.isra.71+0x151/0x290
[ 13.398630] __tipc_sendmsg+0x2d1/0x710 [tipc]
[ 13.398630] ? tipc_connect+0x1d9/0x230 [tipc]
[ 13.398630] ? __local_bh_enable_ip+0x37/0x80
[ 13.398630] tipc_connect+0x1d9/0x230 [tipc]
[ 13.398630] ? __sys_connect+0x9f/0xd0
[ 13.398630] __sys_connect+0x9f/0xd0
[ 13.398630] ? preempt_count_add+0x4d/0xa0
[ 13.398630] ? fpregs_assert_state_consistent+0x22/0x50
[ 13.398630] __x64_sys_connect+0x16/0x20
[ 13.398630] do_syscall_64+0x42/0x90
[ 13.398630] entry_SYSCALL_64_after_hwframe+0x63/0xcd
It is because commit a41dad905e ("iov_iter: saner checks for attempt
to copy to/from iterator") has introduced sanity check for copying
from/to iov iterator. Lacking of copy direction from the iterator
viewpoint would lead to kernel stack trace like above.
This commit fixes this issue by initializing the iov iterator with
the correct copy direction when sending SYN or ACK without data.
Fixes: f25dcc7687 ("tipc: tipc ->sendmsg() conversion")
Reported-by: syzbot+d43608d061e8847ec9f3@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230214012606.5804-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-13 (ice)
This series contains updates to ice driver only.
Michal fixes check of scheduling node weight and priority to be done
against desired value, not current value.
Jesse adds setting of all multicast when adding promiscuous mode to
resolve traffic being lost due to filter settings.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix lost multicast packets in promisc mode
ice: Fix check for weight and priority of a scheduling node
====================
Link: https://lore.kernel.org/r/20230213185259.3959224-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
v3: Fix vmw_user_bo_lookup which was also dropping the gem reference
before the kernel was done with buffer depending on userspace doing
the right thing. Same bug, different spot.
It is possible for userspace to predict the next buffer handle and
to destroy the buffer while it's still used by the kernel. Delay
dropping the internal reference on the buffers until kernel is done
with them.
Instead of immediately dropping the gem reference in vmw_user_bo_lookup
and vmw_gem_object_create_with_handle let the callers decide when they're
ready give the control back to userspace.
Also fixes the second usage of vmw_gem_object_create_with_handle in
vmwgfx_surface.c which wasn't grabbing an explicit reference
to the gem object which could have been destroyed by the userspace
on the owning surface at any point.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: 8afa13a058 ("drm/vmwgfx: Implement DRIVER_GEM")
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230211050514.2431155-1-zack@kde.org
(cherry picked from commit 9ef8d83e8e)
Cc: <stable@vger.kernel.org> # v5.17+
ttm_bo_init_reserved on failure puts the buffer object back which
causes it to be deleted, but kfree was still being called on the same
buffer in vmw_bo_create leading to a double free.
After the double free the vmw_gem_object_create_with_handle was
setting the gem function objects before checking the return status
of vmw_bo_create leading to null pointer access.
Fix the entire path by relaying on ttm_bo_init_reserved to delete the
buffer objects on failure and making sure the return status is checked
before setting the gem function objects on the buffer object.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: 8afa13a058 ("drm/vmwgfx: Implement DRIVER_GEM")
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230208180050.2093426-1-zack@kde.org
(cherry picked from commit 36d421e632)
Cc: <stable@vger.kernel.org> # v5.17+
The UNSLICE_UNIT_LEVEL_CLKGATE register programmed by this workaround
has 'BUS' style reset, indicating that it does not lose its value on
engine resets. Furthermore, this register is part of the GT forcewake
domain rather than the RENDER domain, so it should not be impacted by
RCS engine resets. As such, we should implement this on the GT
workaround list rather than an engine list.
Bspec: 19219
Fixes: 3551ff9287 ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-2-matthew.d.roper@intel.com
(cherry picked from commit 5f21dc07b5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce10 ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: 0c8493d90b ("i40e: add XDP support for pass and drop actions")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Recently I encountered one case where I cannot increase the MTU size
directly from 1500 to a much bigger value with XDP enabled if the
server is equipped with IXGBE card, which happened on thousands of
servers in production environment. After applying the current patch,
we can set the maximum MTU size to 3K.
This patch follows the behavior of changing MTU as i40e/ice does.
References:
[1] commit 23b44513c3 ("ice: allow 3k MTU for XDP")
[2] commit 0c8493d90b ("i40e: add XDP support for pass and drop actions")
Fixes: fabf1bce10 ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull power management fix from Rafael Wysocki:
"Add a missing NULL pointer check to the cpufreq drver for Qualcomm
platforms (Manivannan Sadhasivam)"
* tag 'pm-6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-hw: Add missing null pointer check
Pull kvm fixes from Paolo Bonzini:
"Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling
threads transitions out of C0 state, the other thread gets access to
twice as many entries in the RSB, but unfortunately the predictions of
the now-halted logical processor are not purged. Therefore, the
executing processor could speculatively execute from locations that
the now-halted processor had trained the RSB on.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM
to prevent exiting guest mode when transitioning out of C0 using the
KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change
this behavior. To mitigate the cross-thread return address predictions
bug, a VMM must not be allowed to override the default behavior to
intercept C0 transitions.
These patches introduce a KVM module parameter that, if set, will
prevent the user from disabling the HLT, MWAIT and CSTATE exits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
KVM: x86: Mitigate the cross-thread return address predictions bug
x86/speculation: Identify processors vulnerable to SMT RSB predictions
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a1899 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com
Fixes: f2c45807d3 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
Commit
90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
broke the use case of running Xen dom0 kernels on machines with an
external disk enclosure attached via USB, see Link tag.
What this commit was originally fixing - SEV-SNP guests on Hyper-V - is
a more specialized situation which has other issues at the moment anyway
so reverting this now and addressing the issue properly later is the
prudent thing to do.
So revert it in time for the 6.2 proper release.
[ bp: Rewrite commit message. ]
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de
When setting 'snps,force_thresh_dma_mode' DT property, the following
warning is always emitted, regardless the status of force_sf_dma_mode:
dwmac-starfive 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
Do not print the rather misleading message when DMA store and forward
mode is already disabled.
Fixes: e2a240c7d3 ("driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set.")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230210202126.877548-1-cristian.ciocaltea@collabora.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Set the DMA mask before calling dma_addressing_limited, which depends on it.
Note that this stop checking the return value of dma_set_mask_and_coherent
as this function can only fail for masks < 32-bit.
Fixes: 3f30a79c2e ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Michael Kelley <mikelley@microsoft.com>
The tc action act_ctinfo was using shared stats, fix it to use percpu stats
since bstats_update() must be called with locks or with a percpu pointer argument.
tdc results:
1..12
ok 1 c826 - Add ctinfo action with default setting
ok 2 0286 - Add ctinfo action with dscp
ok 3 4938 - Add ctinfo action with valid cpmark and zone
ok 4 7593 - Add ctinfo action with drop control
ok 5 2961 - Replace ctinfo action zone and action control
ok 6 e567 - Delete ctinfo action with valid index
ok 7 6a91 - Delete ctinfo action with invalid index
ok 8 5232 - List ctinfo actions
ok 9 7702 - Flush ctinfo actions
ok 10 3201 - Add ctinfo action with duplicate index
ok 11 8295 - Add ctinfo action with invalid index
ok 12 3964 - Replace ctinfo action with invalid goto_chain control
Fixes: 24ec483cec ("net: sched: Introduce act_ctinfo action")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
So far changing the period by just setting new period values while
running did not work.
The order as indicated by the publicly available reference manual of the i.MX8MP [1]
indicates a sequence:
* initiate the programming sequence
* set the values for PPS period and start time
* start the pulse train generation.
This is currently not used in dwmac5_flex_pps_config(), which instead does:
* initiate the programming sequence and immediately start the pulse train generation
* set the values for PPS period and start time
This caused the period values written not to take effect until the FlexPPS output was
disabled and re-enabled again.
This patch fix the order and allows the period to be set immediately.
[1] https://www.nxp.com/webapp/Download?colCode=IMX8MPRM
Fixes: 9a8a02c9d4 ("net: stmmac: Add Flexible PPS support")
Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Link: https://lore.kernel.org/r/20230210143937.3427483-1-j.zink@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mark the Tiger Lake UP{3,4} AHCI controller as "low_power". This enables
S0ix to work out of the box. Otherwise this isn't working unless the
user manually sets /sys/class/scsi_host/*/link_power_management_policy.
Intel lists a total of 4 SATA controller IDs in [1] for those mobile
PCHs. This commit just adds the "AHCI" variant since I only tested
those.
[1]: https://cdrdv2.intel.com/v1/dl/getContent/631119
Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
CC: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Samsung MZ7LH drives are spewing messages like this in to dmesg with AMD
SATA controllers:
ata1.00: exception Emask 0x0 SAct 0x7e0000 SErr 0x0 action 0x6 frozen
ata1.00: failed command: SEND FPDMA QUEUED
ata1.00: cmd 64/01:88:00:00:00/00:00:00:00:00/a0 tag 17 ncq dma 512 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
Since this was seen previously with SSD 840 EVO drives in
https://bugzilla.kernel.org/show_bug.cgi?id=203475 let's add the same
fix for these drives as the EVOs have, since they likely have very
similar firmwares.
Signed-off-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
If sdio_add_func() or sdio_init_func() fails, sdio_remove_func() can
not release the resources, because the sdio function is not presented
in these two cases, it won't call of_node_put() or put_device().
To fix these leaks, make sdio_func_present() only control whether
device_del() needs to be called or not, then always call of_node_put()
and put_device().
In error case in sdio_init_func(), the reference of 'card->dev' is
not get, to avoid redundant put in sdio_free_func_cis(), move the
get_device() to sdio_alloc_func() and put_device() to sdio_release_func(),
it can keep the get/put function be balanced.
Without this patch, while doing fault inject test, it can get the
following leak reports, after this fix, the leak is gone.
unreferenced object 0xffff888112514000 (size 2048):
comm "kworker/3:2", pid 65, jiffies 4294741614 (age 124.774s)
hex dump (first 32 bytes):
00 e0 6f 12 81 88 ff ff 60 58 8d 06 81 88 ff ff ..o.....`X......
10 40 51 12 81 88 ff ff 10 40 51 12 81 88 ff ff .@Q......@Q.....
backtrace:
[<000000009e5931da>] kmalloc_trace+0x21/0x110
[<000000002f839ccb>] mmc_alloc_card+0x38/0xb0 [mmc_core]
[<0000000004adcbf6>] mmc_sdio_init_card+0xde/0x170 [mmc_core]
[<000000007538fea0>] mmc_attach_sdio+0xcb/0x1b0 [mmc_core]
[<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
unreferenced object 0xffff888112511000 (size 2048):
comm "kworker/3:2", pid 65, jiffies 4294741623 (age 124.766s)
hex dump (first 32 bytes):
00 40 51 12 81 88 ff ff e0 58 8d 06 81 88 ff ff .@Q......X......
10 10 51 12 81 88 ff ff 10 10 51 12 81 88 ff ff ..Q.......Q.....
backtrace:
[<000000009e5931da>] kmalloc_trace+0x21/0x110
[<00000000fcbe706c>] sdio_alloc_func+0x35/0x100 [mmc_core]
[<00000000c68f4b50>] mmc_attach_sdio.cold.18+0xb1/0x395 [mmc_core]
[<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
Fixes: 3d10a1ba0d ("sdio: fix reference counting in sdio_remove_func()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230130125808.3471254-1-yangyingliang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Pull misc fixes from Andrew Morton:
"Twelve hotfixes, mostly against mm/.
Five of these fixes are cc:stable"
* tag 'mm-hotfixes-stable-2023-02-13-13-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem
scripts/gdb: fix 'lx-current' for x86
lib: parser: optimize match_NUMBER apis to use local array
mm: shrinkers: fix deadlock in shrinker debugfs
mm: hwpoison: support recovery from ksm_might_need_to_copy()
kasan: fix Oops due to missing calls to kasan_arch_is_ready()
revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
fsdax: dax_unshare_iter() should return a valid length
mm/gup: add folio to list when folio_isolate_lru() succeed
aio: fix mremap after fork null-deref
mailmap: add entry for Alexander Mikhalitsyn
mm: extend max struct page size for kmsan
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d447866 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently checks for weight and priority ranges don't check incoming value
from the devlink. Instead it checks node current weight or priority. This
makes those checks useless.
Change range checks in ice_set_object_tx_priority() and
ice_set_object_tx_weight() to check against incoming priority an weight.
Fixes: 42c2eb6b1f ("ice: Implement devlink-rate API")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull x86 platform drivers fix from Hans de Goede:
"Intel vsec driver Meteor Lake PCI ids addition"
* tag 'platform-drivers-x86-v6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel/vsec: Add support for Meteor Lake
Since commit 8f9ea86fdf ("sched: Always preserve the user requested
cpumask"), a successful call to sched_setaffinity() should always save
the user requested cpu affinity mask in a task's user_cpus_ptr. However,
when the given cpu mask is the same as the current one, user_cpus_ptr
is not updated. Fix this by saving the user mask in this case too.
Fixes: 8f9ea86fdf ("sched: Always preserve the user requested cpumask")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230203181849.221943-1-longman@redhat.com
pci_msix_alloc_irq_at() and pci_msix_free_irq() are not declared when
CONFIG_PCI_MSI is disabled.
Users of these two calls do not yet exist but when users do appear (shown
below is an attempt to use the new API in vfio-pci) the following errors
will be encountered when compiling with CONFIG_PCI_MSI disabled:
drivers/vfio/pci/vfio_pci_intrs.c:461:4: error: implicit declaration of\
function 'pci_msix_free_irq' is invalid in C99\
[-Werror,-Wimplicit-function-declaration]
pci_msix_free_irq(pdev, msix_map);
^
drivers/vfio/pci/vfio_pci_intrs.c:511:15: error: implicit declaration of\
function 'pci_msix_alloc_irq_at' is invalid in C99\
[-Werror,-Wimplicit-function-declaration]
msix_map = pci_msix_alloc_irq_at(pdev, vector, NULL);
Provide definitions for pci_msix_alloc_irq_at() and pci_msix_free_irq() in
preparation for users that need to compile when CONFIG_PCI_MSI is
disabled.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 34026364df ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/158e40e1cfcfc58ae30ecb2bbfaf86e5bba7a1ef.1675978686.git.reinette.chatre@intel.com
In bnxt_reserve_rings(), there is logic to check that the number of TX
rings reserved is enough to cover all the mqprio TCs, but it fails to
account for the TX XDP rings. So the check will always fail if there
are mqprio TCs and TX XDP rings. As a result, the driver always fails
to initialize after the XDP program is attached and the device will be
brought down. A subsequent ifconfig up will also fail because the
number of TX rings is set to an inconsistent number. Fix the check to
properly account for TX XDP rings. If the check fails, set the number
of TX rings back to a consistent number after calling netdev_reset_tc().
Fixes: 674f50a5b0 ("bnxt_en: Implement new method to reserve rings.")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When converting net_device_stats to rtnl_link_stats64 sign extension
is triggered on ILP32 machines as 6c1c509778 changed the previous
"ulong -> u64" conversion to "long -> u64" by accessing the
net_device_stats fields through a (signed) atomic_long_t.
This causes for example the received bytes counter to jump to 16EiB after
having received 2^31 bytes. Casting the atomic value to "unsigned long"
beforehand converting it into u64 avoids this.
Fixes: 6c1c509778 ("net: add atomic_long_t to net_device_stats fields")
Signed-off-by: Felix Riemann <felix.riemann@sma.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported that act_len in kalmia_send_init_packet() is
uninitialized when passing it to the first usb_bulk_msg error path. Jiri
Pirko noted that it's pointless to pass it in the error path, and that
the value that would be printed in the second error path would be the
value of act_len from the first call to usb_bulk_msg.[1]
With this in mind, let's just not pass act_len to the usb_bulk_msg error
paths.
1: https://lore.kernel.org/lkml/Y9pY61y1nwTuzMOa@nanopsycho/
Fixes: d40261236e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Reported-and-tested-by: syzbot+cd80c5ef5121bfe85b55@syzkaller.appspotmail.com
Signed-off-by: Miko Larsson <mikoxyzzz@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
old_meter needs to be free after it is detached regardless of whether
the new meter is successfully attached.
Fixes: c7c4c44c9a ("net: openvswitch: expand the meters supported number")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since x->encap of pfkey_msg2xfrm_state() is not
initialized to 0, kernel heap data can be leaked.
Fix with kzalloc() to prevent this.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh
for a while. As I have been maintaining Debian's sh4 port since 2014,
I am interested to keep the architecture alive.
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull tracing fix from Steven Rostedt:
"Fix showing of TASK_COMM_LEN instead of its value
The TASK_COMM_LEN was converted from a macro into an enum so that BTF
would have access to it. But this unfortunately caused TASK_COMM_LEN
to display in the format fields of trace events, as they are created
by the TRACE_EVENT() macro and such, macros convert to their values,
where as enums do not.
To handle this, instead of using the field itself to be display, save
the value of the array size as another field in the trace_event_fields
structure, and use that instead.
Not only does this fix the issue, but also converts the other trace
events that have this same problem (but were not breaking tooling).
With this change, the original work around b3bc8547d3 ("tracing:
Have TRACE_DEFINE_ENUM affect trace event types as well") could be
reverted (but that should be done in the merge window)"
* tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix TASK_COMM_LEN in trace event format file
Pull btrfs fixes from David Sterba:
- one more fix for a tree-log 'write time corruption' report, update
the last dir index directly and don't keep in the log context
- do VFS-level inode lock around FIEMAP to prevent a deadlock with
concurrent fsync, the extent-level lock is not sufficient
- don't cache a single-device filesystem device to avoid cases when a
loop device is reformatted and the entry gets stale
* tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: free device in btrfs_close_devices for a single device filesystem
btrfs: lock the inode in shared mode before starting fiemap
btrfs: simplify update of last_dir_index_offset when logging a directory
Pull USB fixes from Greg KH:
"Here are 2 small USB driver fixes that resolve some reported
regressions and one new device quirk. Specifically these are:
- new quirk for Alcor Link AK9563 smartcard reader
- revert of u_ether gadget change in 6.2-rc1 that caused problems
- typec pin probe fix
All of these have been in linux-next with no reported problems"
* tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: core: add quirk for Alcor Link AK9563 smartcard reader
usb: typec: altmodes/displayport: Fix probe pin assign check
Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
Pull EFI fix from Ard Biesheuvel:
"A fix from Darren to widen the SMBIOS match for detecting Ampere Altra
machines with problematic firmware. In the mean time, we are working
on a more precise check, but this is still work in progress"
* tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
Pull powerpc fixes from Michael Ellerman:
- Fix interrupt exit race with security mitigation switching.
- Don't select ARCH_WANTS_NO_INSTR until warnings are fixed.
- Build fix for CONFIG_NUMA=n.
Thanks to Nicholas Piggin, Randy Dunlap, and Sachin Sant.
* tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
powerpc/kexec_file: fix implicit decl error
powerpc: Don't select ARCH_WANTS_NO_INSTR
When we upgraded our kernel, we started seeing some page corruption like
the following consistently:
BUG: Bad page state in process ganesha.nfsd pfn:1304ca
page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
flags: 0x17ffffc0000000()
raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Call Trace:
dump_stack+0x74/0x96
bad_page.cold+0x63/0x94
check_new_page_bad+0x6d/0x80
rmqueue+0x46e/0x970
get_page_from_freelist+0xcb/0x3f0
? _cond_resched+0x19/0x40
__alloc_pages_nodemask+0x164/0x300
alloc_pages_current+0x87/0xf0
skb_page_frag_refill+0x84/0x110
...
Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.
After bisecting the issue, we found the issue started from commit
e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages"):
if (put_page_testzero(page))
free_the_page(page, order);
else if (!PageHead(page))
while (order-- > 0)
free_the_page(page + (1 << order), order);
So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page. So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.
Fixes: e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull spi fixes from Mark Brown:
"A couple of hopefully final fixes for spi: one driver specific fix for
an issue with very large transfers and a fix for an issue with the
locking fixes in spidev merged earlier this release cycle which was
missed"
* tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spidev: fix a recursive locking error
spi: dw: Fix wrong FIFO level setting for long xfers
Pull x86 fixes from Ingo Molnar:
"Fix a kprobes bug, plus add a new Intel model number to the upstream
<asm/intel-family.h> header for drivers to use"
* tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Lunar Lake M
x86/kprobes: Fix 1 byte conditional jump target
Pull locking fix from Ingo Molnar:
"Fix an rtmutex missed-wakeup bug"
* tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Ensure that the top waiter is always woken up
Pull cxl fixes from Dan Williams:
"Two fixups for CXL (Compute Express Link) in presence of passthrough
decoders.
This primarily helps developers using the QEMU CXL emulation, but with
the impending arrival of CXL switches these types of topologies will
be of interest to end users.
- Fix a crash when shutting down regions in the presence of
passthrough decoders
- Fix region creation to understand passthrough decoders instead of
the narrower definition of passthrough ports"
* tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Fix passthrough-decoder detection
cxl/region: Fix null pointer dereference for resetting decoder
Pull libnvdimm fixes from Dan Williams:
"A fix for an issue that could causes users to inadvertantly reserve
too much capacity when debugging the KMSAN and persistent memory
namespace, a lockdep fix, and a kernel-doc build warning:
- Resolve the conflict between KMSAN and NVDIMM with respect to
reserving pmem namespace / volume capacity for larger sizeof(struct
page)
- Fix a lockdep warning in the the NFIT code
- Fix a kernel-doc build warning"
* tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
ACPI: NFIT: fix a potential deadlock during NFIT teardown
dax: super.c: fix kernel-doc bad line warning
Pull memblock revert from Mike Rapoport:
"Revert 'mm: Always release pages to the buddy allocator in
memblock_free_late()'
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying
to coalesce buddies, which will cause a crash.
A proper fix will be more involved so revert this change for the time
being"
* tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
The nfs4_file table is global, so shutting it down when a containerized
nfsd is shut down is wrong and can lead to double-frees. Tear down the
nfs4_file_rhltable in nfs4_state_shutdown instead of
nfs4_state_shutdown_net.
Fixes: d47b295e8d ("NFSD: Use rhashtable for managing nfs4_file objects")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017
Reported-by: JianHong Yin <jiyin@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit f2bd1c5ae2 ("ALSA: hda: Fix page fault in
snd_hda_codec_shutdown()") relocated initialization of several codec
device fields. Due to differences between codec_exec_verb() and
snd_hdac_bus_exec_bus() in how they handle VERB execution - the latter
does not touch PM - assigning ->exec_verb to codec_exec_verb() causes PM
to be engaged before it is configured for the device. Configuration of
PM for the ASoC HDAudio sound card is done with snd_hda_set_power_save()
during skl_hda_audio_probe() whereas the assignment happens early, in
snd_hda_codec_device_init().
Revert to previous behavior to avoid problems caused by too early PM
manipulation.
Suggested-by: Jason Montleon <jmontleo@redhat.com>
Link: https://lore.kernel.org/regressions/CALFERdzKUodLsm6=Ub3g2+PxpNpPtPq3bGBLbff=eZr9_S=YVA@mail.gmail.com
Fixes: f2bd1c5ae2 ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20230210165541.3543604-1-cezary.rojewski@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Eric Dumazet pointed out [0] that when we call skb_set_owner_r()
for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called,
resulting in a negative sk_forward_alloc.
We add a new helper which clones a skb and sets its owner only
when sk_rmem_schedule() succeeds.
Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv()
because tcp_send_synack() can make sk_forward_alloc negative before
ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.
[0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/
Fixes: 323fbd0edf ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()")
Fixes: 3df80d9320 ("[DCCP]: Introduce DCCPv6")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.
CPU 0: CPU 1:
tcindex_set_parms tcindex_classify
tcindex_lookup
tcindex_lookup
tcf_exts_change
tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b5 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec@valis.email>
Suggested-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an
interrupt. The process of servicing this interrupt involves flushing all
pending RX DMA descriptors and clearing the teardown completion marker
(TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX
NAPI callback services the interrupt. Thus, it is necessary to wait for
this handler to run, drain all packets and clear TDCM, before calling
napi_disable() in am65_cpsw_nuss_common_stop() function post channel
teardown. If napi_disable() executes before ensuring that TDCM is
cleared, the TDCM remains set when the interfaces are down, resulting in
an interrupt storm when the interfaces are brought up again.
Since the interrupt raised to indicate the RX DMA Channel teardown is
specific to the AM62x and AM64x SoCs, add a quirk for it.
Fixes: 4f7cce2724 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull clk fixes from Stephen Boyd:
"Two clk driver fixes
- Use devm_kasprintf() to avoid overflows when forming clk names in
the Microchip PolarFire driver
- Fix the pretty broken Ingenic JZ4760 M/N/OD calculation to actually
work and find proper divisors"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: ingenic: jz4760: Update M/N/OD calculation algorithm
clk: microchip: mpfs-ccc: Use devm_kasprintf() for allocating formatted strings
Pull pin control fixes from Linus Walleij:
"Some assorted pin control fixes, the most interesting will be the
Intel patch fixing a classic problem: laptop touchpad IRQs...
- Some pin drive register fixes in the Mediatek driver.
- Return proper error code in the Aspeed driver, and revert and
ill-advised force-disablement patch that needs to be reworked.
- Fix AMD driver debug output.
- Fix potential NULL dereference in the Single driver.
- Fix a group definition error in the Qualcomm SM8450 LPASS driver.
- Restore pins used in direct IRQ mode in the Intel driver (This
fixes some laptop touchpads!)"
* tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
pinctrl: qcom: sm8450-lpass-lpi: correct swr_rx_data group
pinctrl: aspeed: Revert "Force to disable the function's signal"
pinctrl: single: fix potential NULL dereference
pinctrl: amd: Fix debug output for debounce time
pinctrl: aspeed: Fix confusing types in return value
pinctrl: mediatek: Fix the drive register definition of some Pins
Pull PCI fixes from Bjorn Helgaas:
- Move to a shared PCI git tree (Bjorn Helgaas)
- Add Krzysztof Wilczyński as another PCI maintainer (Lorenzo
Pieralisi)
- Revert a couple ASPM patches to fix suspend/resume regressions (Bjorn
Helgaas)
* tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
MAINTAINERS: Promote Krzysztof to PCI controller maintainer
MAINTAINERS: Move to shared PCI tree
This reverts commit 5e85eba6f5.
Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.
The main symptom is:
iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
and the machine is only partially usable after resume. It can't run dmesg
and can't do a clean reboot. This happens on every suspend/resume cycle.
Revert 5e85eba6f5 until we can figure out the root cause.
Fixes: 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates Control Register programming")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
This reverts commit 4ff116d0d5.
Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1. Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.
Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:
pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore
which is a problem because PCIe r6.0, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.
Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.
Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.
Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume. As far as we know the system will use more power but will
still *work* correctly.
Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
Pull ARM SoC fixes from Arnd Bergmann:
"All the changes this time are minor devicetree corrections, the
majority being for 64-bit Rockchip SoC support. These are a couple of
corrections for properties that are in violation of the binding, some
that put the machine into safer operating points for the eMMC and
thermal settings, and missing properties that prevented rk356x PCIe
and ethernet from working correctly.
The changes for amlogic and mediatek address incorrect properties that
were preventing the display support on MT8195 and the MMC support on
various Meson SoCs from working correctly.
The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to
allow this to be used correctly after a futre driver change, though it
has no effect on older kernels"
* tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port
arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings
arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
arm64: dts: rockchip: fix probe of analog sound card on rock-3a
arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
arm64: dts: rockchip: fix input enable pinconf on rk3399
ARM: dts: rockchip: add power-domains property to dp node on rk3288
arm64: dts: rockchip: add io domain setting to rk3566-box-demo
arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
Pull RISC-V fixes from Palmer Dabbelt:
"This is a little bigger that I'd hope for this late in the cycle, but
they're all pretty concrete fixes and the only one that's bigger than
a few lines is pmdp_collapse_flush() (which is almost all
boilerplate/comment). It's also all bug fixes for issues that have
been around for a while.
So I think it's not all that scary, just bad timing.
- avoid partial TLB fences for huge pages, which are disallowed by
the ISA
- avoid missing a frame when dumping stacks
- avoid misaligned accesses (and possibly overflows) in kprobes
- fix a race condition in tracking page dirtiness"
* tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
riscv: kprobe: Fixup misaligned load text
riscv: stacktrace: Fix missing the first frame
riscv: mm: Implement pmdp_collapse_flush for THP
Pull ceph fix from Ilya Dryomov:
"A fix for a pretty embarrassing omission in the session flush handler
from Xiubo, marked for stable"
* tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client:
ceph: flush cap releases when the session is flushed
Pull block fix from Jens Axboe:
"A single fix for a smatch regression introduced in this merge window"
* tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux:
nvme-auth: mark nvme_auth_wq static
Pull sound fixes from Takashi Iwai:
"Hopefully the last one for 6.2, a collection of the fixes that have
been gathered since the last pull.
All changes are small and trivial device-specific fixes"
* tag 'sound-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add Positivo N14KP6-TG
ASoC: topology: Return -ENOMEM on memory allocation failure
ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
ASoC: fsl_sai: fix getting version from VERID
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform.
ALSA: hda/realtek: Add quirk for ASUS UM3402 using CS35L41
ASoC: codecs: es8326: Fix DTS properties reading
ASoC: tas5805m: add missing page switch.
ASoC: tas5805m: rework to avoid scheduling while atomic.
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Elitebook, 645 G9
ASoC: SOF: amd: Fix for handling spurious interrupts from DSP
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
ALSA: pci: lx6464es: fix a debug loop
ASoC: rt715-sdca: fix clock stop prepare timeout issue
During the driver conversion to shmem, the start address for the
scanout buffer was set to the base PCI address.
In most cases it works because only the lower 24bits are used, and
due to alignment it was almost always 0.
But on some unlucky hardware, it's not the case, and some uninitialized
memory is displayed on the BMC.
With shmem, the primary plane is always at offset 0 in GPU memory.
* v2: rewrite the patch to set the offset to 0. (Thomas Zimmermann)
* v3: move the change to plane_init() and also fix the cursor plane.
(Jammy Huang)
Tested on a sr645 affected by this bug.
Fixes: f2fa5a99ca ("drm/ast: Convert ast to SHMEM")
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jammy Huang <jammy_huang@aspeedtech.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209094417.21630-1-jfalempe@redhat.com
By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.
Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Amlogic fixes for v6.2-rc, take2:
- Change MMC controllers interrupts flag to level on all families, fixes irq loss & performance issues when cpu loaded
* tag 'amlogic-fixes-v6.2-rc-take2' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
Link: https://lore.kernel.org/r/761c2ebc-7c93-8504-35ae-3e84ad216bcf@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When a fbdev with deferred I/O is once opened and closed, the dirty
pages still remain queued in the pageref list, and eventually later
those may be processed in the delayed work. This may lead to a
corruption of pages, hitting an Oops.
This patch makes sure to cancel the delayed work and clean up the
pageref list at closing the device for addressing the bug. A part of
the cleanup code is factored out as a new helper function that is
called from the common fb_release().
Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Miko Larsson <mikoxyzzz@gmail.com>
Fixes: 56c134f7f1 ("fbdev: Track deferred-I/O pages in pageref struct")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230129082856.22113-1-tiwai@suse.de
Commit b3973bb400 ("vmxnet3: set correct hash type based on
rss information") added hashType information into skb. However,
rssType field is populated for eop descriptor. This can lead
to incorrectly reporting of hashType for packets which use
multiple rx descriptors. Multiple rx descriptors are used
for Jumbo frame or LRO packets, which can hit this issue.
This patch moves the RSS codeblock under eop descritor.
Cc: stable@vger.kernel.org
Fixes: b3973bb400 ("vmxnet3: set correct hash type based on rss information")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Peng Li <lpeng@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Link: https://lore.kernel.org/r/20230208223900.5794-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot was able to trigger a warning [1] from net_free()
calling ref_tracker_dir_exit(&net->notrefcnt_tracker)
while the corresponding ref_tracker_dir_init() has not been
done yet.
copy_net_ns() can indeed bypass the call to setup_net()
in some error conditions.
Note:
We might factorize/move more code in preinit_net() in the future.
[1]
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 5817 Comm: syz-executor.3 Not tainted 6.2.0-rc7-next-20230208-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
assign_lock_key kernel/locking/lockdep.c:982 [inline]
register_lock_class+0xdb6/0x1120 kernel/locking/lockdep.c:1295
__lock_acquire+0x10a/0x5df0 kernel/locking/lockdep.c:4951
lock_acquire.part.0+0x11c/0x370 kernel/locking/lockdep.c:5691
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
ref_tracker_dir_exit+0x52/0x600 lib/ref_tracker.c:24
net_free net/core/net_namespace.c:442 [inline]
net_free+0x98/0xd0 net/core/net_namespace.c:436
copy_net_ns+0x4f3/0x6b0 net/core/net_namespace.c:493
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:228
ksys_unshare+0x449/0x920 kernel/fork.c:3205
__do_sys_unshare kernel/fork.c:3276 [inline]
__se_sys_unshare kernel/fork.c:3274 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3274
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
Fixes: 0cafd77dcd ("net: add a refcount tracker for kernel sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230208182123.3821604-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault says:
====================
ipv6: Fix socket connection with DSCP fib-rules.
The "flowlabel" field of struct flowi6 is used to store both the actual
flow label and the DS Field (or Traffic Class). However the .connect
handlers of datagram and TCP sockets don't set the DS Field part when
doing their route lookup. This breaks fib-rules that match on DSCP.
====================
Link: https://lore.kernel.org/r/cover.1675875519.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the fib_rule6_send and fib_rule4_send tests to verify that DSCP
values are properly taken into account when UDP or TCP sockets try to
connect().
Tests are done with nettest, which needs a new option to specify
the DS Field value of the socket being tested. This new option is
named '-Q', in reference to the similar option used by ping.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Take into account the IPV6_TCLASS socket option (DSCP) in
tcp_v6_connect(). Otherwise fib6_rule_match() can't properly
match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0
ip -6 rule add dsfield 0x04 table 100
echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.
Fixes: 2cc67cc731 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Take into account the IPV6_TCLASS socket option (DSCP) in
ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't
properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0
ip -6 rule add dsfield 0x04 table 100
echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.
Fixes: 2cc67cc731 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman says:
====================
nfp: fix schedule in atomic context when offloading sa
Yinjun Zhang says:
IPsec offloading callbacks may be called in atomic context, sleep is
not allowed in the implementation. Now use workqueue mechanism to
avoid this issue.
Extend existing workqueue mechanism for multicast configuration only
to universal use, so that all configuring through mailbox asynchoronously
can utilize it.
Also fix another two incorrect use of mailbox in IPsec:
1. Need lock for race condition when accessing mbox
2. Offset of mbox access should depends on tlv caps
====================
Link: https://lore.kernel.org/r/20230208102258.29639-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPsec offloading callbacks may be called in atomic context, sleep is
not allowed in the implementation. Now use workqueue mechanism to
avoid this issue.
Extend existing workqueue mechanism for multicast configuration only
to universal use, so that all configuring through mailbox asynchronously
can utilize it.
Fixes: 859a497fe8 ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mailbox configuration mechanism requires writing several registers,
which shouldn't be interrupted, so need lock to avoid race condition.
The base offset of mailbox configuration registers is not fixed, it
depends on TLV caps read from application firmware.
Fixes: 859a497fe8 ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were
incorrectly unified. Chip package values are not unique and cannot be
checked independently. They are meaningful only in a context of a given
chip.
Packages BCM5358 and BCM47188 share the same value but then belong to
different chips. Code unification resulted in treating BCM5358 as
BCM47188 and broke its initialization.
Link: https://github.com/openwrt/openwrt/issues/8278
Fixes: cb1b0f90ac ("net: ethernet: bgmac: unify code of the same family")
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull drm fixes from Dave Airlie:
"Weekly fixes.
The amdgpu had a few small fixes to display flicker on certain
configurations, however it was found the the flicker was lessened but
there were other unintended consequences, so for now they've been
reverted and replaced with an option for users to test with so future
fixes can be developed.
Otherwise apart from the usual bunch of i915 and amdgpu, there's a
client, virtio-gpu and an nvidiafb fix that reorders its loading to
avoid failure.
client:
- refcount fix
amdgpu:
- a bunch of attempted flicker fixes that regressed turned into a
user workaround option for now
- Properly fix S/G display with AGP aperture enabled
- Fix cursor offset with 180 rotation
- SMU13 fixes
- Use TGID for GPUVM traces
- Fix oops on in fence error path
- Don't run IB tests on hw rings when sw rings are in use
- memory leak fix
i915:
- Display watermark fix
- fbdev fix for PSR, FBC, DRRS
- Move fd_install after last use of fence
- Initialize the obj flags for shmem objects
- Fix VBT DSI DVO port handling
virtio-gpu:
- fence fix
nvidiafb:
- regression fix for driver load when no hw supported"
* tag 'drm-fixes-2023-02-10' of git://anongit.freedesktop.org/drm/drm: (27 commits)
Revert "drm/amd/display: disable S/G display on DCN 3.1.5"
Revert "drm/amd/display: disable S/G display on DCN 2.1.0"
Revert "drm/amd/display: disable S/G display on DCN 3.1.2/3"
drm/amdgpu: add S/G display parameter
drm/amdgpu/smu: skip pptable init under sriov
amd/amdgpu: remove test ib on hw ring
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes
drm/amdgpu: Add unique_id support for GC 11.0.1/2
drm/amd/pm: bump SMU 13.0.7 driver_if header version
drm/amd/pm: bump SMU 13.0.0 driver_if header version
drm/amd/pm: add SMU 13.0.7 missing GetPptLimit message mapping
drm/amd/display: fix cursor offset on rotation 180
drm/amd/amdgpu: enable athub cg 11.0.3
Revert "drm/amd/display: disable S/G display on DCN 3.1.4"
drm/amd/display: properly handling AGP aperture in vm setup
drm/amd/display: disable S/G display on DCN 3.1.2/3
drm/amd/display: disable S/G display on DCN 2.1.0
drm/i915: Fix VBT DSI DVO port handling
drm/client: fix circular reference counting issue
...
Pull rdma fixes from Jason Gunthorpe:
"The usual collection of small driver bug fixes:
- Fix error unwind bugs in hfi1, irdma rtrs
- Old bug with IPoIB children interfaces possibly using the wrong
number of queues
- Really old bug in usnic calling iommu_map in an atomic context
- Recent regression from the DMABUF locking rework
- Missing user data validation in MANA"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rtrs: Don't call kobject_del for srv_path->kobj
RDMA/mana_ib: Prevent array underflow in mana_ib_create_qp_raw()
IB/hfi1: Assign npages earlier
RDMA/umem: Use dma-buf locked API to solve deadlock
RDMA/usnic: use iommu_map_atomic() under spin_lock()
RDMA/irdma: Fix potential NULL-ptr-dereference
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
IB/hfi1: Restore allocated resources on failed copyout
Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a crash is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be dynamically allocated within a range of
addresses. However, kmemleak should not need to scan CMA regions or any
reserved memory region, as those regions can be used for DMA transfers
between drivers and peripherals, and thus wouldn't contain anything
useful for kmemleak.
Additionally, since CMA regions are unmapped from the kernel's address
space when they are freed to the buddy allocator at boot when
CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
those memory regions, as that will trigger a crash. Thus, kmemleak
should ignore all dynamically allocated reserved memory regions.
This patch (of 1):
Currently, kmemleak ignores dynamically allocated reserved memory regions
that don't have a kernel mapping. However, regions that do retain a
kernel mapping (e.g. CMA regions) do get scanned by kmemleak.
This is not ideal for two reasons:
1 kmemleak works by scanning memory regions for pointers to allocated
objects to determine if those objects have been leaked or not.
However, reserved memory regions can be used between drivers and
peripherals for DMA transfers, and thus, would not contain pointers to
allocated objects, making it unnecessary for kmemleak to scan these
reserved memory regions.
2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
CMA reserved memory regions are unmapped from the kernel's address
space when they are freed to buddy at boot. These CMA reserved regions
are still tracked by kmemleak, however, and when kmemleak attempts to
scan them, a crash will happen, as accessing the CMA region will result
in a page-fault, since the regions are unmapped.
Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
memory regions, instead of those that do not have a kernel mapping
associated with them.
Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
Fixes: a7259df767 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Cc: Nick Kossifidis <mick@ics.forth.gr>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Saravana Kannan <saravanak@google.com>
Cc: <stable@vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The debugfs_remove_recursive() is invoked by unregister_shrinker(), which
is holding the write lock of shrinker_rwsem. It will waits for the
handler of debugfs file complete. The handler also needs to hold the read
lock of shrinker_rwsem to do something. So it may cause the following
deadlock:
CPU0 CPU1
debugfs_file_get()
shrinker_debugfs_count_show()/shrinker_debugfs_scan_write()
unregister_shrinker()
--> down_write(&shrinker_rwsem);
debugfs_remove_recursive()
// wait for (A)
--> wait_for_completion();
// wait for (B)
--> down_read_killable(&shrinker_rwsem)
debugfs_file_put() -- (A)
up_write() -- (B)
The down_read_killable() can be killed, so that the above deadlock can be
recovered. But it still requires an extra kill action, otherwise it will
block all subsequent shrinker-related operations, so it's better to fix
it.
[akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub]
Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com
Fixes: 5035ebc644 ("mm: shrinkers: introduce debugfs interface for memory shrinkers")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull power management fix from Rafael Wysocki:
"Fix the incorrect value returned by cpufreq driver's ->get() callback
for Qualcomm platforms (Douglas Anderson)"
* tag 'pm-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-hw: Fix cpufreq_driver->get() for non-LMH systems
Pull networking fixes from Paolo Abeni:
"Including fixes from can and ipsec subtrees.
Current release - regressions:
- sched: fix off by one in htb_activate_prios()
- eth: mana: fix accessing freed irq affinity_hint
- eth: ice: fix out-of-bounds KASAN warning in virtchnl
Current release - new code bugs:
- eth: mtk_eth_soc: enable special tag when any MAC uses DSA
Previous releases - always broken:
- core: fix sk->sk_txrehash default
- neigh: make sure used and confirmed times are valid
- mptcp: be careful on subflow status propagation on errors
- xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
- phylink: move phy_device_free() to correctly release phy device
- eth: mlx5:
- fix crash unsetting rx-vlan-filter in switchdev mode
- fix hang on firmware reset
- serialize module cleanup with reload and remove"
* tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
selftests: forwarding: lib: quote the sysctl values
net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used
rds: rds_rm_zerocopy_callback() use list_first_entry()
net: txgbe: Update support email address
selftests: Fix failing VXLAN VNI filtering test
selftests: mptcp: stop tests earlier
selftests: mptcp: allow more slack for slow test-case
mptcp: be careful on subflow status propagation on errors
mptcp: fix locking for in-kernel listener creation
mptcp: fix locking for setsockopt corner-case
mptcp: do not wait for bare sockets' timeout
net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0
nfp: ethtool: fix the bug of setting unsupported port speed
txhash: fix sk->sk_txrehash default
net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()
net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA
net: sched: sch: Fix off by one in htb_activate_prios()
igc: Add ndo_tx_timeout support
net: mana: Fix accessing freed irq affinity_hint
hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC
...
Pull HID fixes from Benjamin Tissoires:
- fix potential infinite loop with a badly crafted HID device (Xin
Zhao)
- fix regression from 6.1 in USB logitech devices potentially making
their mouse wheel not working (Bastien Nocera)
- clean up in AMD sensors, which fixes a long time resume bug (Mario
Limonciello)
- few device small fixes and quirks
* tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: Ignore battery for ELAN touchscreen 29DF on HP
HID: amd_sfh: if no sensors are enabled, clean up
HID: logitech: Disable hi-res scrolling on USB
HID: core: Fix deadloop in hid_apply_multiplier.
HID: Ignore battery for Elan touchscreen on Asus TP420IA
HID: elecom: add support for TrackBall 056E:011C
Pull cifx fix from Steve French:
"Small fix for use after free"
* tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix use-after-free in rdata->read_into_pages()
We have this check to make sure we don't accidentally add older devices
that may have disappeared and re-appeared with an older generation from
being added to an fs_devices (such as a replace source device). This
makes sense, we don't want stale disks in our file system. However for
single disks this doesn't really make sense.
I've seen this in testing, but I was provided a reproducer from a
project that builds btrfs images on loopback devices. The loopback
device gets cached with the new generation, and then if it is re-used to
generate a new file system we'll fail to mount it because the new fs is
"older" than what we have in cache.
Fix this by freeing the cache when closing the device for a single device
filesystem. This will ensure that the mount command passed device path is
scanned successfully during the next mount.
CC: stable@vger.kernel.org # 5.10+
Reported-by: Daan De Meyer <daandemeyer@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently fiemap does not take the inode's lock (VFS lock), it only locks
a file range in the inode's io tree. This however can lead to a deadlock
if we have a concurrent fsync on the file and fiemap code triggers a fault
when accessing the user space buffer with fiemap_fill_next_extent(). The
deadlock happens on the inode's i_mmap_lock semaphore, which is taken both
by fsync and btrfs_page_mkwrite(). This deadlock was recently reported by
syzbot and triggers a trace like the following:
task:syz-executor361 state:D stack:20264 pid:5668 ppid:5119 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x995/0xe20 kernel/sched/core.c:6606
schedule+0xcb/0x190 kernel/sched/core.c:6682
wait_on_state fs/btrfs/extent-io-tree.c:707 [inline]
wait_extent_bit+0x577/0x6f0 fs/btrfs/extent-io-tree.c:751
lock_extent+0x1c2/0x280 fs/btrfs/extent-io-tree.c:1742
find_lock_delalloc_range+0x4e6/0x9c0 fs/btrfs/extent_io.c:488
writepage_delalloc+0x1ef/0x540 fs/btrfs/extent_io.c:1863
__extent_writepage+0x736/0x14e0 fs/btrfs/extent_io.c:2174
extent_write_cache_pages+0x983/0x1220 fs/btrfs/extent_io.c:3091
extent_writepages+0x219/0x540 fs/btrfs/extent_io.c:3211
do_writepages+0x3c3/0x680 mm/page-writeback.c:2581
filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388
__filemap_fdatawrite_range mm/filemap.c:421 [inline]
filemap_fdatawrite_range+0x175/0x200 mm/filemap.c:439
btrfs_fdatawrite_range fs/btrfs/file.c:3850 [inline]
start_ordered_ops fs/btrfs/file.c:1737 [inline]
btrfs_sync_file+0x4ff/0x1190 fs/btrfs/file.c:1839
generic_write_sync include/linux/fs.h:2885 [inline]
btrfs_do_write_iter+0xcd3/0x1280 fs/btrfs/file.c:1684
call_write_iter include/linux/fs.h:2189 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x7dc/0xc50 fs/read_write.c:584
ksys_write+0x177/0x2a0 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f7d4054e9b9
RSP: 002b:00007f7d404fa2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f7d405d87a0 RCX: 00007f7d4054e9b9
RDX: 0000000000000090 RSI: 0000000020000000 RDI: 0000000000000006
RBP: 00007f7d405a51d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 61635f65646f6e69
R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87a8
</TASK>
INFO: task syz-executor361:5697 blocked for more than 145 seconds.
Not tainted 6.2.0-rc3-syzkaller-00376-g7c6984405241 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor361 state:D stack:21216 pid:5697 ppid:5119 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x995/0xe20 kernel/sched/core.c:6606
schedule+0xcb/0x190 kernel/sched/core.c:6682
rwsem_down_read_slowpath+0x5f9/0x930 kernel/locking/rwsem.c:1095
__down_read_common+0x54/0x2a0 kernel/locking/rwsem.c:1260
btrfs_page_mkwrite+0x417/0xc80 fs/btrfs/inode.c:8526
do_page_mkwrite+0x19e/0x5e0 mm/memory.c:2947
wp_page_shared+0x15e/0x380 mm/memory.c:3295
handle_pte_fault mm/memory.c:4949 [inline]
__handle_mm_fault mm/memory.c:5073 [inline]
handle_mm_fault+0x1b79/0x26b0 mm/memory.c:5219
do_user_addr_fault+0x69b/0xcb0 arch/x86/mm/fault.c:1428
handle_page_fault arch/x86/mm/fault.c:1519 [inline]
exc_page_fault+0x7a/0x110 arch/x86/mm/fault.c:1575
asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0010:copy_user_short_string+0xd/0x40 arch/x86/lib/copy_user_64.S:233
Code: 74 0a 89 (...)
RSP: 0018:ffffc9000570f330 EFLAGS: 00050202
RAX: ffffffff843e6601 RBX: 00007fffffffefc8 RCX: 0000000000000007
RDX: 0000000000000000 RSI: ffffc9000570f3e0 RDI: 0000000020000120
RBP: ffffc9000570f490 R08: 0000000000000000 R09: fffff52000ae1e83
R10: fffff52000ae1e83 R11: 1ffff92000ae1e7c R12: 0000000000000038
R13: ffffc9000570f3e0 R14: 0000000020000120 R15: ffffc9000570f3e0
copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
raw_copy_to_user arch/x86/include/asm/uaccess_64.h:58 [inline]
_copy_to_user+0xe9/0x130 lib/usercopy.c:34
copy_to_user include/linux/uaccess.h:169 [inline]
fiemap_fill_next_extent+0x22e/0x410 fs/ioctl.c:144
emit_fiemap_extent+0x22d/0x3c0 fs/btrfs/extent_io.c:3458
fiemap_process_hole+0xa00/0xad0 fs/btrfs/extent_io.c:3716
extent_fiemap+0xe27/0x2100 fs/btrfs/extent_io.c:3922
btrfs_fiemap+0x172/0x1e0 fs/btrfs/inode.c:8209
ioctl_fiemap fs/ioctl.c:219 [inline]
do_vfs_ioctl+0x185b/0x2980 fs/ioctl.c:810
__do_sys_ioctl fs/ioctl.c:868 [inline]
__se_sys_ioctl+0x83/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f7d4054e9b9
RSP: 002b:00007f7d390d92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f7d405d87b0 RCX: 00007f7d4054e9b9
RDX: 0000000020000100 RSI: 00000000c020660b RDI: 0000000000000005
RBP: 00007f7d405a51d0 R08: 00007f7d390d9700 R09: 0000000000000000
R10: 00007f7d390d9700 R11: 0000000000000246 R12: 61635f65646f6e69
R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87b8
</TASK>
What happens is the following:
1) Task A is doing an fsync, enters btrfs_sync_file() and flushes delalloc
before locking the inode and the i_mmap_lock semaphore, that is, before
calling btrfs_inode_lock();
2) After task A flushes delalloc and before it calls btrfs_inode_lock(),
another task dirties a page;
3) Task B starts a fiemap without FIEMAP_FLAG_SYNC, so the page dirtied
at step 2 remains dirty and unflushed. Then when it enters
extent_fiemap() and it locks a file range that includes the range of
the page dirtied in step 2;
4) Task A calls btrfs_inode_lock() and locks the inode (VFS lock) and the
inode's i_mmap_lock semaphore in write mode. Then it tries to flush
delalloc by calling start_ordered_ops(), which will block, at
find_lock_delalloc_range(), when trying to lock the range of the page
dirtied at step 2, since this range was locked by the fiemap task (at
step 3);
5) Task B generates a page fault when accessing the user space fiemap
buffer with a call to fiemap_fill_next_extent().
The fault handler needs to call btrfs_page_mkwrite() for some other
page of our inode, and there we deadlock when trying to lock the
inode's i_mmap_lock semaphore in read mode, since the fsync task locked
it in write mode (step 4) and the fsync task can not progress because
it's waiting to lock a file range that is currently locked by us (the
fiemap task, step 3).
Fix this by taking the inode's lock (VFS lock) in shared mode when
entering fiemap. This effectively serializes fiemap with fsync (except the
most expensive part of fsync, the log sync), preventing this deadlock.
Reported-by: syzbot+cc35f55c41e34c30dcb5@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000032dc7305f2a66f46@google.com/
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 3cc67fe1b3.
Some users have reported flickerng with S/G display. We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to. We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
We have a parameter to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further. Having this enabled
seems like the lesser of to evils.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 2404f9b0ea.
Some users have reported flickerng with S/G display. We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to. We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
We have a parameter to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further. Having this enabled
seems like the lesser of to evils.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit f081cd4ca2.
Some users have reported flickerng with S/G display. We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to. We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
We have a parameter to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further. Having this enabled
seems like the lesser of to evils.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some users have reported flickerng with S/G display. We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to. We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
Add a option to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further.
v2: fix typo
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.2
- fix a static checker warning for a variable introduces in the last
pull request (Tom Rix)"
* tag 'nvme-6.2-2023-02-09' of git://git.infradead.org/nvme:
nvme-auth: mark nvme_auth_wq static
While checking Pin Assignments of the port and partner during probe, we
don't take into account whether the peripheral is a plug or receptacle.
This manifests itself in a mode entry failure on certain docks and
dongles with captive cables. For instance, the Startech.com Type-C to DP
dongle (Model #CDP2DP) advertises its DP VDO as 0x405. This would fail
the Pin Assignment compatibility check, despite it supporting
Pin Assignment C as a UFP.
Update the check to use the correct DP Pin Assign macros that
take the peripheral's receptacle bit into account.
Fixes: c1e5c2f0cb ("usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles")
Cc: stable@vger.kernel.org
Reported-by: Diana Zigterman <dzigterman@chromium.org>
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Link: https://lore.kernel.org/r/20230208205318.131385-1-pmalani@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 550b33cfd4 ("arm64: efi: Force the use of SetVirtualAddressMap()
on Altra machines") identifies the Altra family via the family field in
the type#1 SMBIOS record. eMAG and Altra Max machines are similarly
affected but not detected with the strict strcmp test.
The type1_family smbios string is not an entirely reliable means of
identifying systems with this issue as OEMs can, and do, use their own
strings for these fields. However, until we have a better solution,
capture the bulk of these systems by adding strcmp matching for "eMAG"
and "Altra Max".
Fixes: 550b33cfd4 ("arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines")
Cc: <stable@vger.kernel.org> # 6.1.x
Cc: Alexandru Elisei <alexandru.elisei@gmail.com>
Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
Tested-by: Justin He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
While running this selftest which usually passes:
~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ]
TEST: swp0: Multicast IPv4 to joined group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ]
TEST: swp0: Multicast IPv6 to joined group [ OK ]
TEST: swp0: Multicast IPv6 to unknown group [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ]
if I start PTP timestamping then run it again (debug prints added by me),
the unknown IPv6 MC traffic is seen by the CPU port even when it should
have been dropped:
~/selftests/drivers/net/dsa# ptp4l -i swp0 -2 -P -m
ptp4l[225.410]: selected /dev/ptp1 as PTP clock
[ 225.445746] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_add: port 0 adding L2 PTP trap
[ 225.453815] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP event trap
[ 225.462703] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP general trap
[ 225.471768] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP event trap
[ 225.480651] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP general trap
ptp4l[225.488]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[225.488]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
^C
~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ]
TEST: swp0: Multicast IPv4 to joined group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ]
TEST: swp0: Multicast IPv6 to joined group [ OK ]
TEST: swp0: Multicast IPv6 to unknown group [FAIL]
reception succeeded, but should have failed
TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ]
The PGID_MCIPV6 is configured correctly to not flood to the CPU,
I checked that.
Furthermore, when I disable back PTP RX timestamping (ptp4l doesn't do
that when it exists), packets are RX filtered again as they should be:
~/selftests/drivers/net/dsa# hwstamp_ctl -i swp0 -r 0
[ 218.202854] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_del: port 0 removing L2 PTP trap
[ 218.212656] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP event trap
[ 218.222975] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP general trap
[ 218.233133] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP event trap
[ 218.242251] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP general trap
current settings:
tx_type 1
rx_filter 12
new settings:
tx_type 1
rx_filter 0
~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
TEST: swp0: Unicast IPv4 to primary MAC address [ OK ]
TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ]
TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ]
TEST: swp0: Multicast IPv4 to joined group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ]
TEST: swp0: Multicast IPv6 to joined group [ OK ]
TEST: swp0: Multicast IPv6 to unknown group [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ]
TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ]
So it's clear that something in the PTP RX trapping logic went wrong.
Looking a bit at the code, I can see that there are 4 typos, which
populate "ipv4" VCAP IS2 key filter fields for IPv6 keys.
VCAP IS2 keys of type OCELOT_VCAP_KEY_IPV4 and OCELOT_VCAP_KEY_IPV6 are
handled by is2_entry_set(). OCELOT_VCAP_KEY_IPV4 looks at
&filter->key.ipv4, and OCELOT_VCAP_KEY_IPV6 at &filter->key.ipv6.
Simply put, when we populate the wrong key field, &filter->key.ipv6
fields "proto.mask" and "proto.value" remain all zeroes (or "don't care").
So is2_entry_set() will enter the "else" of this "if" condition:
if (msk == 0xff && (val == IPPROTO_TCP || val == IPPROTO_UDP))
and proceed to ignore the "proto" field. The resulting rule will match
on all IPv6 traffic, trapping it to the CPU.
This is the reason why the local_termination.sh selftest sees it,
because control traps are stronger than the PGID_MCIPV6 used for
flooding (from the forwarding data path).
But the problem is in fact much deeper. We trap all IPv6 traffic to the
CPU, but if we're bridged, we set skb->offload_fwd_mark = 1, so software
forwarding will not take place and IPv6 traffic will never reach its
destination.
The fix is simple - correct the typos.
I was intentionally inaccurate in the commit message about the breakage
occurring when any PTP timestamping is enabled. In fact it only happens
when L4 timestamping is requested (HWTSTAMP_FILTER_PTP_V2_EVENT or
HWTSTAMP_FILTER_PTP_V2_L4_EVENT). But ptp4l requests a larger RX
timestamping filter than it needs for "-2": HWTSTAMP_FILTER_PTP_V2_EVENT.
I wanted people skimming through git logs to not think that the bug
doesn't affect them because they only use ptp4l in L2 mode.
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230207183117.1745754-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The formula that determines the core clock requirement based on pixel
clock and blanking has been determined experimentally to minimise the
clock while supporting all modes we've seen.
A new reduced blanking mode (4kp60 at 533MHz rather than the standard
594MHz) has been seen that doesn't produce a high enough clock and
results in "flip_done timed out" error.
Increase the setup cost in the formula to make this work. The result is
a reduced blanking mode increases by up to 7MHz while leaving the
standard timing
mode untouched
Link: https://github.com/raspberrypi/linux/issues/4446
Fixes: 16e101051f ("drm/vc4: Increase the core clock based on HVS load")
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127145558.446123-1-maxime@cerno.tech
Issue is some displays go blank at the point of firmware to kms
handover. Plugging/unplugging hdmi cable, power cycling display, or
switching standby off/on
typically resolve this case.
Finally managed to find a display that suffers from this, and track down
the issue.
The firmware uses AVMUTE in normal operation. It will set AVMUTE before
disabling hdmi clocks and phy. It will clear AVMUTE after clocks and phy
are set up for a new hdmi mode.
But with the hdmi handover from firmware to kms, AVMUTE will be set by
firmware.
kms driver typically has no GCP packet (except for deep colour modes).
The spec isn't clear on whether to consider the AVMUTE as continuing
indefinitely in the absence of a GCP packet, or to consider that state
to have ended.
Most displays behave as we want, but there are a number (from multiple
manufacturers) which need to see AVMUTE cleared before displaying a
picture.
Lets just always enable GCP packet with AVMUTE cleared. That resolves
the issue on problematic displays.
From HDMI 1.4 spec:
A CD field of zero (Color Depth not indicated) shall be used whenever
the Sink does not indicate support for Deep Color. This value may
also be used in Deep Color mode to transmit a GCP indicating only
non-Deep Color information (e.g. AVMUTE).
So use CD=0 where we were previously not enabling a GCP.
Link: https://forum.libreelec.tv/thread/24780-le-10-0-1-rpi4-no-picture-after-update-from-le-10-0-0
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127161219.457058-1-maxime@cerno.tech
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023faf ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxime@cerno.tech
Steffen Klassert says:
====================
ipsec 2023-02-08
1) Fix policy checks for nested IPsec tunnels when using
xfrm interfaces. From Benedict Wong.
2) Fix netlink message expression on 32=>64-bit
messages translators. From Anastasia Belova.
3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr.
From Eric Dumazet.
4) Always consistently use time64_t in xfrm_timer_handler.
From Eric Dumazet.
5) Fix KCSAN reported bug: Multiple cpus can update use_time
at the same time. From Eric Dumazet.
6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel.
From Christian Hopps.
* tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix bug with DSCP copy to v6 from v4 tunnel
xfrm: annotate data-race around use_time
xfrm: consistently use time64_t in xfrm_timer_handler()
xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
xfrm: compat: change expression for switch in xfrm_xlate64
Fix XFRM-I support for nested ESP tunnels
====================
Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]
To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Fixes: 067f44c8b4 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The pid field corresponds to the result of gettid() in userspace.
However, userspace cannot reliably attribute PTE events to processes
with just the thread id. This patch allows userspace to easily
attribute PTE update events to specific processes by comparing this
field with the result of getpid().
For attributing events to specific threads, the thread id is also
contained in the common fields of each trace event.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Saeed Mahameed says:
====================
mlx5 fixes 2023-02-07
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Serialize module cleanup with reload and remove
net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
net/mlx5: Expose SF firmware pages counter
net/mlx5: Store page counters in a single array
net/mlx5e: IPoIB, Show unknown speed instead of error
net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
net/mlx5: Bridge, fix ageing of peer FDB entries
net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
====================
Link: https://lore.kernel.org/r/20230208030302.95378-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 3bc753c06d ("kbuild: treat char as always unsigned") broke
kprobes. Setting a probe-point on 1 byte conditional jump can cause the
kernel to crash when the (signed) relative jump offset gets treated as
unsigned.
Fix by replacing the unsigned 'immediate.bytes' (plus a cast) with the
signed 'immediate.value' when assigning to the relative jump offset.
[ dhansen: clarified changelog ]
Fixes: 3bc753c06d ("kbuild: treat char as always unsigned")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230208071708.4048-1-namit%40vmware.com
Matthieu Baerts says:
====================
mptcp: fixes for v6.2
Patch 1 clears resources earlier if there is no more reasons to keep
MPTCP sockets alive.
Patches 2 and 3 fix some locking issues visible in some rare corner
cases: the linked issues should be quite hard to reproduce.
Patch 4 makes sure subflows are correctly cleaned after the end of a
connection.
Patch 5 and 6 improve the selftests stability when running in a slow
environment by transfering data for a longer period on one hand and by
stopping the tests when all expected events have been observed on the
other hand.
All these patches fix issues introduced before v6.2.
====================
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
These 'endpoint' tests from 'mptcp_join.sh' selftest start a transfer in
the background and check the status during this transfer.
Once the expected events have been recorded, there is no reason to wait
for the data transfer to finish. It can be stopped earlier to reduce the
execution time by more than half.
For these tests, the exchanged data were not verified. Errors, if any,
were ignored but that's fine, plenty of other tests are looking at that.
It is then OK to mute stderr now that we are sure errors will be printed
(and still ignored) because the transfer is stopped before the end.
Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A test-case is frequently failing on some extremely slow VMs.
The mptcp transfer completes before the script is able to do
all the required PM manipulation.
Address the issue in the simplest possible way, making the
transfer even more slow.
Additionally dump more info in case of failures, to help debugging
similar problems in the future and init dump_stats var.
Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/323
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the subflow error report callback unconditionally
propagates the fallback subflow status to the owning msk.
If the msk is already orphaned, the above prevents the code
from correctly tracking the msk moving to the TCP_CLOSE state
and doing the appropriate cleanup.
All the above causes increasing memory usage over time and
sporadic self-tests failures.
There is a great deal of infrastructure trying to propagate
correctly the fallback subflow status to the owning mptcp socket,
e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed():
in the error propagation path we need only to cope with unorphaned
sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/339
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency, in mptcp_pm_nl_create_listen_socket(), we need to
call the __mptcp_nmpc_socket() under the msk socket lock.
Note that as a side effect, mptcp_subflow_create_socket() needs a
'nested' lockdep annotation, as it will acquire the subflow (kernel)
socket lock under the in-kernel listener msk socket lock.
The current lack of locking is almost harmless, because the relevant
socket is not exposed to the user space, but in future we will add
more complexity to the mentioned helper, let's play safe.
Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to call the __mptcp_nmpc_socket(), and later subflow socket
access under the msk socket lock, or e.g. a racing connect() could
change the socket status under the hood, with unexpected results.
Fixes: 54635bd047 ("mptcp: add TCP_FASTOPEN_CONNECT socket option")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the peer closes all the existing subflows for a given
mptcp socket and later the application closes it, the current
implementation let it survive until the timewait timeout expires.
While the above is allowed by the protocol specification it
consumes resources for almost no reason and additionally
causes sporadic self-tests failures.
Let's move the mptcp socket to the TCP_CLOSE state when there are
no alive subflows at close time, so that the allocated resources
will be freed immediately.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arınç reports that on his MT7621AT Unielec U7621-06 board and MT7623NI
Bananapi BPI-R2, packets received by the CPU over mt7530 switch port 0
(of which this driver acts as the DSA master) are not processed
correctly by software. More precisely, they arrive without a DSA tag
(in packet or in the hwaccel area - skb_metadata_dst()), so DSA cannot
demux them towards the switch's interface for port 0. Traffic from other
ports receives a skb_metadata_dst() with the correct port and is demuxed
properly.
Looking at mtk_poll_rx(), it becomes apparent that this driver uses the
skb vlan hwaccel area:
union {
u32 vlan_all;
struct {
__be16 vlan_proto;
__u16 vlan_tci;
};
};
as a temporary storage for the VLAN hwaccel tag, or the DSA hwaccel tag.
If this is a DSA master it's a DSA hwaccel tag, and finally clears up
the skb VLAN hwaccel header.
I'm guessing that the problem is the (mis)use of API.
skb_vlan_tag_present() looks like this:
#define skb_vlan_tag_present(__skb) (!!(__skb)->vlan_all)
So if both vlan_proto and vlan_tci are zeroes, skb_vlan_tag_present()
returns precisely false. I don't know for sure what is the format of the
DSA hwaccel tag, but I surely know that lowermost 3 bits of vlan_proto
are 0 when receiving from port 0:
unsigned int port = vlan_proto & GENMASK(2, 0);
If the RX descriptor has no other bits set to non-zero values in
RX_DMA_VTAG, then the call to __vlan_hwaccel_put_tag() will not, in
fact, make the subsequent skb_vlan_tag_present() return true, because
it's implemented like this:
static inline void __vlan_hwaccel_put_tag(struct sk_buff *skb,
__be16 vlan_proto, u16 vlan_tci)
{
skb->vlan_proto = vlan_proto;
skb->vlan_tci = vlan_tci;
}
What we need to do to fix this problem (assuming this is the problem) is
to stop using skb->vlan_all as temporary storage for driver affairs, and
just create some local variables that serve the same purpose, but
hopefully better. Instead of calling skb_vlan_tag_present(), let's look
at a boolean has_hwaccel_tag which we set to true when the RX DMA
descriptors have something. Disambiguate based on netdev_uses_dsa()
whether this is a VLAN or DSA hwaccel tag, and only call
__vlan_hwaccel_put_tag() if we're certain it's a VLAN tag.
Arınç confirms that the treatment works, so this validates the
assumption.
Link: https://lore.kernel.org/netdev/704f3a72-fc9e-714a-db54-272e17612637@arinc9.com/
Fixes: 2d7605a729 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unsupported port speed can be set and cause error. Now fixing it
and return an error if setting unsupported speed.
This fix depends on the following, which was included in v6.2-rc1:
commit a61474c41e ("nfp: ethtool: support reporting link modes").
Fixes: 7c69873727 ("nfp: add support for .set_link_ksettings()")
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code fix a bug that sk->sk_txrehash gets its default enable
value from sysctl_txrehash only when the socket is a TCP listener.
We should have sysctl_txrehash to set the default sk->sk_txrehash,
no matter TCP, nor listerner/connector.
Tested by following packetdrill:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4
// SO_TXREHASH == 74, default to sysctl_txrehash == 1
+0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0
+0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0
Fixes: 26859240e4 ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parameters 'queue_index' and 'napi_id' are passed in a swapped order.
Fix it here.
Fixes: 23233e577e ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The special tag is only enabled when the first MAC uses DSA. However, it
must be enabled when any MAC uses DSA. Change the check accordingly.
This fixes hardware DSA untagging not working on the second MAC of the
MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the
check that disables hardware DSA untagging for the second MAC of the MT7621
and MT7623 SoCs.
Fixes: a1f47752fd ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC")
Co-developed-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a smatch report for the newly added nvme_auth_wq.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-06 (ice)
This series contains updates to ice driver only.
Ani removes WQ_MEM_RECLAIM flag from workqueue to resolve
check_flush_dependency warning.
Michal fixes KASAN out-of-bounds warning.
Brett corrects behaviour for port VLAN Rx filters to prevent receiving
of unintended traffic.
Dan Carpenter fixes possible off by one issue.
Zhang Changzhong adjusts error path for switch recipe to prevent memory
leak.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: switch: fix potential memleak in ice_add_adv_recipe()
ice: Fix off by one in ice_tc_forward_to_queue()
ice: Fix disabling Rx VLAN filtering with port VLAN enabled
ice: fix out-of-bounds KASAN warning in virtchnl
ice: Do not use WQ_MEM_RECLAIM flag for workqueue
====================
Link: https://lore.kernel.org/r/20230206232934.634298-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On some platforms, 100/1000/2500 speeds seem to have sometimes problems
reporting false positive tx unit hang during stressful UDP traffic. Likely
other Intel drivers introduce responses to a tx hang. Update the 'tx hang'
comparator with the comparison of the head and tail of ring pointers and
restore the tx_timeout_factor to the previous value (one).
This can be test by using netperf or iperf3 applications.
Example:
iperf3 -s -p 5001
iperf3 -c 192.168.0.2 --udp -p 5001 --time 600 -b 0
netserver -p 16604
netperf -H 192.168.0.2 -l 600 -p 16604 -t UDP_STREAM -- -m 64000
Fixes: b27b8dc77b ("igc: Increase timeout value for Speed 100/1000/2500")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230206235818.662384-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After calling irq_set_affinity_and_hint(), the cpumask pointer is
saved in desc->affinity_hint, and will be used later when reading
/proc/irq/<num>/affinity_hint. So the cpumask variable needs to be
persistent. Otherwise, we are accessing freed memory when reading
the affinity_hint file.
Also, need to clear affinity_hint before free_irq(), otherwise there
is a one-time warning and stack trace during module unloading:
[ 243.948687] WARNING: CPU: 10 PID: 1589 at kernel/irq/manage.c:1913 free_irq+0x318/0x360
...
[ 243.948753] Call Trace:
[ 243.948754] <TASK>
[ 243.948760] mana_gd_remove_irqs+0x78/0xc0 [mana]
[ 243.948767] mana_gd_remove+0x3e/0x80 [mana]
[ 243.948773] pci_device_remove+0x3d/0xb0
[ 243.948778] device_remove+0x46/0x70
[ 243.948782] device_release_driver_internal+0x1fe/0x280
[ 243.948785] driver_detach+0x4e/0xa0
[ 243.948787] bus_remove_driver+0x70/0xf0
[ 243.948789] driver_unregister+0x35/0x60
[ 243.948792] pci_unregister_driver+0x44/0x90
[ 243.948794] mana_driver_exit+0x14/0x3fe [mana]
[ 243.948800] __do_sys_delete_module.constprop.0+0x185/0x2f0
To fix the bug, use the persistent mask, cpumask_of(cpu#), and set
affinity_hint to NULL before freeing the IRQ, as required by free_irq().
Cc: stable@vger.kernel.org
Fixes: 71fa6887ee ("net: mana: Assign interrupts to CPUs based on NUMA nodes")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/1675718929-19565-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
can 2023-02-07
The patch is from Devid Antonio Filoni and fixes an address claiming
problem in the J1939 CAN protocol.
* tag 'linux-can-fixes-for-6.2-20230207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: do not wait 250 ms if the same addr was already claimed
====================
Link: https://lore.kernel.org/r/20230207140514.2885065-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, remove and reload flows can run in parallel to module cleanup.
This design is error prone. For example: aux_drivers callbacks are called
from both cleanup and remove flows with different lockings, which can
cause a deadlock[1].
Hence, serialize module cleanup with reload and remove.
[1]
cleanup remove
------- ------
auxiliary_driver_unregister();
devl_lock()
auxiliary_device_delete(mlx5e_aux)
device_lock(mlx5e_aux)
devl_lock()
device_lock(mlx5e_aux)
Fixes: 912cebf420 ("net/mlx5e: Connect ethernet part to auxiliary bus")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When tracer is reloaded, the device will log the traces at the
beginning of the log buffer. Also, driver is reading the log buffer in
chunks in accordance to the consumer index.
Hence, zero consumer index when reloading the tracer.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Whenever the driver is reading the string DBs into buffers, the driver
is setting the load bit, but the driver never clears this bit.
As a result, in case load bit is on and the driver query the device for
new string DBs, the driver won't read again the string DBs.
Fix it by clearing the load bit when query the device for new string
DBs.
Fixes: 2d69356752 ("net/mlx5: Add support for fw live patch event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, each core device has VF pages counter which stores number of
fw pages used by its VFs and SFs.
The current design led to a hang when performing firmware reset on DPU,
where the DPU PFs stalled in sriov unload flow due to waiting on release
of SFs pages instead of waiting on only VFs pages.
Thus, Add a separate counter for SF firmware pages, which will prevent
the stall scenario described above.
Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, an independent page counter is used for tracking memory usage
for each function type such as VF, PF and host PF (DPU).
For better code-readibilty, use a single array that stores
the number of allocated memory pages for each function type.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
ethtool is returning an error for unknown speeds for the IPoIB interface:
$ ethtool ib0
netlink error: failed to retrieve link settings
netlink error: Invalid argument
netlink error: failed to retrieve link settings
netlink error: Invalid argument
Settings for ib0:
Link detected: no
After this change, ethtool will return success and show "unknown speed":
$ ethtool ib0
Settings for ib0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: no
Fixes: eb234ee9d5 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moving to switchdev mode with rx-vlan-filter on and then setting it off
causes the kernel to crash since fs->vlan is freed during nic profile
cleanup flow.
RX VLAN filtering is not supported in switchdev mode so unset it when
changing to switchdev and restore its value when switching back to
legacy.
trace:
[] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70
[] set_feature_cvlan_filter+0x37/0x40 [mlx5_core]
[] mlx5e_handle_feature+0x3a/0x60 [mlx5_core]
[] mlx5e_set_features+0x6d/0x160 [mlx5_core]
[] __netdev_update_features+0x288/0xa70
[] ethnl_set_features+0x309/0x380
[] ? __nla_parse+0x21/0x30
[] genl_family_rcv_msg_doit.isra.17+0x110/0x150
[] genl_rcv_msg+0x112/0x260
[] ? features_reply_size+0xe0/0xe0
[] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150
[] netlink_rcv_skb+0x4e/0x100
[] genl_rcv+0x24/0x40
[] netlink_unicast+0x1ab/0x290
[] netlink_sendmsg+0x257/0x4f0
[] sock_sendmsg+0x5c/0x70
Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse'
field is only executed for eswitch that owns the entry. However, if peer
entry processed packets at least once it will have hardware counter 'used'
value greater than entry 'lastuse' from that point on, which will cause FDB
entry not being aged out.
Process the event on all eswitch instances.
Fixes: ff9b752146 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Selecting builder should be protected by the lock to prevent the case
where a new rule sets a builder in the nic_matcher while the previous
rule is still using the nic_matcher.
Fixing this issue and cleaning the error flow.
Fixes: b9b81e1e93 ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear()
to catch oversized packets. If FCS is concatenated to the end of the
packet then the check should be updated accordingly.
Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set
of channels, as part of mlx5e_safe_switch_params(), unknowingly if it
runs with default configuration or not. Current rq->hw_mtu
initialization assumes default configuration and ignores
params->scatter_fcs_en flag state.
Fix this, by accounting for params->scatter_fcs_en flag state during
rq->hw_mtu initialization.
In addition, updating rq->hw_mtu value during ingress traffic might
lead to packets drop and oversize_pkts_sw_drop counter increase with no
good reason. Hence we remove this optimization and switch the set of
channels with a new one, to make sure we don't get false positives on
the oversize_pkts_sw_drop counter.
Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull devicetree fixes from Rob Herring:
- Fix handling of multiple OF framebuffer devices
- Fix booting on Socionext Synquacer with bad 'dma-ranges' entries
- Add DT binding .yamllint to .gitignore
* tag 'devicetree-fixes-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: interrupt-controller: arm,gic-v3: Fix typo in description of msi-controller property
dt-bindings: Fix .gitignore
of/address: Return an error when no valid dma-ranges are found
of: Make OF framebuffer device names unique
A passthrough decoder is a decoder that maps only 1 target. It is a
special case because it does not impose any constraints on the
interleave-math as compared to a decoder with multiple targets. Extend
the passthrough case to multi-target-capable decoders that only have one
target selected. I.e. the current code was only considering passthrough
*ports* which are only a subset of the potential passthrough decoder
scenarios.
Fixes: e4f6dfa9ef ("cxl/region: Fix 'distance' calculation with passthrough ports")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167564540422.847146.13816934143225777888.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ASoC: Fixes for v6.2
A few more fixes for v6.2, all driver specific and small. It's larger
than is ideal but we can't really control when people find problems.
Pull tracing fix from Steven Rostedt:
"Fix regression in poll() and select()
With the fix that made poll() and select() block if read would block
caused a slight regression in rasdaemon, as it needed that kind of
behavior. Add a way to make that behavior come back by writing zero
into the 'buffer_percentage', which means to never block on read"
* tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
d) No CF shall begin, or resume, transmission on the network until 250
ms after it has successfully claimed an address except when
responding to a request for address-claimed.
But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
prioritization" show that the CF begins the transmission after 250 ms
from the first AC (address-claimed) message even if it sends another AC
message during that time window to resolve the address contention with
another CF.
As stated in "4.4.2.3 - Address-claimed message":
In order to successfully claim an address, the CF sending an address
claimed message shall not receive a contending claim from another CF
for at least 250 ms.
As stated in "4.4.3.2 - NAME management (NM) message":
1) A commanding CF can
d) request that a CF with a specified NAME transmit the address-
claimed message with its current NAME.
2) A target CF shall
d) send an address-claimed message in response to a request for a
matching NAME
Taking the above arguments into account, the 250 ms wait is requested
only during network initialization.
Do not restart the timer on AC message if both the NAME and the address
match and so if the address has already been claimed (timer has expired)
or the AC message has been sent to resolve the contention with another
CF (timer is still running).
Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Currently only the network namespace of devlink instance is monitored
for port events. If netdev is moved to a different namespace and then
unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
following WARN_ON in devl_port_unregister().
WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);
Fix this by changing the netdev notifier from per-net to global so no
event is missed.
Fixes: 02a68a47ea ("net: devlink: track netdev with devlink_port assigned")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We have two IS1 filters of the OCELOT_VCAP_KEY_ANY key type (the one with
"action vlan pop" and the one with "action vlan modify") and one of the
OCELOT_VCAP_KEY_IPV4 key type (the one with "action skbedit priority").
But we have no IS1 filter with the OCELOT_VCAP_KEY_ETYPE key type, and
there was an uncaught breakage there.
To increase test coverage, convert one of the OCELOT_VCAP_KEY_ANY
filters to OCELOT_VCAP_KEY_ETYPE, by making the filter also match on the
MAC SA of the traffic sent by mausezahn, $h1_mac.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230205192409.1796428-2-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Alternative short title: don't instruct the hardware to match on
EtherType with "protocol 802.1Q" flower filters. It doesn't work for the
reasons detailed below.
With a command such as the following:
tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \
protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \
action vlan modify id 300 \
action goto chain $(IS2 0 0)
the created filter is set by ocelot_flower_parse_key() to be of type
OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}.
This gets propagated all the way to is1_entry_set() which commits it to
hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the
case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY,
and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE.
The problem is that for VLAN-tagged frames, the hardware interprets the
ETYPE field as holding the encapsulated VLAN protocol. So the above
filter will only match those packets which have an encapsulated protocol
of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac.
The reason why this is allowed to occur is because, although we have a
block of code in ocelot_flower_parse_key() which sets "match_protocol"
to false when VLAN keys are present, that code executes too late.
There is another block of code, which executes for Ethernet addresses,
and has a "goto finished_key_parsing" and skips the VLAN header parsing.
By skipping it, "match_protocol" remains with the value it was
initialized with, i.e. "true", and "proto" is set to f->common.protocol,
or 0x8100.
The concept of ignoring some keys rather than erroring out when they are
present but can't be offloaded is dubious in itself, but is present
since the initial commit fe3490e610 ("net: mscc: ocelot: Hardware
ofload for tc flower filter"), and it's outside of the scope of this
patch to change that.
The problem was introduced when the driver started to interpret the
flower filter's protocol, and populate the VCAP filter's ETYPE field
based on it.
To fix this, it is sufficient to move the code that parses the VLAN keys
earlier than the "goto finished_key_parsing" instruction. This will
ensure that if we have a flower filter with both VLAN and Ethernet
address keys, it won't match on ETYPE 0x8100, because the VLAN key
parsing sets "match_protocol = false".
Fixes: 86b956de11 ("net: mscc: ocelot: support matching on EtherType")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 115d9d77bb.
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying to
coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8
RIP: 0010:__list_del_entry_valid+0x3f/0x70
<TASK>
__free_one_page+0x139/0x410
__free_pages_ok+0x21d/0x450
memblock_free_late+0x8c/0xb9
efi_free_boot_services+0x16b/0x25c
efi_enter_virtual_mode+0x403/0x446
start_kernel+0x678/0x714
secondary_startup_64_no_verify+0xd2/0xdb
</TASK>
A proper fix will be more involved so revert this change for the time
being.
Fixes: 115d9d77bb ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Frank reports that in a mt7530 setup where some ports are standalone and
some are in a VLAN-aware bridge, 8021q uppers of the standalone ports
lose their VLAN tag on xmit, as seen by the link partner.
This seems to occur because once the other ports join the VLAN-aware
bridge, mt7530_port_vlan_filtering() also calls
mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way
that the switch processes the traffic of the standalone port.
Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it:
EG_TAG: Incoming Port Egress Tag VLAN Attribution
0: disabled (system default)
1: consistent (keep the original ingress tag attribute)
My interpretation is that this setting applies on the ingress port, and
"disabled" is basically the normal behavior, where the egress tag format
of the packet (tagged or untagged) is decided by the VLAN table
(MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG).
But there is also an option of overriding the system default behavior,
and for the egress tagging format of packets to be decided not by the
VLAN table, but simply by copying the ingress tag format (if ingress was
tagged, egress is tagged; if ingress was untagged, egress is untagged;
aka "consistent). This is useful in 2 scenarios:
- VLAN-unaware bridge ports will always encounter a miss in the VLAN
table. They should forward a packet as-is, though. So we use
"consistent" there. See commit e045124e93 ("net: dsa: mt7530: fix
tagged frames pass-through in VLAN-unaware mode").
- Traffic injected from the CPU port. The operating system is in god
mode; if it wants a packet to exit as VLAN-tagged, it sends it as
VLAN-tagged. Otherwise it sends it as VLAN-untagged*.
*This is true only if we don't consider the bridge TX forwarding offload
feature, which mt7530 doesn't support.
So for now, make the CPU port always stay in "consistent" mode to allow
software VLANs to be forwarded to their egress ports with the VLAN tag
intact, and not stripped.
Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/
Fixes: e045124e93 ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We reference dump buffers both by their handle as well as their
object. The problem is now that when anybody iterates over the DRM
framebuffers and exports the underlying GEM objects through DMA-buf
we run into a circular reference count situation.
The result is that the fbdev handling holds the GEM handle preventing
the DMA-buf in the GEM object to be released. This DMA-buf in turn
holds a reference to the driver module which on unload would release
the fbdev.
Break that loop by releasing the handle as soon as the DRM
framebuffer object is created. The DRM framebuffer and the DRM client
buffer structure still hold a reference to the underlying GEM object
preventing its destruction.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: c76f0f7cb5 ("drm: Begin an API for in-kernel clients")
Cc: <stable@vger.kernel.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230126102814.8722-1-christian.koenig@amd.com
When the network status is unstable, use-after-free may occur when
read data from the server.
BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0
Call Trace:
<TASK>
dump_stack_lvl+0x38/0x4c
print_report+0x16f/0x4a6
kasan_report+0xb7/0x130
readpages_fill_pages+0x14c/0x7e0
cifs_readv_receive+0x46d/0xa40
cifs_demultiplex_thread+0x121c/0x1490
kthread+0x16b/0x1a0
ret_from_fork+0x2c/0x50
</TASK>
Allocated by task 2535:
kasan_save_stack+0x22/0x50
kasan_set_track+0x25/0x30
__kasan_kmalloc+0x82/0x90
cifs_readdata_direct_alloc+0x2c/0x110
cifs_readdata_alloc+0x2d/0x60
cifs_readahead+0x393/0xfe0
read_pages+0x12f/0x470
page_cache_ra_unbounded+0x1b1/0x240
filemap_get_pages+0x1c8/0x9a0
filemap_read+0x1c0/0x540
cifs_strict_readv+0x21b/0x240
vfs_read+0x395/0x4b0
ksys_read+0xb8/0x150
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 79:
kasan_save_stack+0x22/0x50
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2e/0x50
__kasan_slab_free+0x10e/0x1a0
__kmem_cache_free+0x7a/0x1a0
cifs_readdata_release+0x49/0x60
process_one_work+0x46c/0x760
worker_thread+0x2a4/0x6f0
kthread+0x16b/0x1a0
ret_from_fork+0x2c/0x50
Last potentially related work creation:
kasan_save_stack+0x22/0x50
__kasan_record_aux_stack+0x95/0xb0
insert_work+0x2b/0x130
__queue_work+0x1fe/0x660
queue_work_on+0x4b/0x60
smb2_readv_callback+0x396/0x800
cifs_abort_connection+0x474/0x6a0
cifs_reconnect+0x5cb/0xa50
cifs_readv_from_socket.cold+0x22/0x6c
cifs_read_page_from_socket+0xc1/0x100
readpages_fill_pages.cold+0x2f/0x46
cifs_readv_receive+0x46d/0xa40
cifs_demultiplex_thread+0x121c/0x1490
kthread+0x16b/0x1a0
ret_from_fork+0x2c/0x50
The following function calls will cause UAF of the rdata pointer.
readpages_fill_pages
cifs_read_page_from_socket
cifs_readv_from_socket
cifs_reconnect
__cifs_reconnect
cifs_abort_connection
mid->callback() --> smb2_readv_callback
queue_work(&rdata->work) # if the worker completes first,
# the rdata is freed
cifs_readv_complete
kref_put
cifs_readdata_release
kfree(rdata)
return rdata->... # UAF in readpages_fill_pages()
Similarly, this problem also occurs in the uncache_fill_pages().
Fix this by adjusts the order of condition judgment in the return
statement.
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Not all decoders have a reset callback.
The CXL specification allows a host bridge with a single root port to
have no explicit HDM decoders. Currently the region driver assumes there
are none. As such the CXL core creates a special pass through decoder
instance without a commit/reset callback.
Prior to this patch, the ->reset() callback was called unconditionally when
calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
1 Root Port, and one directly attached CXL type 3 device or multiple CXL
type 3 devices attached to downstream ports of a switch can cause a null
pointer dereference.
Before the fix, a kernel crash was observed when we destroy the region, and
a pass through decoder is reset.
The issue can be reproduced as below,
1) create a region with a CXL setup which includes a HB with a
single root port under which a memdev is attached directly.
2) destroy the region with cxl destroy-region regionX -f.
Fixes: 176baefb2e ("cxl/hdm: Commit decoder state to hardware")
Cc: <stable@vger.kernel.org>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Link: https://lore.kernel.org/r/20221215170909.2650271-1-fan.ni@samsung.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RFI and STF security mitigation options can flip the
interrupt_exit_not_reentrant static branch condition concurrently with
the interrupt exit code which tests that branch.
Interrupt exit tests this condition to set MSR[EE|RI] for exit, then
again in the case a soft-masked interrupt is found pending, to recover
the MSR so the interrupt can be replayed before attempting to exit
again. If the condition changes between these two tests, the MSR and irq
soft-mask state will become corrupted, leading to warnings and possible
crashes. For example, if the branch is initially true then false,
MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get
enabled, leading to warnings in irq_64.c.
Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
When ice_add_special_words() fails, the 'rm' is not released, which will
lead to a memory leak. Fix this up by going to 'err_unroll' label.
Compile tested only.
Fixes: 8b032a55c1 ("ice: low level support for tunnels")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
The > comparison should be >= to prevent reading one element beyond
the end of the array.
The "vsi->num_rxq" is not strictly speaking the number of elements in
the vsi->rxq_map[] array. The array has "vsi->alloc_rxq" elements and
"vsi->num_rxq" is less than or equal to the number of elements in the
array. The array is allocated in ice_vsi_alloc_arrays(). It's still
an off by one but it might not access outside the end of the array.
Fixes: 143b86f346 ("ice: Enable RX queue selection using skbedit action")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
If the user turns on the vf-true-promiscuous-support flag, then Rx VLAN
filtering will be disabled if the VF requests to enable promiscuous
mode. When the VF is in a port VLAN, this is the incorrect behavior
because it will allow the VF to receive traffic outside of its port VLAN
domain. Fortunately this only resulted in the VF(s) receiving broadcast
traffic outside of the VLAN domain because all of the VLAN promiscuous
rules are based on the port VLAN ID. Fix this by setting the
.disable_rx_filtering VLAN op to a no-op when a port VLAN is enabled on
the VF.
Also, make sure to make this fix for both Single VLAN Mode and Double
VLAN Mode enabled devices.
Fixes: c31af68a1b ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull cgroup fixes from Tejun Heo:
"During the v6.2 cycle, there were a series of changes to task cpu
affinity handling which fixed cpuset inadvertently clobbering
user-configured affinity masks. Unfortunately, they broke the affinity
handling on hybrid heterogeneous CPUs which have cores that can
execute both 64 and 32bit along with cores that can only execute 32bit
code.
This contains two fix patches for the above issue. While reverting the
changes that caused the regression is definitely an option, the
origial patches do improve how cpuset behave signficantly in some
cases and the fixes seem fairly safe, so I think it'd be better to try
to fix them first"
* tag 'cgroup-for-6.2-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
cgroup/cpuset: Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks
Pull btrfs fixes from David Sterba:
- explicitly initialize zlib work memory to fix a KCSAN warning
- limit number of send clones by maximum memory allocated
- limit device size extent in case it device shrink races with chunk
allocation
- raid56 fixes:
- fix copy&paste error in RAID6 stripe recovery
- make error bitmap update atomic
* tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: raid56: make error_bitmap update atomic
btrfs: send: limit number of clones and allocated memory size
btrfs: zlib: zero-initialize zlib workspace
btrfs: limit device extents to the device size
btrfs: raid56: fix stripes if vertical errors are found
Merge series from Daniel Beer <daniel.beer@igorinstitute.com>:
This pair of patches fixes two issues which crept in while revising the
original submission, at a time when I no longer had access to test
hardware.
The fixes here have been tested and verified on hardware.
set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.
For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().
Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.
Fixes: 431c69fac0 ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Since commit 8f9ea86fdf ("sched: Always preserve the user
requested cpumask"), relax_compatible_cpus_allowed_ptr() is calling
__sched_setaffinity() unconditionally. This helps to expose a bug in
the current cpuset hotplug code where the cpumasks of the tasks in
the top cpuset are not updated at all when some CPUs become online or
offline. It is likely caused by the fact that some of the tasks in the
top cpuset, like percpu kthreads, cannot have their cpu affinity changed.
One way to reproduce this as suggested by Peter is:
- boot machine
- offline all CPUs except one
- taskset -p ffffffff $$
- online all CPUs
Fix this by allowing cpuset_cpus_allowed() to return a wider mask that
includes offline CPUs for those tasks that are in the top cpuset. For
tasks not in the top cpuset, the old rule applies and only online CPUs
will be returned in the mask since hotplug events will update their
cpumasks accordingly.
Fixes: 8f9ea86fdf ("sched: Always preserve the user requested cpumask")
Reported-by: Will Deacon <will@kernel.org>
Originally-from: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull an ARM cpufreq fix for 6.2-rc8 from Viresh Kumar:
- Fix the incorrect value returned by cpufreq driver's ->get() callback for
Qualcomm platforms (Douglas Anderson).
* tag 'cpufreq-arm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-hw: Fix cpufreq_driver->get() for non-LMH systems
An interrupted dma_fence_wait() becomes an -ERESTARTSYS returned
to userspace ioctl(DRM_IOCTL_VIRTGPU_EXECBUFFER) calls, prompting to
retry the ioctl(), but the passed exbuf->fence_fd has been reset to -1,
making the retry attempt fail at sync_file_get_fence().
The uapi for DRM_IOCTL_VIRTGPU_EXECBUFFER is changed to retain the
passed value for exbuf->fence_fd when returning anything besides a
successful result from the ioctl.
Fixes: 2cd7b6f08b ("drm/virtio: add in/out fence support for explicit synchronization")
Signed-off-by: Ryan Neph <ryanneph@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230203233345.2477767-1-ryanneph@chromium.org
Let L1 and L2 be two spinlocks.
Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.
Let T2 be the task holding L2.
Let T3 be a task trying to acquire L1.
The following events will lead to a state in which the wait queue of L2
isn't empty, but no task actually holds the lock.
T1 T2 T3
== == ==
spin_lock(L1)
| raw_spin_lock(L1->wait_lock)
| rtlock_slowlock_locked(L1)
| | task_blocks_on_rt_mutex(L1, T3)
| | | orig_waiter->lock = L1
| | | orig_waiter->task = T3
| | | raw_spin_unlock(L1->wait_lock)
| | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
spin_unlock(L2) | | | |
| rt_mutex_slowunlock(L2) | | | |
| | raw_spin_lock(L2->wait_lock) | | | |
| | wakeup(T1) | | | |
| | raw_spin_unlock(L2->wait_lock) | | | |
| | | | waiter = T1->pi_blocked_on
| | | | waiter == rt_mutex_top_waiter(L2)
| | | | waiter->task == T1
| | | | raw_spin_lock(L2->wait_lock)
| | | | dequeue(L2, waiter)
| | | | update_prio(waiter, T1)
| | | | enqueue(L2, waiter)
| | | | waiter != rt_mutex_top_waiter(L2)
| | | | L2->owner == NULL
| | | | wakeup(T1)
| | | | raw_spin_unlock(L2->wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()
If the deadline of T1 is updated before the call to update_prio(), and the
new deadline is greater than the deadline of the second top waiter, then
after the requeue, T1 is no longer the top waiter, and the wrong task is
woken up which will then go back to sleep because it is not the top waiter.
This can be reproduced in PREEMPT_RT with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
A similar issue was pointed out by Thomas versus the cases where the top
waiter drops out early due to a signal or timeout, which is a general issue
for all regular rtmutex use cases, e.g. futex.
The problematic code is in rt_mutex_adjust_prio_chain():
// Save the top waiter before dequeue/enqueue
prerequeue_top_waiter = rt_mutex_top_waiter(lock);
rt_mutex_dequeue(lock, waiter);
waiter_update_prio(waiter, task);
rt_mutex_enqueue(lock, waiter);
// Lock has no owner?
if (!rt_mutex_owner(lock)) {
// Top waiter changed
----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
----> wake_up_state(waiter->task, waiter->wake_state);
This only takes the case into account where @waiter is the new top waiter
due to the requeue operation.
But it fails to handle the case where @waiter is not longer the top
waiter due to the requeue operation.
Ensure that the new top waiter is woken up so in all cases so it can take
over the ownerless lock.
[ tglx: Amend changelog, add Fixes tag ]
Fixes: c014ef69b3 ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com
Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
Due to a workaround we have to make sure the WM1 watermarks block/lines
values are sensible even when WM1 is disabled. To that end we copy those
values from WM0.
However since we now keep each wm level enabled on a per-plane basis
it doesn't seem necessary to do that copy when we already have an
enabled WM1 on the current plane. That is, we might be in a situation
where another plane can only do WM0 (and thus needs the copy) but
the current plane's WM1 is still perfectly valid (ie. fits into the
current DDB allocation).
Skipping the copy could avoid reprogramming the plane's registers
needlessly in some cases.
Fixes: a301cb0fca ("drm/i915: Keep plane watermarks enabled more aggressively")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230131002127.29305-1-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit c580c2d27a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Seems like properties parsing and reading was copy-pasted,
so "everest,interrupt-src" and "everest,interrupt-clk" are saved into
the es8326->jack_pol variable. This might lead to wrong settings
being saved into the reg 57 (ES8326_HP_DET).
Fix this by using proper variables while reading properties.
Signed-off-by: Alexey Firago <a.firago@yadro.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com
Link: https://lore.kernel.org/r/20230204195106.46539-1-a.firago@yadro.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There's some setup we need to do in order to get the DSP initialized,
and this can't be done until a bit-clock is ready. In an earlier version
of this driver, this work was done in a DAPM callback.
The DAPM callback doesn't guarantee that the bit-clock is running, so
the work was moved instead to the trigger callback. Unfortunately this
callback runs in atomic context, and the setup code needs to do I2C
transactions.
Here we use a work_struct to kick off the setup in a thread instead.
Fixes: ec45268467 ("ASoC: add support for TAS5805M digital amplifier")
Signed-off-by: Daniel Beer <daniel.beer@igorinstitute.com>
Link: https://lore.kernel.org/r/85d8ba405cb009a7a3249b556dc8f3bdb1754fdf.1675497326.git.daniel.beer@igorinstitute.com
Signed-off-by: Mark Brown <broonie@kernel.org>
kexec (PPC64) code calls memory_hotplug_max(). Add the header
declaration for it from <asm/mmzone.h>. Using <linux/mmzone.h> does not
work since the #include for <asm/mmzone.h> depends on CONFIG_NUMA=y,
which is not always set.
Fixes this build error/warning:
arch/powerpc/kexec/file_load_64.c: In function 'kexec_extra_fdt_size_ppc64':
arch/powerpc/kexec/file_load_64.c:993:33: error: implicit declaration of function 'memory_hotplug_max'
993 | usm_entries = ((memory_hotplug_max() / drmem_lmb_size()) +
| ^~~~~~~~~~~~~~~~~~
Fixes: fc546faa55 ("powerpc/kexec_file: Count hot-pluggable memory in FDT estimate")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230204172206.7662-1-rdunlap@infradead.org
The "port" comes from the user and if it is zero then the:
ndev = mc->ports[port - 1];
assignment does an out of bounds read. I have changed the if
statement to fix this and to mirror how it is done in
mana_ib_create_qp_rss().
Fixes: 0266a17763 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y8/3Vn8qx00kE9Kk@kili
Acked-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The initial value of hid->collection[].parent_idx if 0. When
Report descriptor doesn't contain "HID Collection", the value
remains as 0.
In the meanwhile, when the Report descriptor fullfill
all following conditions, it will trigger hid_apply_multiplier
function call.
1. Usage page is Generic Desktop Ctrls (0x01)
2. Usage is RESOLUTION_MULTIPLIER (0x48)
3. Contain any FEATURE items
The while loop in hid_apply_multiplier will search the top-most
collection by searching parent_idx == -1. Because all parent_idx
is 0. The loop will run forever.
There is a Report Descriptor triggerring the deadloop
0x05, 0x01, // Usage Page (Generic Desktop Ctrls)
0x09, 0x48, // Usage (0x48)
0x95, 0x01, // Report Count (1)
0x75, 0x08, // Report Size (8)
0xB1, 0x01, // Feature
Signed-off-by: Xin Zhao <xnzhao@google.com>
Link: https://lore.kernel.org/r/20230130212947.1315941-1-xnzhao@google.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Check all ports instead of just port_count ports. PTP init was only
checking ports 0 to port_count. If the hardware ports are not mapped
starting from 0 then they would be missed, e.g. if only ports 20-30 were
mapped it would attempt to init ports 0-10, resulting in NULL pointers
when attempting to timestamp. Now it will init all mapped ports.
Fixes: 70dfe25cd8 ("net: sparx5: Update extraction/injection for timestamping")
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 58e0be1ef6 ("net: use struct_group to copy ip/ipv6
header addresses"), ip and ipv6 headers started to use the __struct_group
definition, which is defined at include/uapi/linux/stddef.h. However,
linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h,
which breaks build of xskxceiver bpf selftest if you install the uapi
headers in the system:
$ make V=1 xskxceiver -C tools/testing/selftests/bpf
...
make: Entering directory '(...)/tools/testing/selftests/bpf'
gcc -g -O0 -rdynamic -Wall -Werror (...)
In file included from xskxceiver.c:79:
/usr/include/linux/ip.h:103:9: error: expected specifier-qualifier-list before ‘__struct_group’
103 | __struct_group(/* no tag */, addrs, /* no attrs */,
| ^~~~~~~~~~~~~~
...
Include the missing <linux/stddef.h> dependency in ip.h and do the
same for the ipv6.h header.
Fixes: 58e0be1ef6 ("net: use struct_group to copy ip/ipv6 header addresses")
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Entries can linger in cache without timer for days, thanks to
the gc_thresh1 limit. As result, without traffic, the confirmed
time can be outdated and to appear to be in the future. Later,
on traffic, NUD_STALE entries can switch to NUD_DELAY and start
the timer which can see the invalid confirmed time and wrongly
switch to NUD_REACHABLE state instead of NUD_PROBE. As result,
timer is set many days in the future. This is more visible on
32-bit platforms, with higher HZ value.
Why this is a problem? While we expect unused entries to expire,
such entries stay in REACHABLE state for too long, locked in
cache. They are not expired normally, only when cache is full.
Problem and the wrong state change reported by Zhang Changzhong:
172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
350520 seconds have elapsed since this entry was last updated, but it is
still in the REACHABLE state (base_reachable_time_ms is 30000),
preventing lladdr from being updated through probe.
Fix it by ensuring timer is started with valid used/confirmed
times. Considering the valid time range is LONG_MAX jiffies,
we try not to go too much in the past while we are in
DELAY/PROBE state. There are also places that need
used/updated times to be validated while timer is not running.
Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HP Elitebook 645 G9 laptop (with motherboard model 89D2) uses the
ALC236 codec and requires the alc236_fixup_hp_mute_led_micmute_vref
fixup in order to enable mute/micmute LEDs.
Note: the alc236_fixup_hp_gpio_led fixup, which is used by the Elitebook
640 G9, does not work with the 645 G9.
[ rearranged the entry in SSID order -- tiwai ]
Signed-off-by: Elvis Angelaccio <elvis.angelaccio@kde.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/4055cb48-e228-8a13-524d-afbb7aaafebe@kde.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On a sc7180-based Chromebook, when I go to
/sys/devices/system/cpu/cpu0/cpufreq I can see:
cpuinfo_cur_freq:2995200
cpuinfo_max_freq:1804800
scaling_available_frequencies:300000 576000 ... 1708800 1804800
scaling_cur_freq:1804800
scaling_max_freq:1804800
As you can see the `cpuinfo_cur_freq` is bogus. It turns out that this
bogus info started showing up as of commit c72cf0cb1d ("cpufreq:
qcom-hw: Fix the frequency returned by cpufreq_driver->get()"). That
commit seems to assume that everyone is on the LMH bandwagon, but
sc7180 isn't.
Let's go back to the old code in the case where LMH isn't used.
Fixes: c72cf0cb1d ("cpufreq: qcom-hw: Fix the frequency returned by cpufreq_driver->get()")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
[ Viresh: Fixed the 'fixes' tag ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull ELF fix from Al Viro:
"One of the many equivalent build warning fixes for !CONFIG_ELF_CORE
configs. Geert's is the earliest one I've been able to find"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: Move dump_emit_page() to kill unused warning
Pull USB fixes from Greg KH:
"Here are some small USB fixes that resolve some reported problems.
These include:
- gadget driver fixes
- dwc3 driver fix
- typec driver fix
- MAINTAINERS file update.
All of these have been in linux-next with no reported problems"
* tag 'usb-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Don't attempt to resume the ports before they exist
usb: gadget: udc: do not clear gadget driver.bus
usb: gadget: f_uac2: Fix incorrect increment of bNumEndpoints
usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
usb: dwc3: qcom: enable vbus override when in OTG dr-mode
MAINTAINERS: Add myself as UVC Gadget Maintainer
Pull tty/serial driver fixes from Greg KH:
"Here are some small serial and vt fixes. These include:
- 8250 driver fixes relating to dma issues
- stm32 serial driver fix for threaded irqs
- vc_screen bugfix for reported problems.
All have been in linux-next for a while with no reported problems"
* tag 'tty-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
serial: 8250_dma: Fix DMA Rx rearm race
serial: 8250_dma: Fix DMA Rx completion race
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
Pull char/misc driver fixes from Greg KH:
"Here are a number of small char/misc/whatever driver fixes. They
include:
- IIO driver fixes for some reported problems
- nvmem driver fixes
- fpga driver fixes
- debugfs memory leak fix in the hv_balloon and irqdomain code
(irqdomain change was acked by the maintainer)
All have been in linux-next with no reported problems"
* tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
HV: hv_balloon: fix memory leak with using debugfs_lookup()
nvmem: qcom-spmi-sdam: fix module autoloading
nvmem: core: fix return value
nvmem: core: fix cell removal on error
nvmem: core: fix device node refcounting
nvmem: core: fix registration vs use race
nvmem: core: fix cleanup after dev_set_name()
nvmem: core: remove nvmem_config wp_gpio
nvmem: core: initialise nvmem->id early
nvmem: sunxi_sid: Always use 32-bit MMIO reads
nvmem: brcm_nvram: Add check for kzalloc
iio: imu: fxos8700: fix MAGN sensor scale and unit
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
iio: imu: fxos8700: fix failed initialization ODR mode assignment
iio: imu: fxos8700: fix incorrect ODR mode readback
iio: light: cm32181: Fix PM support on system with 2 I2C resources
iio: hid: fix the retval in gyro_3d_capture_sample
iio: hid: fix the retval in accel_3d_capture_sample
iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
...
Pull fbdev fixes from Helge Deller:
- fix fbcon to prevent fonts bigger than 32x32 pixels to avoid
overflows reported by syzbot
- switch omapfb to use kstrtobool()
- switch some fbdev drivers to use the backlight helpers
* tag 'fbdev-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbcon: Check font dimension limits
fbdev: omapfb: Use kstrtobool() instead of strtobool()
fbdev: fbmon: fix function name in kernel-doc
fbdev: atmel_lcdfb: Rework backlight status updates
fbdev: riva: Use backlight helper
fbdev: omapfb: panel-dsi-cm: Use backlight helper
fbdev: nvidia: Use backlight helper
fbdev: mx3fb: Use backlight helper
fbdev: radeon: Use backlight helper
fbdev: atyfb: Use backlight helper
fbdev: aty128fb: Use backlight helper
Pull x86 fix from Borislav Petkov:
- Prevent the compiler from reordering accesses to debug regs which
could cause a #VC exception in SEV-ES guests at the wrong place in
the NMI handling path
* tag 'x86_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
Pull perf fix from Borislav Petkov:
- Lock the proper critical section when dealing with perf event context
* tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix perf_event_pmu_context serialization
Pull powerpc fixes from Michael Ellerman:
"It's a bit of a big batch for rc6, but just because I didn't send any
fixes the last week or two while I was on vacation, next week should
be quieter:
- Fix a few objtool warnings since we recently enabled objtool.
- Fix a deadlock with the hash MMU vs perf record.
- Fix perf profiling of asynchronous interrupt handlers.
- Revert the IMC PMU nest_init_lock to being a mutex.
- Two commits fixing problems with the kexec_file FDT size
estimation.
- Two commits fixing problems with strict RWX vs kernels running at
non-zero.
- Reconnect tlb_flush() to hash__tlb_flush()
Thanks to Kajol Jain, Nicholas Piggin, Sachin Sant Sathvika Vasireddy,
and Sourabh Jain"
* tag 'powerpc-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
powerpc/kexec_file: Count hot-pluggable memory in FDT estimate
powerpc/64s/radix: Fix RWX mapping with relocated kernel
powerpc/64s/radix: Fix crash with unaligned relocated kernel
powerpc/kexec_file: Fix division by zero in extra size estimation
powerpc/imc-pmu: Revert nest_init_lock to being a mutex
powerpc/64: Fix perf profiling asynchronous interrupt handlers
powerpc/64s: Fix local irq disable when PMIs are disabled
powerpc/kvm: Fix unannotated intra-function call warning
powerpc/85xx: Fix unannotated intra-function call warning
Pull RTC fixes from Alexandre Belloni:
"Here are a few fixes for 6.2. The EFI one is the most important as it
allows some RTCs to actually work. The other two are warnings that are
worth fixing.
- efi: make WAKEUP services optional
- sunplus: fix format string warning"
* tag 'rtc-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: sunplus: fix format string for printing resource
dt-bindings: rtc: qcom-pm8xxx: allow 'wakeup-source' property
rtc: efi: Enable SET/GET WAKEUP services as optional
Pull Kbuild fixes from Masahiro Yamada:
- Fix two bugs (for building and for signing) when MODULE_SIG_KEY
contains a PKCS#11 URI
* tag 'kbuild-fixes-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: modinst: Fix build error when CONFIG_MODULE_SIG_KEY is a PKCS#11 URI
certs: Fix build error when PKCS#11 URI contains semicolon
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Yet another fix for non-CPU accesses to the memory backing the
VGICv3 subsystem
- A set of fixes for the setlftest checking for the S1PTW behaviour
after the fix that went in ealier in the cycle"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: aarch64: Test read-only PT memory regions
KVM: selftests: aarch64: Fix check of dirty log PT write
KVM: selftests: aarch64: Do not default to dirty PTE pages on all S1PTWs
KVM: selftests: aarch64: Relax userfaultfd read vs. write checks
KVM: arm64: Allow no running vcpu on saving vgic3 pending table
KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending status
KVM: arm64: Add helper vgic_write_guest_lock()
Pull parisc architecture fixes from Helge Deller:
- Fix PTRACE_GETREGS/PTRACE_SETREGS for 32-bit userspace on a 64-bit
kernel
- pdc_iodc_print() dropped chars for newline in strings
- Drop constants in favour of PRIV_USER
- use safer strscpy() function in pdc_stable driver
* tag 'parisc-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
parisc: Replace hardcoded value with PRIV_USER constant in ptrace.c
parisc: Fix return code of pdc_iodc_print()
parisc: pdc_stable: use strscpy() to instead of strncpy()
Pull OpenRISC mailing list update from Stafford Horne:
"The old mailing list for OpenRISC died due to some infrastructure
issues and the people in charge decided not to keep it running. We
have migrated this and the users over to kernel.org infrastructure.
Sending this out now to avoid kernel developers getting lots of
bounced mails for using the old list"
* tag 'for-linus' of https://github.com/openrisc/linux:
MAINTAINERS: Update OpenRISC mailing list
KVM/arm64 fixes for 6.2, take #3
- Yet another fix for non-CPU accesses to the memory backing
the VGICv3 subsystem
- A set of fixes for the setlftest checking for the S1PTW
behaviour after the fix that went in ealier in the cycle
In one version of the HW there is a remote possibility that it
will miss the doorbell ring. This adds a bit of protection to
be sure we don't stall a queue from a missed doorbell.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure the q+cq alloc for NotifyQ is clearly documented
and don't bother with unnecessary local variables.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clear the interrupt credits before enabling the queue rather
than after to be sure that the enabled queue starts at 0 and
that we don't wipe away possible credits after enabling the
queue.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Neel Patel <neel.patel@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fixes from Jens Axboe:
"A bit bigger than I'd like at this point, but mostly a bunch of little
fixes. In detail:
- NVMe pull request via Christoph:
- Fix a missing queue put in nvmet_fc_ls_create_association
(Amit Engel)
- Clear queue pointers on tag_set initialization failure
(Maurizio Lombardi)
- Use workqueue dedicated to authentication (Shin'ichiro
Kawasaki)
- Fix for an overflow in ublk (Liu)
- Fix for leaking a queue reference in block cgroups (Ming)
- Fix for a use-after-free in BFQ (Yu)"
* tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linux:
blk-cgroup: don't update io stat for root cgroup
nvme-auth: use workqueue dedicated to authentication
nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
block: Fix the blk_mq_destroy_queue() documentation
block: ublk: extending queue_size to fix overflow
block, bfq: fix uaf for bfqq in bic_set_bfqq()
Pull ceph fix from Ilya Dryomov:
"A safeguard to prevent the kernel client from further damaging the
filesystem after running into a case of an invalid snap trace.
The root cause of this metadata corruption is still being investigated
but it appears to be stemming from the MDS. As such, this is the best
we can do for now"
* tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-client:
ceph: blocklist the kclient when receiving corrupted snap trace
ceph: move mount state enum to super.h
Pull RISC-V fixes from Palmer Dabbelt:
- A build fix to avoid static branches in cpu_relax(), which greatly
inflates the jump tables and breaks at least
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
- A fix for a kernel panic when probing impossible instruction
positions.
- A fix to disable unwind tables, which are enabled by default for
GCC-13 and result in unhandled relocations in modules.
* tag 'riscv-for-linus-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: disable generation of unwind tables
riscv: kprobe: Fixup kernel panic when probing an illegal position
riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
Pull drm fixes from Dave Airlie:
"A few more fixes this week, a bit more spread out though.
We have a bunch of nouveau regression and stabilisation fixes, along
with usual amdgpu, and i915. Otherwise just some minor misc ones:
dma-fence:
- fix signaling bit for private fences
panel:
- boe-tv101wum-nl6 disable fix
nouveau:
- gm20b acr regression fix
- tu102 scrub status fix
- tu102 wait for firmware fix
i915:
- Fixes for potential use-after-free and double-free
- GuC locking and refcount fixes
- Display's reference clock value fix
amdgpu:
- GC11 fixes
- DCN 3.1.4 fixes
- NBIO 4.3 fix
- DCN 3.2 fixes
- Properly handle additional cases where DCN is not supported
- SMU13 fixes
vc4:
- fix CEC adapter names
ssd130x:
- fix display init regression"
* tag 'drm-fixes-2023-02-03' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/amd/display: Properly handle additional cases where DCN is not supported
drm/amdgpu: Enable vclk dclk node for gc11.0.3
drm/amd: Fix initialization for nbio 4.3.0
drm/amdgpu: enable HDP SD for gfx 11.0.3
drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11
drm/amd/display: Reset DMUB mailbox SW state after HW reset
drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2
drm/amd/display: Adjust downscaling limits for dcn314
drm/amd/display: Add missing brackets in calculation
drm/amdgpu: update wave data type to 3 for gfx11
drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disable
drm/nouveau/acr/gm20b: regression fixes
drm/nouveau/fb/tu102-: fix register used to determine scrub status
drm/nouveau/devinit/tu102-: wait for GFW_BOOT_PROGRESS == COMPLETED
drm/i915/adlp: Fix typo for reference clock
drm/i915: Fix potential bit_17 double-free
drm/i915: Fix up locking around dumping requests lists
drm/i915: Fix request ref counting during error capture & debugfs dump
drm/i915/guc: Fix locking when searching for a hung request
drm/i915: Avoid potential vm use-after-free
...
Pull misc fixes from Andrew Morton:
"25 hotfixes, mainly for MM. 13 are cc:stable"
* tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
Kconfig.debug: fix the help description in SCHED_DEBUG
mm/swapfile: add cond_resched() in get_swap_pages()
mm: use stack_depot_early_init for kmemleak
Squashfs: fix handling and sanity checking of xattr_ids count
sh: define RUNTIME_DISCARD_EXIT
highmem: round down the address passed to kunmap_flush_on_unmap()
migrate: hugetlb: check for hugetlb shared PMD in node migration
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
Revert "mm: kmemleak: alloc gray object for reserved region with direct map"
freevxfs: Kconfig: fix spelling
maple_tree: should get pivots boundary by type
.mailmap: update e-mail address for Eugen Hristev
mm, mremap: fix mremap() expanding for vma's with vm_ops->close()
squashfs: harden sanity check in squashfs_read_xattr_id_table
ia64: fix build error due to switch case label appearing next to declaration
mm: multi-gen LRU: fix crash during cgroup migration
Revert "mm: add nodes= arg to memory.reclaim"
zsmalloc: fix a race with deferred_handles storing
...
When iterating on a linked list, a result of memremap is dereferenced
without checking it for NULL.
This patch adds a check that falls back on allocating a new page in
case memremap doesn't succeed.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 18df7577ad ("efi/memreserve: deal with memreserve entries in unmapped memory")
Signed-off-by: Anton Gusev <aagusev@ispras.ru>
[ardb: return -ENOMEM instead of breaking out of the loop]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
According to c8sectpfe driver code we first drive reset line low and
then high to reset the port, therefore the reset line is supposed to
be annotated as "active low". This will be important when we convert
the driver to gpiod API.
Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
When vdosys1 was initially added, it was incorrectly assumed to be
compatible with vdosys0, and thus both had the same mt8195-mmsys
compatible attached.
This has since been corrected in commit b237efd47d ("dt-bindings:
arm: mediatek: mmsys: change compatible for MT8195") and commit
82219cfbef ("dt-bindings: arm: mediatek: mmsys: add vdosys1 compatible
for MT8195"). The device tree needs to be fixed as well, otherwise
the vdosys1 block fails to work, and causes its dependent power domain
controller to not work either.
Change the compatible string of vdosys1 to "mediatek,mt8195-vdosys1".
While at it, also add the new "mediatek,mt8195-vdosys0" compatible to
vdosys0.
Fixes: 6aa5b46d17 ("arm64: dts: mt8195: Add vdosys and vppsys clock nodes")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20230202104014.2931517-1-wenst@chromium.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes to adapt to correct binding behaviour and fixes for devices on some boards
Most notably may be the adaption of lower thermal limits for the pinephone
pro, where the original hiher ones could result in (possibly permanent)
display issues.
* tag 'v6.2-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
arm64: dts: rockchip: fix probe of analog sound card on rock-3a
arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
arm64: dts: rockchip: fix input enable pinconf on rk3399
ARM: dts: rockchip: add power-domains property to dp node on rk3288
arm64: dts: rockchip: add io domain setting to rk3566-box-demo
arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
Link: https://lore.kernel.org/r/3514663.mvXUDI8C0e@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
After calling fwnode_phy_find_device(), the phy device refcount is
incremented. Then, when the phy device is attached to a netdev with
phy_attach_direct(), the refcount is also incremented but only
decremented in the caller if phy_attach_direct() fails. Move
phy_device_free() before the "if" to always release it correctly.
Indeed, either phy_attach_direct() failed and we don't want to keep a
reference to the phydev or it succeeded and a reference has been taken
internally.
Fixes: 25396f680d ("net: phylink: introduce phylink_fwnode_phy_connect()")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running kfence_test, I found some testcases failed like this:
# test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
Expected report_matches(&expect) to be true, but is false
not ok 1 - test_out_of_bounds_read
The corresponding call-trace is:
BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84
Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef4493 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).
With this patch, the call-trace will be:
BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e
Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
test_out_of_bounds_read+0x88/0x19e
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
Fixes: 6a00ef4493 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull libata fix from Damien Le Moal:
"Fix device probe issues with some combination of adapters & devices
that do not report a current link speed, leading to device probe
failures if a link speed was not previously reported and saved (me)"
* tag 'ata-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata: Fix sata_down_spd_limit() when no link speed is reported
Commit 7a8b64d17e ("of/address: use range parser for of_dma_get_range")
converted the parsing of dma-range properties to use code shared with the
PCI range parser. The intent was to introduce no functional changes however
in the case where we fail to translate the first resource instead of
returning -EINVAL the new code we return 0. Restore the previous behaviour
by returning an error if we find no valid ranges, the original code only
handled the first range but subsequently support for parsing all supplied
ranges was added.
This avoids confusing code using the parsed ranges which doesn't expect to
successfully parse ranges but have only a list terminator returned, this
fixes breakage with so far as I can tell all DMA for on SoC devices on the
Socionext Synquacer platform which has a firmware supplied DT. A bisect
identified the original conversion as triggering the issues there.
Fixes: 7a8b64d17e ("of/address: use range parser for of_dma_get_range")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Luca Di Stefano <luca.distefano@linaro.org>
Cc: 993612@bugs.debian.org
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230126-synquacer-boot-v2-1-cb80fd23c4e2@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- phy: fix null-deref in phy_attach_direct
- mac802154: fix possible double free upon parsing error
Previous releases - regressions:
- bpf: preserve reg parent/live fields when copying range info,
prevent mis-verification of programs as safe
- ip6: fix GRE tunnels not generating IPv6 link local addresses
- phy: dp83822: fix null-deref on DP83825/DP83826 devices
- sctp: do not check hb_timer.expires when resetting hb_timer
- eth: mtk_sock: fix SGMII configuration after phylink conversion
Previous releases - always broken:
- eth: xdp: execute xdp_do_flush() before napi_complete_done()
- skb: do not mix page pool and page referenced frags in GRO
- bpf:
- fix a possible task gone issue with bpf_send_signal[_thread]()
- fix an off-by-one bug in bpf_mem_cache_idx() to select the right
cache
- add missing btf_put to register_btf_id_dtor_kfuncs
- sockmap: fon't let sock_map_{close,destroy,unhash} call itself
- gso: fix null-deref in skb_segment_list()
- mctp: purge receive queues on sk destruction
- fix UaF caused by accept on already connected socket in exotic
socket families
- tls: don't treat list head as an entry in tls_is_tx_ready()
- netfilter: br_netfilter: disable sabotage_in hook after first
suppression
- wwan: t7xx: fix runtime PM implementation
Misc:
- MAINTAINERS: spring cleanup of networking maintainers"
* tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
mtk_sgmii: enable PCS polling to allow SFP work
net: mediatek: sgmii: fix duplex configuration
net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration
MAINTAINERS: update SCTP maintainers
MAINTAINERS: ipv6: retire Hideaki Yoshifuji
mailmap: add John Crispin's entry
MAINTAINERS: bonding: move Veaceslav Falico to CREDITS
net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC
virtio-net: Keep stop() to follow mirror sequence of open()
selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
can: isotp: split tx timer into transmission and timeout
can: isotp: handle wait_event_interruptible() return values
can: raw: fix CAN FD frame transmissions over CAN XL devices
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
...
poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work
since kernel 6.1-rc6. This issue is seen after the commit
42fb0a1e84 ("tracing/ring-buffer: Have
polling block on watermark").
This issue is firstly detected and reported, when testing the CXL error
events in the rasdaemon and also erified using the test application for poll()
and select().
This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(),
in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the
percentage of pages are available. The default value set for the buffer_percent is 50
in the kernel/trace/trace.c.
As a fix, allow userspace application could set buffer_percent as 0 through
the buffer_percent_fops, so that the task will wake up as soon as data is added
to any of the specific cpu buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20230202182309.742-2-shiju.jose@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 42fb0a1e84 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull KUnit fixes from Shuah Khan:
"Three fixes to bugs that cause kernel crash, link error during build,
and a third to fix kunit_test_init_section_suites() extra indirection
issue"
* tag 'linux-kselftest-kunit-fixes-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: fix kunit_test_init_section_suites(...)
kunit: fix bug in KUNIT_EXPECT_MEMEQ
kunit: Export kunit_running()
Pull ARM SoC fixes from Arnd Bergmann:
"The majority of bugfixes is once more for the NXP i.MX platform,
addressing issue with i.MX8M (UART, watchdog and ethernet) as well as
imx8dxl power button and the USB modem on an imx7 board.
The reason that i.MX always shows up here is obviously not that they
are more buggy than the others, but they have the most boards and are
good about getting fixes in quickly.
The other DT fixes are for the Nuvoton wpcm450 flash controller and
the i2c mux on an ASpeed board.
Lastly, there are updates to the MAINTAINERS entries for Mediatek,
AMD/Seattle and NXP SoCs, as well as a lone code fix for error
handling in the allwinner 'rsb' bus driver"
* tag 'soc-fixes-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: wpcm450: Add nuvoton,shm = <&shm> to FIU node
MAINTAINERS: Update entry for MediaTek SoC support
MAINTAINERS: amd: drop inactive Brijesh Singh
ARM: dts: imx7d-smegw01: Fix USB host over-current polarity
arm64: dts: imx8mm-verdin: Do not power down eth-phy
MAINTAINERS: match freescale ARM64 DT directory in i.MX entry
arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
ARM: dts: aspeed: Fix pca9849 compatible
arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode
arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts'
arm64: dts: imx8mm: Reinstate GPIO watchdog always-running property on eDM SBC
bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
Pull s390 fixes from Heiko Carstens:
- With CONFIG_VMAP_STACK enabled it is not possible to load the s390
specific diag288_wdt watchdog module. The reason is that a pointer to
a string is passed to an inline assembly; this string however is
located on the stack, while the instruction within the inline
assembly expects a physicial address. Fix this by copying the string
to a kmalloc'ed buffer.
- The diag288_wdt watchdog module does not indicate that it accesses
memory from an inline assembly, which it does. Add "memory" to the
clobber list to prevent the compiler from optimizing code incorrectly
away.
- Pass size of the uncompressed kernel image to __decompress() call.
Otherwise the kernel image decompressor may corrupt/overwrite an
initrd. This was reported to happen on s390 after commit 2aa14b1ab2
("zstd: import usptream v1.5.2").
* tag 's390-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/decompressor: specify __decompress() buf len to avoid overflow
watchdog: diag288_wdt: fix __diag288() inline assembly
watchdog: diag288_wdt: do not use stack buffers for hardware data
Pull x86 platform driver fixes from Hans de Goede:
"A set of AMD PMF fixes + a few other small fixes"
* tag 'platform-drivers-x86-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
platform/x86: thinkpad_acpi: Fix thinklight LED brightness returning 255
platform/x86/amd: pmc: add CONFIG_SERIO dependency
platform/x86/amd/pmf: Ensure mutexes are initialized before use
platform/x86/amd/pmf: Fix to update SPS thermals when power supply change
platform/x86/amd/pmf: Fix to update SPS default pprof thermals
platform/x86/amd/pmf: update to auto-mode limits only after AMT event
platform/x86/amd/pmf: Add helper routine to check pprof is balanced
platform/x86/amd/pmf: Add helper routine to update SPS thermals
Bjørn Mork says:
====================
Fix mtk_eth_soc sgmii configuration.
This has been tested on a MT7986 with a Maxlinear GPY211C phy
permanently attached to the second SoC mac.
====================
Link: https://lore.kernel.org/r/20230201182331.943411-1-bjorn@mork.no
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently there is no IRQ handling (even the SGMII supports it).
Enable polling to support SFP ports.
Fixes: 14a44ab033 ("net: mtk_eth_soc: partially convert to phylink_pcs")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: changed "1" => "true" ]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The logic of the duplex bit is inverted. Setting it means half
duplex, not full duplex.
Fix and rename macro to avoid confusion.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code expect the PHY to be in power down which is only true after reset.
Allow changes of the SGMII parameters more than once.
Only power down when reconfiguring to avoid bouncing the link when there's
no reason to - based on code from Russell King.
There are cases when the SGMII_PHYA_PWD register contains 0x9 which
prevents SGMII from working. The SGMII still shows link but no traffic
can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
taken from a good working state of the SGMII interface.
Fixes: 42c03844e9 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: rebased and squashed into one patch ]
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
can 2023-02-02
The first patch is by Ziyang Xuan and removes a errant WARN_ON_ONCE()
in the CAN J1939 protocol.
The next 3 patches are by Oliver Hartkopp. The first 2 target the CAN
ISO-TP protocol and fix the state machine with respect to signals and
a regression found by the syzbot.
The last patch is by me an missing assignment during the ethtool ring
configuration callback.
* tag 'linux-can-fixes-for-6.2-20230202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
can: isotp: split tx timer into transmission and timeout
can: isotp: handle wait_event_interruptible() return values
can: raw: fix CAN FD frame transmissions over CAN XL devices
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
====================
Link: https://lore.kernel.org/r/20230202094135.2293939-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
MAINTAINERS: spring refresh of networking maintainers
Use Jon Corbet's script for generating statistics about maintainer
coverage to identify inactive maintainers of relatively active code.
Move them to CREDITS.
====================
Link: https://lore.kernel.org/r/20230201182014.2362044-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We very rarely hear from Hideaki Yoshifuji and the IPv4/IPv6
entry covers a lot of code. Asking people to CC someone who
rarely responds feels wrong.
Note that Hideaki Yoshifuji already has an entry in CREDITS
for IPv6 so not adding another one.
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.
Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pul NVMe fixes from Christoph:
"nvme fixes for Linux 6.2
- fix a missing queue put in nvmet_fc_ls_create_association (Amit Engel)
- clear queue pointers on tag_set initialization failure
(Maurizio Lombardi)
- use workqueue dedicated to authentication (Shin'ichiro Kawasaki)"
* tag 'nvme-6.2-2023-02-02' of git://git.infradead.org/nvme:
nvme-auth: use workqueue dedicated to authentication
nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
UEFI v2.10 introduces version 2 of the memory attributes table, which
turns the reserved field into a flags field, but is compatible with
version 1 in all other respects. So let's not complain about version 2
if we encounter it.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.
Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.
warning reported in dmesg:
irq 80: nobody cared (try booting with the "irqpoll" option)
CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
Not tainted 5.15.86-15817-g4c19f3e06d49 #1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
Call Trace:
<IRQ>
dump_stack_lvl+0x69/0x97
__report_bad_irq+0x3a/0xae
note_interrupt+0x1a9/0x1e3
handle_irq_event_percpu+0x4b/0x6e
handle_irq_event+0x36/0x5b
handle_fasteoi_irq+0xae/0x171
__common_interrupt+0x48/0xc4
</IRQ>
handlers:
acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
Disabling IRQ #80
Signed-off-by: V sujith kumar Reddy <Vsujithkumar.Reddy@amd.com>
Link: https://lore.kernel.org/r/20230203123254.1898794-1-Vsujithkumar.Reddy@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When received corrupted snap trace we don't know what exactly has
happened in MDS side. And we shouldn't continue IOs and metadatas
access to MDS, which may corrupt or get incorrect contents.
This patch will just block all the further IO/MDS requests
immediately and then evict the kclient itself.
The reason why we still need to evict the kclient just after
blocking all the further IOs is that the MDS could revoke the caps
faster.
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
These flags are only used in ceph filesystem in fs/ceph, so just
move it to the place it should be.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The test tool can check that the zerocopy number of completions value is
valid taking into consideration the number of datagram send calls. This can
catch the system into a state where the datagrams are still in the system
(for example in a qdisk, waiting for the network interface to return a
completion notification, etc).
This change adds a retry logic of computing the number of completions up to
a configurable (via CLI) timeout (default: 2 seconds).
Fixes: 79ebc3c260 ("net/udpgso_bench_tx: options to exercise TX CMSG")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs
subsequently and while doing so, there is a chance that the rx one is not
ready to accept socket connections. This racing bug could fail the test
with at least one of the following:
./udpgso_bench_tx: connect: Connection refused
./udpgso_bench_tx: sendmsg: Connection refused
./udpgso_bench_tx: write: Connection refused
This change addresses this by making udpgro_bench.sh wait for the rx
program to be ready before firing off the tx one - up to a 10s timeout.
Fixes: 3a687bef14 ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This change fixes the following compiler warning:
/usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may
be used uninitialized [-Wmaybe-uninitialized]
40 | __error_noreturn (__status, __errnum, __format,
__va_arg_pack ());
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
udpgso_bench_rx.c: In function ‘main’:
udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here
253 | int ret, len, gso_size, budget = 256;
Fixes: 3327a9c463 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 2dc0b46b5e ("libata: sata_down_spd_limit should return if
driver has not recorded sstatus speed") changed the behavior of
sata_down_spd_limit() to return doing nothing if a drive does not report
a current link speed, to avoid reducing the link speed to the lowest 1.5
Gbps speed.
However, the change assumed that a speed was recorded before probing
(e.g. before a suspend/resume) and set in link->sata_spd. This causes
problems with adapters/drives combination failing to establish a link
speed during probe autonegotiation. One example reported of this problem
is an mvebu adapter with a 3Gbps port-multiplier box: autonegotiation
fails, leaving no recorded link speed and no reported current link
speed. Probe retries also fail as no action is taken by sata_set_spd()
after each retry.
Fix this by returning early in sata_down_spd_limit() only if we do have
a recorded link speed, that is, if link->sata_spd is not 0. With this
fix, a failed probe not leading to a recorded link speed is retried at
the lower 1.5 Gbps speed, with the link speed potentially increased
later on the second revalidate of the device if the device reports
that it supports higher link speeds.
Reported-by: Marius Dinu <marius@psihoexpert.ro>
Fixes: 2dc0b46b5e ("libata: sata_down_spd_limit should return if driver has not recorded sstatus speed")
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Marius Dinu <marius@psihoexpert.ro>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Commit 41b7a347bf ("powerpc: Book3S 64-bit outline-only KASAN
support") added a select of ARCH_WANTS_NO_INSTR, because it also added
some uses of noinstr. However noinstr is always defined, regardless of
ARCH_WANTS_NO_INSTR, so there's no need to select it just for that.
As PeterZ says [1]:
Note that by selecting ARCH_WANTS_NO_INSTR you effectively state to
abide by its rules.
As of now the powerpc code does not abide by those rules, and trips some
new warnings added by Peter in linux-next.
So until the code can be fixed to avoid those warnings, disable
ARCH_WANTS_NO_INSTR.
Note that ARCH_WANTS_NO_INSTR is also used to gate building KCOV and
parts of KCSAN. However none of the noinstr annotations in powerpc were
added for KCOV or KCSAN, instead instrumentation is blocked at the file
level using KCOV_INSTRUMENT_foo.o := n.
[1]: https://lore.kernel.org/linuxppc-dev/Y9t6yoafrO5YqVgM@hirez.programming.kicks-ass.net
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The timer for the transmission of isotp PDUs formerly had two functions:
1. send two consecutive frames with a given time gap
2. monitor the timeouts for flow control frames and the echo frames
This led to larger txstate checks and potentially to a problem discovered
by syzbot which enabled the panic_on_warn feature while testing.
The former 'txtimer' function is split into 'txfrtimer' and 'txtimer'
to handle the two above functionalities with separate timer callbacks.
The two simplified timers now run in one-shot mode and make the state
transitions (especially with isotp_rcv_echo) better understandable.
Fixes: 866337865f ("can: isotp: fix tx state handling for echo tx processing")
Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # >= v6.0
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The conclusion "j1939_session_deactivate() should be called with a
session ref-count of at least 2" is incorrect. In some concurrent
scenarios, j1939_session_deactivate can be called with the session
ref-count less than 2. But there is not any problem because it
will check the session active state before session putting in
j1939_session_deactivate_locked().
Here is the concurrent scenario of the problem reported by syzbot
and my reproduction log.
cpu0 cpu1
j1939_xtp_rx_eoma
j1939_xtp_rx_abort_one
j1939_session_get_by_addr [kref == 2]
j1939_session_get_by_addr [kref == 3]
j1939_session_deactivate [kref == 2]
j1939_session_put [kref == 1]
j1939_session_completed
j1939_session_deactivate
WARN_ON_ONCE(kref < 2)
=====================================================
WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70
CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:j1939_session_deactivate+0x5f/0x70
Call Trace:
j1939_session_deactivate_activate_next+0x11/0x28
j1939_xtp_rx_eoma+0x12a/0x180
j1939_tp_recv+0x4a2/0x510
j1939_can_recv+0x226/0x380
can_rcv_filter+0xf8/0x220
can_receive+0x102/0x220
? process_backlog+0xf0/0x2c0
can_rcv+0x53/0xf0
__netif_receive_skb_one_core+0x67/0x90
? process_backlog+0x97/0x2c0
__netif_receive_skb+0x22/0x80
Fixes: 0c71437dd5 ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object")
Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Before the commit fc274c1e99 ("USB: gadget: Add a new bus for gadgets")
gadget driver.bus was unused. For whatever reason, many UDC drivers set
this field explicitly to NULL in udc_start(). With the newly added gadget
bus, doing this will crash the driver during the attach.
The problem was first reported, fixed and tested with OMAP UDC and g_ether.
Other drivers are changed based on code analysis only.
Fixes: fc274c1e99 ("USB: gadget: Add a new bus for gadgets")
Cc: stable <stable@kernel.org>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230201220125.GD2415@darkstar.musicnaut.iki.fi
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
netvsc_dma_map() and netvsc_dma_unmap() currently check the cp_partial
flag and adjust the page_count so that pagebuf entries for the RNDIS
portion of the message are skipped when it has already been copied into
a send buffer. But this adjustment has already been made by code in
netvsc_send(). The duplicate adjustment causes some pagebuf entries to
not be mapped. In a normal VM, this doesn't break anything because the
mapping doesn’t change the PFN. But in a Confidential VM,
dma_map_single() does bounce buffering and provides a different PFN.
Failing to do the mapping causes the wrong PFN to be passed to Hyper-V,
and various errors ensue.
Fix this by removing the duplicate adjustment in netvsc_dma_map() and
netvsc_dma_unmap().
Fixes: 846da38de0 ("net: netvsc: Add Isolation VM support for netvsc driver")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1675135986-254490-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When THP is enabled, 4K pages are collapsed into a single huge
page using the generic pmdp_collapse_flush() which will further
use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
using address specific SFENCEs which results in repetitive (or
unpredictable) page faults on RISC-V implementations which cache
non-leaf PTEs.
Provide a RISC-V specific pmdp_collapse_flush() which ensures both
cached leaf and non-leaf PTEs are invalidated by using non-address
specific SFENCEs as recommended by the RISC-V privileged specification.
Fixes: e88b333142 ("riscv: mm: add THP support on 64-bit")
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Link: https://lore.kernel.org/r/20230130074815.1694055-1-mchitale@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In
the kernel, we don't have any use for unwind tables yet, so disable them.
More importantly, the .eh_frame section brings relocations
(R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules
that we are not prepared to handle.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Link: https://lore.kernel.org/r/mvmzg9xybqu.fsf@suse.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The kernel would panic when probed for an illegal position. eg:
(CONFIG_RISCV_ISA_C=n)
echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events
echo 1 > events/kprobes/hello/enable
cat trace
Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: __do_sys_newfstatat+0xb8/0xb8
CPU: 0 PID: 111 Comm: sh Not tainted
6.2.0-rc1-00027-g2d398fe49a4d #490
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80007268>] dump_backtrace+0x38/0x48
[<ffffffff80c5e83c>] show_stack+0x50/0x68
[<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84
[<ffffffff80c6da6c>] dump_stack+0x20/0x30
[<ffffffff80c5ecf4>] panic+0x160/0x374
[<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8
[<ffffffff802deeb0>] sys_newstat+0x0/0x30
[<ffffffff800158c0>] sys_clone+0x20/0x30
[<ffffffff800039e8>] ret_from_syscall+0x0/0x4
---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]---
That is because the kprobe's ebreak instruction broke the kernel's
original code. The user should guarantee the correction of the probe
position, but it couldn't make the kernel panic.
This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an
illegal position (Such as the middle of an instruction).
Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Thomas Winter says:
====================
ip/ip6_gre: Fix GRE tunnels not generating IPv6 link local addresses
For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE
when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they
come up to generate the IPv6 link local address for the interface.
Recently we found that they were no longer generating IPv6 addresses.
Also, non-point-to-point tunnels were not generating any IPv6 link
local address and instead generating an IPv6 compat address,
breaking IPv6 communication on the tunnel.
These failures were caused by commit e5dd729460 and this patch set
aims to resolve these issues.
====================
Link: https://lore.kernel.org/r/20230131034646.237671-1-Thomas.Winter@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We recently found that our non-point-to-point tunnels were not
generating any IPv6 link local address and instead generating an
IPv6 compat address, breaking IPv6 communication on the tunnel.
Previously, addrconf_gre_config always would call addrconf_addr_gen
and generate a EUI64 link local address for the tunnel.
Then commit e5dd729460 changed the code path so that add_v4_addrs
is called but this only generates a compat IPv6 address for
non-point-to-point tunnels.
I assume the compat address is specifically for SIT tunnels so
have kept that only for SIT - GRE tunnels now always generate link
local addresses.
Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE
when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they
come up to generate the IPv6 link local address for the interface.
Recently we found that they were no longer generating IPv6 addresses.
This issue would also have affected SIT tunnels.
Commit e5dd729460 changed the code path so that GRE tunnels
generate an IPv6 address based on the tunnel source address.
It also changed the code path so GRE tunnels don't call addrconf_addr_gen
in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode
when the IN6_ADDR_GEN_MODE is changed.
This patch aims to fix this issue by moving the code in addrconf_notify
which calls the addr gen for GRE and SIT into a separate function
and calling it in the places that expect the IPv6 address to be
generated.
The previous addrconf_dev_config is renamed to addrconf_eth_config
since it only expected eth type interfaces and follows the
addrconf_gre/sit_config format.
A part of this changes means that the loopback address will be
attempted to be configured when changing addr_gen_mode for lo.
This should not be a problem because the address should exist anyway
and if does already exist then no error is produced.
Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There could be boards with DCN listed in IP discovery, but no
display hardware actually wired up. In this case the vbios
display table will not be populated. Detect this case and
skip loading DM when we detect it.
v2: Mark DCN as harvested as well so other display checks
elsewhere in the driver are handled properly.
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Otherwise we can be out of sync with what's in the hardware, leading
to us rerunning every command that's presently in the ringbuffer.
[How]
Reset software state for the mailboxes in hw_reset callback.
This is already done as part of the mailbox init in hw_init, but we
do need to remember to reset the last cached wptr value as well here.
Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The hwss function does_plane_fit_in_mall not applicable to dcn3.2 asics.
Using it with dcn3.2 can result in undefined behaviour.
[How]
Assign the function pointer to NULL.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Lower max_downscale_ratio and ARGB888 downscale factor
to prevent cases where underflow may occur on dcn314
[How]
Set max_downscale_ratio to 400 and ARGB downscale factor
to 250 for dcn314
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We source root cgroup stats from the system-wide stats, see blkcg_print_stat
and blkcg_rstat_flush, so don't update io state for root cgroup.
Fixes blkg leak issue introduced in commit 3b8cc62987 ("blk-cgroup: Optimize blkcg_rstat_flush()")
which starts to grab blkg's reference when adding iostat_cpu into percpu
blkcg list, but this state won't be consumed by blkcg_rstat_flush() where
the blkg reference is dropped.
Tested-by: Bart van Assche <bvanassche@acm.org>
Reported-by: Bart van Assche <bvanassche@acm.org>
Fixes: 3b8cc62987 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230202021804.278582-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
removed some empty hash MMU flushing routines, but got a bit overeager
and also removed the call to hash__tlb_flush() from tlb_flush().
In regular use this doesn't lead to any noticable breakage, which is a
little concerning. Presumably there are flushes happening via other
paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck.
Fix it by reinstating the call to hash__tlb_flush().
Fixes: baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
Prefer usage of the PRIV_USER constant over the hard-coded value to set
the lowest 2 bits for the userspace privilege.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 5.16+
Pull virtio fixes from Michael Tsirkin:
"Just small bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa: ifcvf: Do proper cleanup if IFCVF init fails
vhost-scsi: unbreak any layout for response
tools/virtio: fix the vringh test for virtio ring changes
vhost/net: Clear the pending messages when the backend is removed
Pull sound fixes from Takashi Iwai:
"A bit higher volume of changes than wished, but each change is
relatively small and the fix targets are mostly device-specific, so
those should be safe as a late stage merge.
The most significant LoC is about the memalloc helper fix, which is
applied only to Xen PV. The other major parts are ASoC Intel SOF and
AVS fixes that are scattered as various small code changes. The rest
are device-specific fixes and quirks for HD- and USB-audio, FireWire
and ASoC AMD / HDMI"
* tag 'sound-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
ALSA: firewire-motu: fix unreleased lock warning in hwdep device
ALSA: memalloc: Workaround for Xen PV
ASoC: cs42l56: fix DT probe
ASoC: codecs: wsa883x: correct playback min/max rates
ALSA: hda/realtek: Add Acer Predator PH315-54
ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI table
ALSA: hda: Do not unset preset when cleaning up codec
ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure
ASoC: hdmi-codec: zero clear HDMI pdata
ASoC: SOF: ipc4-mtrace: prevent underflow in sof_ipc4_priority_mask_dfs_write()
ASoC: Intel: sof_ssp_amp: always set dpcm_capture for amplifiers
ASoC: Intel: sof_nau8825: always set dpcm_capture for amplifiers
ASoC: Intel: sof_cs42l42: always set dpcm_capture for amplifiers
ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers
ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path()
ALSA: usb-audio: Add FIXED_RATE quirk for JBL Quantum610 Wireless
ALSA: hda/realtek: fix mute/micmute LEDs, speaker don't work for a HP platform
ASoC: SOF: keep prepare/unprepare widgets in sink path
ASoC: SOF: sof-audio: skip prepare/unprepare if swidget is NULL
ASoC: SOF: sof-audio: unprepare when swidget->use_count > 0
...
NVMe In-Band authentication uses two kinds of works: chap->auth_work and
ctrl->dhchap_auth_work. The latter work flushes or cancels the former
work. However, the both works are queued to the same workqueue nvme-wq.
It results in the lockdep WARNING as follows:
WARNING: possible recursive locking detected
6.2.0-rc4+ #1 Not tainted
--------------------------------------------
kworker/u16:7/69 is trying to acquire lock:
ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: start_flush_work+0x2c5/0x380
but task is already holding lock:
ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x210/0x410
To avoid the WARNING, introduce a new workqueue nvme-auth-wq dedicated
to chap->auth_work.
Reported-by: Daniel Wagner <dwagner@suse.de>
Link: https://lore.kernel.org/linux-nvme/20230130110802.paafkiipmitwtnwr@carbon.lan/
Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In nvme_alloc_io_tag_set(), the connect_q pointer should be set to NULL
in case of error to avoid potential invalid pointer dereferences.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If nvme_alloc_admin_tag_set() fails, the admin_q and fabrics_q pointers
are left with an invalid, non-NULL value. Other functions may then check
the pointers and dereference them, e.g. in
nvme_probe() -> out_disable: -> nvme_dev_remove_admin().
Fix the bug by setting admin_q and fabrics_q to NULL in case of error.
Also use the set variable to free the tag_set as ctrl->admin_tagset isn't
initialized yet.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
As part of nvmet_fc_ls_create_association there is a case where
nvmet_fc_alloc_target_queue fails right after a new association with an
admin queue is created. In this case, no one releases the get taken in
nvmet_fc_alloc_target_assoc. This fix is adding the missing put.
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This loop accidentally reuses the "i" iterator for both the inside and
the outside loop. The value of MAX_STREAM_BUFFER is 5. I believe that
chip->rmh.stat_len is in the 2-12 range. If the value of .stat_len is
4 or more then it will loop exactly one time, but if it's less then it
is a forever loop.
It looks like it was supposed to combined into one loop where
conditions are checked.
Fixes: 8e6320064c ("ALSA: lx_core: Remove useless #if 0 .. #endif")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y9jnJTis/mRFJAQp@kili
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The unprepare sequence has started to fail after moving to panel bridge
code in the msm drm driver (commit 007ac0262b ("drm/msm/dsi: switch to
DRM_PANEL_BRIDGE")). You'll see messages like this in the kernel logs:
panel-boe-tv101wum-nl6 ae94000.dsi.0: failed to set panel off: -22
This is because boe_panel_enter_sleep_mode() needs an operating DSI link
to set the panel into sleep mode. Performing those writes in the
unprepare phase of bridge ops is too late, because the link has already
been torn down by the DSI controller in post_disable, i.e. the PHY has
been disabled, etc. See dsi_mgr_bridge_post_disable() for more details
on the DSI .
Split the unprepare function into a disable part and an unprepare part.
For now, just the DSI writes to enter sleep mode are put in the disable
function. This fixes the panel off routine and keeps the panel happy.
My Wormdingler has an integrated touchscreen that stops responding to
touch if the panel is only half disabled too. This patch fixes it. And
finally, this saves power when the screen is off because without this
fix the regulators for the panel are left enabled when nothing is being
displayed on the screen.
Fixes: 007ac0262b ("drm/msm/dsi: switch to DRM_PANEL_BRIDGE")
Fixes: a869b9db7a ("drm/panel: support for boe tv101wum-nl6 wuxga dsi video mode panel")
Cc: yangcong <yangcong5@huaqin.corp-partner.google.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Jitao Shi <jitao.shi@mediatek.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230106030108.2542081-1-swboyd@chromium.org
(cherry picked from commit c913cd5489)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
commit 8eb060e101 ("arch/riscv: add Zihintpause support") broke
building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0):
CC arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <command-line>:
./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax':
././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm'
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2
Having a static branch in cpu_relax() is problematic because that
function is widely inlined, including in some quite complex functions
like in the VDSO. A quick measurement shows this static branch is
responsible by itself for around 40% of the jump table.
Drop the static branch, which ends up being the same number of
instructions anyway. If Zihintpause is supported, we trade the nop from
the static branch for a div. If Zihintpause is unsupported, we trade the
jump from the static branch for (what gets interpreted as) a nop.
Fixes: 8eb060e101 ("arch/riscv: add Zihintpause support")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Release bridge info once packet escapes the br_netfilter path,
from Florian Westphal.
2) Revert incorrect fix for the SCTP connection tracking chunk
iterator, also from Florian.
First path fixes a long standing issue, the second path addresses
a mistake in the previous pull request for net.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk"
netfilter: br_netfilter: disable sabotage_in hook after first suppression
====================
Link: https://lore.kernel.org/r/20230131133158.4052-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Meson G12A Internal PHY does not support standard IEEE MMD extended
register access, therefore add generic dummy stubs to fail the read and
write MMD calls. This is necessary to prevent the core PHY code from
erroneously believing that EEE is supported by this PHY even though this
PHY does not support EEE, as MMD register access returns all FFFFs.
Fixes: 5c3407abb3 ("net: phy: meson-gxl: add g12a support")
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Chris Healy <healych@amazon.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230130231402.471493-1-cphealy@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bd
("sctp: avoid refreshing heartbeat timer too often"), and it only allows
mod_timer when the new expires is after hb_timer.expires. It means even
a much shorter interval for hb timer gets applied, it will have to wait
until the current hb timer to time out.
In sctp_do_8_2_transport_strike(), when a transport enters PF state, it
expects to update the hb timer to resend a heartbeat every rto after
calling sctp_transport_reset_hb_timer(), which will not work as the
change mentioned above.
The frequently hb_timer refresh was caused by sctp_transport_reset_timers()
called in sctp_outq_flush() and it was already removed in the commit above.
So we don't have to check hb_timer.expires when resetting hb_timer as it is
now not called very often.
Fixes: ba6f5e33bd ("sctp: avoid refreshing heartbeat timer too often")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On Systems where online memory is lesser compared to max memory, the
kexec_file_load system call may fail to load the kdump kernel with the
below errors:
"Failed to update fdt with linux,drconf-usable-memory property"
"Error setting up usable-memory property for kdump kernel"
This happens because the size estimation for usable memory properties
for the kdump kernel's FDT is based on the online memory whereas the
usable memory properties include max memory. In short, the hot-pluggable
memory is not accounted for while estimating the size of the usable
memory properties.
The issue is addressed by calculating usable memory property size using
max hotplug address instead of the last online memory address.
Fixes: 2377c92e37 ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131030615.729894-1-sourabhjain@linux.ibm.com
Mirsad report the below error which is caused by stack_depot_init()
failure in kvcalloc. Solve this by having stackdepot use
stack_depot_early_init().
On 1/4/23 17:08, Mirsad Goran Todorovac wrote:
I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak:
[root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff951c118568b0 (size 16):
comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s)
hex dump (first 16 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
backtrace:
[root@pc-mtodorov ~]#
Apparently, backtrace of called functions on the stack is no longer
printed with the list of memory leaks. This appeared on Lenovo desktop
10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and
6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same
CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from
Mr. Torvalds' tree. I don't know if this is deliberate feature for some
reason or a bug. Please find attached the config, lshw and kmemleak
output.
[vbabka@suse.cz: remove stack_depot_init() call]
Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/
Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com
Fixes: 56a61617dd ("mm: use stack_depot for recording kmemleak's backtrace")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and
sanity checking of the xattr_ids count in the filesystem. Both of these
flaws cause computation overflow due to incorrect typing.
In the corrupted filesystem the xattr_ids value is 4294967071, which
stored in a signed variable becomes the negative number -225.
Flaw 1 (64-bit systems only):
The signed integer xattr_ids variable causes sign extension.
This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The
variable is first multiplied by sizeof(struct squashfs_xattr_id) where the
type of the sizeof operator is "unsigned long".
On a 64-bit system this is 64-bits in size, and causes the negative number
to be sign extended and widened to 64-bits and then become unsigned. This
produces the very large number 18446744073709548016 or 2^64 - 3600. This
number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and
divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0
(stored in len).
Flaw 2 (32-bit systems only):
On a 32-bit system the integer variable is not widened by the unsigned
long type of the sizeof operator (32-bits), and the signedness of the
variable has no effect due it always being treated as unsigned.
The above corrupted xattr_ids value of 4294967071, when multiplied
overflows and produces the number 4294963696 or 2^32 - 3400. This number
when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by
SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.
The effect of the 0 length computation:
In conjunction with the corrupted xattr_ids field, the filesystem also has
a corrupted xattr_table_start value, where it matches the end of
filesystem value of 850.
This causes the following sanity check code to fail because the
incorrectly computed len of 0 matches the incorrect size of the table
reported by the superblock (0 bytes).
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
/*
* The computed size of the index table (len bytes) should exactly
* match the table start and end points
*/
start = table_start + sizeof(*id_table);
end = msblk->bytes_used;
if (len != (end - start))
return ERR_PTR(-EINVAL);
Changing the xattr_ids variable to be "usigned int" fixes the flaw on a
64-bit system. This relies on the fact the computation is widened by the
unsigned long type of the sizeof operator.
Casting the variable to u64 in the above macro fixes this flaw on a 32-bit
system.
It also means 64-bit systems do not implicitly rely on the type of the
sizeof operator to widen the computation.
[1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/
Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk
Fixes: 506220d2ba ("squashfs: add more sanity checks in xattr id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Fedor Pchelkin <pchelkin@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since
commit 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv").
This is similar to fixes for powerpc and s390:
commit 4b9880dbf3 ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT").
commit a494398bde ("s390: define RUNTIME_DISCARD_EXIT to fix link error
with GNU ld < 2.36").
$ sh4-linux-gnu-ld --version | head -n1
GNU ld (GNU Binutils for Debian) 2.35.2
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu-
`.exit.text' referenced in section `__bug_table' of crypto/algboss.o:
defined in discarded section `.exit.text' of crypto/algboss.o
`.exit.text' referenced in section `__bug_table' of
drivers/char/hw_random/core.o: defined in discarded section
`.exit.text' of drivers/char/hw_random/core.o
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make[1]: *** [Makefile:1252: vmlinux] Error 2
arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT:
/*
* .exit.text is discarded at runtime, not link time, to deal with
* references from __bug_table
*/
.exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT }
However, EXIT_TEXT is thrown away by
DISCARD(include/asm-generic/vmlinux.lds.h) because
sh does not define RUNTIME_DISCARD_EXIT.
GNU ld 2.40 does not have this issue and builds fine.
This corresponds with Masahiro's comments in a494398bde:
"Nathan [Chancellor] also found that binutils
commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this
issue, so we cannot reproduce it with binutils 2.36+, but it is better
to not rely on it."
Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com
Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dennis Gilmore <dennis@ausil.us>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1]. The following two patches address user visible behavior
caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD. This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.
page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps. Pages referenced via a shared PMD were
incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
count the hugetlb page as shared. A new helper to check for a shared PMD
is added.
[akpm@linux-foundation.org: simplification, per David]
[akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com
Fixes: 25ee01a2fc ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In commit 34488399fa ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():
- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;
This was for-use by MADV_COLLAPSE file/shmem codepaths, where
MADV_COLLAPSE might identify a pte-mapped hugepage, only to have
khugepaged race-in, free the pte table, and clear the pmd. Such codepaths
include:
A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
mm/address.
In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.
However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds. One such example that could leak through are swap
entries. The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.
We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.
The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present() and
pmd_trans_huge() still need to return true while the pmd is in this
transitory state. For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.
Covering all 4 cases for x86 (all checks done on the same pmd value):
1) pmd_present() && pmd_trans_huge()
All we actually know here is that the PSE bit is set. Either:
a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
is set.
=> huge-pmd
b) We are currently racing with __split_huge_page(). The danger here
is that we proceed as-if we have a huge-pmd, but really we are
looking at a pte-mapping-pmd. So, what is the risk of this
danger?
The only relevant path is:
madvise_collapse() -> collapse_pte_mapped_thp()
Where we might just incorrectly report back "success", when really
the memory isn't pmd-backed. This is fine, since split could
happen immediately after (actually) successful madvise_collapse().
So, it should be safe to just assume huge-pmd here.
2) pmd_present() && !pmd_trans_huge()
Either:
a) PSE not set and either PRESENT or PROTNONE is.
=> pte-table-mapping pmd (or PROT_NONE)
b) devmap. This routine can be called immediately after
unlocking/locking mmap_lock -- or called with no locks held (see
khugepaged_scan_mm_slot()), so previous VMA checks have since been
invalidated.
3) !pmd_present() && pmd_trans_huge()
Not possible.
4) !pmd_present() && !pmd_trans_huge()
Neither PRESENT nor PROTNONE set
=> not present
I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since
!pmd_present() always takes failure path).
Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how mmap_lock
is held (if at all).
Link: https://lkml.kernel.org/r/20230125225358.2576151-1-zokeefe@google.com
Fixes: 34488399fa ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian has reported another regression in 6.1 due to ca3d76b0aa ("mm:
add merging after mremap resize"). The problem is that vma_merge() can
fail when vma has a vm_ops->close() method, causing is_mergeable_vma()
test to be negative. This was happening for vma mapping a file from
fuse-overlayfs, which does have the method. But when we are simply
expanding the vma, we never remove it due to the "merge" with the added
area, so the test should not prevent the expansion.
As a quick fix, check for such vmas and expand them using vma_adjust()
directly as was done before commit ca3d76b0aa. For a more robust long
term solution we should try to limit the check for vma_ops->close only to
cases that actually result in vma removal, so that no merge would be
prevented unnecessarily.
[akpm@linux-foundation.org: fix indenting whitespace, reflow comment]
Link: https://lkml.kernel.org/r/20230117101939.9753-1-vbabka@suse.cz
Fixes: ca3d76b0aa ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Fabian Vogt <fvogt@suse.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35
Tested-by: Fabian Vogt <fvogt@suse.com>
Cc: Jakub Matěna <matenajakub@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since commit aa06a9bd85 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to
report ITC frequency"), gcc 10.1.0 fails to build ia64 with the gnomic:
| ../arch/ia64/kernel/sys_ia64.c: In function 'ia64_clock_getres':
| ../arch/ia64/kernel/sys_ia64.c:189:3: error: a label can only be part of a statement and a declaration is not a statement
| 189 | s64 tick_ns = DIV_ROUND_UP(NSEC_PER_SEC, local_cpu_data->itc_freq);
This line appears immediately after a case label in a switch.
Move the declarations out of the case, to the top of the function.
Link: https://lkml.kernel.org/r/20230117151632.393836-1-james.morse@arm.com
Fixes: aa06a9bd85 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency")
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sergei Trofimovich <slyich@gmail.com>
Cc: Émeric Maschino <emeric.maschino@gmail.com>
Cc: matoro <matoro_mailinglist_kernel@matoro.tk>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, there is a race between zs_free() and zs_reclaim_page():
zs_reclaim_page() finds a handle to an allocated object, but before the
eviction happens, an independent zs_free() call to the same handle could
come in and overwrite the object value stored at the handle with the last
deferred handle. When zs_reclaim_page() finally gets to call the eviction
handler, it will see an invalid object value (i.e the previous deferred
handle instead of the original object value).
This race happens quite infrequently. We only managed to produce it with
out-of-tree developmental code that triggers zsmalloc writeback with a
much higher frequency than usual.
This patch fixes this race by storing the deferred handle in the object
header instead. We differentiate the deferred handle from the other two
cases (handle for allocated object, and linkage for free object) with a
new tag. If zspage reclamation succeeds, we will free these deferred
handles by walking through the zspage objects. On the other hand, if
zspage reclamation fails, we reconstruct the zspage freelist (with the
deferred handle tag and allocated tag) before trying again with the
reclamation.
[arnd@arndb.de: avoid unused-function warning]
Link: https://lkml.kernel.org/r/20230117170507.2651972-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20230110231701.326724-1-nphamcs@gmail.com
Fixes: 9997bc0175 ("zsmalloc: implement writeback mechanism for zsmalloc")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.
Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).
If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.
Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.
Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.
Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull cgroup fix from Tejun Heo:
"cpuset has a bug which can cause an oops after some configuration
operations, introduced during the v6.1 cycle.
This single commit fixes the bug"
* tag 'cgroup-for-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()
It was found that the check to see if a partition could use up all
the cpus from the parent cpuset in update_parent_subparts_cpumask()
was incorrect. As a result, it is possible to leave parent with no
effective cpu left even if there are tasks in the parent cpuset. This
can lead to system panic as reported in [1].
Fix this probem by updating the check to fail the enabling the partition
if parent's effective_cpus is a subset of the child's cpus_allowed.
Also record the error code when an error happens in update_prstate()
and add a test case where parent partition and child have the same cpu
list and parent has task. Enabling partition in the child will fail in
this case.
[1] https://www.spinics.net/lists/cgroups/msg36254.html
Fixes: f0af1bfc27 ("cgroup/cpuset: Relax constraints to partition & cpus changes")
Cc: stable@vger.kernel.org # v6.1
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull SCSI fixes from James Bottomley:
"Two core fixes.
One simply moves an annotation from put to release to avoid the
warning triggering needlessly in alua, but to keep it in case release
is ever called from that path (which we don't think will happen).
The other reverts a change to the PQ=1 target scanning behaviour
that's under intense discussion at the moment"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
scsi: core: Fix the scsi_device_put() might_sleep annotation
Syzkaller triggered a WARN in put_pmu_ctx().
WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
This is because there is no locking around the access of "if
(!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
put_pmu_ctx().
The decrement of the reference count in put_pmu_ctx() also happens
outside of the spinlock, leading to the possibility of this order of
events, and the context being cleared in put_pmu_ctx(), after its
refcount is non zero:
CPU0 CPU1
find_get_pmu_context()
if (!epc->ctx) == false
put_pmu_ctx()
atomic_dec_and_test(&epc->refcount) == true
epc->refcount == 0
atomic_inc(&epc->refcount);
epc->refcount == 1
list_del_init(&epc->pmu_ctx_entry);
epc->ctx = NULL;
Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
is outside of the lock. If the perf_event_pmu_context is an embedded
one, even after clearing it, it won't be deleted and can be re-used. So
the warning can trigger. For this reason it also needs to be moved
inside the lock.
The above warning is very quick to trigger on Arm by running these two
commands at the same time:
while true; do perf record -- ls; done
while true; do perf record -- ls; done
[peterz: atomic_dec_and_raw_lock*()]
Fixes: bd27568117 ("perf: Rewrite core context handling")
Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230127143141.1782804-2-james.clark@arm.com
Pull media fixes from Mauro Carvalho Chehab:
"A couple of v4l2 core fixes:
- fix a regression on strings control support
- fix a regression for some drivers that depend on an odd streaming
behavior"
* tag 'media/v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2: set q->streaming later
media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line
Historically calls to __decompress() didn't specify "out_len" parameter
on many architectures including s390, expecting that no writes beyond
uncompressed kernel image are performed. This has changed since commit
2aa14b1ab2 ("zstd: import usptream v1.5.2") which includes zstd library
commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer
(#2751)"). Now zstd decompression code might store literal buffer in
the unwritten portion of the destination buffer. Since "out_len" is
not set, it is considered to be unlimited and hence free to use for
optimization needs. On s390 this might corrupt initrd or ipl report
which are often placed right after the decompressor buffer. Luckily the
size of uncompressed kernel image is already known to the decompressor,
so to avoid the problem simply specify it in the "out_len" parameter.
Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Looks like kunit_test_init_section_suites(...) was messed up in a merge
conflict. This fixes it.
kunit_test_init_section_suites(...) was not updated to avoid the extra
level of indirection when .kunit_test_suites was flattened. Given no-one
was actively using it, this went unnoticed for a long period of time.
Fixes: e5857d396f ("kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites")
Signed-off-by: Brendan Higgins <brendan.higgins@linux.dev>
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Martin Fernandez <martin.fernandez@eclypsium.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The mailing list at librecores.org is being shut down due to
infrastructure issues. Update the the newly created list on
vger.kernel.org.
Signed-off-by: Stafford Horne <shorne@gmail.com>
When validating drafted SPDK ublk target, in a case that
assigning large queue depth to multiqueue ublk device,
ublk target would run into a weird incorrect state. During
rounds of review and debug, An overflow bug was found
in ublk driver.
In ublk_cmd.h, UBLK_MAX_QUEUE_DEPTH is 4096 which means
each ublk queue depth can be set as large as 4096. But
when setting qd for a ublk device,
sizeof(struct ublk_queue) + depth * sizeof(struct ublk_io)
will be larger than 65535 if qd is larger than 2728.
Then queue_size is overflowed, and ublk_get_queue()
references a wrong pointer position. The wrong content of
ublk_queue elements will lead to out-of-bounds memory
access.
Extend queue_size in ublk_device as "unsigned int".
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Fixes: 71f28f3136 ("ublk_drv: add io_uring based userspace block driver")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230131070552.115067-1-xiaodong.liu@intel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no bug. If sch->length == 0, this would result in an infinite
loop, but first caller, do_basic_checks(), errors out in this case.
After this change, packets with bogus zero-length chunks are no longer
detected as invalid, so revert & add comment wrt. 0 length check.
Fixes: 98ee007745 ("netfilter: conntrack: fix bug in for_each_sctp_chunk")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When using a xfrm interface in a bridged setup (the outgoing device is
bridged), the incoming packets in the xfrm interface are only tracked
in the outgoing direction.
$ brctl show
bridge name interfaces
br_eth1 eth1
$ conntrack -L
tcp 115 SYN_SENT src=192... dst=192... [UNREPLIED] ...
If br_netfilter is enabled, the first (encrypted) packet is received onR
eth1, conntrack hooks are called from br_netfilter emulation which
allocates nf_bridge info for this skb.
If the packet is for local machine, skb gets passed up the ip stack.
The skb passes through ip prerouting a second time. br_netfilter
ip_sabotage_in supresses the re-invocation of the hooks.
After this, skb gets decrypted in xfrm layer and appears in
network stack a second time (after decryption).
Then, ip_sabotage_in is called again and suppresses netfilter
hook invocation, even though the bridge layer never called them
for the plaintext incarnation of the packet.
Free the bridge info after the first suppression to avoid this.
I was unable to figure out where the regression comes from, as far as i
can see br_netfilter always had this problem; i did not expect that skb
is looped again with different headers.
Fixes: c4b0e771f9 ("netfilter: avoid using skb->nf_bridge directly")
Reported-and-tested-by: Wolfgang Nothdurft <wolfgang@linogate.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In kernels compiled with CONFIG_PARAVIRT=n, the compiler re-orders the
DR7 read in exc_nmi() to happen before the call to sev_es_ist_enter().
This is problematic when running as an SEV-ES guest because in this
environment the DR7 read might cause a #VC exception, and taking #VC
exceptions is not safe in exc_nmi() before sev_es_ist_enter() has run.
The result is stack recursion if the NMI was caused on the #VC IST
stack, because a subsequent #VC exception in the NMI handler will
overwrite the stack frame of the interrupted #VC handler.
As there are no compiler barriers affecting the ordering of DR7
reads/writes, make the accesses to this register volatile, forbidding
the compiler to re-order them.
[ bp: Massage text, make them volatile too, to make sure some
aggressive compiler optimization pass doesn't discard them. ]
Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Reported-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230127035616.508966-1-aik@amd.com
If a relocatable kernel is loaded at a non-zero address and told not to
relocate to zero (kdump or RELOCATABLE_TEST), the mapping of the
interrupt code at zero is left with RWX permissions.
That is a security weakness, and leads to a warning at boot if
CONFIG_DEBUG_WX is enabled:
powerpc/mm: Found insecure W+X mapping at address 00000000056435bc/0xc000000000000000
WARNING: CPU: 1 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:193 note_page+0x484/0x4c0
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc1-00001-g8ae8e98aea82-dirty #175
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-dd0dca hv:linux,kvm pSeries
NIP: c0000000004a1c34 LR: c0000000004a1c30 CTR: 0000000000000000
REGS: c000000003503770 TRAP: 0700 Not tainted (6.2.0-rc1-00001-g8ae8e98aea82-dirty)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24000220 XER: 00000000
CFAR: c000000000545a58 IRQMASK: 0
...
NIP note_page+0x484/0x4c0
LR note_page+0x480/0x4c0
Call Trace:
note_page+0x480/0x4c0 (unreliable)
ptdump_pmd_entry+0xc8/0x100
walk_pgd_range+0x618/0xab0
walk_page_range_novma+0x74/0xc0
ptdump_walk_pgd+0x98/0x170
ptdump_check_wx+0x94/0x100
mark_rodata_ro+0x30/0x70
kernel_init+0x78/0x1a0
ret_from_kernel_thread+0x5c/0x64
The fix has two parts. Firstly the pages from zero up to the end of
interrupts need to be marked read-only, so that they are left with R-X
permissions. Secondly the mapping logic needs to be taught to ensure
there is a page boundary at the end of the interrupt region, so that the
permission change only applies to the interrupt text, and not the region
following it.
Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-2-mpe@ellerman.id.au
If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.
Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.
Example crash with the kernel loaded at 5MB:
Run /sbin/init as init process
BUG: Unable to handle kernel data access on write at 0xc000000000452000
Faulting instruction address: 0xc0000000005b6730
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000
CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
...
NIP memset+0x68/0x104
LR zero_user_segments.constprop.0+0xa8/0xf0
Call Trace:
ext4_mpage_readpages+0x7f8/0x830
ext4_readahead+0x48/0x60
read_pages+0xb8/0x380
page_cache_ra_unbounded+0x19c/0x250
filemap_fault+0x58c/0xae0
__do_fault+0x60/0x100
__handle_mm_fault+0x1230/0x1a40
handle_mm_fault+0x120/0x300
___do_page_fault+0x20c/0xa80
do_page_fault+0x30/0xc0
data_access_common_virt+0x210/0x220
This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)
The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.
To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.
That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)
If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages
Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
In kexec_extra_fdt_size_ppc64() there's logic to estimate how much
extra space will be needed in the device tree for some memory related
properties.
That logic uses the size of RAM divided by drmem_lmb_size() to do the
estimation. However drmem_lmb_size() can be zero if the machine has no
hotpluggable memory configured, which is the case when booting with qemu
and no maxmem=x parameter is passed (the default).
The division by zero is reported by UBSAN, and can also lead to an
overflow and a warning from kvmalloc, and kdump kernel loading fails:
WARNING: CPU: 0 PID: 133 at mm/util.c:596 kvmalloc_node+0x15c/0x160
Modules linked in:
CPU: 0 PID: 133 Comm: kexec Not tainted 6.2.0-rc5-03455-g07358bd97810 #223
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-dd0dca pSeries
NIP: c00000000041ff4c LR: c00000000041fe58 CTR: 0000000000000000
REGS: c0000000096ef750 TRAP: 0700 Not tainted (6.2.0-rc5-03455-g07358bd97810)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24248242 XER: 2004011e
CFAR: c00000000041fed0 IRQMASK: 0
...
NIP kvmalloc_node+0x15c/0x160
LR kvmalloc_node+0x68/0x160
Call Trace:
kvmalloc_node+0x68/0x160 (unreliable)
of_kexec_alloc_and_setup_fdt+0xb8/0x7d0
elf64_load+0x25c/0x4a0
kexec_image_load_default+0x58/0x80
sys_kexec_file_load+0x5c0/0x920
system_call_exception+0x128/0x330
system_call_vectored_common+0x15c/0x2ec
To fix it, skip the calculation if drmem_lmb_size() is zero.
Fixes: 2377c92e37 ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014707.541110-1-mpe@ellerman.id.au
As DMA Rx can be completed from two places, it is possible that DMA Rx
completes before DMA completion callback had a chance to complete it.
Once the previous DMA Rx has been completed, a new one can be started
on the next UART interrupt. The following race is possible
(uart_unlock_and_check_sysrq_irqrestore() replaced with
spin_unlock_irqrestore() for simplicity/clarity):
CPU0 CPU1
dma_rx_complete()
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma_flush()
__dma_rx_complete()
dma->rx_running = 0
// Complete DMA Rx
spin_unlock_irqrestore(&port->lock)
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma()
dma->rx_running = 1
// Setup a new DMA Rx
spin_unlock_irqrestore(&port->lock)
spin_lock_irqsave(&port->lock)
// sees dma->rx_running = 1
__dma_rx_complete()
dma->rx_running = 0
// Incorrectly complete
// running DMA Rx
This race seems somewhat theoretical to occur for real but handle it
correctly regardless. Check what is the DMA status before complething
anything in __dma_rx_complete().
Reported-by: Gilles BULOZ <gilles.buloz@kontron.com>
Tested-by: Gilles BULOZ <gilles.buloz@kontron.com>
Fixes: 9ee4b83e51 ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230130114841.25749-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__dma_rx_complete() is called from two places:
- Through the DMA completion callback dma_rx_complete()
- From serial8250_rx_dma_flush() after IIR_RLSI or IIR_RX_TIMEOUT
The former does not hold port's lock during __dma_rx_complete() which
allows these two to race and potentially insert the same data twice.
Extend port's lock coverage in dma_rx_complete() to prevent the race
and check if the DMA Rx is still pending completion before calling
into __dma_rx_complete().
Reported-by: Gilles BULOZ <gilles.buloz@kontron.com>
Tested-by: Gilles BULOZ <gilles.buloz@kontron.com>
Fixes: 9ee4b83e51 ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230130114841.25749-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Requesting an interrupt with IRQF_ONESHOT will run the primary handler
in the hard-IRQ context even in the force-threaded mode. The
force-threaded mode is used by PREEMPT_RT in order to avoid acquiring
sleeping locks (spinlock_t) in hard-IRQ context. This combination
makes it impossible and leads to "sleeping while atomic" warnings.
Use one interrupt handler for both handlers (primary and secondary)
and drop the IRQF_ONESHOT flag which is not needed.
Fixes: e359b4411c ("serial: stm32: fix threaded interrupt handling")
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Valentin Caron <valentin.caron@foss.st.com> # V3
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230120160332.57930-1-marex@denx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
"1st set of IIO fixes for the 6.2 cycle.
The usual mixed bag - with a bunch of issues found by Carlos Song
in the fxos8700 IMU driver dominating.
hid-accel,gyro
- Fix wrong returned value when read succeeds.
marvell,berlin-adc
- Missing of_node_put() in an error path.
nxp,fxos8700 (freescale)
- Wrong channel type match.
- Swapped channel read back.
- Incomplete channel read back (not enough bytes).
- Missing shift of acceleration data.
- Range selection didn't work (datasheet bug)
- Wrong ODR mode read back due to wrong field offset.
- Drop unused, but wrong define.
- Fix issue with magnetometer scale an units.
nxp,imx8qxp
- Fix an irq flood due to not reading data early enough.
st,lsm6dsx
- Add CONFIG_IIO_TRIGGERED_BUFFER select.
st,stm32-adc
- Fix missing MODULE_DEVICE_TABLE() needed for module aliases.
ti,twl6030
- Fix missing enable of some channels.
- Fix a typo in previous patch that meant one channel still wasn't enabled.
xilinx,xadc
- Carrying on incorrectly after allocation error."
* tag 'iio-fixes-for-6.2a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: imu: fxos8700: fix MAGN sensor scale and unit
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
iio: imu: fxos8700: fix failed initialization ODR mode assignment
iio: imu: fxos8700: fix incorrect ODR mode readback
iio: light: cm32181: Fix PM support on system with 2 I2C resources
iio: hid: fix the retval in gyro_3d_capture_sample
iio: hid: fix the retval in accel_3d_capture_sample
iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
iio:adc:twl6030: Enable measurement of VAC
iio: imu: fxos8700: fix ACCEL measurement range selection
iio: imu: fxos8700: fix IMU data bits returned to user space
iio: imu: fxos8700: fix incomplete ACCEL and MAGN channels readback
iio: imu: fxos8700: fix swapped ACCEL and MAGN channels readback
iio: imu: fxos8700: fix map label of channel type to MAGN sensor
iio:adc:twl6030: Enable measurements of VUSB, VBAT and others
iio: imx8qxp-adc: fix irq flood when call imx8qxp_adc_read_raw()
iio: adc: xilinx-ams: fix devm_krealloc() return value check
iio: adc: berlin2-adc: Add missing of_node_put() in error path
iio: adc: stm32-dfsdm: fill module aliases
When CONFIG_MODULE_SIG_KEY is PKCS#11 URI (pkcs11:*), signing of modules
fails:
scripts/sign-file sha256 /.../linux/pkcs11:token=foo;object=bar;pin-value=1111 certs/signing_key.x509 /.../kernel/crypto/tcrypt.ko
Usage: scripts/sign-file [-dp] <hash algo> <key> <x509> <module> [<dest>]
scripts/sign-file -s <raw sig> <hash algo> <x509> <module> [<dest>]
First, we need to avoid adding the $(srctree)/ prefix to the URL.
Second, since the kconfig string values no longer include quotes, we need to add
them again when passing a PKCS#11 URI to sign-file. This avoids
splitting by the shell if the URI contains semicolons.
Fixes: 4db9c2e3d0 ("kbuild: stop using config_filename in scripts/Makefile.modsign")
Fixes: 129ab0d2d9 ("kbuild: do not quote string values in include/config/auto.conf")
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When CONFIG_MODULE_SIG_KEY is PKCS#11 URI (pkcs11:*) and contains a
semicolon, signing_key.x509 fails to build:
certs/extract-cert pkcs11:token=foo;object=bar;pin-value=1111 certs/signing_key.x509
Usage: extract-cert <source> <dest>
Add quotes to the extract-cert argument to avoid splitting by the shell.
This approach was suggested by Masahiro Yamada <masahiroy@kernel.org>.
Fixes: 129ab0d2d9 ("kbuild: do not quote string values in include/config/auto.conf")
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Stefan Schmidt says:
====================
ieee802154 for net 2023-01-30
Only one fix this time around.
Miquel Raynal fixed a potential double free spotted by Dan Carpenter.
* tag 'ieee802154-for-net-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
mac802154: Fix possible double free upon parsing error
====================
Link: https://lore.kernel.org/r/20230130095646.301448-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-27 (ice)
This series contains updates to ice driver only.
Dave prevents modifying channels when RDMA is active as this will break
RDMA traffic.
Michal fixes a broken URL.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix broken link in ice NAPI doc
ice: Prevent set_channel from changing queues while RDMA active
====================
Link: https://lore.kernel.org/r/20230127225333.1534783-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The recent commit 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in
IRQs disabled section") fixed warnings (and possible deadlocks) in the
IMC PMU driver by converting the locking to use spinlocks.
It also converted the init-time nest_init_lock to a spinlock, even
though it's not used at runtime in IRQ disabled sections or while
holding other spinlocks.
This leads to warnings such as:
BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1
Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV
Call Trace:
dump_stack_lvl+0x74/0xa8 (unreliable)
__might_resched+0x178/0x1a0
__cpuhp_setup_state+0x64/0x1e0
init_imc_pmu+0xe48/0x1250
opal_imc_counters_probe+0x30c/0x6a0
platform_probe+0x78/0x110
really_probe+0x104/0x420
__driver_probe_device+0xb0/0x170
driver_probe_device+0x58/0x180
__driver_attach+0xd8/0x250
bus_for_each_dev+0xb4/0x140
driver_attach+0x34/0x50
bus_add_driver+0x1e8/0x2d0
driver_register+0xb4/0x1c0
__platform_driver_register+0x38/0x50
opal_imc_driver_init+0x2c/0x40
do_one_initcall+0x80/0x360
kernel_init_freeable+0x310/0x3b8
kernel_init+0x30/0x1a0
ret_from_kernel_thread+0x5c/0x64
Fix it by converting nest_init_lock back to a mutex, so that we can call
sleeping functions while holding it. There is no interaction between
nest_init_lock and the runtime spinlocks used by the actual PMU routines.
Fixes: 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
Tested-by: Kajol Jain<kjain@linux.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au
Starting from Turing, the driver is no longer responsible for initiating
DEVINIT when required as the GPU started loading a FW image from ROM and
executing DEVINIT itself after power-on.
However - we apparently still need to wait for it to complete.
This should correct some issues with runpm on some systems, where we get
control of the HW before it's been fully reinitialised after resume from
suspend.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230130223715.1831509-1-bskeggs@redhat.com
This reverts commit cf517fef60.
The commit cf517fef60 ("pinctrl: aspeed: Force to disable the
function's signal") exposed a problem with fetching the regmap for
reading the GFX register.
The Romulus machine the device tree contains a gpio hog for GPIO S7.
With the patch applied:
Muxing pin 151 for GPIO
Disabling signal VPOB9 for VPO
aspeed-g5-pinctrl 1e6e2080.pinctrl: Failed to acquire regmap for IP block 1
aspeed-g5-pinctrl 1e6e2080.pinctrl: request() failed for pin 151
The code path is aspeed-gpio -> pinmux-g5 -> regmap -> clk, and the
of_clock code returns an error as it doesn't have a valid struct clk_hw
pointer. The regmap call happens because pinmux wants to check the GFX
node (IP block 1) to query bits there.
For reference, before the offending patch:
Muxing pin 151 for GPIO
Disabling signal VPOB9 for VPO
Want SCU8C[0x00000080]=0x1, got 0x0 from 0x00000000
Disabling signal VPOB9 for VPOOFF1
Want SCU8C[0x00000080]=0x1, got 0x0 from 0x00000000
Disabling signal VPOB9 for VPOOFF2
Want SCU8C[0x00000080]=0x1, got 0x0 from 0x00000000
Enabling signal GPIOS7 for GPIOS7
Muxed pin 151 as GPIOS7
gpio-943 (seq_cont): hogged as output/low
We can't skip the clock check to allow pinmux to proceed, because the
write to disable VPOB9 will try to set a bit in the GFX register space
which will not stick when the IP is in reset. However, we do not want to
enable the IP just so pinmux can do a disable-enable dance for the pin.
For now, revert the offending patch while a correct solution is found.
Fixes: cf517fef60 ("pinctrl: aspeed: Force to disable the function's signal")
Link: https://github.com/openbmc/linux/issues/218
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20230130220845.917985-1-joel@jms.id.au
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
In KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ, add check if one of the
inputs is NULL and fail if this is the case.
Currently, the kernel crashes if one of the inputs is NULL. Instead,
fail the test and add an appropriate error message.
Fixes: b8a926bea8 ("kunit: Introduce KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ macros")
This was found by the kernel test robot:
https://lore.kernel.org/all/202212191448.D6EDPdOh-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When GuC support was added to error capture, the reference counting
around the request object was broken. Fix it up.
The context based search manages the spinlocking around the search
internally. So it needs to grab the reference count internally as
well. The execlist only request based search relies on external
locking, so it needs an external reference count but within the
spinlock not outside it.
The only other caller of the context based search is the code for
dumping engine state to debugfs. That code wasn't previously getting
an explicit reference at all as it does everything while holding the
execlist specific spinlock. So, that needs updaing as well as that
spinlock doesn't help when using GuC submission. Rather than trying to
conditionally get/put depending on submission model, just change it to
always do the get/put.
v2: Explicitly document adding an extra blank line in some dense code
(Andy Shevchenko). Fix multiple potential null pointer derefs in case
of no request found (some spotted by Tvrtko, but there was more!).
Also fix a leaked request in case of !started and another in
__guc_reset_context now that intel_context_find_active_request is
actually reference counting the returned request.
v3: Add a _get suffix to intel_context_find_active_request now that it
grabs a reference (Daniele).
v4: Split the intel_guc_find_hung_context change to a separate patch
and rename intel_context_find_active_request_get to
intel_context_get_active_request (Tvrtko).
v5: s/locking/reference counting/ in commit message (Tvrtko)
Fixes: dc0dad365c ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Fixes: 573ba126ae ("drm/i915/guc: Capture error state on context reset")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com
(cherry picked from commit 3700e35378)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Pull fscache fixes from David Howells:
"Fix two problems in fscache volume handling:
- wake_up_bit() is incorrectly paired with wait_var_event(). The
latter selects the waitqueue to use differently.
- Missing barriers ordering between state bit and task state"
* tag 'fscache-fixes-20230130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work()
fscache: Use wait_on_bit() to wait for the freeing of relinquished volume
i.MX fixes for 6.2, round 2:
- Update MAINTAINERS i.MX entry to match arm64 freescale DTS.
- Drop misused 'uart-has-rtscts' from imx8m-venice boards.
- Fix USB host over-current polarity for imx7d-smegw01 board.
- Fix a typo in i.MX8DXL sc_pwrkey property name.
- Fix GPIO watchdog property for i.MX8MM eDM SBC board.
- Keep Ethernet PHY powered on imx8mm-verdin to avoid kernel crash.
- Fix configuration of i.MX8MM pad UART1_DTE_RX.
* tag 'imx-fixes-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx7d-smegw01: Fix USB host over-current polarity
arm64: dts: imx8mm-verdin: Do not power down eth-phy
MAINTAINERS: match freescale ARM64 DT directory in i.MX entry
arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode
arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts'
arm64: dts: imx8mm: Reinstate GPIO watchdog always-running property on eDM SBC
Link: https://lore.kernel.org/r/20230130003614.GP20713@T480
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The DIAG 288 statement consumes an EBCDIC string the address of which is
passed in a register. Use a "memory" clobber to tell the compiler that
memory is accessed within the inline assembly.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space.
Data passed to a hardware or a hypervisor interface that
requires V=R can no longer be allocated on the stack.
Use kmalloc() to get memory for a diag288 command.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Using the serio subsystem now requires the code to be reachable:
x86_64-linux-ld: drivers/platform/x86/amd/pmc.o: in function `amd_pmc_suspend_handler':
pmc.c:(.text+0x86c): undefined reference to `serio_bus'
Add the usual dependency: as other users of serio use 'select'
rather than 'depends on', use the same here.
Fixes: 8e60615e89 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230127093950.2368575-1-arnd@kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
As soon as the first handler or sysfs file is registered
the mutex may get used.
Move the initialization to before any handler registration /
sysfs file creation.
Likewise move the destruction of the mutex to after all
the de-initialization is done.
Fixes: da5ce22df5 ("platform/x86/amd/pmf: Add support for PMF core layer")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230130132554.696025-1-hdegoede@redhat.com
Every power mode of static power slider has its own AC and DC power
settings.
When the power source changes from AC to DC, corresponding DC thermals
were not updated from PMF config store and this leads the system to always
run on AC power settings.
Fix it by registering with power_supply notifier and apply DC settings
upon getting notified by the power_supply handler.
Fixes: da5ce22df5 ("platform/x86/amd/pmf: Add support for PMF core layer")
Suggested-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230125095936.3292883-6-Shyam-sundar.S-k@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
fscache_create_volume_work() uses wake_up_bit() to wake up the processes
which are waiting for the completion of volume creation. According to
comments in wake_up_bit() and waitqueue_active(), an extra smp_mb() is
needed to guarantee the memory order between FSCACHE_VOLUME_CREATING
flag and waitqueue_active() before invoking wake_up_bit().
Fixing it by using clear_and_wake_up_bit() to add the missing memory
barrier.
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20230113115211.2895845-3-houtao@huaweicloud.com/ # v3
The freeing of relinquished volume will wake up the pending volume
acquisition by using wake_up_bit(), however it is mismatched with
wait_var_event() used in fscache_wait_on_volume_collision() and it will
never wake up the waiter in the wait-queue because these two functions
operate on different wait-queues.
According to the implementation in fscache_wait_on_volume_collision(),
if the wake-up of pending acquisition is delayed longer than 20 seconds
(e.g., due to the delay of on-demand fd closing), the first
wait_var_event_timeout() will timeout and the following wait_var_event()
will hang forever as shown below:
FS-Cache: Potential volume collision new=00000024 old=00000022
......
INFO: task mount:1148 blocked for more than 122 seconds.
Not tainted 6.1.0-rc6+ #1
task:mount state:D stack:0 pid:1148 ppid:1
Call Trace:
<TASK>
__schedule+0x2f6/0xb80
schedule+0x67/0xe0
fscache_wait_on_volume_collision.cold+0x80/0x82
__fscache_acquire_volume+0x40d/0x4e0
erofs_fscache_register_volume+0x51/0xe0 [erofs]
erofs_fscache_register_fs+0x19c/0x240 [erofs]
erofs_fc_fill_super+0x746/0xaf0 [erofs]
vfs_get_super+0x7d/0x100
get_tree_nodev+0x16/0x20
erofs_fc_get_tree+0x20/0x30 [erofs]
vfs_get_tree+0x24/0xb0
path_mount+0x2fa/0xa90
do_mount+0x7c/0xa0
__x64_sys_mount+0x8b/0xe0
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Considering that wake_up_bit() is more selective, so fix it by using
wait_on_bit() instead of wait_var_event() to wait for the freeing of
relinquished volume. In addition because waitqueue_active() is used in
wake_up_bit() and clear_bit() doesn't imply any memory barrier, use
clear_and_wake_up_bit() to add the missing memory barrier between
cursor->flags and waitqueue_active().
Fixes: 62ab633523 ("fscache: Implement volume registration")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20230113115211.2895845-2-houtao@huaweicloud.com/ # v3
When copying the DSCP bits for decap-dscp into IPv6 don't assume the
outer encap is always IPv6. Instead, as with the inner IPv4 case, copy
the DSCP bits from the correctly saved "tos" value in the control block.
Fixes: 227620e295 ("[IPSEC]: Separate inner/outer mode processing on input")
Signed-off-by: Christian Hopps <chopps@chopps.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit bc66fa87d4 ("net: phy: Add link between phy dev and mac dev")
introduced a link between net devices and phy devices. It fails to check
whether dev is NULL, leading to a NULL dereference error.
Fixes: bc66fa87d4 ("net: phy: Add link between phy dev and mac dev")
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() returns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].
This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5ea ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().
Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into
the mask.
This was found with the stress_hpt option with a kbuild workload running
together with `perf record -g`.
Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Fixes: 8b91cee5ea ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Just take the fix without the new warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095352.2823517-1-npiggin@gmail.com
Currently in phy_init_eee() the driver unconditionally configures the PHY
to stop RX_CLK after entering Rx LPI state. This causes an LPI interrupt
storm on my qcs404-base board.
Change the PHY initialization so that for "qcom,qcs404-ethqos" compatible
device RX_CLK continues to run even in Rx LPI state.
Signed-off-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
objtool throws the following warning:
arch/powerpc/kvm/booke.o: warning: objtool: kvmppc_fill_pt_regs+0x30:
unannotated intra-function call
Fix the warning by setting the value of 'nip' using the _THIS_IP_ macro,
without using an assembly bl/mflr sequence to save the instruction
pointer.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230128124158.1066251-1-sv@linux.ibm.com
objtool throws the following warning:
arch/powerpc/kernel/head_85xx.o: warning: objtool: .head.text+0x1a6c:
unannotated intra-function call
Fix the warning by annotating KernelSPE symbol with SYM_FUNC_START_LOCAL
and SYM_FUNC_END macros.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230128124138.1066176-1-sv@linux.ibm.com
Pull irq fix from Borislav Petkov:
- Cleanup the firmware node for the new IRQ MSI domain properly, to
avoid leaking memory
* tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Free the fwnode created by msi_create_device_irq_domain()
Pull x86 fixes from Borislav Petkov:
- Start checking for -mindirect-branch-cs-prefix clang support too now
that LLVM 16 will support it
- Fix a NULL ptr deref when suspending with Xen PV
- Have a SEV-SNP guest check explicitly for features enabled by the
hypervisor and fail gracefully if some are unsupported by the guest
instead of failing in a non-obvious and hard-to-debug way
- Fix a MSI descriptor leakage under Xen
- Mark Xen's MSI domain as supporting MSI-X
- Prevent legacy PIC interrupts from being resent in software by
marking them level triggered, as they should be, which lead to a NULL
ptr deref
* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
acpi: Fix suspend with Xen PV
x86/sev: Add SEV-SNP guest feature negotiation support
x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
Pull input fixes from Dmitry Torokhov:
- touchpads on HP 15-* laptops switched back to PS/2 emulation mode
- a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is
responding after resume
* tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Clevo PCX0DX to i8042 quirk table
Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"
The dirty log checks are mistakenly testing the first page in the page
table (PT) memory region instead of the page holding the test data
page PTE. This wasn't an issue before commit 406504c7b0 ("KVM:
arm64: Fix S1PTW handling on RO memslots") as all PT pages (including
the first page) were treated as writes.
Fix the page_fault_test dirty logging tests by checking for the right
page: the one for the PTE of the data test page.
Fixes: a4edf25b3e ("KVM: selftests: aarch64: Add dirty logging tests into page_fault_test")
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230127214353.245671-4-ricarkol@google.com
Only Stage1 Page table walks (S1PTW) trying to write into a PTE should
result in the PTE page being dirty in the log. However, the dirty log
tests in page_fault_test default to treat all S1PTW accesses as writes.
Fix the relevant tests by asserting dirty pages only for S1PTW writes,
which in these tests only applies to when Hardware management of the Access
Flag is enabled.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230127214353.245671-3-ricarkol@google.com
Only Stage1 Page table walks (S1PTW) writing a PTE on an unmapped page
should result in a userfaultfd write. However, the userfaultfd tests in
page_fault_test wrongly assert that any S1PTW is a PTE write.
Fix this by relaxing the read vs. write checks in all userfaultfd
handlers. Note that this is also an attempt to focus less on KVM (and
userfaultfd) behavior, and more on architectural behavior. Also note
that after commit 406504c7b0 ("KVM: arm64: Fix S1PTW handling on RO
memslots"), the userfaultfd fault (S1PTW with AF on an unmaped PTE
page) is actually a read: the translation fault that comes before the
permission fault.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230127214353.245671-2-ricarkol@google.com
Pull cxl fixes from Dan Williams:
"A couple of fixes for bugs introduced during the merge window. One is
a regression, the other was a bug in the CXL AER handler:
- Fix a crash regression due to module load order of cxl_pmem.ko
- Fix wrong register offset read in CXL AER handling path"
* tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/pmem: Fix nvdimm unregistration when cxl_pmem driver is absent
cxl: fix cxl_report_and_clear() RAS UE addr mis-assignment
We don't have a running VCPU context to save vgic3 pending table due
to KVM_DEV_ARM_VGIC_{GRP_CTRL, SAVE_PENDING_TABLES} command on KVM
device "kvm-arm-vgic-v3". The unknown case is caught by kvm-unit-tests.
# ./kvm-unit-tests/tests/its-pending-migration
WARNING: CPU: 120 PID: 7973 at arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3325 \
mark_page_dirty_in_slot+0x60/0xe0
:
mark_page_dirty_in_slot+0x60/0xe0
__kvm_write_guest_page+0xcc/0x100
kvm_write_guest+0x7c/0xb0
vgic_v3_save_pending_tables+0x148/0x2a0
vgic_set_common_attr+0x158/0x240
vgic_v3_set_attr+0x4c/0x5c
kvm_device_ioctl+0x100/0x160
__arm64_sys_ioctl+0xa8/0xf0
invoke_syscall.constprop.0+0x7c/0xd0
el0_svc_common.constprop.0+0x144/0x160
do_el0_svc+0x34/0x60
el0_svc+0x3c/0x1a0
el0t_64_sync_handler+0xb4/0x130
el0t_64_sync+0x178/0x17c
Use vgic_write_guest_lock() to save vgic3 pending table.
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230126235451.469087-5-gshan@redhat.com
Currently, the unknown no-running-vcpu sites are reported when a
dirty page is tracked by mark_page_dirty_in_slot(). Until now, the
only known no-running-vcpu site is saving vgic/its tables through
KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on KVM device
"kvm-arm-vgic-its". Unfortunately, there are more unknown sites to
be handled and no-running-vcpu context will be allowed in these
sites: (1) KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} command
on KVM device "kvm-arm-vgic-its" to restore vgic/its tables. The
vgic3 LPI pending status could be restored. (2) Save vgic3 pending
table through KVM_DEV_ARM_{VGIC_GRP_CTRL, VGIC_SAVE_PENDING_TABLES}
command on KVM device "kvm-arm-vgic-v3".
In order to handle those unknown cases, we need a unified helper
vgic_write_guest_lock(). struct vgic_dist::save_its_tables_in_progress
is also renamed to struct vgic_dist::save_tables_in_progress.
No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230126235451.469087-3-gshan@redhat.com
This reverts commit 7efc3b7261.
We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
stalling CPU for long periods of time. Investigation of tracepoint data
shows that compaction is stuck in repeating fast_find_migrateblock()
based migrate page isolation, and then fails to migrate all isolated
pages.
Commit 7efc3b7261 ("mm/compaction: fix set skip in fast_find_migrateblock")
was suspected as it was merged in 6.1 and in theory can indeed remove a
termination condition for fast_find_migrateblock() under certain
conditions, as it removes a place that always marks a scanned pageblock
from being re-scanned. There are other such places, but those can be
skipped under certain conditions, which seems to match the tracepoint
data.
Testing of revert also appears to have resolved the issue, thus revert
the commit until a more robust solution for the original problem is
developed.
It's also likely this will fix qemu stalls with 6.1 kernel reported in
Link 2, but that is not yet confirmed.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
Link: https://lore.kernel.org/lkml/20230125134434.18017-1-mgorman@techsingularity.net/
Fixes: 7efc3b7261 ("mm/compaction: fix set skip in fast_find_migrateblock")
Cc: <stable@vger.kernel.org>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6e9f05dc66 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
doubles the amount of capacity stolen from user addressable capacity for
everyone, regardless of whether they are using the debug option. Revert
that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
allow for debug scenarios to proceed with creating debug sized page maps
with a compile option to support debug scenarios.
Note that this only applies to cases where the page map is permanent,
i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
create-namespace" terms). For the "--map=mem" case, since the allocation
is ephemeral for the lifespan of the namespace, there are no explicit
restriction. However, the implicit restriction, of having enough
available "System RAM" to store the page map for the typically large
pmem, still applies.
Fixes: 6e9f05dc66 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Link: https://lore.kernel.org/r/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Joe found another DT file that shouldn't be executable, and that
frustrated me enough that I went hunting with this script:
git ls-files -s |
grep '^100755' |
cut -f2 |
xargs grep -L '^#!'
and that found another file that shouldn't have been marked executable
either, despite being in the scripts directory.
Maybe these two are the last ones at least for now. But I'm sure we'll
be back in a few years, fixing things up again.
Fixes: 8c6789f4e2 ("ASoC: dt-bindings: Add Everest ES8326 audio CODEC")
Fixes: 4d8e5cd233 ("locking/atomics: Fix scripts/atomic/ script permissions")
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ksmbd server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix for signing bug
- fix to more strictly check packet length
- add a max connections parm to limit simultaneous connections
- fix error message flood that can occur with newer Samba xattr
format"
* tag '6.2-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: downgrade ndr version error message to debug
ksmbd: limit pdu length size according to connection status
ksmbd: do not sign response to session request for guest login
ksmbd: add max connections parameter
Xy writes:
FPGA Manager changes for 6.2-final
stratix10-soc:
- Zheng's change fixes return value check
Intel m10 bmc secure update:
- Ilpo's change fixes probe rollback
All patches have been reviewed on the mailing list, and have been in
the last linux-next releases (as part of our for-6.2 branch)
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
* tag 'fpga-for-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga:
fpga: m10bmc-sec: Fix probe rollback
fpga: stratix10-soc: Fix return value check in s10_ops_write_init()
"tcpdump" is used to capture traffic in these tests while using a random,
temporary and not suffixed file for it. This can interfere with apparmor
configuration where the tool is only allowed to read from files with
'known' extensions.
The MINE type application/vnd.tcpdump.pcap was registered with IANA for
pcap files and .pcap is the extension that is both most common but also
aligned with standard apparmor configurations. See TCPDUMP(8) for more
details.
This improves compatibility with standard apparmor configurations by
using ".pcap" as the file extension for the tests' temporary files.
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i.MX6 CPU frequency driver sometimes fails to register at boot time
due to nvmem_cell_read_u32() sporadically returning -ENOENT.
This happens because there is a window where __nvmem_device_get() in
of_nvmem_cell_get() is able to return the nvmem device, but as cells
have been setup, nvmem_find_cell_entry_by_node() returns NULL.
The occurs because the nvmem core registration code violates one of the
fundamental principles of kernel programming: do not publish data
structures before their setup is complete.
Fix this by making nvmem core code conform with this principle.
Fixes: eace75cfdc ("nvmem: Add a simple NVMEM framework for nvmem providers")
Cc: stable@vger.kernel.org
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If dev_set_name() fails, we leak nvmem->wp_gpio as the cleanup does not
put this. While a minimal fix for this would be to add the gpiod_put()
call, we can do better if we split device_register(), and use the
tested nvmem_release() cleanup code by initialising the device early,
and putting the device.
This results in a slightly larger fix, but results in clear code.
Note: this patch depends on "nvmem: core: initialise nvmem->id early"
and "nvmem: core: remove nvmem_config wp_gpio".
Fixes: 5544e90c81 ("nvmem: core: add error handling for dev_set_name")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[Srini: Fixed subject line and error code handing with wp_gpio while applying.]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SID SRAM on at least some SoCs (A64 and D1) returns different values
when read with bus cycles narrower than 32 bits. This is not immediately
obvious, because memcpy_fromio() uses word-size accesses as long as
enough data is being copied.
The vendor driver always uses 32-bit MMIO reads, so do the same here.
This is faster than the register-based method, which is currently used
as a workaround on A64. And it fixes the values returned on D1, where
the SRAM method was being used.
The special case for the last word is needed to maintain .word_size == 1
for sysfs ABI compatibility, as noted previously in commit de2a3eaea5
("nvmem: sunxi_sid: Optimize register read-out method").
Fixes: 07ae4fde9e ("nvmem: sunxi_sid: Add support for D1 variant")
Cc: stable@vger.kernel.org
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kornel Dulęba says:
====================
net: wwan: t7xx: Fix Runtime PM implementation
d10b3a695b ("net: wwan: t7xx: Runtime PM") introduced support for
Runtime PM for this driver, but due to a bug in the initialization logic
the usage refcount would never reach 0, leaving the feature unused.
This patchset addresses that, together with a bug found after runtime
suspend was enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For PCI devices the Runtime PM refcount is incremented twice:
1. During device enumeration with a call to pm_runtime_forbid.
2. Just before a driver probe logic is called.
Because of that in order to enable Runtime PM on a given device
we have to call both pm_runtime_allow and pm_runtime_put_noidle,
once it's ready to be runtime suspended.
The former was missing causing the pm refcount to never reach 0.
Fixes: d10b3a695b ("net: wwan: t7xx: Runtime PM")
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resume device before calling napi_schedule, instead of doing in the napi
poll routine. Polling is done in softrq context. We can't call the PM
resume logic from there as it's blocking and not irq safe.
In order to make it work modify the interrupt handler to be run from irq
handler thread.
Fixes: 5545b7b9f2 ("net: wwan: t7xx: Add NAPI support")
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The probe() function is only used for the DP83822 PHY, leaving the
private data pointer uninitialized for the smaller DP83825/26 models.
While all uses of the private data structure are hidden in 82822 specific
callbacks, configuring the interrupt is shared across all models.
This causes a NULL pointer dereference on the smaller PHYs as it accesses
the private data unchecked. Verifying the pointer avoids that.
Fixes: 5dc39fd5ef ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Andre Kalb <andre.kalb@sma.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/Y9FzniUhUtbaGKU7@pc6682
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASoC: Fixes for v6.2
An unfortunately large batch of fixes here, the numbers amplified
by several repeated fixes for patterns of bugs in multiple
drivers. Most of this is in the x86 drivers which are very
actively developed, the implementation of PCI shutdown is a fix
for issues with spamming warnings into the logs with a leaked
reference to the i915 driver.
If you call listen() and accept() on an already connect()ed
rose socket, accept() can successfully connect.
This is because when the peer socket sends data to sendmsg,
the skb with its own sk stored in the connected socket's
sk->sk_receive_queue is connected, and rose_accept() dequeues
the skb waiting in the sk->sk_receive_queue.
This creates a child socket with the sk of the parent
rose socket, which can cause confusion.
Fix rose_listen() to return -EINVAL if the socket has
already been successfully connected, and add lock_sock
to prevent this issue.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230125105944.GA133314@ubuntu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent sfc NICs are TSO capable for some tunnel protocols. However, it
was not working properly because the feature was not advertised in
hw_enc_features, but in hw_features only.
Setting up a GENEVE tunnel and using iperf3 to send IPv4 and IPv6 traffic
to the tunnel show, with tcpdump, that the IPv4 packets still had ~64k
size but the IPv6 ones had only ~1500 bytes (they had been segmented by
software, not offloaded). With this patch segmentation is offloaded as
expected and the traffic is correctly received at the other end.
Fixes: 24b2c3751a ("sfc: advertise encapsulated offloads on EF10")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230125143513.25841-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
bpf 2023-01-27
We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 10 files changed, 170 insertions(+), 59 deletions(-).
The main changes are:
1) Fix preservation of register's parent/live fields when copying
range-info, from Eduard Zingerman.
2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right
cache, from Hou Tao.
3) Fix stack overflow from infinite recursion in sock_map_close(),
from Jakub Sitnicki.
4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path,
from Jiri Olsa.
5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb,
from Kui-Feng Lee.
6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix the kernel crash caused by bpf_setsockopt().
selftests/bpf: Cover listener cloning with progs attached to sockmap
selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests
bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
selftests/bpf: Verify copy_register_state() preserves parent/live fields
bpf: Fix to preserve reg parent/live fields when copying range info
bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
bpf: Fix off-by-one error in bpf_mem_cache_idx()
====================
Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GSO should not merge page pool recycled frames with standard reference
counted frames. Traditionally this didn't occur, at least not often.
However as we start looking at adding support for wireless adapters there
becomes the potential to mix the two due to A-MSDU repartitioning frames in
the receive path. There are possibly other places where this may have
occurred however I suspect they must be few and far between as we have not
seen this issue until now.
Fixes: 53e0961da1 ("page_pool: add frag page recycling support in page pool")
Reported-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Magnus Karlsson says:
====================
net: xdp: execute xdp_do_flush() before napi_complete_done()
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found in [1].
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in [2].
The drivers have only been compile-tested since I do not own any of
the HW below. So if you are a maintainer, it would be great if you
could take a quick look to make sure I did not mess something up.
Note that these were the drivers I found that violated the ordering by
running a simple script and manually checking the ones that came up as
potential offenders. But the script was not perfect in any way. There
might still be offenders out there, since the script can generate
false negatives.
[1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
[2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
====================
Link: https://lore.kernel.org/r/20230125074901.2737-1-magnus.karlsson@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: d678be1dc1 ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: a1e031ffb4 ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: 186b3c998c ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: a825b611c7 ("net: lan966x: Add support for XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: d1b25b79e1 ("qede: add .ndo_xdp_xmit() and XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull cifs fix from Steve French:
"Fix for reconnect oops in smbdirect (RDMA), also is marked for stable"
* tag '6.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops due to uncleared server->smbd_conn in reconnect
Pull io_uring fixes from Jens Axboe:
"Two small fixes for this release:
- Sanitize how async prep is done for drain requests, so we ensure
that it always gets done (Dylan)
- A ring provided buffer recycling fix for multishot receive (me)"
* tag 'io_uring-6.2-2023-01-27' of git://git.kernel.dk/linux:
io_uring: always prep_async for drain requests
io_uring/net: cache provided buffer group value for multishot receives
Pull tracing fixes from Steven Rostedt:
- Fix filter memory leak by calling ftrace_free_filter()
- Initialize trace_printk() earlier so that ftrace_dump_on_oops shows
data on early crashes.
- Update the outdated instructions in scripts/tracing/ftrace-bisect.sh
- Add lockdep_is_held() to fix lockdep warning
- Add allocation failure check in create_hist_field()
- Don't initialize pointer that gets set right away in enabled_monitors_write()
- Update MAINTAINER entries
- Fix help messages in Kconfigs
- Fix kernel-doc header for update_preds()
* tag 'trace-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Update MAINTAINERS file to add tree and mailing list
rv: remove redundant initialization of pointer ptr
ftrace: Maintain samples/ftrace
tracing/filter: fix kernel-doc warnings
lib: Kconfig: fix spellos
trace_events_hist: add check for return value of 'create_hist_field'
tracing/osnoise: Use built-in RCU list checking
tracing: Kconfig: Fix spelling/grammar/punctuation
ftrace/scripts: Update the instructions for ftrace-bisect.sh
tracing: Make sure trace_printk() can output as soon as it can be used
ftrace: Export ftrace_free_filter() to modules
Pull i2c fixes from Wolfram Sang:
"A bunch of driver fixes with a tiny bit of new IDs"
* tag 'i2c-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rk3x: fix a bunch of kernel-doc warnings
i2c: axxia: use 'struct' for kernel-doc notation
dt-bindings: i2c: renesas,rzv2m: Fix SoC specific string
i2c: mxs: suppress probe-deferral error message
i2c: designware-pci: Add new PCI IDs for AMD NAVI GPU
i2c: designware: Fix unbalanced suspended flag
i2c: designware: use casting of u64 in clock multiplication to avoid overflow
Pull gpio fixes from Bartosz Golaszewski:
- fix the -c option in the gpio-event-mode user-space example program
- fix the irq number translation in gpio-ep93xx and make its irqchip
immutable
- add a missing spin_unlock in error path in gpio-mxc
- fix a suspend breakage on System76 and Lenovo Gen2a introduced in
GPIO ACPI
* tag 'gpio-fixes-for-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
tools: gpio: fix -c option of gpio-event-mon
gpio: ep93xx: remove unused variable
gpio: ep93xx: Make irqchip immutable
gpio: ep93xx: Fix port F hwirq numbers in handler
gpio: mxc: Unlock on error path in mxc_flip_edge()
gpiolib-acpi: Don't set GPIOs for wakeup in S3 mode
Pull regulator fix from Mark Brown:
"A fix for the DT binding documentation which dropped a property when
being converted to YAML format causing spurious errors validating
device trees for platforms using the device"
* tag 'regulator-fix-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: samsung,s2mps14: add lost samsung,ext-control-gpios
Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, a recent one introduced in the last cycle, and an older
one from v5.11"
* tag 'ovl-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fail on invalid uid/gid mapping at copy up
ovl: fix tmpfile leak
Pull drm fixes from Dave Airlie:
"Fairly small this week as well, i915 has a memory leak fix and some
minor changes, and amdgpu has some MST fixes, and some other minor
ones:
drm:
- DP MST kref fix
- fb_helper: check return value
i915:
- Fix BSC default context for Meteor Lake
- Fix selftest-scheduler's modify_type
- memory leak fix
amdgpu:
- GC11.x fixes
- SMU13.0.0 fix
- Freesync video fix
- DP MST fixes
- build fix"
* tag 'drm-fixes-2023-01-27' of git://anongit.freedesktop.org/drm/drm:
amdgpu: fix build on non-DCN platforms.
drm/amd/display: Fix timing not changning when freesync video is enabled
drm/display/dp_mst: Correct the kref of port.
drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
drm/amdgpu/display/mst: limit payload to be updated one by one
drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignments
drm/amdgpu: declare firmware for new MES 11.0.4
drm/amdgpu: enable imu firmware for GC 11.0.4
drm/amd/pm: add missing AllowIHInterrupt message mapping for SMU13.0.0
drm/amdgpu: remove unconditional trap enable on add gfx11 queues
drm/fb-helper: Use a per-driver FB deferred I/O handler
drm/fb-helper: Check fb_deferred_io_init() return value
drm/i915/selftest: fix intel_selftest_modify_policy argument types
drm/i915/mtl: Fix bcs default context
drm/i915: Fix a memory leak with reused mmap_offset
drm/drm_vma_manager: Add drm_vma_node_allow_once()
Pull ACPI fixes from Rafael Wysocki:
"Add ACPI backlight handling quirks for 3 machines (Hans de Goede)"
* tag 'acpi-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add backlight=native DMI quirk for Asus U46E
ACPI: video: Add backlight=native DMI quirk for HP EliteBook 8460p
ACPI: video: Add backlight=native DMI quirk for HP Pavilion g6-1d80nr
Pull thermal control fixes from Rafael Wysocki:
"Add locking to the Intel int340x thermal control driver to prevent its
thermal zone callbacks from racing with firmware-induced thermal trip
point updates (Srinivas Pandruvada, Rafael Wysocki)"
* tag 'thermal-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()
thermal: intel: int340x: Protect trip temperature from concurrent updates
Pull arm64 fix from Will Deacon:
- Fix event counting regression in Arm CMN PMU driver due to broken
optimisation
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
Pull RISC-V fixes from Palmer Dabbelt:
- A few DT bindings fixes to more closely align the ISA string
requirements between the bindings and the ISA manual.
- A handful of build error/warning fixes.
- A fix to move init_cpu_topology() later in the boot flow, so it can
allocate memory.
- The IRC channel is now in the MAINTAINERS file, so it's easier to
find.
* tag 'riscv-for-linus-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Move call to init_cpu_topology() to later initialization stage
riscv/kprobe: Fix instruction simulation of JALR
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
MAINTAINERS: add an IRC entry for RISC-V
RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
Pull ARM fixes from Russell King:
- fix nommu assignment build warning
- fix -Wundef preprocessor warning
- reduce __thumb2__ definitions for crypto files that require it
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9287/1: Reduce __thumb2__ definition to crypto files that require it
ARM: 9284/1: include <asm/pgtable.h> from proc-macros.S to fix -Wundef warnings
ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment
Pull Kselftest fixes from Shuah Khan:
"A single fix to a amd-pstate test Makefile bug that deletes source
files during make clean run"
* tag 'linux-kselftest-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: amd-pstate: Don't delete source files via Makefile
The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.
Prevent set_channel from executing when RDMA driver bound to auxiliary
device.
Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When calling spidev_message() from the one of the ioctl() callbacks, the
spi_lock is already taken. When we then end up calling spidev_sync(), we
get the following splat:
[ 214.047619]
[ 214.049198] ============================================
[ 214.054533] WARNING: possible recursive locking detected
[ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
[ 214.065969] --------------------------------------------
[ 214.071290] spidev_test/1454 is trying to acquire lock:
[ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
[ 214.084164]
[ 214.084164] but task is already holding lock:
[ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[ 214.097537]
[ 214.097537] other info that might help us debug this:
[ 214.104075] Possible unsafe locking scenario:
[ 214.104075]
[ 214.110004] CPU0
[ 214.112461] ----
[ 214.114916] lock(&spidev->spi_lock);
[ 214.118687] lock(&spidev->spi_lock);
[ 214.122457]
[ 214.122457] *** DEADLOCK ***
[ 214.122457]
[ 214.128386] May be due to missing lock nesting notation
[ 214.128386]
[ 214.135183] 2 locks held by spidev_test/1454:
[ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
[ 214.155493]
[ 214.155493] stack backtrace:
[ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
[ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 214.175555] unwind_backtrace from show_stack+0x10/0x14
[ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
[ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
[ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
[ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
[ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
[ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
[ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
[ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
[ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
[ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
[ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
[ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
Fix it by introducing an unlocked variant of spidev_sync() and calling it
from spidev_message() while other users who don't check the spidev->spi's
existence keep on using the locking flavor.
Reported-by: Francesco Dolcini <francesco@dolcini.it>
Fixes: 1f4d2dd45b ("spi: spidev: fix a race condition when accessing spidev->spi")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Max Krummenacher <max.krummenacher@toradex.com>
Link: https://lore.kernel.org/r/20230116144149.305560-1-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>
Due to using the u16 type in the min_t() macros the SPI transfer length
will be cast to word before participating in the conditional statement
implied by the macro. Thus if the transfer length is greater than 64KB the
Tx/Rx FIFO threshold level value will be determined by the leftover of the
truncated after the type-case length. In the worst case it will cause the
dramatical performance drop due to the "Tx FIFO Empty" or "Rx FIFO Full"
interrupts triggered on each xfer word sent/received to/from the bus.
The problem can be easily fixed by specifying the unsigned int type in the
min_t() macros thus preventing the possible data loss.
Fixes: ea11370fff ("spi: dw: get TX level without an additional variable")
Reported-by: Sergey Nazarov <Sergey.Nazarov@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230113185942.2516-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then
copy-up should fail, just like it would fail if the mounter task was doing
the copy using "cp -a".
There's a corner case where the "cp -a" would succeed but copy up fail: if
there's a mapping of the invalid uid/gid (65534 by default) in the user
namespace. This is because stat(2) will return this value if the mapping
doesn't exist in the current user_ns and "cp -a" will in turn be able to
create a file with this uid/gid.
This behavior would be inconsistent with POSIX ACL's, which return -1 for
invalid uid/gid which result in a failed copy.
For consistency and simplicity fail the copy of the st_uid/st_gid are
invalid.
Fixes: 459c7c565a ("ovl: unprivieged mounts")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Seth Forshee <sforshee@kernel.org>
In the rework of raid56 code, there is very limited concurrency in the
endio context.
Most of the work is done inside the sectors arrays, which different bios
will never touch the same sector.
But there is a concurrency here for error_bitmap. Both read and write
endio functions need to touch them, and we can have multiple write bios
touching the same error bitmap if they all hit some errors.
Here we fix the unprotected bitmap operation by going set_bit() in a
loop.
Since we have a very small ceiling of the sectors (at most 16 sectors),
such set_bit() in a loop should be very acceptable.
Fixes: 2942a50dea ("btrfs: raid56: introduce btrfs_raid_bio::error_bitmap")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The arg->clone_sources_count is u64 and can trigger a warning when a
huge value is passed from user space and a huge array is allocated.
Limit the allocated memory to 8MiB (can be increased if needed), which
in turn limits the number of clone sources to 8M / sizeof(struct
clone_root) = 8M / 40 = 209715. Real world number of clones is from
tens to hundreds, so this is future proof.
Reported-by: syzbot+4376a9a073770c173269@syzkaller.appspotmail.com
Signed-off-by: David Sterba <dsterba@suse.com>
Drain requests all go through io_drain_req, which has a quick exit in case
there is nothing pending (ie the drain is not useful). In that case it can
run the issue the request immediately.
However for safety it queues it through task work.
The problem is that in this case the request is run asynchronously, but
the async work has not been prepared through io_req_prep_async.
This has not been a problem up to now, as the task work always would run
before returning to userspace, and so the user would not have a chance to
race with it.
However - with IORING_SETUP_DEFER_TASKRUN - this is no longer the case and
the work might be defered, giving userspace a chance to change data being
referred to in the request.
Instead _always_ prep_async for drain requests, which is simpler anyway
and removes this issue.
Cc: stable@vger.kernel.org
Fixes: c0e0d6ba25 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127105911.2420061-1-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Following line should listen for a rising edge and exit after the first
one since '-c 1' is provided.
# gpio-event-mon -n gpiochip1 -o 0 -r -c 1
It works with kernel 4.19 but it doesn't work with 5.10. In 5.10 the
above command doesn't exit after the first rising edge it keep listening
for an event forever. The '-c 1' is not taken into an account.
The problem is in commit 62757c32d5 ("tools: gpio: add multi-line
monitoring to gpio-event-mon").
Before this commit the iterator 'i' in monitor_device() is used for
counting of the events (loops). In the case of the above command (-c 1)
we should start from 0 and increment 'i' only ones and hit the 'break'
statement and exit the process. But after the above commit counting
doesn't start from 0, it start from 1 when we listen on one line.
It is because 'i' is used from one more purpose, counting of lines
(num_lines) and it isn't restore to 0 after following code
for (i = 0; i < num_lines; i++)
gpiotools_set_bit(&values.mask, i);
Restore the initial value of the iterator to 0 in order to allow counting
of loops to work for any cases.
Fixes: 62757c32d5 ("tools: gpio: add multi-line monitoring to gpio-event-mon")
Signed-off-by: Ivo Borisov Shopov <ivoshopov@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
[Bartosz: tweak the commit message]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
This one was left behind by a previous cleanup patch:
drivers/gpio/gpio-ep93xx.c: In function 'ep93xx_gpio_add_bank':
drivers/gpio/gpio-ep93xx.c:366:34: error: unused variable 'ic' [-Werror=unused-variable]
Fixes: 216f37366e ("gpio: ep93xx: Make irqchip immutable")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Al Viro said:
"""
Since "vhost/scsi: fix reuse of &vq->iov[out] in response"
we have this:
cmd->tvc_resp_iov = vq->iov[vc.out];
cmd->tvc_in_iovs = vc.in;
combined with
iov_iter_init(&iov_iter, ITER_DEST, &cmd->tvc_resp_iov,
cmd->tvc_in_iovs, sizeof(v_rsp));
in vhost_scsi_complete_cmd_work(). We used to have ->tvc_resp_iov
_pointing_ to vq->iov[vc.out]; back then iov_iter_init() asked to
set an iovec-backed iov_iter over the tail of vq->iov[], with
length being the amount of iovecs in the tail.
Now we have a copy of one element of that array. Fortunately, the members
following it in the containing structure are two non-NULL kernel pointers,
so copy_to_iter() will not copy anything beyond the first iovec - kernel
pointer is not (on the majority of architectures) going to be accepted by
access_ok() in copyout() and it won't be skipped since the "length" (in
reality - another non-NULL kernel pointer) won't be zero.
So it's not going to give a guest-to-qemu escalation, but it's definitely
a bug. Frankly, my preference would be to verify that the very first iovec
is long enough to hold rsp_size. Due to the above, any users that try to
give us vq->iov[vc.out].iov_len < sizeof(struct virtio_scsi_cmd_resp)
would currently get a failure in vhost_scsi_complete_cmd_work()
anyway.
"""
However, the spec doesn't say anything about the legacy descriptor
layout for the respone. So this patch tries to not assume the response
to reside in a single separate descriptor which is what commit
79c14141a4 ("vhost/scsi: Convert completion path to use") tries to
achieve towards to ANY_LAYOUT.
This is done by allocating and using dedicate resp iov in the
command. To be safety, start with UIO_MAXIOV to be consistent with the
limitation that we advertise to the vhost_get_vq_desc().
Testing with the hacked virtio-scsi driver that use 1 descriptor for 1
byte in the response.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Fixes: a77ec83a57 ("vhost/scsi: fix reuse of &vq->iov[out] in response")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119073647.76467-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix the build caused by missing kmsan_handle_dma() and is_power_of_2() that
are used in drivers/virtio/virtio_ring.c.
Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
Message-Id: <20230110034310.779744-1-mie@igel.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the vhost iotlb is used along with a guest virtual iommu
and the guest gets rebooted, some MISS messages may have been
recorded just before the reboot and spuriously executed by
the virtual iommu after the reboot.
As vhost does not have any explicit reset user API,
VHOST_NET_SET_BACKEND looks a reasonable point where to clear
the pending messages, in case the backend is removed.
Export vhost_clear_msg() and call it in vhost_net_set_backend()
when fd == -1.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Message-Id: <20230117151518.44725-3-eric.auger@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We change recently the memalloc helper to use
dma_alloc_noncontiguous() and the fallback to get_pages(). Although
lots of issues with IOMMU (or non-IOMMU) have been addressed, but
there seems still a regression on Xen PV. Interestingly, the only
proper way to work is use dma_alloc_coherent(). The use of
dma_alloc_coherent() for SG buffer was dropped as it's problematic on
IOMMU systems. OTOH, Xen PV has a different way, and it's fine to use
the dma_alloc_coherent().
This patch is a workaround for Xen PV. It consists of the following
changes:
- For Xen PV, use only the fallback allocation without
dma_alloc_noncontiguous()
- In the fallback allocation, use dma_alloc_coherent();
the DMA address from dma_alloc_coherent() is returned in get_addr
ops
- The DMA addresses are stored in an array; the first entry stores the
number of allocated pages in lower bits, which are referred at
releasing pages again
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: a8d302a0b7 ("ALSA: memalloc: Revive x86-specific WC page allocations again")
Fixes: 9736a32513 ("ALSA: memalloc: Don't fall back for SG-buffer with IOMMU")
Link: https://lore.kernel.org/r/87tu256lqs.wl-tiwai@suse.de
Link: https://lore.kernel.org/r/20230125153104.5527-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The kernel crash was caused by a BPF program attached to the
"lsm_cgroup/socket_sock_rcv_skb" hook, which performed a call to
`bpf_setsockopt()` in order to set the TCP_NODELAY flag as an
example. Flags like TCP_NODELAY can prompt the kernel to flush a
socket's outgoing queue, and this hook
"lsm_cgroup/socket_sock_rcv_skb" is frequently triggered by
softirqs. The issue was that in certain circumstances, when
`tcp_write_xmit()` was called to flush the queue, it would also allow
BH (bottom-half) to run. This could lead to our program attempting to
flush the same socket recursively, which caused a `skbuff` to be
unlinked twice.
`security_sock_rcv_skb()` is triggered by `tcp_filter()`. This occurs
before the sock ownership is checked in `tcp_v4_rcv()`. Consequently,
if a bpf program runs on `security_sock_rcv_skb()` while under softirq
conditions, it may not possess the lock needed for `bpf_setsockopt()`,
thus presenting an issue.
The patch fixes this issue by ensuring that a BPF program attached to
the "lsm_cgroup/socket_sock_rcv_skb" hook is not allowed to call
`bpf_setsockopt()`.
The differences from v1 are
- changing commit log to explain holding the lock of the sock,
- emphasizing that TCP_NODELAY is not the only flag, and
- adding the fixes tag.
v1: https://lore.kernel.org/bpf/20230125000244.1109228-1-kuifeng@meta.com/
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Fixes: 9113d7e48e ("bpf: expose bpf_{g,s}etsockopt to lsm cgroup")
Link: https://lore.kernel.org/r/20230127001732.4162630-1-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This reverts commit 948e922fc4.
Not all targets that return PQ=1 and PDT=0 should be ignored. While
the SCSI spec is vague in this department, there appears to be a
critical mass of devices which rely on devices being accessible with
this combination of reported values.
Fixes: 948e922fc4 ("scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT")
Link: https://lore.kernel.org/r/yq1lelrleqr.fsf@ca-mkp.ca.oracle.com
Acked-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin Wilck <mwilck@suse.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 622113b9f1 ("drm/ssd130x: Replace simple display helpers with the
atomic helpers") changed the driver to just use the atomic helpers instead
of the simple KMS abstraction layer.
But the commit also made a subtle change on the display power sequence and
initialization order, by moving the ssd130x_power_on() call to the encoder
.atomic_enable handler and the ssd130x_init() call to CRTC .reset handler.
Before this change, both ssd130x_power_on() and ssd130x_init() were called
in the simple display pipeline .enable handler, so the display was already
initialized by the time the SSD130X_DISPLAY_ON command was sent.
For some reasons, it only made the ssd130x SPI driver to fail but the I2C
was still working. That is the reason why the bug was not noticed before.
To revert to the old driver behavior, move the ssd130x_init() call to the
encoder .atomic_enable as well. Besides fixing the panel not being turned
on when using SPI, it also gets rid of the custom CRTC .reset callback.
Fixes: 622113b9f1 ("drm/ssd130x: Replace simple display helpers with the atomic helpers")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125184230.3343206-1-javierm@redhat.com
Pull x86 platform driver fixes from Hans de Goede:
- Fix false positive apple_gmux backlight detection on older iGPU only
MacBook models
- Various other small fixes and hardware-id additions
* tag 'platform-drivers-x86-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Fix profile modes on Intel platforms
ACPI: video: Fix apple gmux detection
platform/x86: apple-gmux: Add apple_gmux_detect() helper
platform/x86: apple-gmux: Move port defines to apple-gmux.h
platform/x86: hp-wmi: Fix cast to smaller integer type warning
platform/x86/amd: pmc: Add a module parameter to disable workarounds
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
platform/x86: asus-wmi: Fix kbd_dock_devid tablet-switch reporting
platform/x86: gigabyte-wmi: add support for B450M DS3H WIFI-CF
platform/x86: hp-wmi: Handle Omen Key event
platform/x86: dell-wmi: Add a keymap for KEY_MUTE in type 0x0010 table
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- sched: sch_taprio: do not schedule in taprio_reset()
Previous releases - regressions:
- core: fix UaF in netns ops registration error path
- ipv4: prevent potential spectre v1 gadgets
- ipv6: fix reachability confirmation with proxy_ndp
- netfilter: fix for the set rbtree
- eth: fec: use page_pool_put_full_page when freeing rx buffers
- eth: iavf: fix temporary deadlock and failure to set MAC address
Previous releases - always broken:
- netlink: prevent potential spectre v1 gadgets
- netfilter: fixes for SCTP connection tracking
- mctp: struct sock lifetime fixes
- eth: ravb: fix possible hang if RIS2_QFF1 happen
- eth: tg3: resolve deadlock in tg3_reset_task() during EEH
Misc:
- Mat stepped out as MPTCP co-maintainer"
* tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
net: mdio-mux-meson-g12a: force internal PHY off on mux switch
docs: networking: Fix bridge documentation URL
tsnep: Fix TX queue stop/wake for multiple queues
net/tg3: resolve deadlock in tg3_reset_task() during EEH
net: mctp: mark socks as dead on unhash, prevent re-add
net: mctp: hold key reference when looking up a general key
net: mctp: move expiry timer delete to unhash
net: mctp: add an explicit reference from a mctp_sk_key to sock
net: ravb: Fix possible hang if RIS2_QFF1 happen
net: ravb: Fix lack of register setting after system resumed for Gen3
net/x25: Fix to not accept on connected socket
ice: move devlink port creation/deletion
sctp: fail if no bound addresses can be used for a given scope
net/sched: sch_taprio: do not schedule in taprio_reset()
Revert "Merge branch 'ethtool-mac-merge'"
netrom: Fix use-after-free of a listening socket.
netfilter: conntrack: unify established states for SCTP paths
Revert "netfilter: conntrack: add sctp DATA_SENT state"
netfilter: conntrack: fix bug in for_each_sctp_chunk
netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
...
I'm not exactly clear on what strange workflow causes people to do it,
but clearly occasionally some files end up being committed as executable
even though they clearly aren't.
This is a reprise of commit 90fda63fa1 ("treewide: fix up files
incorrectly marked executable"), just with a different set of files (but
with the same trivial shell scripting).
So apparently we need to re-do this every five years or so, and Joe
needs to just keep reminding me to do so ;)
Reported-by: Joe Perches <joe@perches.com>
Fixes: 523375c943 ("drm/vmwgfx: Port vmwgfx to arm64")
Fixes: 5c43993777 ("ASoC: codecs: add support for ES8326")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While looking through legacy platform data users, I noticed that
the DT probing never uses data from the DT properties, as the
platform_data structure gets overwritten directly after it
is initialized.
There have never been any boards defining the platform_data in
the mainline kernel either, so this driver so far only worked
with patched kernels or with the default values.
For the benefit of possible downstream users, fix the DT probe
by no longer overwriting the data.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230126162203.2986339-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
It turns out the optimisation implemented by commit 4f2c3872dd is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.
If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.
Fixes: 4f2c3872dd ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b41bb4ed7283c3d8400ce5cf5e6ec94915e6750f.1674498637.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently, when resetting the USB modem via AT commands, the modem is
no longer re-connected.
This problem is caused by the incorrect description of the USB_OTG2_OC
pad. It should have pull-up enabled, hysteresis enabled and the
property 'over-current-active-low' should be passed.
With this change, the USB modem can be successfully re-connected
after a reset.
Cc: stable@vger.kernel.org
Fixes: 9ac0ae97e3 ("ARM: dts: imx7d-smegw01: Add support for i.MX7D SMEGW01 board")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Currently if suspending using either freeze or memory state, the fec
driver tries to power down the phy which leads to crash of the kernel
and non-responsible kernel with the following call trace:
[ 24.839889 ] Call trace:
[ 24.839892 ] phy_error+0x18/0x60
[ 24.839898 ] kszphy_handle_interrupt+0x6c/0x80
[ 24.839903 ] phy_interrupt+0x20/0x2c
[ 24.839909 ] irq_thread_fn+0x30/0xa0
[ 24.839919 ] irq_thread+0x178/0x2c0
[ 24.839925 ] kthread+0x154/0x160
[ 24.839932 ] ret_from_fork+0x10/0x20
Since there is currently no functionality in the phy subsystem to power
down phys let's just disable the feature of powering-down the ethernet
phy.
Fixes: 6a57f224f7 ("arm64: dts: freescale: add initial support for verdin imx8m mini")
Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The majority of device trees in arch/arm64/boot/dts/freescale/ are
built around i.MX SoCs with the rest being for Layerscape. Yet, calling
get_maintainers.pl -f on this directory will not match the MAINTAINERS
entry, because the directory name doesn't contain the substring "imx".
Add an explicit file match for the directory and exclude the Layerscape
specific files. This ensures To/Cc is not only generated from git
history, but takes e.g. the R: entries into account as well.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
as long as only a single TX queue is supported. But support for multiple
TX queues was introduced with 762031375d and I missed to adapt stop
and wake of TX queues.
Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
TX queue.
Fixes: 762031375d ("tsnep: Support multiple TX/RX queue pairs")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:
crash> foreach UN bt
...
PID: 159 TASK: c0000000067c6000 CPU: 8 COMMAND: "eehd"
...
#5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
#6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
#7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...
PID: 290 TASK: c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1"
...
#4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
#5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
#6 [c00000003721fc60] process_one_work at c00000000019e5c4
...
PID: 296 TASK: c000000037a65800 CPU: 21 COMMAND: "kworker/21:1"
...
#4 [c000000037247bc0] rtnl_lock at c000000000c940d8
#5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
#6 [c000000037247c60] process_one_work at c00000000019e5c4
...
PID: 655 TASK: c000000036f49000 CPU: 16 COMMAND: "kworker/16:2"
...:1
#4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
#5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
#6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...
Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks. If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.
Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:
crash> foreach UN bt
PID: 159 TASK: c0000000067d2000 CPU: 9 COMMAND: "eehd"
...
#4 [c000000006867a60] rtnl_lock at c000000000c940d8
#5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
#6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363 TASK: c000000037564000 CPU: 6 COMMAND: "kworker/6:1"
...
#3 [c000000036c1bb70] msleep at c000000000259e6c
#4 [c000000036c1bba0] napi_disable at c000000000c6b848
#5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
#6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...
This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.
Fixes: db84bf43ef ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When user switch samba to ksmbd, The following message flood is coming
when accessing files. Samba seems to changs dos attribute version to v5.
This patch downgrade ndr version error message to debug.
$ dmesg
...
[68971.766914] ksmbd: v5 version is not supported
[68971.779808] ksmbd: v5 version is not supported
[68971.871544] ksmbd: v5 version is not supported
[68971.910135] ksmbd: v5 version is not supported
...
Cc: stable@vger.kernel.org
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stream protocol length will never be larger than 16KB until session setup.
After session setup, the size of requests will not be larger than
16KB + SMB2 MAX WRITE size. This patch limits these invalidly oversized
requests and closes the connection immediately.
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The previous algorithm was pretty broken.
- The inner loop had a '(m > m_max)' condition, and the value of 'm'
would increase in each iteration;
- Each iteration would actually multiply 'm' by two, so it is not needed
to re-compute the whole equation at each iteration;
- It would loop until (m & 1) == 0, which means it would loop at most
once.
- The outer loop would divide the 'n' value by two at the end of each
iteration. This meant that for a 12 MHz parent clock and a 1.2 GHz
requested clock, it would first try n=12, then n=6, then n=3, then
n=1, none of which would work; the only valid value is n=2 in this
case.
Simplify this algorithm with a single for loop, which decrements 'n'
after each iteration, addressing all of the above problems.
Fixes: bdbfc02937 ("clk: ingenic: Add support for the JZ4760")
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20221214123704.7305-1-paul@crapouillou.net
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
The cxl_pmem.ko module houses the driver for both cxl_nvdimm_bridge
objects and cxl_nvdimm objects. When the core creates a cxl_nvdimm it
arranges for it to be autoremoved when the bridge goes down. However, if
the bridge never initialized because the cxl_pmem.ko module never
loaded, it sets up a the following crash scenario:
BUG: kernel NULL pointer dereference, address: 0000000000000478
[..]
RIP: 0010:cxl_nvdimm_probe+0x99/0x140 [cxl_pmem]
[..]
Call Trace:
<TASK>
cxl_bus_probe+0x17/0x50 [cxl_core]
really_probe+0xde/0x380
__driver_probe_device+0x78/0x170
driver_probe_device+0x1f/0x90
__driver_attach+0xd2/0x1c0
bus_for_each_dev+0x79/0xc0
bus_add_driver+0x1b1/0x200
driver_register+0x89/0xe0
cxl_pmem_init+0x50/0xff0 [cxl_pmem]
It turns out the recent rework to simplify nvdimm probing obviated the
need to unregister cxl_nvdimm objects at cxl_nvdimm_bridge ->remove()
time. Leave the cxl_nvdimm device registered until the hosting
cxl_memdev departs. The alternative is that the cxl_memdev needs to be
reattached whenever the cxl_nvdimm_bridge attach state cycles, which is
awkward and unnecessary.
The only requirement is to make sure that when the cxl_nvdimm_bridge
goes away any dependent cxl_nvdimm objects are shutdown. Handle that in
unregister_nvdimm_bus().
With these registration entanglements removed there is no longer a need
to pre-load the cxl_pmem module in cxl_acpi.
Fixes: cb9cfff82f ("cxl/acpi: Simplify cxl_nvdimm_bridge probing")
Reported-by: Gregory Price <gregory.price@memverge.com>
Debugged-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167426077263.3955046.9695309346988027311.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Lockdep reports that acpi_nfit_shutdown() may deadlock against an
opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a
'work' and therefore has already acquired workqueue-internal locks. It
also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first
acquires init_mutex, and was subsequently attempting to cancel any
pending workqueue items. This reversed locking order causes a potential
deadlock:
======================================================
WARNING: possible circular locking dependency detected
6.2.0-rc3 #116 Tainted: G O N
------------------------------------------------------
libndctl/1958 is trying to acquire lock:
ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450
but task is already holding lock:
ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]
which lock already depends on the new lock.
...
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&acpi_desc->init_mutex);
lock((work_completion)(&(&acpi_desc->dwork)->work));
lock(&acpi_desc->init_mutex);
lock((work_completion)(&(&acpi_desc->dwork)->work));
*** DEADLOCK ***
Since the workqueue manipulation is protected by its own internal locking,
the cancellation of pending work doesn't need to be done under
acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the
init_mutex to fix the deadlock. Any work that starts after
acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the
cancel_delayed_work_sync() will safely flush it out.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
struct bkey has internal padding in a union, but it isn't always named
the same (e.g. key ## _pad, key_p, etc). This makes it extremely hard
for the compiler to reason about the available size of copies done
against such keys. Use unsafe_memcpy() for now, to silence the many
run-time false positive warnings:
memcpy: detected field-spanning write (size 264) of single field "&i->j" at drivers/md/bcache/journal.c:152 (size 240)
memcpy: detected field-spanning write (size 24) of single field "&b->key" at drivers/md/bcache/btree.c:939 (size 16)
memcpy: detected field-spanning write (size 24) of single field "&temp.key" at drivers/md/bcache/extents.c:428 (size 16)
Reported-by: Alexandre Pereira <alexpereira@disroot.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216785
Acked-by: Coly Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230106060229.never.047-kees@kernel.org
[Why&How]
Switching between certain modes that are freesync video modes and those
are not freesync video modes result in timing not changing as seen by
the monitor due to incorrect timing being driven.
The issue is fixed by ensuring that when a non freesync video mode is
set, we reset the freesync status on the crtc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There was a recent regression in btrfs/177 that started happening with
the size class patches ("btrfs: introduce size class to block group
allocator"). This however isn't a regression introduced by those
patches, but rather the bug was uncovered by a change in behavior in
these patches. The patches triggered more chunk allocations in the
^free-space-tree case, which uncovered a race with device shrink.
The problem is we will set the device total size to the new size, and
use this to find a hole for a device extent. However during shrink we
may have device extents allocated past this range, so we could
potentially find a hole in a range past our new shrink size. We don't
actually limit our found extent to the device size anywhere, we assume
that we will not find a hole past our device size. This isn't true with
shrink as we're relocating block groups and thus creating holes past the
device size.
Fix this by making sure we do not search past the new device size, and
if we wander into any device extents that start after our device size
simply break from the loop and use whatever hole we've already found.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We take two stripe numbers if vertical errors are found. In case it is
just a pstripe it does not matter but in case of raid 6 it matters as
both stripes need to be fixed.
Fixes: 7a31507230 ("btrfs: raid56: do data csum verification during RMW cycle")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Why]
amdgpu expects to update payload table for one stream one time
by calling dm_helpers_dp_mst_write_payload_allocation_table().
Currently, it get modified to try to update HW payload table
at once by referring mst_state.
[How]
This is just a quick workaround. Should find way to remove the
temporary struct dc_dp_mst_stream_allocation_table later if set
struct link_mst_stream_allocatio directly is possible.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes: 4d07b0bc40 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Looks like I made a pretty big mistake here without noticing: it seems when
I moved the assignments of mst_state->pbn_div I completely missed the fact
that the reason for us calling drm_dp_mst_update_slots() earlier was to
account for the fact that we need to call this function using info from the
root MST connector, instead of just trying to do this from each MST
encoder's atomic check function. Otherwise, we end up filling out all of
DC's link information with zeroes.
So, let's restore that and hopefully fix this DSC regression.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes: 4d07b0bc40 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull fuse ACL fix from Christian Brauner:
"The new posix acl API doesn't depend on the xattr handler
infrastructure anymore and instead only relies on the posix acl inode
operations. As a result daemons without FUSE_POSIX_ACL are unable to
use posix acls like they used to.
Fix this by copying what we did for overlayfs during the posix acl api
conversion. Make fuse implement a dedicated ->get_inode_acl() method
as does overlayfs. Fuse can then also uses this to express different
needs for vfs permission checking during lookup and acl based
retrieval via the regular system call path.
This allows fuse to continue to refuse retrieving posix acls for
daemons that don't set FUSE_POSXI_ACL for permission checking while
also allowing a fuse server to retrieve it via the usual system calls"
* tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fuse: fixes after adapting to new posix acl api
Use the 'struct' keyword for a struct's kernel-doc notation and
use the correct function parameter name to eliminate kernel-doc
warnings:
kernel/trace/trace_events_filter.c:136: warning: cannot understand function prototype: 'struct prog_entry '
kerne/trace/trace_events_filter.c:155: warning: Excess function parameter 'when_to_branch' description in 'update_preds'
Also correct some trivial punctuation problems.
Link: https://lkml.kernel.org/r/20230108021238.16398-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently connect/disconnect of USB cable calls afunc_bind and
eventually increments the bNumEndpoints. Performing multiple
plugin/plugout will increment bNumEndpoints incorrectly, and on
the next plug-in it leads to invalid configuration of descriptor
and hence enumeration fails.
Fix this by resetting the value of bNumEndpoints to 1 on every
afunc_bind call.
Fixes: 40c73b3054 ("usb: gadget: f_uac2: add adaptive sync support for capture")
Cc: stable <stable@kernel.org>
Signed-off-by: Pratham Pratap <quic_ppratap@quicinc.com>
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Link: https://lore.kernel.org/r/1674631645-28888-1-git-send-email-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This device has a touchscreen thats report a battery even if it doesn't
have one.
Ask Linux to ignore the battery so it will not always report it as low.
[jkosina@suse.cz: fix whitespace damage]
Signed-off-by: Marco Rodolfi <marco.rodolfi@tuta.io>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In order to prevent int340x_thermal_get_trip_type() from possibly
racing with int340x_thermal_read_trips() invoked by int3403_notify()
add locking to it in analogy with int340x_thermal_get_trip_temp().
Fixes: 6757a7abe4 ("thermal: intel: int340x: Protect trip temperature from concurrent updates")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jeremy Kerr says:
====================
net: mctp: struct sock lifetime fixes
This series is a set of fixes for the sock lifetime handling in the
AF_MCTP code, fixing a uaf reported by Noam Rathaus
<noamr@ssd-disclosure.com>.
The Fixes: tags indicate the original patches affected, but some
tweaking to backport to those commits may be needed; I have a separate
branch with backports to 5.15 if that helps with stable trees.
Of course, any comments/queries most welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Once a socket has been unhashed, we want to prevent it from being
re-used in a sk_key entry as part of a routing operation.
This change marks the sk as SOCK_DEAD on unhash, which prevents addition
into the net's key list.
We need to do this during the key add path, rather than key lookup, as
we release the net keys_lock between those operations.
Fixes: 4a992bbd36 ("mctp: Implement message fragmentation & reassembly")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we have a race where we look up a sock through a "general"
(ie, not directly associated with the (src,dest,tag) tuple) key, then
drop the key reference while still holding the key's sock.
This change expands the key reference until we've finished using the
sock, and hence the sock reference too.
Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Fixes: 73c618456d ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we delete the key expiry timer (in sk->close) before
unhashing the sk. This means that another thread may find the sk through
its presence on the key list, and re-queue the timer.
This change moves the timer deletion to the unhash, after we have made
the key no longer observable, so the timer cannot be re-queued.
Fixes: 7b14e15ae6 ("mctp: Implement a timeout for tags")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
through the sock hash/unhash operations, but this is pretty tenuous, and
there are cases where we may have a temporary reference to an unhashed
sk.
This change makes the reference more explicit, by adding a hold on the
sock when it's associated with a mctp_sk_key, released on final key
unref.
Fixes: 73c618456d ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this driver enables the interrupt by RIC2_QFE1, this driver
should clear the interrupt flag if it happens. Otherwise, the interrupt
causes to hang the system.
Note that this also fix a minor coding style (a comment indentation)
around the fixed code.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
After system entered Suspend to RAM, registers setting of this
hardware is reset because the SoC will be turned off. On R-Car Gen3
(info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So,
after system resumed, it lacks of the initial settings for ptp. So,
add ravb_ptp_{init,stop}() into ravb_{resume,suspend}().
Fixes: f5d7837f96 ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix wrong translation of irq numbers in port F handler, as ep93xx hwirqs
increased by 1, we should simply decrease them by 1 in translation.
Fixes: 482c27273f ("ARM: ep93xx: renumber interrupts")
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
We recently added locking to this function but one error path was
over looked. Drop the lock before returning.
Fixes: e546427762 ("gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
My last commit to fix profile mode displays on AMD platforms caused
an issue on Intel platforms - sorry!
In it I was reading the current functional mode (MMC, PSC, AMT) from
the BIOS but didn't account for the fact that on some of our Intel
platforms I use a different API which returns just the profile and not
the functional mode.
This commit fixes it so that on Intel platforms it knows the functional
mode is always MMC.
I also fixed a potential problem that a platform may try to set the mode
for both MMC and PSC - which was incorrect.
Tested on X1 Carbon 9 (Intel) and Z13 (AMD).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216963
Fixes: fde5f74ccf ("platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode")
Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230124153623.145188-1-mpearson-lenovo@squebb.ca
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
When listen() and accept() are called on an x25 socket
that connect() succeeds, accept() succeeds immediately.
This is because x25_connect() queues the skb to
sk->sk_receive_queue, and x25_accept() dequeues it.
This creates a child socket with the sk of the parent
x25 socket, which can cause confusion.
Fix x25_listen() to return -EINVAL if the socket has
already been successfully connect()ed to avoid this issue.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The namespace head saves the Command Set Indicator enum, so use that
instead of the Command Set Selected. The two values are not the same.
Fixes: 831ed60c2a ("nvme: also return I/O command effects from nvme_command_effects")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A listening socket linked to a sockmap has its sk_prot overridden. It
points to one of the struct proto variants in tcp_bpf_prots. The variant
depends on the socket's family and which sockmap programs are attached.
A child socket cloned from a TCP listener initially inherits their sk_prot.
But before cloning is finished, we restore the child's proto to the
listener's original non-tcp_bpf_prots one. This happens in
tcp_create_openreq_child -> tcp_bpf_clone.
Today, in tcp_bpf_clone we detect if the child's proto should be restored
by checking only for the TCP_BPF_BASE proto variant. This is not
correct. The sk_prot of listening socket linked to a sockmap can point to
to any variant in tcp_bpf_prots.
If the listeners sk_prot happens to be not the TCP_BPF_BASE variant, then
the child socket unintentionally is left if the inherited sk_prot by
tcp_bpf_clone.
This leads to issues like infinite recursion on close [1], because the
child state is otherwise not set up for use with tcp_bpf_prot operations.
Adjust the check in tcp_bpf_clone to detect all of tcp_bpf_prots variants.
Note that it wouldn't be sufficient to check the socket state when
overriding the sk_prot in tcp_bpf_update_proto in order to always use the
TCP_BPF_BASE variant for listening sockets. Since commit
b8b8315e39 ("bpf, sockmap: Remove unhash handler for BPF sockmap usage")
it is possible for a socket to transition to TCP_LISTEN state while already
linked to a sockmap, e.g. connect() -> insert into map ->
connect(AF_UNSPEC) -> listen().
[1]: https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/
Fixes: e80251555f ("tcp_bpf: Don't let child socket inherit parent protocol ops on copy")
Reported-by: syzbot+04c21ed96d861dccc5cd@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-2-1e0ee7ac2f90@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Perform SCTP vtag verification for ABORT/SHUTDOWN_COMPLETE according
to RFC 9260, Sect 8.5.1.
2) Fix infinite loop if SCTP chunk size is zero in for_each_sctp_chunk().
And remove useless check in this macro too.
3) Revert DATA_SENT state in the SCTP tracker, this was applied in the
previous merge window. Next patch in this series provides a more
simple approach to multihoming support.
4) Unify HEARTBEAT_ACKED and ESTABLISHED states for SCTP multihoming
support, use default ESTABLISHED of 210 seconds based on
heartbeat timeout * maximum number of retransmission + round-trip timeout.
Otherwise, SCTP conntrack entry that represents secondary paths
remain stale in the table for up to 5 days.
This is a slightly large batch with fixes for the SCTP connection
tracking helper, all patches from Sriram Yagnaraman.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: unify established states for SCTP paths
Revert "netfilter: conntrack: add sctp DATA_SENT state"
netfilter: conntrack: fix bug in for_each_sctp_chunk
netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
====================
Link: https://lore.kernel.org/r/20230124183933.4752-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit a286ba7387 ("ice: reorder PF/representor devlink
port register/unregister flows") moved the code to create
and destroy the devlink PF port. This was fine, but created
a corner case issue in the case of ice_register_netdev()
failing. In that case, the driver would end up calling
ice_devlink_destroy_pf_port() twice.
Additionally, it makes no sense to tie creation of the devlink
PF port to the creation of the netdev so separate out the
code to create/destroy the devlink PF port from the netdev
code. This makes it a cleaner interface.
Fixes: a286ba7387 ("ice: reorder PF/representor devlink port register/unregister flows")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, if you bind the socket to something like:
servaddr.sin6_family = AF_INET6;
servaddr.sin6_port = htons(0);
servaddr.sin6_scope_id = 0;
inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);
And then request a connect to:
connaddr.sin6_family = AF_INET6;
connaddr.sin6_port = htons(20000);
connaddr.sin6_scope_id = if_nametoindex("lo");
inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);
What the stack does is:
- bind the socket
- create a new asoc
- to handle the connect
- copy the addresses that can be used for the given scope
- try to connect
But the copy returns 0 addresses, and the effect is that it ends up
trying to connect as if the socket wasn't bound, which is not the
desired behavior. This unexpected behavior also allows KASLR leaks
through SCTP diag interface.
The fix here then is, if when trying to copy the addresses that can
be used for the scope used in connect() it returns 0 addresses, bail
out. This is what TCP does with a similar reproducer.
Reported-by: Pietro Borrello <borrello@diag.uniroma1.it>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull module fix from Luis Chamberlain:
"Theis is a fix we have been delaying for v6.2 due to lack of early
testing on linux-next.
The commit has been sitting in linux-next since December and testing
has also been now a bit extensive by a few developers. Since this is a
fix which definitely will go to v6.3 it should also apply to v6.2 so
if there are any issues we pick them up earlier rather than later. The
fix fixes a regression since v5.3, prior to me helping with module
maintenance, however, the issue is real in that in the worst case now
can prevent boot.
We've discussed all possible corner cases [0] and at last do feel this
is ready for v6.2-rc6"
Link https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ [0]
* tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Don't wait for GOING modules
Pull rust fix from Miguel Ojeda:
- Avoid evaluating arguments in 'pr_*' macros in 'unsafe' blocks
* tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux:
rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Pass the correct address to mte_clear_page_tags() on initialising a
tagged page
- Plug a race against a GICv4.1 doorbell interrupt while saving the
vgic-v3 pending state.
x86:
- A command line parsing fix and a clang compilation fix for
selftests
- A fix for a longstanding VMX issue, that surprisingly was only
found now to affect real world guests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Make reclaim_period_ms input always be positive
KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
selftests: kvm: move declaration at the beginning of main()
KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
KVM: arm64: Pass the actual page address to mte_clear_page_tags()
Pull SCSI fixes from James Bottomley:
"Six fixes, all in drivers.
The biggest are the UFS devfreq fixes which address a lock inversion
and the two iscsi_tcp fixes which try to prevent a use after free from
userspace still accessing an area which the kernel has released (seen
by KASAN)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: device_handler: alua: Remove a might_sleep() annotation
scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress
scsi: ufs: core: Fix devfreq deadlocks
scsi: hpsa: Fix allocation size for scsi_host_alloc()
scsi: target: core: Fix warning on RT kernels
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence false lockdep
warning when CONFIG_PROVE_RCU_LIST is enabled.
Execute as follow:
[tracing]# echo osnoise > current_tracer
[tracing]# echo 1 > tracing_on
[tracing]# echo 0 > tracing_on
The trace_types_lock is held when osnoise_tracer_stop() or
timerlat_tracer_stop() are called in the non-RCU read side section.
So, pass lockdep_is_held(&trace_types_lock) to silence false lockdep
warning.
Link: https://lkml.kernel.org/r/20221227023036.784337-1-nashuiliang@gmail.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: dae181349f ("tracing/osnoise: Support a list of trace_array *tr")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull nfsd fix from Chuck Lever:
- Nail another UAF in NFSD's filecache
* tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
Pull fscrypt MAINTAINERS entry update from Eric Biggers:
"Update the MAINTAINERS file entry for fscrypt"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
MAINTAINERS: update fscrypt git repo
During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.
Since commit 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.
This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.
This patch attempts a different solution for the problem 6e6de3dee5
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee5 (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.
This has been tested on linux-next since December 2022 and passes
all kmod selftests except test 0009 with module compression enabled
but it has been confirmed that this issue has existed and has gone
unnoticed since prior to this commit and can also be reproduced without
module compression with a simple usleep(5000000) on tools/modprobe.c [0].
These failures are caused by hitting the kernel mod_concurrent_max and can
happen either due to a self inflicted kernel module auto-loead DoS somehow
or on a system with large CPU count and each CPU count incorrectly triggering
many module auto-loads. Both of those issues need to be fixed in-kernel.
[0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/
Fixes: 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Co-developed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Petr Mladek <pmladek@suse.com>
[mcgrof: enhance commit log with testing and kmod test result interpretation ]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Pull fsverity MAINTAINERS entry update from Eric Biggers:
"Update the MAINTAINERS file entry for fsverity"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
MAINTAINERS: update fsverity git repo, list, and patchwork
Commit f3bbac3247 ("ext4: deal with legacy signed xattr name hash
values") added a hashing function for the legacy case of having the
xattr hash calculated using a signed 'char' type. It left the unsigned
case alone, since it's all implicitly handled by the '-funsigned-char'
compiler option.
However, there's been some noise about back-porting it all into stable
kernels that lack the '-funsigned-char', so let's just make that at
least possible by making the whole 'this uses unsigned char' very
explicit in the code itself. Whether such a back-port is really
warranted or not, I'll leave to others, but at least together with this
change it is technically sensible.
Also, add a 'pr_warn_once()' for reporting the "hey, signedness for this
hash calculation has changed" issue. Hopefully it never triggers except
for that xfstests generic/454 test-case, but even if it does it's just
good information to have.
If for no other reason than "we can remove the legacy signed hash code
entirely if nobody ever sees the message any more".
Cc: Sasha Levin <sashal@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Cc: Jason Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trip temperatures are read using ACPI methods and stored in the memory
during zone initializtion and when the firmware sends a notification for
change. This trip temperature is returned when the thermal core calls via
callback get_trip_temp().
But it is possible that while updating the memory copy of the trips when
the firmware sends a notification for change, thermal core is reading the
trip temperature via the callback get_trip_temp(). This may return invalid
trip temperature.
To address this add a mutex to protect the invalid temperature reads in
the callback get_trip_temp() and int340x_thermal_read_trips().
Fixes: 5fbf7f27fa ("Thermal/int340x: Add common thermal zone handler")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The nvme device may have a namespace with the root partition, so make
sure we've completed scanning before returning from the async probe.
Fixes: eac3ef2629 ("nvme-pci: split the initial probe from the rest path")
Reported-by: Klaus Jensen <its@irrelevant.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The instructions for the ftrace-bisect.sh script, which is used to find
what function is being traced that is causing a kernel crash, and possibly
a triple fault reboot, uses the old method. In 5.1, a new feature was
added that let the user write in the index into available_filter_functions
that maps to the function a user wants to set in set_ftrace_filter (or
set_ftrace_notrace). This takes O(1) to set, as suppose to writing a
function name, which takes O(n) (where n is the number of functions in
available_filter_functions).
The ftrace-bisect.sh requires setting half of the functions in
available_filter_functions, which is O(n^2) using the name method to enable
and can take several minutes to complete. The number method is O(n) which
takes less than a second to complete. Using the number method for any
kernel 5.1 and after is the proper way to do the bisect.
Update the usage to reflect the new change, as well as using the
/sys/kernel/tracing path instead of the obsolete debugfs path.
Link: https://lkml.kernel.org/r/20230123112252.022003dd@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Fixes: f79b3f3385 ("ftrace: Allow enabling of filters via index of available_filter_functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
__ffs_ep0_queue_wait executes holding the spinlock of &ffs->ev.waitq.lock
and unlocks it after the assignments to usb_request are done.
However in the code if the request is already NULL we bail out returning
-EINVAL but never unlocked the spinlock.
Fix this by adding spin_unlock_irq &ffs->ev.waitq.lock before returning.
Fixes: 6a19da1110 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Reviewed-by: John Keeping <john@metanate.com>
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Link: https://lore.kernel.org/r/20230124091149.18647-1-quic_ugoswami@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently trace_printk() can be used as soon as early_trace_init() is
called from start_kernel(). But if a crash happens, and
"ftrace_dump_on_oops" is set on the kernel command line, all you get will
be:
[ 0.456075] <idle>-0 0dN.2. 347519us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 353141us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 358684us : Unknown type 6
This is because the trace_printk() event (type 6) hasn't been registered
yet. That gets done via an early_initcall(), which may be early, but not
early enough.
Instead of registering the trace_printk() event (and other ftrace events,
which are not trace events) via an early_initcall(), have them registered at
the same time that trace_printk() can be used. This way, if there is a
crash before early_initcall(), then the trace_printk()s will actually be
useful.
Link: https://lkml.kernel.org/r/20230104161412.019f6c55@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e725c731e3 ("tracing: Split tracing initialization into two for early initialization")
Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Setting filters on an ftrace ops results in some memory being allocated
for the filter hashes, which must be freed before the ops can be freed.
This can be done by removing every individual element of the hash by
calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove`
set, but this is somewhat error prone as it's easy to forget to remove
an element.
Make it easier to clean this up by exporting ftrace_free_filter(), which
can be used to clean up all of the filter hashes after an ftrace_ops has
been unregistered.
Using this, fix the ftrace-direct* samples to free hashes prior to being
unloaded. All other code either removes individual filters explicitly or
is built-in and already calls ftrace_free_filter().
Link: https://lkml.kernel.org/r/20230103124912.2948963-3-mark.rutland@arm.com
Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e1067a07cf ("ftrace/samples: Add module to test multi direct modify interface")
Fixes: 5fae941b9a ("ftrace/samples: Add multi direct interface test module")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit a10b215325 ("media: vb2: add (un)prepare_streaming queue ops") moved
up the q->streaming = 1 assignment to before the call to vb2_start_streaming().
This does make sense since q->streaming indicates that VIDIOC_STREAMON is called,
and the call to start_streaming happens either at that time or later if
q->min_buffers_needed > 0. So q->streaming should be 1 before start_streaming
is called.
However, it turned out that some drivers use vb2_is_streaming() in buf_queue,
and if q->min_buffers_needed == 0, then that will now return true instead of
false.
So for the time being revert to the original behavior.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: a10b215325 ("media: vb2: add (un)prepare_streaming queue ops")
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.
With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.
Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.
We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.
We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.
We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.
But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.
In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.
But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.
Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.
A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.
Fixes: 318e66856d ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
By default when the system is configured for low power idle in the FADT
the keyboard is set up as a wake source. This matches the behavior that
Windows uses for Modern Standby as well.
It has been reported that a variety of AMD based designs there are
spurious wakeups are happening where two IRQ sources are active.
For example:
```
PM: Triggering wakeup from IRQ 9
PM: Triggering wakeup from IRQ 1
```
In these designs IRQ 9 is the ACPI SCI and IRQ 1 is the keyboard.
One way to trigger this problem is to suspend the laptop and then unplug
the AC adapter. The SOC will be in a hardware sleep state and plugging
in the AC adapter returns control to the kernel's s2idle loop.
Normally if just IRQ 9 was active the s2idle loop would advance any EC
transactions and no other IRQ being active would cause the s2idle loop
to put the SOC back into hardware sleep state.
When this bug occurred IRQ 1 is also active even if no keyboard activity
occurred. This causes the s2idle loop to break and the system to wake.
This is a platform firmware bug triggering IRQ1 without keyboard activity.
This occurs in Windows as well, but Windows will enter "SW DRIPS" and
then with no activity enters back into "HW DRIPS" (hardware sleep state).
This issue affects Renoir, Lucienne, Cezanne, and Barcelo platforms. It
does not happen on newer systems such as Mendocino or Rembrandt.
It's been fixed in newer platform firmware. To avoid triggering the bug
on older systems check the SMU F/W version and adjust the policy at suspend
time for s2idle wakeup from keyboard on these systems. A lot of thought
and experimentation has been given around the timing of disabling IRQ1,
and to make it work the "suspend" PM callback is restored.
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2115
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1951
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230120191519.15926-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Commit 1ea0d3b467 ("platform/x86: asus-wmi: Simplify tablet-mode-switch
handling") unified the asus-wmi tablet-switch handling, but it did not take
into account that the value returned for the kbd_dock_devid WMI method is
inverted where as the other ones are not inverted.
This causes asus-wmi to report an inverted tablet-switch state for devices
which use the kbd_dock_devid, which causes libinput to ignore touchpad
events while the affected T10x model 2-in-1s are docked.
Add inverting of the return value in the kbd_dock_devid case to fix this.
Fixes: 1ea0d3b467 ("platform/x86: asus-wmi: Simplify tablet-mode-switch handling")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230120143441.527334-1-hdegoede@redhat.com
Add support to map the "HP Omen Key" to KEY_PROG2. Laptops in the HP
Omen Series open the HP Omen Command Center application on windows. But,
on linux it fails with the following message from the hp-wmi driver:
[ 5143.415714] hp_wmi: Unknown event_id - 29 - 0x21a5
Also adds support to map Fn+Esc to KEY_FN_ESC. This currently throws the
following message on the hp-wmi driver:
[ 6082.143785] hp_wmi: Unknown key code - 0x21a7
There is also a "Win-Lock" key on HP Omen Laptops which supports
Enabling and Disabling the Windows key, which trigger commands 0x21a4
and 0x121a4 respectively, but I wasn't able to find any KEY in input.h
to map this to.
Signed-off-by: Rishit Bansal <rishitbansal0@gmail.com>
Link: https://lore.kernel.org/r/20230120221214.24426-1-rishitbansal0@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The DRM fbdev emulation layer sets the struct fb_info .fbdefio field to
a struct fb_deferred_io pointer, that is shared across all drivers that
use the generic drm_fbdev_generic_setup() helper function.
It is a problem because the fbdev core deferred I/O logic assumes that
the struct fb_deferred_io data is not shared between devices, and it's
stored there state such as the list of pages touched and a mutex that
is use to synchronize between the fb_deferred_io_track_page() function
that track the dirty pages and fb_deferred_io_work() workqueue handler
doing the actual deferred I/O.
The latter can lead to the following error, since it may happen that two
drivers are probed and then one is removed, which causes the mutex bo be
destroyed and not existing anymore by the time the other driver tries to
grab it for the fbdev deferred I/O logic:
[ 369.756553] ------------[ cut here ]------------
[ 369.756604] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 369.756631] WARNING: CPU: 2 PID: 1023 at kernel/locking/mutex.c:582 __mutex_lock+0x348/0x424
[ 369.756744] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ip
v6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr btsdio bluetooth sunrpc brcmfmac snd_soc_hdmi_codec cpufreq_dt cfg80211 vfat fat vc4 rfkill brcmutil raspberrypi_cpufreq i2c_bcm2835 iproc_rng200 bcm2711_thermal snd_soc_core snd_pcm_dmaen
gine leds_gpio nvmem_rmem joydev hid_cherry uas usb_storage gpio_raspberrypi_exp v3d snd_pcm raspberrypi_hwmon gpu_sched bcm2835_wdt broadcom bcm_phy_lib snd_timer genet snd mdio_bcm_unimac clk_bcm2711_dvp soundcore drm_display_helper pci
e_brcmstb cec ip6_tables ip_tables fuse
[ 369.757400] CPU: 2 PID: 1023 Comm: fbtest Not tainted 5.19.0-rc6+ #94
[ 369.757455] Hardware name: raspberrypi,4-model-b Raspberry Pi 4 Model B Rev 1.4/Raspberry Pi 4 Model B Rev 1.4, BIOS 2022.10 10/01/2022
[ 369.757538] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 369.757596] pc : __mutex_lock+0x348/0x424
[ 369.757635] lr : __mutex_lock+0x348/0x424
[ 369.757672] sp : ffff80000953bb00
[ 369.757703] x29: ffff80000953bb00 x28: ffff17fdc087c000 x27: 0000000000000002
[ 369.757771] x26: ffff17fdc349f9b0 x25: fffffc5ff72e0100 x24: 0000000000000000
[ 369.757838] x23: 0000000000000000 x22: 0000000000000002 x21: ffffa618df636f10
[ 369.757903] x20: ffff80000953bb68 x19: ffffa618e0f18138 x18: 0000000000000001
[ 369.757968] x17: 0000000020000000 x16: 0000000000000002 x15: 0000000000000000
[ 369.758032] x14: 0000000000000000 x13: 284e4f5f4e524157 x12: 5f534b434f4c5f47
[ 369.758097] x11: 00000000ffffdfff x10: ffffa618e0c79f88 x9 : ffffa618de472484
[ 369.758162] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
[ 369.758227] x5 : 0000000000001fff x4 : 0000000000000000 x3 : 0000000000000027
[ 369.758292] x2 : 0000000000000001 x1 : ffff17fdc087c000 x0 : 0000000000000028
[ 369.758357] Call trace:
[ 369.758383] __mutex_lock+0x348/0x424
[ 369.758420] mutex_lock_nested+0x4c/0x5c
[ 369.758459] fb_deferred_io_mkwrite+0x78/0x1d8
[ 369.758507] do_page_mkwrite+0x5c/0x19c
[ 369.758550] wp_page_shared+0x70/0x1a0
[ 369.758590] do_wp_page+0x3d0/0x510
[ 369.758628] handle_pte_fault+0x1c0/0x1e0
[ 369.758670] __handle_mm_fault+0x250/0x380
[ 369.758712] handle_mm_fault+0x17c/0x3a4
[ 369.758753] do_page_fault+0x158/0x530
[ 369.758792] do_mem_abort+0x50/0xa0
[ 369.758831] el0_da+0x78/0x19c
[ 369.758864] el0t_64_sync_handler+0xbc/0x150
[ 369.758904] el0t_64_sync+0x190/0x194
[ 369.758942] irq event stamp: 11395
[ 369.758973] hardirqs last enabled at (11395): [<ffffa618de472554>] __up_console_sem+0x74/0x80
[ 369.759042] hardirqs last disabled at (11394): [<ffffa618de47254c>] __up_console_sem+0x6c/0x80
[ 369.760554] softirqs last enabled at (11392): [<ffffa618de330a74>] __do_softirq+0x4c4/0x6b8
[ 369.762060] softirqs last disabled at (11383): [<ffffa618de3c9124>] __irq_exit_rcu+0x104/0x214
[ 369.763564] ---[ end trace 0000000000000000 ]---
Fixes: d536540f30 ("drm/fb-helper: Add generic fbdev emulation .fb_probe function")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230121192418.2814955-4-javierm@redhat.com
An SCTP endpoint can start an association through a path and tear it
down over another one. That means the initial path will not see the
shutdown sequence, and the conntrack entry will remain in ESTABLISHED
state for 5 days.
By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
ESTABLISHED state, there remains no difference between a primary or
secondary path. The timeout for the merged ESTABLISHED state is set to
210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
path doesn't see the shutdown sequence, it will expire in a reasonable
amount of time.
With this change in place, there is now more than one state from which
we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
handle the setting of ASSURED bit whenever a state change has happened
and the new state is ESTABLISHED. Removed the check for dir==REPLY since
the transition to ESTABLISHED can happen only in the reply direction.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This reverts commit (bff3d05348: "netfilter: conntrack: add sctp
DATA_SENT state")
Using DATA/SACK to detect a new connection on secondary/alternate paths
works only on new connections, while a HEARTBEAT is required on
connection re-use. It is probably consistent to wait for HEARTBEAT to
create a secondary connection in conntrack.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
skb_header_pointer() will return NULL if offset + sizeof(_sch) exceeds
skb->len, so this offset < skb->len test is redundant.
if sch->length == 0, this will end up in an infinite loop, add a check
for sch->length > 0
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
MUST be accepted if the vtag of the packet matches its own tag and the
T bit is not set OR if it is set to its peer's vtag and the T bit is set
in chunk flags. Otherwise the packet MUST be silently dropped.
Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
description.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-20 (iavf)
This series contains updates to iavf driver only.
Michal Schmidt converts single iavf workqueue to per adapter to avoid
deadlock issues.
Marcin moves setting of VLAN related netdev features to watchdog task to
avoid RTNL deadlock.
Stefan Assmann schedules immediate watchdog task execution on changing
primary MAC to avoid excessive delay.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: schedule watchdog immediately when changing primary MAC
iavf: Move netdev_update_features() into watchdog task
iavf: fix temporary deadlock and failure to set MAC address
====================
Link: https://lore.kernel.org/r/20230120211036.430946-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix overlap detection in rbtree set backend: Detect overlap by going
through the ordered list of valid tree nodes. To shorten the number of
visited nodes in the list, this algorithm descends the tree to search
for an existing element greater than the key value to insert that is
greater than the new element.
2) Fix for the rbtree set garbage collector: Skip inactive and busy
elements when checking for expired elements to avoid interference
with an ongoing transaction from control plane.
This is a rather large fix coming at this stage of the 6.2-rc. Since
33c7aba0b4 ("netfilter: nf_tables: do not set up extensions for end
interval"), bogus overlap errors in the rbtree set occur more frequently.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
====================
Link: https://lore.kernel.org/r/20230123211601.292930-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with commit eee16b1471 ("net: dsa: microchip: perform the
compatibility check for dev probed"), the KSZ switch driver now bails
out if it thinks the DT compatible doesn't match the actual chip ID
read back from the hardware:
ksz9477-switch 1-005f: Device tree specifies chip KSZ9893 but found
KSZ8563, please fix it!
For the KSZ8563, which used ksz_switch_chips[KSZ9893], this was fine
at first, because it indeed shares the same chip id as the KSZ9893.
Commit b449080956 ("net: dsa: microchip: add separate struct
ksz_chip_data for KSZ8563 chip") started differentiating KSZ9893
compatible chips by consulting the 0x1F register. The resulting breakage
was fixed for the SPI driver in the same commit by introducing the
appropriate ksz_switch_chips[KSZ8563], but not for the I2C driver.
Fix this for I2C-connected KSZ8563 now to get it probing again.
Fixes: b449080956 ("net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip").
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230120110933.1151054-1-a.fatoum@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
if (!type)
continue;
if (type > RTAX_MAX)
return false;
...
fi_val = fi->fib_metrics->metrics[type - 1];
@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.
Fixes: 5f9ae3d9e7 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netlink_getsockbyportid() reads sk_state while a concurrent
netlink_connect() can change its value.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netlink_getname(), netlink_sendmsg() and netlink_getsockbyportid()
can read nlk->dst_portid and nlk->dst_group while another
thread is changing them.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On 32-bit architectures with 64-bit resource_size_t, sp_rtc_probe()
causes a compiler warning:
drivers/rtc/rtc-sunplus.c: In function 'sp_rtc_probe':
drivers/rtc/rtc-sunplus.c:243:33: error: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Werror=format=]
243 | dev_dbg(&plat_dev->dev, "res = 0x%x, reg_base = 0x%lx\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The best way to print a resource is the special %pR format string,
and similarly to print a pointer we can use %p and avoid the cast.
Fixes: fad6cbe9b2 ("rtc: Add driver for RTC in Sunplus SP7021")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230117172450.2938962-1-arnd@kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Skip interference with an ongoing transaction, do not perform garbage
collection on inactive elements. Reset annotated previous end interval
if the expired element is marked as busy (control plane removed the
element right before expiration).
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
...instead of a tree descent, which became overly complicated in an
attempt to cover cases where expired or inactive elements would affect
comparisons with the new element being inserted.
Further, it turned out that it's probably impossible to cover all those
cases, as inactive nodes might entirely hide subtrees consisting of a
complete interval plus a node that makes the current insertion not
overlap.
To speed up the overlap check, descent the tree to find a greater
element that is closer to the key value to insert. Then walk down the
node list for overlap detection. Starting the overlap check from
rb_first() unconditionally is slow, it takes 10 times longer due to the
full linear traversal of the list.
Moreover, perform garbage collection of expired elements when walking
down the node list to avoid bogus overlap reports.
For the insertion operation itself, this essentially reverts back to the
implementation before commit 7c84d41416 ("netfilter: nft_set_rbtree:
Detect partial overlaps on insertion"), except that cases of complete
overlap are already handled in the overlap detection phase itself, which
slightly simplifies the loop to find the insertion point.
Based on initial patch from Stefano Brivio, including text from the
original patch description too.
Fixes: 7c84d41416 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The Asus U46E backlight tables have a set of interesting problems:
1. Its ACPI tables do make _OSI ("Windows 2012") checks, so
acpi_osi_is_win8() should return true.
But the tables have 2 sets of _OSI calls, one from the usual global
_INI method setting a global OSYS variable and a second set of _OSI
calls from a MSOS method and the MSOS method is the only one calling
_OSI ("Windows 2012").
The MSOS method only gets called in the following cases:
1. From some Asus specific WMI methods
2. From _DOD, which only runs after acpi_video_get_backlight_type()
has already been called by the i915 driver
3. From other ACPI video bus methods which never run (see below)
4. From some EC query callbacks
So when i915 calls acpi_video_get_backlight_type() MSOS has never run
and acpi_osi_is_win8() returns false, so acpi_video_get_backlight_type()
returns acpi_video as the desired backlight type, which causes
the intel_backlight device to not register.
2. _DOD effectively does this:
Return (Package (0x01)
{
0x0400
})
causing acpi_video_device_in_dod() to return false, which causes
the acpi_video backlight device to not register.
Leaving the user with no backlight device at all. Note that before 6.1.y
the i915 driver would register the intel_backlight device unconditionally
and since that then was the only backlight device userspace would use that.
Add a backlight=native DMI quirk for this special laptop to restore
the old (and working) behavior of the intel_backlight device registering.
Fixes: fb1836c913 ("ACPI: video: Prefer native over vendor")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The HP EliteBook 8460p predates Windows 8, so it defaults to using
acpi_video# for backlight control.
Starting with the 6.1.y kernels the native radeon_bl0 backlight is hidden
in this case instead of relying on userspace preferring acpi_video# over
native backlight devices.
It turns out that for the acpi_video# interface to work on
the HP EliteBook 8460p, the brightness needs to be set at least once
through the native interface, which now no longer is done breaking
backlight control.
The native interface however always works without problems, so add
a quirk to use native backlight on the EliteBook 8460p to fix this.
Fixes: fb1836c913 ("ACPI: video: Prefer native over vendor")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2161428
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The HP Pavilion g6-1d80nr predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.
Add a DMI quirk to use the native backlight interface which does
work properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull VFIO fixes from Alex Williamson:
- Honor reserved regions when testing for IOMMU find grained super page
support, avoiding a regression on s390 for a firmware device where
the existence of the mapping, even if unused can trigger an error
state. (Niklas Schnelle)
- Fix a deadlock in releasing KVM references by using the alternate
.release() rather than .destroy() callback for the kvm-vfio device.
(Yi Liu)
* tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio:
kvm/vfio: Fix potential deadlock on vfio group_lock
vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()
Pull EFI fixes from Ard Biesheuvel:
"Another couple of EFI fixes, of which the first two were already in
-next when I sent out the previous PR, but they caused some issues on
non-EFI boots so I let them simmer for a bit longer.
- ensure the EFI ResetSystem and ACPI PRM calls are recognized as
users of the EFI runtime, and therefore protected against
exceptions
- account for the EFI runtime stack in the stacktrace code
- remove Matthew Garrett's MAINTAINERS entry for efivarfs"
* tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Remove Matthew Garrett as efivarfs maintainer
arm64: efi: Account for the EFI runtime stack in stack unwinder
arm64: efi: Avoid workqueue to check whether EFI runtime is live
The definition of intel_selftest_modify_policy() does not match the
declaration, as gcc-13 points out:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:29:5: error: conflicting types for 'intel_selftest_modify_policy' due to enum/integer mismatch; have 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, u32)' {aka 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, unsigned int)'} [-Werror=enum-int-mismatch]
29 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:11:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h:28:5: note: previous declaration of 'intel_selftest_modify_policy' with type 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, enum selftest_scheduler_modify)'
28 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change the type in the definition to match.
Fixes: 617e87c05c ("drm/i915/selftest: Fix hangcheck self test for GuC submission")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117163743.1003219-1-arnd@kernel.org
(cherry picked from commit 8d7eb8ed3f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Commit 0d0e7d1eea ("drm/i915/mtl: Define engine context layouts")
added the engine context for Meteor Lake. In a second revision of the
patch it was believed the xcs offsets were wrong due to a tagging
issue in the spec. The first version was actually correct, as shown
by the intel_lrc_live_selftests/live_lrc_layout test:
i915: Running gt_lrc
i915: Running intel_lrc_live_selftests/live_lrc_layout
bcs0: LRI command mismatch at dword 1, expected 1108101d found 11081019
[drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:236:DP-1] disconnected
bcs0: HW register image:
[0000] 00000000 1108101d 00022244 ffff0008 00022034 00000088 00022030 00000088
...
bcs0: SW register image:
[0000] 00000000 11081019 00022244 00090009 00022034 00000000 00022030 00000000
The difference in the 2 additional dwords (0x1d vs 0x19) are the offsets
0x120 / 0x124 that are indeed part of the context image.
Bspec: 45585
Fixes: 0d0e7d1eea ("drm/i915/mtl: Define engine context layouts")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230111235531.3353815-2-radhakrishna.sripada@intel.com
(cherry picked from commit ca54a9a32d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by
nvme_init_ctrl() so reorder the calls to avoid a NULL pointer
dereference.
Fixes: 6dfba1c09c ("nvme-fc: use the tagset alloc/free helpers")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.
Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.
Fixes: ac3a2585f0 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit 1d2e9b67b0 ("ARM: 9265/1: pass -march= only to compiler") added
a __thumb2__ define to ASFLAGS to avoid build errors in the crypto code,
which relies on __thumb2__ for preprocessing. Commit 59e2cf8d21 ("ARM:
9275/1: Drop '-mthumb' from AFLAGS_ISA") followed up on this by removing
-mthumb from AFLAGS so that __thumb2__ would not be defined when the
default target was ARMv7 or newer.
Unfortunately, the second commit's fix assumes that the toolchain
defaults to -mno-thumb / -marm, which is not the case for Debian's
arm-linux-gnueabihf target, which defaults to -mthumb:
$ echo | arm-linux-gnueabihf-gcc -dM -E - | grep __thumb
#define __thumb2__ 1
#define __thumb__ 1
This target is used by several CI systems, which will still see
redefined macro warnings, despite '-mthumb' not being present in the
flags:
<command-line>: warning: "__thumb2__" redefined
<built-in>: note: this is the location of the previous definition
Remove the global AFLAGS __thumb2__ define and move it to the crypto
folder where it is required by the imported OpenSSL algorithms; the rest
of the kernel should use the internal CONFIG_THUMB2_KERNEL symbol to
know whether or not Thumb2 is being used or not. Be sure that __thumb2__
is undefined first so that there are no macro redefinition warnings.
Link: https://github.com/ClangBuiltLinux/linux/issues/1772
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Fixes: 59e2cf8d21 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA")
Fixes: 1d2e9b67b0 ("ARM: 9265/1: pass -march= only to compiler")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
If we're using ring provided buffers with multishot receive, and we end
up doing an io-wq based issue at some points that also needs to select
a buffer, we'll lose the initially assigned buffer group as
io_ring_buffer_select() correctly clears the buffer group list as the
issue isn't serialized by the ctx uring_lock. This is fine for normal
receives as the request puts the buffer and finishes, but for multishot,
we will re-arm and do further receives. On the next trigger for this
multishot receive, the receive will try and pick from a buffer group
whose value is the same as the buffer ID of the las receive. That is
obviously incorrect, and will result in a premature -ENOUFS error for
the receive even if we had available buffers in the correct group.
Cache the buffer group value at prep time, so we can restore it for
future receives. This only needs doing for the above mentioned case, but
just do it by default to keep it easier to read.
Cc: stable@vger.kernel.org
Fixes: b3fdea6ecb ("io_uring: multishot recv")
Fixes: 9bb66906f2 ("io_uring: support multishot in recvmsg")
Cc: Dylan Yudaken <dylany@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When proxying IPv6 NDP requests, the adverts to the initial multicast
solicits are correct and working. On the other hand, when later a
reachability confirmation is requested (on unicast), no reply is sent.
This causes the neighbor entry expiring on the sending node, which is
mostly a non-issue, as a new multicast request is sent. There are
routers, where the multicast requests are intentionally delayed, and in
these environments the current implementation causes periodic packet
loss for the proxied endpoints.
The root cause is the erroneous decrease of the hop limit, as this
is checked in ndisc.c and no answer is generated when it's 254 instead
of the correct 255.
Cc: stable@vger.kernel.org
Fixes: 46c7655f0b ("ipv6: decrease hop limit counter in ip6_forward()")
Signed-off-by: Gergely Risko <gergely.risko@gmail.com>
Tested-by: Gergely Risko <gergely.risko@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean say:
====================
ethtool support for IEEE 802.3 MAC Merge layer
Change log
----------
v3->v4:
- add missing opening bracket in ocelot_port_mm_irq()
- moved cfg.verify_time range checking so that it actually takes place
for the updated rather than old value
v3 at:
https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464-1-vladimir.oltean@nxp.com/
v2->v3:
- made get_mm return int instead of void
- deleted ETHTOOL_A_MM_SUPPORTED
- renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE
- introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE
- cleaned up documentation
- rebased on top of PLCA changes
- renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_*
v2 at:
https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242-1-vladimir.oltean@nxp.com/
v1->v2:
I've decided to focus just on the MAC Merge layer for now, which is why
I am able to submit this patch set as non-RFC.
v1 (RFC) at:
https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/
What is being introduced
------------------------
TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99
(interspersing of express traffic). This is controlled through ethtool
netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool
commands are posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687-1-vladimir.oltean@nxp.com/
The MAC Merge layer has its own statistics counters
(ethtool --include-statistics --show-mm swp0) as well as two member
MACs, the statistics of which can be queried individually, through a new
ethtool netlink attribute, corresponding to:
$ ethtool -I --show-pause eno2 --src aggregate
$ ethtool -S eno2 --groups eth-mac eth-phy eth-ctrl rmon -- --src pmac
The core properties of the MAC Merge layer are described in great detail
in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format.
Devices for which the API is supported
--------------------------------------
I decided to start with the Ethernet switch on NXP LS1028A (Felix)
because of the smaller patch set. I also have support for the ENETC
controller pending.
I would like to get confirmation that the UAPI being proposed here will
not restrict any use cases known by other hardware vendors.
Why is support for preemptible traffic classes not here?
--------------------------------------------------------
There is legitimate concern whether the 802.1Q portion of the standard
(which traffic classes go to the eMAC and which to the pMAC) should be
modeled in Linux using tc or using another UAPI. I think that is
stalling the entire series, but should be discussed separately instead.
Removing FP adminStatus support makes me confident enough to submit this
patch set without an RFC tag (meaning: I wouldn't mind if it was merged
as is).
What is submitted here is sufficient for an LLDP daemon to do its job.
I've patched openlldp to advertise and configure frame preemption:
https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3
In case someone wants to try it out, here are some commands I've used.
# Configure the interfaces to receive and transmit LLDP Data Units
lldptool -L -i eno0 adminStatus=rxtx
lldptool -L -i swp0 adminStatus=rxtx
# Enable the transmission of certain TLVs on switch's interface
lldptool -T -i eno0 -V addEthCap enableTx=yes
lldptool -T -i swp0 -V addEthCap enableTx=yes
# Query LLDP statistics on switch's interface
lldptool -S -i swp0
# Query the received neighbor TLVs
lldptool -i swp0 -t -n -V addEthCap
Additional Ethernet Capabilities TLV
Preemption capability supported
Preemption capability enabled
Preemption capability active
Additional fragment size: 60 octets
So using this patch set, lldpad will be able to advertise and configure
frame preemption, but still, no data packet will be sent as preemptible
over the link, because there is no UAPI to control which traffic classes
are sent as preemptible and which as express.
Preemptable or preemptible?
---------------------------
IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible"
throughout. Because the definition of "preemptible" falls under 802.1Q's
jurisdiction and 802.3 just references it, I went with the 802.1Q naming
even where supporting an 802.3 feature. Also, checkpatch agrees with this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the fact that the kernel-side data structures have been carried
over from the ioctl-based ethtool, we are now in the situation where we
have an ethnl_update_bool32() function, but the plain function that
operates on a boolean value kept in an actual u8 netlink attribute
doesn't exist.
With new ethtool features that are exposed solely over netlink, the
kernel data structures will use the "bool" type, so we will need this
kind of helper. Introduce it now; it's needed for things like
verify-disabled for the MAC merge configuration.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several functions that take part in codec's initialization and removal
are re-used by ASoC codec drivers implementations. Drivers mimic the
behavior of hda_codec_driver_probe/remove() found in
sound/pci/hda/hda_bind.c with their component->probe/remove() instead.
One of the reasons for that is the expectation of
snd_hda_codec_device_new() to receive a valid pointer to an instance of
struct snd_card. This expectation can be met only once sound card
components probing commences.
As ASoC sound card may be unbound without codec device being actually
removed from the system, unsetting ->preset in
snd_hda_codec_cleanup_for_unbind() interferes with module unload -> load
scenario causing null-ptr-deref. Preset is assigned only once, during
device/driver matching whereas ASoC codec driver's module reloading may
occur several times throughout the lifetime of an audio stack.
Suggested-by: Takashi Iwai <tiwai@suse.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20230119143235.1159814-1-cezary.rojewski@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
int type = nla_type(nla);
if (type > XFRMA_MAX) {
return -EOPNOTSUPP;
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
if (nla_len(nla) < compat_policy[type].len) {
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
Fixes: 5106f4a8ac ("xfrm/compat: Add 32=>64-bit messages translator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull scheduler fixes from Borislav Petkov:
- Make sure the scheduler doesn't use stale frequency scaling values
when latter get disabled due to a value error
- Fix a NULL pointer access on UP configs
- Use the proper locking when updating CPU capacity
* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
sched/fair: Fixes for capacity inversion detection
sched/uclamp: Fix a uninitialized variable warnings
Pull EDAC fixes from Borislav Petkov:
- Respect user-supplied polling value in the EDAC device code
- Fix a use-after-free issue in qcom_edac
* tag 'edac_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info
EDAC/device: Respect any driver-supplied workqueue polling value
Pull perf fix from Borislav Petkov:
- Add Emerald Rapids model support to more perf machinery
* tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/cstate: Add Emerald Rapids
perf/x86/intel: Add Emerald Rapids
Pull gfs2 writepage fix from Andreas Gruenbacher:
- Fix a regression introduced by commit "gfs2: stop using
generic_writepages in gfs2_ail1_start_one".
* tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
reclaim_period_ms used to be positive only but the commit 0001725d0f
("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input
validation") incorrectly changed it to non-negative validation.
Change validation to allow only positive input.
Fixes: 0001725d0f ("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation")
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reported-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230111183408.104491-1-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.
This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.
Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.
VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].
We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.
This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.
[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.
[1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers
[2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table
Registers
[3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and
Descriptor-Table Registers
Cc: Alexander Graf <graf@amazon.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hendrik Borghorst <hborghor@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221114164823.69555-1-hborghor@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Placing a declaration of evt_reset is pedantically invalid
according to the C standard. While GCC does not really care
and only warns with -Wpedantic, clang ignores the declaration
altogether with an error:
x86_64/xen_shinfo_test.c:965:2: error: expected expression
struct kvm_xen_hvm_attr evt_reset = {
^
x86_64/xen_shinfo_test.c:969:38: error: use of undeclared identifier evt_reset
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);
^
Reported-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: a79b53aaaa ("KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET", 2022-12-28)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit b2b0a5e978 switched from generic_writepages() to
filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to
replacing ->writepage() with ->writepages() and eventually eliminating
the former. Function gfs2_ail1_start_one() is called from
gfs2_log_flush(), our main function for flushing the filesystem log.
Unfortunately, at least as implemented today, ->writepage() and
->writepages() are entirely different operations for journaled data
inodes: while the former creates and submits transactions covering the
data to be written, the latter flushes dirty buffers out to disk.
With gfs2_ail1_start_one() now calling ->writepages(), we end up
creating filesystem transactions while we are in the course of a log
flush, which immediately deadlocks on the sdp->sd_log_flush_lock
semaphore.
Work around that by going back to how things used to work before commit
b2b0a5e978 for now; figuring out a superior solution will take time we
don't have available right now. However ...
Since the removal of generic_writepages() is imminent, open-code it
here. We're already inside a blk_start_plug() ... blk_finish_plug()
section here, so skip that part of the original generic_writepages().
This reverts commit b2b0a5e978.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
KVM/arm64 fixes for 6.2, take #2
- Pass the correct address to mte_clear_page_tags() on initialising
a tagged page
- Plug a race against a GICv4.1 doorbell interrupt while saving
the vgic-v3 pending state.
The patch that fixed string control support somehow got mangled when it was
merged in mainline: the added line ended up in the wrong place.
Fix this.
Fixes: 73278d4833 ("media: v4l2-ctrls-api.c: add back dropped ctrl->is_new = 1")
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Pull another io_uring fix from Jens Axboe:
"Just a single fix for a regression that happened in this release due
to a poll change. Normally I would've just deferred it to next week,
but since the original fix got picked up by stable, I think it's
better to just send this one off separately.
The issue is around the poll race fix, and how it mistakenly also got
applied to multishot polling. Those don't need the race fix, and we
should not be doing any reissues for that case. Exhaustive test cases
were written and committed to the liburing regression suite for the
reported issue, and additions for similar issues"
* tag 'io_uring-6.2-2023-01-21' of git://git.kernel.dk/linux:
io_uring/poll: don't reissue in case of poll race on multishot request
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc and other subsystem driver fixes for
6.2-rc5 to resolve a few reported issues. They include:
- long time pending fastrpc fixes (should have gone into 6.1, my
fault)
- mei driver/bus fixes and new device ids
- interconnect driver fixes for reported problems
- vmci bugfix
- w1 driver bugfixes for reported problems
Almost all of these have been in linux-next with no reported problems,
the rest have all passed 0-day bot testing in my tree and on the
mailing lists where they have sat too long due to me taking a long
time to catch up on my pending patch queue"
* tag 'char-misc-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
VMCI: Use threaded irqs instead of tasklets
misc: fastrpc: Pass bitfield into qcom_scm_assign_mem
gsmi: fix null-deref in gsmi_get_variable
misc: fastrpc: Fix use-after-free race condition for maps
misc: fastrpc: Don't remove map on creater_process and device_release
misc: fastrpc: Fix use-after-free and race in fastrpc_map_find
misc: fastrpc: fix error code in fastrpc_req_mmap()
mei: me: add meteor lake point M DID
mei: bus: fix unlink on bus in error path
w1: fix WARNING after calling w1_process()
w1: fix deadloop in __w1_remove_master_device()
comedi: adv_pci1760: Fix PWM instruction handling
interconnect: qcom: rpm: Use _optional func for provider clocks
interconnect: qcom: msm8996: Fix regmap max_register values
interconnect: qcom: msm8996: Provide UFS clocks to A2NoC
dt-bindings: interconnect: Add UFS clocks to MSM8996 A2NoC
Pull driver core fixes from Greg KH:
"Here are three small driver and kernel core fixes for 6.2-rc5. They
include:
- potential gadget fixup in do_prlimit
- device property refcount leak fix
- test_async_probe bugfix for reported problem"
* tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
prlimit: do_prlimit needs to have a speculation check
driver core: Fix test_async_probe_init saves device in wrong array
device property: fix of node refcount leak in fwnode_graph_get_next_endpoint()
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix for 6.2-rc5. It resolves a build
issue reported and Fixed by Arnd in the vc04_services driver. It's
been in linux-next this week with no reported problems"
* tag 'staging-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_arm: fix enum vchiq_status return types
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.2-rc5 that
resolve a number of tiny reported issues and some new device ids. They
include:
- new device id for the exar serial driver
- speakup tty driver bugfix
- atmel serial driver baudrate fixup
- stm32 serial driver bugfix and then revert as the bugfix broke the
build. That will come back in a later pull request once it is all
worked out properly.
- amba-pl011 serial driver rs486 mode bugfix
- qcom_geni serial driver bugfix
Most of these have been in linux-next with no reported problems (well,
other than the build breakage which generated the revert), the new
device id passed 0-day testing"
* tag 'tty-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: exar: Add support for Sealevel 7xxxC serial cards
Revert "serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler"
tty: serial: qcom_geni: avoid duplicate struct member init
serial: atmel: fix incorrect baudrate setup
tty: fix possible null-ptr-defer in spk_ttyio_release
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
serial: amba-pl011: fix high priority character transmission in rs486 mode
serial: pch_uart: Pass correct sg to dma_unmap_sg()
tty: serial: qcom-geni-serial: fix slab-out-of-bounds on RX FIFO buffer
Pull USB / Thunderbolt fixes from Greg KH:
"Here are a number of small USB and Thunderbolt driver fixes and new
device id changes for 6.2-rc5. Included in here are:
- thunderbolt bugfixes for reported problems
- new usb-serial driver ids added
- onboard_hub usb driver fixes for much-reported problems
- xhci bugfixes
- typec bugfixes
- ehci-fsl driver module alias fix
- iowarrior header size fix
- usb gadget driver fixes
All of these, except for the iowarrior fix, have been in linux-next
with no reported issues. The iowarrior fix passed the 0-day testing
and is a one digit change based on a reported problem in the driver
(which was written to a spec, not the real device that is now
available)"
* tag 'usb-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (40 commits)
USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
usb: host: ehci-fsl: Fix module alias
usb: dwc3: fix extcon dependency
usb: core: hub: disable autosuspend for TI TUSB8041
USB: fix misleading usb_set_intfdata() kernel doc
usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
USB: gadget: Add ID numbers to configfs-gadget driver names
usb: typec: tcpm: Fix altmode re-registration causes sysfs create fail
usb: gadget: g_webcam: Send color matching descriptor per frame
usb: typec: altmodes/displayport: Use proper macro for pin assignment check
usb: typec: altmodes/displayport: Fix pin assignment calculation
usb: typec: altmodes/displayport: Add pin assignment helper
usb: gadget: f_fs: Ensure ep0req is dequeued before free_request
usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait
usb: misc: onboard_hub: Move 'attach' work to the driver
usb: misc: onboard_hub: Invert driver registration order
usb: ucsi: Ensure connector delayed work items are flushed
usb: musb: fix error return code in omap2430_probe()
usb: chipidea: core: fix possible constant 0 if use IS_ERR(ci->role_switch)
xhci: Detect lpm incapable xHC USB3 roothub ports from ACPI tables
...
Pull Kbuild fixes from Masahiro Yamada:
- Hide LDFLAGS_vmlinux from decompressor Makefiles to fix error
messages when GNU Make 4.4 is used.
- Fix 'make modules' build error when CONFIG_DEBUG_INFO_BTF_MODULES=y.
- Fix warnings emitted by GNU Make 4.4 in scripts/kconfig/Makefile.
- Support GNU Make 4.4 for scripts/jobserver-exec.
- Show clearer error message when kernel/gen_kheaders.sh fails due to
missing cpio.
* tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: explicitly validate existence of cpio command
scripts: support GNU make 4.4 in jobserver-exec
kconfig: Update all declared targets
scripts: rpm: make clear that mkspec script contains 4.13 feature
init/Kconfig: fix LOCALVERSION_AUTO help text
kbuild: fix 'make modules' error when CONFIG_DEBUG_INFO_BTF_MODULES=y
kbuild: export top-level LDFLAGS_vmlinux only to scripts/Makefile.vmlinux
init/version-timestamp.c: remove unneeded #include <linux/version.h>
docs: kbuild: remove mention to dropped $(objtree) feature
+/-1200uT is a MAGN sensor full measurement range. Magnetometer scale
is the magnetic sensitivity parameter. It is referenced as 0.1uT
according to datasheet and magnetometer channel unit is Gauss in
sysfs-bus-iio documentation. Gauss and uTesla unit conversion
relationship as follows: 0.1uT = 0.001Gs.
Set magnetometer scale and available magnetometer scale as fixed 0.001Gs.
Fixes: 84e5ddd5c4 ("iio: imu: Add support for the FXOS8700 IMU")
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Link: https://lore.kernel.org/r/20230118074227.1665098-5-carlos.song@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced. In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Tested-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed.
The opposite is done once the state is saved.
This is all done by using the activate/deactivate irqdomain callbacks
directly from the vgic code. Crutially, this is done without holding
the irqdesc lock for the interrupts that represent the VPE. And these
callbacks are changing the state of the irqdesc. What could possibly
go wrong?
If a doorbell fires while we are messing with the irqdesc state,
it will acquire the lock and change the interrupt state concurrently.
Since we don't hole the lock, curruption occurs in on the interrupt
state. Oh well.
While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt,
do what we have to do, and re-request it.
It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.
Fixes: f66b7b151e ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables")
Reported-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com
Commit d77e59a8fc ("arm64: mte: Lock a page for MTE tag
initialisation") added a call to mte_clear_page_tags() in case a
prior mte_copy_tags_from_user() failed in order to avoid stale tags in
the guest page (it should have really been a separate commit).
Unfortunately, the argument passed to this function was the address of
the struct page rather than the actual page address. Fix this function
call.
Fixes: d77e59a8fc ("arm64: mte: Lock a page for MTE tag initialisation")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230119170902.1574756-1-catalin.marinas@arm.com
If net_assign_generic() fails, the current error path in ops_init() tries
to clear the gen pointer slot. Anyway, in such error path, the gen pointer
itself has not been modified yet, and the existing and accessed one is
smaller than the accessed index, causing an out-of-bounds error:
BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320
Write of size 8 at addr ffff888109124978 by task modprobe/1018
CPU: 2 PID: 1018 Comm: modprobe Not tainted 6.2.0-rc2.mptcp_ae5ac65fbed5+ #1641
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9f
print_address_description.constprop.0+0x86/0x2b5
print_report+0x11b/0x1fb
kasan_report+0x87/0xc0
ops_init+0x2de/0x320
register_pernet_operations+0x2e4/0x750
register_pernet_subsys+0x24/0x40
tcf_register_action+0x9f/0x560
do_one_initcall+0xf9/0x570
do_init_module+0x190/0x650
load_module+0x1fa5/0x23c0
__do_sys_finit_module+0x10d/0x1b0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f42518f778d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff96869688 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 00005568ef7f7c90 RCX: 00007f42518f778d
RDX: 0000000000000000 RSI: 00005568ef41d796 RDI: 0000000000000003
RBP: 00005568ef41d796 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
R13: 00005568ef7f7d30 R14: 0000000000040000 R15: 0000000000000000
</TASK>
This change addresses the issue by skipping the gen pointer
de-reference in the mentioned error-path.
Found by code inspection and verified with explicit error injection
on a kasan-enabled kernel.
Fixes: d266935ac4 ("net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/cec4e0f3bb2c77ac03a6154a8508d3930beb5f0f.1674154348.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most netlink attributes are parsed and validated from
__nla_validate_parse() or validate_nla()
u16 type = nla_type(nla);
if (type == 0 || type > maxtype) {
/* error or continue */
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
This should take care of vast majority of netlink uses,
but an audit is needed to take care of others where
validation is not yet centralized in core netlink functions.
Fixes: bfa83a9e03 ("[NETLINK]: Type-safe netlink messages/attributes interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull gpio fixes from Bartosz Golaszewski:
- fix a potential race condition and always set GPIOs used as interrupt
source to input in gpio-mxc
- fix a GPIO ACPI-related issue with system suspend on Clevo NL5xRU
* tag 'gpio-fixes-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: acpi: Add a ignore wakeup quirk for Clevo NL5xRU
gpiolib: acpi: Allow ignoring wake capability on pins that aren't in _AEI
gpio: mxc: Always set GPIOs used as interrupt source to INPUT mode
gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock
Pull cifs fixes from Steve French:
- important fix for packet signature calculation error
- three fixes to correct DFS deadlock, and DFS refresh problem
- remove an unused DFS function, and duplicate tcon refresh code
- DFS cache lookup fix
- uninitialized rc fix
* tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: remove unused function
cifs: do not include page data when checking signature
cifs: fix return of uninitialized rc in dfs_cache_update_tgthint()
cifs: handle cache lookup errors different than -ENOENT
cifs: remove duplicate code in __refresh_tcon()
cifs: don't take exclusive lock for updating target hints
cifs: avoid re-lookups in dfs_cache_find()
cifs: fix potential deadlock in cache_refresh_path()
Pull pin control fixes from Linus Walleij:
- Compilation fix for Sunplus sp7021
- Add some missing headers after a cleanup to the Nomadik driver
- Fix pull type and mux routes on Rockchip RK3568
* tag 'pinctrl-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: rockchip: fix mux route data for rk3568
pinctrl: rockchip: fix reading pull type on rk3568
pinctrl: nomadik: Add missing header(s)
pinctrl: sp7021: fix unused function warning
Pull a Microchip clock fix from Claudiu Beznea:
Only one fix for Polarfire SoCs at this time as follows:
- replace devm_kzalloc() with devm_kasprintf(); this has been marked as
fix to avoid having registered 2 clocks with the same or invalid name in
case device tree node addresses will be longer such that clocks
registered with name patern "ccc<node_address>_pll<N>" will exeed the
allocated space.
* tag 'clk-microchip-fixes-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
clk: microchip: mpfs-ccc: Use devm_kasprintf() for allocating formatted strings
Pull rdma fixes from Jason Gunthorpe:
- Several hfi1 patches fixing some long standing driver bugs
- Overflow when working with sg lists with elements greater than 4G
- An rxe regression with object numbering after the mrs reach their
limit
- A theoretical problem with the scatterlist merging code
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
lib/scatterlist: Fix to calculate the last_pg properly
IB/hfi1: Remove user expected buffer invalidate race
IB/hfi1: Immediately remove invalid memory from hardware
IB/hfi1: Fix expected receive setup error exit issues
IB/hfi1: Reserve user expected TIDs
IB/hfi1: Reject a zero-length user expected buffer
RDMA/core: Fix ib block iterator counter overflow
RDMA/rxe: Prevent faulty rkey generation
RDMA/rxe: Fix inaccurate constants in rxe_type_info
A previous commit fixed a poll race that can occur, but it's only
applicable for multishot requests. For a multishot request, we can safely
ignore a spurious wakeup, as we never leave the waitqueue to begin with.
A blunt reissue of a multishot armed request can cause us to leak a
buffer, if they are ring provided. While this seems like a bug in itself,
it's not really defined behavior to reissue a multishot request directly.
It's less efficient to do so as well, and not required to rearm anything
like it is for singleshot poll requests.
Cc: stable@vger.kernel.org
Fixes: 6e5aedb932 ("io_uring/poll: attempt request issue after racy poll wakeup")
Reported-and-tested-by: Olivier Langlois <olivier@trillion01.com>
Link: https://github.com/axboe/liburing/issues/778
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If ksmbd.mountd is configured to assign unknown users to the guest account
("map to guest = bad user" in the config), ksmbd signs the response.
This is wrong according to MS-SMB2 3.3.5.5.3:
12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
field, and Session.IsAnonymous is FALSE, the server MUST sign the
final session setup response before sending it to the client, as
follows:
[...]
This fixes libsmb2 based applications failing to establish a session
("Wrong signature in received").
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull block fixes from Jens Axboe:
"Various little tweaks all over the place:
- NVMe pull request via Christoph:
- fix controller shutdown regression in nvme-apple (Janne Grunau)
- fix a polling on timeout regression in nvme-pci (Keith Busch)
- Fix a bug in the read request side request allocation caching
(Pavel)
- pktcdvd was brought back after we configured a NULL return on bio
splits, make it consistent with the others (me)
- BFQ refcount fix (Yu)
- Block cgroup policy activation fix (Yu)
- Fix for an md regression introduced in the 6.2 cycle (Adrian)"
* tag 'block-6.2-2023-01-20' of git://git.kernel.dk/linux:
nvme-pci: fix timeout request state check
nvme-apple: only reset the controller when RTKit is running
nvme-apple: reset controller during shutdown
block: fix hctx checks for batch allocation
block/rnbd-clt: fix wrong max ID in ida_alloc_max
blk-cgroup: fix missing pd_online_fn() while activating policy
pktcdvd: check for NULL returna fter calling bio_split_to_limits()
block, bfq: switch 'bfqg->ref' to use atomic refcount apis
md: fix incorrect declaration about claim_rdev in md_import_device
Pull io_uring fixes from Jens Axboe:
"Fixes for the MSG_RING opcode. Nothing really major:
- Fix an overflow missing serialization around posting CQEs to the
target ring (me)
- Disable MSG_RING on a ring that isn't enabled yet. There's nothing
really wrong with allowing it, but 1) it's somewhat odd as nobody
can receive them yet, and 2) it means that using the right delivery
mechanism might change. As nobody should be sending CQEs to a ring
that isn't enabled yet, let's just disable it (Pavel)
- Tweak to when we decide to post remotely or not for MSG_RING
(Pavel)"
* tag 'io_uring-6.2-2023-01-20' of git://git.kernel.dk/linux:
io_uring/msg_ring: fix remote queue to disabled ring
io_uring/msg_ring: fix flagging remote execution
io_uring/msg_ring: fix missing lock on overflow for IOPOLL
io_uring/msg_ring: move double lock/unlock helpers higher up
Pull btrfs fixes from David Sterba:
- fix potential out-of-bounds access to leaf data when seeking in an
inline file
- fix potential crash in quota when rescan races with disable
- reimplement super block signature scratching by marking page/folio
dirty and syncing block device, allow removing write_one_page
* tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between quota rescan and disable leading to NULL pointer deref
btrfs: fix invalid leaf access due to inline extent during lseek
btrfs: stop using write_one_page in btrfs_scratch_superblock
btrfs: factor out scratching of one regular super block
Pull Kselftest fix from Shuah Khan:
"Fix an error seen during unconfigured LLVM builds"
* tag 'linux-kselftest-fixes-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Fix error message for unconfigured LLVM builds
Pull thermal control fix from Rafael Wysocki:
"Modify __thermal_cooling_device_register() to make it call
put_device() after invoking device_register() and fix up a few error
paths calling thermal_cooling_device_destroy_sysfs() unnecessarily
(Viresh Kumar)"
* tag 'thermal-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: call put_device() only after device_register() fails
Pull ACPI fixes from Rafael Wysocki:
"These update the ACPICA entry in MAINTAINERS, add a backlight handling
quirk and fix the ACPI PRM (platform runtime) mechanism support.
Specifics:
- Update the ACPICA development list address in MAINTAINERS to the
new one that does not bounce (Rafael Wysocki)
- Check whether EFI runtime is available when registering the ACPI
PRM address space handler and when running it (Ard Biesheuvel)
- Add backlight=native DMI quirk for Acer Aspire 4810T to the ACPI
video driver (Hans de Goede)"
* tag 'acpi-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PRM: Check whether EFI runtime is available
ACPI: video: Add backlight=native DMI quirk for Acer Aspire 4810T
MAINTAINERS: Update the ACPICA development list address
Pull MMC fixes from Ulf Hansson:
- sunxi-mmc: Fix clock refcount imbalance during unbind
- sdhci-esdhc-imx: Fix some tuning settings
* tag 'mmc-v6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sunxi-mmc: Fix clock refcount imbalance during unbind
mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting
Pull ARM SoC DT and driver fixes from Arnd Bergmann:
"Lots of dts fixes for Qualcomm Snapdragon and NXP i.MX platforms,
including:
- A regression fix for SDHCI controllers on Inforce 6540, and another
SDHCI fix on SM8350
- Reenable cluster idle on sm8250 after the the code fix is upstream
- multiple fixes for the QMP PHY binding, needing an incompatible dt
change
- The reserved memory map is updated on Xiaomi Mi 4C and Huawei Nexus
6P, to avoid instabilities caused by use of protected memory
regions
- Fix i.MX8MP DT for missing GPC Interrupt, power-domain typo and USB
clock error
- A couple of verdin-imx8mm DT fixes for audio playback support
- Fix pca9547 i2c-mux node name for i.MX and Vybrid device trees
- Fix an imx93-11x11-evk uSDHC pad setting problem that causes Micron
eMMC CMD8 CRC error in HS400ES/HS400 mode
The remaining ARM and RISC-V platforms only have very few smaller dts
bugfixes this time:
- A fix for the SiFive unmatched board's PCI memory space
- A revert to fix a regression with GPIO on Marvell Armada
- A fix for the UART address on Marvell AC5
- Missing chip-select phandles for stm32 boards
- Selecting the correct clock for the sam9x60 memory controller
- Amlogic based Odroid-HC4 needs a revert to restore USB
functionality.
And finally, there are some minor code fixes:
- Build fixes for OMAP1, pxa, riscpc, raspberry pi firmware, and zynq
firmware
- memory controller driver fixes for an OMAP regression and older
bugs on tegra, atmel and mvebu
- reset controller fixes for ti-sci and uniphier platforms
- ARM SCMI firmware fixes for a couple of rare corner cases
- Qualcomm platform driver fixes for incorrect error handling and a
backwards compatibility fix for the apr driver using older dtb
- NXP i.MX SoC driver fixes for HDMI output, error handling in the
imx8 soc-id and missing reference counting on older cpuid code"
* tag 'soc-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (60 commits)
firmware: zynqmp: fix declarations for gcc-13
ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp151a-prtt1l
ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp157c-emstamp-argon
ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcom-som
ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcor-som
ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60
ARM: omap1: fix building gpio15xx
ARM: omap1: fix !ARCH_OMAP1_ANY link failures
firmware: raspberrypi: Fix type assignment
arm64: dts: qcom: msm8992-libra: Fix the memory map
arm64: dts: qcom: msm8992: Don't use sfpb mutex
PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe()
arm64: dts: msm8994-angler: fix the memory map
arm64: dts: marvell: AC5/AC5X: Fix address for UART1
ARM: footbridge: drop unnecessary inclusion
Revert "ARM: dts: armada-39x: Fix compatible string for gpios"
Revert "ARM: dts: armada-38x: Fix compatible string for gpios"
ARM: pxa: enable PXA310/PXA320 for DT-only build
riscv: dts: sifive: fu740: fix size of pcie 32bit memory
soc: qcom: apr: Make qcom,protection-domain optional again
...
The memory for llcc_driv_data is allocated by the LLCC driver. But when
it is passed as the private driver info to the EDAC core, it will get freed
during the qcom_edac driver release. So when the qcom_edac driver gets probed
again, it will try to use the freed data leading to the use-after-free bug.
Hence, do not pass llcc_driv_data as pvt_info but rather reference it
using the platform_data pointer in the qcom_edac driver.
Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Cc: <stable@vger.kernel.org> # 4.20
Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org
Pull dmaengine fixes from Vinod Koul:
- email address Update for Jie Hai
- fix double increment of client_count in dma_chan_get()
- idxd driver fixes: use after free, probe error handling and callback
on wq disable
- fix for qcom gpi driver GO tre
- ptdma locking fix
- tegra & imx-sdma mem leak fix
* tag 'dmaengine-fix-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
ptdma: pt_core_execute_cmd() should use spinlock
dmaengine: tegra: Fix memory leak in terminate_all()
dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node()
dmaengine: imx-sdma: Fix a possible memory leak in sdma_transfer_init
dmaengine: Fix double increment of client_count in dma_chan_get()
dmaengine: tegra210-adma: fix global intr clear
Add exception protection processing for vd in axi_chan_handle_err function
dmaengine: lgm: Move DT parsing after initialization
MAINTAINERS: update Jie Hai's email address
dmaengine: ti: k3-udma: Do conditional decrement of UDMA_CHAN_RT_PEER_BCNT_REG
dmaengine: idxd: Do not call DMX TX callbacks during workqueue disable
dmaengine: idxd: Prevent use after free on completion memory
dmaengine: idxd: Let probe fail when workqueue cannot be enabled
dmaengine: qcom: gpi: Set link_rx bit on GO TRE for rx operation
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bluetooth, bpf and netfilter.
Current release - regressions:
- Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
addrconf", fix nsna_ping mode of team
- wifi: mt76: fix bugs in Rx queue handling and DMA mapping
- eth: mlx5:
- add missing mutex_unlock in error reporter
- protect global IPsec ASO with a lock
Current release - new code bugs:
- rxrpc: fix wrong error return in rxrpc_connect_call()
Previous releases - regressions:
- bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
- wifi:
- mac80211: fix crashes on Rx due to incorrect initialization of
rx->link and rx->link_sta
- mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
aggregation handling, crashes
- brcmfmac: fix regression for Broadcom PCIe wifi devices
- rndis_wlan: prevent buffer overflow in rndis_query_oid
- netfilter: conntrack: handle tcp challenge acks during connection
reuse
- sched: avoid grafting on htb_destroy_class_offload when destroying
- virtio-net: correctly enable callback during start_xmit, fix stalls
- tcp: avoid the lookup process failing to get sk in ehash table
- ipa: disable ipa interrupt during suspend
- eth: stmmac: enable all safety features by default
Previous releases - always broken:
- bpf:
- fix pointer-leak due to insufficient speculative store bypass
mitigation (Spectre v4)
- skip task with pid=1 in send_signal_common() to avoid a splat
- fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
PERF_BPF_EVENT_PROG_UNLOAD events
- fix potential deadlock in htab_lock_bucket from same bucket
index but different map_locked index
- bluetooth:
- fix a buffer overflow in mgmt_mesh_add()
- hci_qca: fix driver shutdown on closed serdev
- ISO: fix possible circular locking dependency
- CIS: hci_event: fix invalid wait context
- wifi: brcmfmac: fixes for survey dump handling
- mptcp: explicitly specify sock family at subflow creation time
- netfilter: nft_payload: incorrect arithmetics when fetching VLAN
header bits
- tcp: fix rate_app_limited to default to 1
- l2tp: close all race conditions in l2tp_tunnel_register()
- eth: mlx5: fixes for QoS config and eswitch configuration
- eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
- eth: stmmac: fix invalid call to mdiobus_get_phy()
Misc:
- ethtool: add netlink attr in rss get reply only if the value is not
empty"
* tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
Revert "Merge branch 'octeontx2-af-CPT'"
tcp: fix rate_app_limited to default to 1
bnxt: Do not read past the end of test names
net: stmmac: enable all safety features by default
octeontx2-af: add mbox to return CPT_AF_FLT_INT info
octeontx2-af: update cpt lf alloc mailbox
octeontx2-af: restore rxc conf after teardown sequence
octeontx2-af: optimize cpt pf identification
octeontx2-af: modify FLR sequence for CPT
octeontx2-af: add mbox for CPT LF reset
octeontx2-af: recover CPT engine when it gets fault
net: dsa: microchip: ksz9477: port map correction in ALU table entry register
selftests/net: toeplitz: fix race on tpacket_v3 block close
net/ulp: use consistent error code when blocking ULP
octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
tcp: avoid the lookup process failing to get sk in ehash table
Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
MAINTAINERS: add networking entries for Willem
net: sched: gred: prevent races when adding offloads to stats
l2tp: prevent lockdep issue in l2tp_tunnel_register()
...
Make function buttons on ELECOM M-HT1DRBK trackball mouse work. This model
has two devices with different device IDs (010D and 011C). Both of
them misreports the number of buttons as 5 in the report descriptor, even
though they have 8 buttons. hid-elecom overwrites the report to fix them,
but supports only on 010D and does not work on 011C. This patch fixes
011C in the similar way but with specialized position parameters.
In fact, it is sufficient to rewrite only 17th byte (05 -> 08). However I
followed the existing way.
Signed-off-by: Takahiro Fujii <fujii@xaxxi.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Merge an ACPI PRM (platform runtime) support fix and an ACPI backlight
quirk for 6.2-rc5:
- Check whether EFI runtime is available when registering the ACPI PRM
address space handler and when running it (Ard Biesheuvel).
- Add backlight=native DMI quirk for Acer Aspire 4810T to the ACPI
video driver (Hans de Goede).
* acpi-prm:
ACPI: PRM: Check whether EFI runtime is available
* acpi-video:
ACPI: video: Add backlight=native DMI quirk for Acer Aspire 4810T
Using kunit_fail_current_test() in a loadable module causes a link
error like:
ERROR: modpost: "kunit_running" [drivers/gpu/drm/vc4/vc4.ko] undefined!
Export the symbol to allow using it from modules.
Fixes: da43ff045c ("drm/vc4: tests: Fail the current test if we access a register")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
iavf_replace_primary_mac() utilizes queue_work() to schedule the
watchdog task but that only ensures that the watchdog task is queued
to run. To make sure the watchdog is executed asap use
mod_delayed_work().
Without this patch it may take up to 2s until the watchdog task gets
executed, which may cause long delays when setting the MAC address.
Fixes: a3e839d539 ("iavf: Add usage of new virtchnl format to set default MAC")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove netdev_update_features() from iavf_adminq_task(), as it can cause
deadlocks due to needing rtnl_lock. Instead use the
IAVF_FLAG_SETUP_NETDEV_FEATURES flag to indicate that netdev features need
to be updated in the watchdog task. iavf_set_vlan_offload_features()
and iavf_set_queue_vlan_tag_loc() can be called directly from
iavf_virtchnl_completion().
Suggested-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We are seeing an issue where setting the MAC address on iavf fails with
EAGAIN after the 2.5s timeout expires in iavf_set_mac().
There is the following deadlock scenario:
iavf_set_mac(), holding rtnl_lock, waits on:
iavf_watchdog_task (within iavf_wq) to send a message to the PF,
and
iavf_adminq_task (within iavf_wq) to receive a response from the PF.
In this adapter state (>=__IAVF_DOWN), these tasks do not need to take
rtnl_lock, but iavf_wq is a global single-threaded workqueue, so they
may get stuck waiting for another adapter's iavf_watchdog_task to run
iavf_init_config_adapter(), which does take rtnl_lock.
The deadlock resolves itself by the timeout in iavf_set_mac(),
which results in EAGAIN returned to userspace.
Let's break the deadlock loop by changing iavf_wq into a per-adapter
workqueue, so that one adapter's tasks are not blocked by another's.
Fixes: 35a2443d09 ("iavf: Add waiting for response from PF in set mac")
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
IORING_SETUP_R_DISABLED rings don't have the submitter task set, so
it's not always safe to use ->submitter_task. Disallow posting msg_ring
messaged to disabled rings. Also add task NULL check for loosy sync
around testing for IORING_SETUP_R_DISABLED.
Cc: stable@vger.kernel.org
Fixes: 6d043ee116 ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a couple of problems with queueing a tw in io_msg_ring_data()
for remote execution. First, once we queue it the target ring can
go away and so setting IORING_SQ_TASKRUN there is not safe. Secondly,
the userspace might not expect IORING_SQ_TASKRUN.
Extract a helper and uniformly use TWA_SIGNAL without TWA_SIGNAL_NO_IPI
tricks for now, just as it was done in the original patch.
Cc: stable@vger.kernel.org
Fixes: 6d043ee116 ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit b4fbf0b27f, reversing
changes made to 6c977c5c2e.
This seems like net-next material.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We take the BTF reference before we register dtors and we need
to put it back when it's done.
We probably won't se a problem with kernel BTF, but module BTF
would stay loaded (because of the extra ref) even when its module
is removed.
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Fixes: 5ce937d613 ("bpf: Populate pairs of btf_id and destructor kfunc in btf")
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230120122148.1522359-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently it is possible that the final put of a KVM reference comes from
vfio during its device close operation. This occurs while the vfio group
lock is held; however, if the vfio device is still in the kvm device list,
then the following call chain could result in a deadlock:
VFIO holds group->group_lock/group_rwsem
-> kvm_put_kvm
-> kvm_destroy_vm
-> kvm_destroy_devices
-> kvm_vfio_destroy
-> kvm_vfio_file_set_kvm
-> vfio_file_set_kvm
-> try to hold group->group_lock/group_rwsem
The key function is the kvm_destroy_devices() which triggers destroy cb
of kvm_device_ops. It calls back to vfio and try to hold group_lock. So
if this path doesn't call back to vfio, this dead lock would be fixed.
Actually, there is a way for it. KVM provides another point to free the
kvm-vfio device which is the point when the device file descriptor is
closed. This can be achieved by providing the release cb instead of the
destroy cb. Also rename kvm_vfio_destroy() to be kvm_vfio_release().
/*
* Destroy is responsible for freeing dev.
*
* Destroy may be called before or after destructors are called
* on emulated I/O regions, depending on whether a reference is
* held by a vcpu or other kvm component that gets destroyed
* after the emulated I/O.
*/
void (*destroy)(struct kvm_device *dev);
/*
* Release is an alternative method to free the device. It is
* called when the device file descriptor is closed. Once
* release is called, the destroy method will not be called
* anymore as the device is removed from the device list of
* the VM. kvm->lock is held.
*/
void (*release)(struct kvm_device *dev);
Fixes: 421cfe6596 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230114000351.115444-1-mjrosato@linux.ibm.com
Link: https://lore.kernel.org/r/20230120150528.471752-1-yi.l.liu@intel.com
[aw: update comment as well, s/destroy/release/]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.2
- fix controller shutdown regression in nvme-apple (Janne Grunau)
- fix a polling on timeout regression in nvme-pci (Keith Busch)"
* tag 'nvme-6.2-2023-01-20' of git://git.infradead.org/nvme:
nvme-pci: fix timeout request state check
nvme-apple: only reset the controller when RTKit is running
nvme-apple: reset controller during shutdown
The initial default value of 0 for tp->rate_app_limited was incorrect,
since a flow is indeed application-limited until it first sends
data. Fixing the default to be 1 is generally correct but also
specifically will help user-space applications avoid using the initial
tcpi_delivery_rate value of 0 that persists until the connection has
some non-zero bandwidth sample.
Fixes: eb8329e0a0 ("tcp: export data delivery rate")
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The srcvm parameter of qcom_scm_assign_mem is a pointer to a bitfield of
VMIDs. The bitfield is updated with which VMIDs have permissions
after the qcom_scm_assign_mem call. This makes it simpler for clients to
make qcom_scm_assign_mem calls later, they always pass in same srcvm
bitfield and do not need to closely track whether memory was originally
shared.
When restoring permissions to HLOS, fastrpc is incorrectly using the
first VMID directly -- neither the BIT nor the other possible VMIDs the
memory was already assigned to. We already have a field intended for
this purpose: "perms" in the struct fastrpc_channel_ctx, but it was
never used. Start using the perms field.
Cc: Abel Vesa <abel.vesa@linaro.org>
Cc: Vamsi Krishna Gattupalli <quic_vgattupa@quicinc.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: e90d911906 ("misc: fastrpc: Add support to secure memory map")
Fixes: 0871561055 ("misc: fastrpc: Add support for audiopd")
Fixes: 532ad70c6d ("misc: fastrpc: Add mmap request assigning for static PD pool")
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
drivers/misc/fastrpc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
Link: https://lore.kernel.org/r/20230112182313.521467-1-quic_eberman@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible that in between calling fastrpc_map_get() until
map->fl->lock is taken in fastrpc_free_map(), another thread can call
fastrpc_map_lookup() and get a reference to a map that is about to be
deleted.
Rewrite fastrpc_map_get() to only increase the reference count of a map
if it's non-zero. Propagate this to callers so they can know if a map is
about to be deleted.
Fixes this warning:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 5 PID: 10100 at lib/refcount.c:25 refcount_warn_saturate
...
Call trace:
refcount_warn_saturate
[fastrpc_map_get inlined]
[fastrpc_map_lookup inlined]
fastrpc_map_create
fastrpc_internal_invoke
fastrpc_device_ioctl
__arm64_sys_ioctl
invoke_syscall
Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable@kernel.org>
Signed-off-by: Ola Jeppsson <ola@snap.com>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221124174941.418450-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, there is a race window between the point when the mutex is
unlocked in fastrpc_map_lookup and the reference count increasing
(fastrpc_map_get) in fastrpc_map_find, which can also lead to
use-after-free.
So lets merge fastrpc_map_find into fastrpc_map_lookup which allows us
to both protect the maps list by also taking the &fl->lock spinlock and
the reference count, since the spinlock will be released only after.
Add take_ref argument to make this suitable for all callers.
Fixes: 8f6c1d8c4f ("misc: fastrpc: Add fdlist implementation")
Cc: stable <stable@kernel.org>
Co-developed-by: Ola Jeppsson <ola@snap.com>
Signed-off-by: Ola Jeppsson <ola@snap.com>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221124174941.418450-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Georgi writes:
interconnect fixes for v6.2-rc
This contains fixes for a rare boot hang issue that has been reported
on the db820c dragonboard.
- dt-bindings: interconnect: Add UFS clocks to MSM8996 A2NoC
- interconnect: qcom: msm8996: Provide UFS clocks to A2NoC
- interconnect: qcom: msm8996: Fix regmap max_register values
- interconnect: qcom: rpm: Use _optional func for provider clocks
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: rpm: Use _optional func for provider clocks
interconnect: qcom: msm8996: Fix regmap max_register values
interconnect: qcom: msm8996: Provide UFS clocks to A2NoC
dt-bindings: interconnect: Add UFS clocks to MSM8996 A2NoC
gcc-13.0.1 reports a type mismatch for two functions:
drivers/firmware/xilinx/zynqmp.c:1228:5: error: conflicting types for 'zynqmp_pm_set_rpu_mode' due to enum/integer mismatch; have 'int(u32, enum rpu_oper_mode)' {aka 'int(unsigned int, enum rpu_oper_mode)'} [-Werror=enum-int-mismatch]
1228 | int zynqmp_pm_set_rpu_mode(u32 node_id, enum rpu_oper_mode rpu_mode)
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/firmware/xilinx/zynqmp.c:25:
include/linux/firmware/xlnx-zynqmp.h:552:5: note: previous declaration of 'zynqmp_pm_set_rpu_mode' with type 'int(u32, u32)' {aka 'int(unsigned int, unsigned int)'}
552 | int zynqmp_pm_set_rpu_mode(u32 node_id, u32 arg1);
| ^~~~~~~~~~~~~~~~~~~~~~
drivers/firmware/xilinx/zynqmp.c:1246:5: error: conflicting types for 'zynqmp_pm_set_tcm_config' due to enum/integer mismatch; have 'int(u32, enum rpu_tcm_comb)' {aka 'int(unsigned int, enum rpu_tcm_comb)'} [-Werror=enum-int-mismatch]
1246 | int zynqmp_pm_set_tcm_config(u32 node_id, enum rpu_tcm_comb tcm_mode)
| ^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/firmware/xlnx-zynqmp.h:553:5: note: previous declaration of 'zynqmp_pm_set_tcm_config' with type 'int(u32, u32)' {aka 'int(unsigned int, unsigned int)'}
553 | int zynqmp_pm_set_tcm_config(u32 node_id, u32 arg1);
| ^~~~~~~~~~~~~~~~~~~~~~~~
Change the declaration in the header to match the function definition.
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In the original implementation of dwmac5
commit 8bf993a587 ("net: stmmac: Add support for DWMAC5 and implement Safety Features")
all safety features were enabled by default.
Later it seems some implementations didn't have support for all the
features, so in
commit 5ac712dcdf ("net: stmmac: enable platform specific safety features")
the safety_feat_cfg structure was added to the callback and defined for
some platforms to selectively enable these safety features.
The problem is that only certain platforms were given that software
support. If the automotive safety package bit is set in the hardware
features register the safety feature callback is called for the platform,
and for platforms that didn't get a safety_feat_cfg defined this results
in the following NULL pointer dereference:
[ 7.933303] Call trace:
[ 7.935812] dwmac5_safety_feat_config+0x20/0x170 [stmmac]
[ 7.941455] __stmmac_open+0x16c/0x474 [stmmac]
[ 7.946117] stmmac_open+0x38/0x70 [stmmac]
[ 7.950414] __dev_open+0x100/0x1dc
[ 7.954006] __dev_change_flags+0x18c/0x204
[ 7.958297] dev_change_flags+0x24/0x6c
[ 7.962237] do_setlink+0x2b8/0xfa4
[ 7.965827] __rtnl_newlink+0x4ec/0x840
[ 7.969766] rtnl_newlink+0x50/0x80
[ 7.973353] rtnetlink_rcv_msg+0x12c/0x374
[ 7.977557] netlink_rcv_skb+0x5c/0x130
[ 7.981500] rtnetlink_rcv+0x18/0x2c
[ 7.985172] netlink_unicast+0x2e8/0x340
[ 7.989197] netlink_sendmsg+0x1a8/0x420
[ 7.993222] ____sys_sendmsg+0x218/0x280
[ 7.997249] ___sys_sendmsg+0xac/0x100
[ 8.001103] __sys_sendmsg+0x84/0xe0
[ 8.004776] __arm64_sys_sendmsg+0x24/0x30
[ 8.008983] invoke_syscall+0x48/0x114
[ 8.012840] el0_svc_common.constprop.0+0xcc/0xec
[ 8.017665] do_el0_svc+0x38/0xb0
[ 8.021071] el0_svc+0x2c/0x84
[ 8.024212] el0t_64_sync_handler+0xf4/0x120
[ 8.028598] el0t_64_sync+0x190/0x194
Go back to the original behavior, if the automotive safety package
is found to be supported in hardware enable all the features unless
safety_feat_cfg is passed in saying this particular platform only
supports a subset of the features.
Fixes: 5ac712dcdf ("net: stmmac: enable platform specific safety features")
Reported-by: Ning Cai <ncai@quicinc.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix multiple W=1 kernel-doc warnings in i2c-rk3x.c:
drivers/i2c/busses/i2c-rk3x.c:83: warning: missing initial short description on line:
* struct i2c_spec_values:
drivers/i2c/busses/i2c-rk3x.c:139: warning: missing initial short description on line:
* struct rk3x_i2c_calced_timings:
drivers/i2c/busses/i2c-rk3x.c:162: warning: missing initial short description on line:
* struct rk3x_i2c_soc_data:
drivers/i2c/busses/i2c-rk3x.c:242: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Generate a START condition, which triggers a REG_INT_START interrupt.
drivers/i2c/busses/i2c-rk3x.c:261: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Generate a STOP condition, which triggers a REG_INT_STOP interrupt.
drivers/i2c/busses/i2c-rk3x.c:304: warning: expecting prototype for Setup a read according to i2c(). Prototype was for rk3x_i2c_prepare_read() instead
drivers/i2c/busses/i2c-rk3x.c:335: warning: expecting prototype for Fill the transmit buffer with data from i2c(). Prototype was for rk3x_i2c_fill_transmit_buf() instead
drivers/i2c/busses/i2c-rk3x.c:535: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Get timing values of I2C specification
drivers/i2c/busses/i2c-rk3x.c:552: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Calculate divider values for desired SCL frequency
drivers/i2c/busses/i2c-rk3x.c:713: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Calculate timing values for desired SCL frequency
drivers/i2c/busses/i2c-rk3x.c:963: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Setup I2C registers for an I2C operation specified by msgs, num.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Add "struct" to prevent this kernel-doc warning:
drivers/i2c/busses/i2c-axxia.c:135: warning: cannot understand function prototype: 'struct axxia_i2c_dev '
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Srujana Challa says:
====================
octeontx2-af: Miscellaneous changes for CPT
This patchset consists of miscellaneous changes for CPT.
- Adds a new mailbox to reset the requested CPT LF.
- Modify FLR sequence as per HW team suggested.
- Adds support to recover CPT engines when they gets fault.
- Updates CPT inbound inline IPsec configuration mailbox,
as per new generation of the OcteonTX2 chips.
- Adds a new mailbox to return CPT FLT Interrupt info.
---
v2:
- Addressed a review comment.
v1:
- Dropped patch "octeontx2-af: Fix interrupt name strings completely"
to submit to net.
---
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CPT HW would trigger the CPT AF FLT interrupt when CPT engines
hits some uncorrectable errors and AF is the one which receives
the interrupt and recovers the engines.
This patch adds a mailbox for CPT VFs to request for CPT faulted
and recovered engines info.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CN10K CPT coprocessor contains a context processor
to accelerate updates to the IPsec security association
contexts. The context processor contains a context cache.
This patch updates CPT LF ALLOC mailbox to config ctx_ilen
requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
initial context fetch.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
The age/threshold is being set to minimum values to evict
all entries at the time of teardown.
This patch adds code to restore timeout and threshold config
after teardown sequence is complete as it is global config.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize CPT PF identification in mbox handling for faster
mbox response by doing it at AF driver probe instead of
every mbox message.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTX2 platform CPT instruction enqueue is only
possible via LMTST operations.
The existing FLR sequence mentioned in HRM requires
a dummy LMTST to CPT but LMTST can't be submitted from
AF driver. So, HW team provided a new sequence to avoid
dummy LMTST. This patch adds code for the same.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTX2 SoC, the admin function (AF) is the only one with all
priviliges to configure HW and alloc resources, PFs and it's VFs
have to request AF via mailbox for all their needs.
This patch adds a new mailbox for CPT VFs to request for CPT LF
reset.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CPT engine has uncorrectable errors, it will get halted and
must be disabled and re-enabled. This patch adds code for the same.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The preferred form for Renesas' compatible strings is:
"<vendor>,<family>-<module>"
Somehow the compatible string for the r9a09g011 I2C IP was upstreamed
as renesas,i2c-r9a09g011 instead of renesas,r9a09g011-i2c, which
is really confusing, especially considering the generic fallback
is renesas,rzv2m-i2c.
The first user of renesas,i2c-r9a09g011 in the kernel is not yet in
a kernel release, it will be in v6.1, therefore it can still be
fixed in v6.1.
Even if we don't fix it before v6.2, I don't think there is any
harm in making such a change.
s/renesas,i2c-r9a09g011/renesas,r9a09g011-i2c/g for consistency.
Fixes: ba7a4d15e2 ("dt-bindings: i2c: Document RZ/V2M I2C controller")
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
When use tools/testing/selftests/kselftest_install.sh to make the
kselftest-list.txt under tools/testing/selftests/kselftest_install.
Then use tools/testing/selftests/kselftest_install/run_kselftest.sh to run
all the kselftests in kselftest-list.txt, it will be blocked by case
"filesystems/fat: run_fat_tests.sh" with "Warning: file run_fat_tests.sh
is not executable", so grant executable permission to run_fat_tests.sh to
fix this issue.
Link: https://lkml.kernel.org/r/dfdbba6df8a1ab34bb1e81cd8bd7ca3f9ed5c369.1673424747.git.pengfei.xu@intel.com
Fixes: dd7c9be330 ("selftests/filesystems: add a vfat RENAME_EXCHANGE test")
Signed-off-by: Pengfei Xu <pengfei.xu@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since commit 80b6093b55 ("kbuild: add -Wundef to KBUILD_CPPFLAGS
for W=1 builds"), building with W=1 detects misuse of #if.
$ make W=1 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- arch/riscv/kernel/
[snip]
AS arch/riscv/kernel/head.o
arch/riscv/kernel/head.S:329:5: warning: "CONFIG_RISCV_BOOT_SPINWAIT" is not defined, evaluates to 0 [-Wundef]
329 | #if CONFIG_RISCV_BOOT_SPINWAIT
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
CONFIG_RISCV_BOOT_SPINWAIT is a bool option. #ifdef should be used.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Fixes: 2ffc48fc70 ("RISC-V: Move spinwait booting method to its own config")
Link: https://lore.kernel.org/r/20230106161213.2374093-1-masahiroy@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
I remember being told "Just ping me on IRC" about patches, but googling
at the time was not helpful. #riscv on libera is not linux specific,
but a bunch of contributors etc do hang out there.
Add a link to the maintainers entry to help others find it in the future!
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230106125344.1685266-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
On the non-assembler-side wrapping alternative-macros inside other macros
to prevent duplication of code works, as the end result will just be a
string that gets fed to the asm instruction.
In real assembler code, wrapping .macro blocks inside other .macro blocks
brings more restrictions on usage it seems and the optimization done by
commit 2ba8c7dc71 ("riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2")
results in a compile error like:
../arch/riscv/lib/strcmp.S: Assembler messages:
../arch/riscv/lib/strcmp.S:15: Error: too many positional arguments
../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "887:"
../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "887:"
../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
../arch/riscv/lib/strcmp.S:15: Error: attempt to move .org backwards
Wrapping the variables containing assembler code in quotes solves this issue,
compilation and the code in question still works and objdump also shows sane
decompiled results of the affected code.
Fixes: 2ba8c7dc71 ("riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2")
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230105192610.1940841-1-heiko@sntech.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Prevent reading into undefined memory in the expression lexer,
accounting for a trailer backslash followed by the null byte.
- Fix file mode when copying files to the build id cache, the problem
happens when the cache directory is in a different file system than
the file being cached, otherwise the mode was preserved as only a
hard link would be done to save space.
- Fix a related build-id 'perf test' entry that checked that permission
when caching PE (Portable Executable) files, used when profiling
Windows executables under wine.
- Sync the tools/ copies of kvm headers, build_bug.h, socket.h and
arm64's cputype.h with the kernel sources.
* tag 'perf-tools-fixes-for-v6.2-3-2023-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test build-id: Fix test check for PE file
perf buildid-cache: Fix the file mode with copyfile() while adding file to build-id cache
perf expr: Prevent normalize() from reading into undefined memory in the expression lexer
tools headers: Syncronize linux/build_bug.h with the kernel sources
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers arm64: Sync arm64's cputype.h with the kernel sources
tools kvm headers arm64: Update KVM header from the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
Eduard Zingerman says:
====================
Struct bpf_reg_state is copied directly in several places including:
- check_stack_write_fixed_off() (via save_register_state());
- check_stack_read_fixed_off();
- find_equal_scalars().
However, a literal copy of this struct also copies the following fields:
struct bpf_reg_state {
...
struct bpf_reg_state *parent;
...
enum bpf_reg_liveness live;
...
};
This breaks register parentage chain and liveness marking logic.
The commit message for the first patch has a detailed example.
This patch-set replaces direct copies with a call to a function
copy_register_state(dst,src), which preserves 'parent' and 'live'
fields of the 'dst'.
The fix comes with a significant verifier runtime penalty for some
selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
and cilium BPF binaries (see [1]):
$ ./veristat -e file,prog,states -C -f 'states_diff>10' master-baseline.log current.log
File Program States (A) States (B) States (DIFF)
-------------------------- -------------------------------- ---------- ---------- ---------------
bpf_host.o tail_handle_ipv4_from_host 231 299 +68 (+29.44%)
bpf_host.o tail_handle_nat_fwd_ipv4 1088 1320 +232 (+21.32%)
bpf_host.o tail_handle_nat_fwd_ipv6 716 729 +13 (+1.82%)
bpf_host.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%)
bpf_host.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%)
bpf_lxc.o tail_handle_nat_fwd_ipv4 1088 1320 +232 (+21.32%)
bpf_lxc.o tail_handle_nat_fwd_ipv6 716 729 +13 (+1.82%)
bpf_lxc.o tail_ipv4_ct_egress 239 262 +23 (+9.62%)
bpf_lxc.o tail_ipv4_ct_ingress 239 262 +23 (+9.62%)
bpf_lxc.o tail_ipv4_ct_ingress_policy_only 239 262 +23 (+9.62%)
bpf_lxc.o tail_ipv6_ct_egress 181 195 +14 (+7.73%)
bpf_lxc.o tail_ipv6_ct_ingress 181 195 +14 (+7.73%)
bpf_lxc.o tail_ipv6_ct_ingress_policy_only 181 195 +14 (+7.73%)
bpf_lxc.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%)
bpf_lxc.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%)
bpf_overlay.o tail_handle_nat_fwd_ipv4 799 829 +30 (+3.75%)
bpf_overlay.o tail_nodeport_nat_ingress_ipv4 281 314 +33 (+11.74%)
bpf_overlay.o tail_nodeport_nat_ingress_ipv6 245 256 +11 (+4.49%)
bpf_sock.o cil_sock4_connect 47 70 +23 (+48.94%)
bpf_sock.o cil_sock4_sendmsg 45 68 +23 (+51.11%)
bpf_sock.o cil_sock6_post_bind 31 42 +11 (+35.48%)
bpf_xdp.o tail_lb_ipv4 4413 6457 +2044 (+46.32%)
bpf_xdp.o tail_lb_ipv6 6876 7249 +373 (+5.42%)
test_cls_redirect.bpf.o cls_redirect 4704 4799 +95 (+2.02%)
test_tcp_hdr_options.bpf.o estab 180 206 +26 (+14.44%)
xdp_synproxy_kern.bpf.o syncookie_tc 21059 21485 +426 (+2.02%)
xdp_synproxy_kern.bpf.o syncookie_xdp 21857 23122 +1265 (+5.79%)
-------------------------- -------------------------------- ---------- ---------- ---------------
I looked through verification log for bpf_xdp.o tail_lb_ipv4 program in
order to identify the reason for ~50% visited states increase.
The slowdown is triggered by a difference in handling of three stack slots:
fp-56, fp-72 and fp-80, with the main difference coming from fp-72.
In fact the following change removes all the difference:
@@ -3256,7 +3256,10 @@ static void save_register_state(struct bpf_func_state *state,
{
int i;
- copy_register_state(&state->stack[spi].spilled_ptr, reg);
+ if ((spi == 6 /*56*/ || spi == 8 /*72*/ || spi == 9 /*80*/) && size != BPF_REG_SIZE)
+ state->stack[spi].spilled_ptr = *reg;
+ else
+ copy_register_state(&state->stack[spi].spilled_ptr, reg);
For fp-56 I found the following pattern for divergences between
verification logs with and w/o this patch:
- At some point insn 1862 is reached and checkpoint is created;
- At some other point insn 1862 is reached again:
- with this patch:
- the current state is considered *not* equivalent to the old checkpoint;
- the reason for mismatch is the state of fp-56:
- current state: fp-56=????mmmm
- checkpoint: fp-56_rD=mmmmmmmm
- without this patch the current state is considered equivalent to the
checkpoint, the fp-56 is not present in the checkpoint.
Here is a fragment of the verification log for when the checkpoint in
question created at insn 1862:
checkpoint 1862: ... fp-56=mmmmmmmm ...
1862: ...
1863: ...
1864: (61) r1 = *(u32 *)(r0 +0)
1865: ...
1866: (63) *(u32 *)(r10 -56) = r1 ; R1_w=scalar(...) R10=fp0 fp-56=
1867: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
1868: (07) r2 += -56 ; R2_w=fp-56
; return map_lookup_elem(&LB4_BACKEND_MAP_V2, &backend_id);
1869: (18) r1 = 0xffff888100286000 ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
1871: (85) call bpf_map_lookup_elem#1
- Without this patch:
- at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
- at insn 1866 fp-56 liveness is set REG_LIVE_WRITTEN mark because
of the direct r1 copy in save_register_state();
- at insn 1871 REG_LIVE_READ is not propagated to fp-56 at
checkpoint 1862 because of the REG_LIVE_WRITTEN mark;
- eventually fp-56 is pruned from checkpoint at 1862 in
clean_func_state().
- With this patch:
- at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
- at insn 1866 fp-56 liveness is *not* set to REG_LIVE_WRITTEN mark
because write size is not equal to BPF_REG_SIZE;
- at insn 1871 REG_LIVE_READ is propagated to fp-56 at checkpoint 1862.
Hence more states have to be visited by verifier with this patch compared
to current master.
Similar patterns could be found for both fp-72 and fp-80, although these
are harder to track trough the log because of a big number of insns between
slot write and bpf_map_lookup_elem() call triggering read mark, boils down
to the following C code:
struct ipv4_frag_id frag_id = {
.daddr = ip4->daddr,
.saddr = ip4->saddr,
.id = ip4->id,
.proto = ip4->protocol,
.pad = 0,
};
...
map_lookup_elem(..., &frag_id);
Where:
- .id is mapped to fp-72, write of size u16;
- .saddr is mapped to fp-80, write of size u32.
This patch-set is a continuation of discussion from [2].
Changes v1 -> v2 (no changes in the code itself):
- added analysis for the tail_lb_ipv4 verification slowdown;
- rebase against fresh master branch.
[1] git@github.com:anakryiko/cilium.git
[2] https://lore.kernel.org/bpf/517af2c57ee4b9ce2d96a8cf33f7295f2d2dfe13.camel@gmail.com/
====================
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Register range information is copied in several places. The intent is
to transfer range/id information from one register/stack spill to
another. Currently this is done using direct register assignment, e.g.:
static void find_equal_scalars(..., struct bpf_reg_state *known_reg)
{
...
struct bpf_reg_state *reg;
...
*reg = *known_reg;
...
}
However, such assignments also copy the following bpf_reg_state fields:
struct bpf_reg_state {
...
struct bpf_reg_state *parent;
...
enum bpf_reg_liveness live;
...
};
Copying of these fields is accidental and incorrect, as could be
demonstrated by the following example:
0: call ktime_get_ns()
1: r6 = r0
2: call ktime_get_ns()
3: r7 = r0
4: if r0 > r6 goto +1 ; r0 & r6 are unbound thus generated
; branch states are identical
5: *(u64 *)(r10 - 8) = 0xdeadbeef ; 64-bit write to fp[-8]
--- checkpoint ---
6: r1 = 42 ; r1 marked as written
7: *(u8 *)(r10 - 8) = r1 ; 8-bit write, fp[-8] parent & live
; overwritten
8: r2 = *(u64 *)(r10 - 8)
9: r0 = 0
10: exit
This example is unsafe because 64-bit write to fp[-8] at (5) is
conditional, thus not all bytes of fp[-8] are guaranteed to be set
when it is read at (8). However, currently the example passes
verification.
First, the execution path 1-10 is examined by verifier.
Suppose that a new checkpoint is created by is_state_visited() at (6).
After checkpoint creation:
- r1.parent points to checkpoint.r1,
- fp[-8].parent points to checkpoint.fp[-8].
At (6) the r1.live is set to REG_LIVE_WRITTEN.
At (7) the fp[-8].parent is set to r1.parent and fp[-8].live is set to
REG_LIVE_WRITTEN, because of the following code called in
check_stack_write_fixed_off():
static void save_register_state(struct bpf_func_state *state,
int spi, struct bpf_reg_state *reg,
int size)
{
...
state->stack[spi].spilled_ptr = *reg; // <--- parent & live copied
if (size == BPF_REG_SIZE)
state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
...
}
Note the intent to mark stack spill as written only if 8 bytes are
spilled to a slot, however this intent is spoiled by a 'live' field copy.
At (8) the checkpoint.fp[-8] should be marked as REG_LIVE_READ but
this does not happen:
- fp[-8] in a current state is already marked as REG_LIVE_WRITTEN;
- fp[-8].parent points to checkpoint.r1, parentage chain is used by
mark_reg_read() to mark checkpoint states.
At (10) the verification is finished for path 1-10 and jump 4-6 is
examined. The checkpoint.fp[-8] never gets REG_LIVE_READ mark and this
spill is pruned from the cached states by clean_live_states(). Hence
verifier state obtained via path 1-4,6 is deemed identical to one
obtained via path 1-6 and program marked as safe.
Note: the example should be executed with BPF_F_TEST_STATE_FREQ flag
set to force creation of intermediate verifier states.
This commit revisits the locations where bpf_reg_state instances are
copied and replaces the direct copies with a call to a function
copy_register_state(dst, src) that preserves 'parent' and 'live'
fields of the 'dst'.
Fixes: 679c782de1 ("bpf/verifier: per-register parent pointers")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230106142214.1040390-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull printk fixes from Petr Mladek:
- Prevent a potential deadlock when configuring kgdb console
- Fix a kernel doc warning
* tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
kernel/printk/printk.c: Fix W=1 kernel-doc warning
tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
Pull s390 build fix from Heiko Carstens:
- Workaround invalid gcc-11 out of bounds read warning caused by s390's
S390_lowcore definition. This happens only with gcc 11.1.0 and
11.2.0.
The code which causes this warning will be gone with the next merge
window. Therefore just replace the memcpy() with a for loop to get
rid of the warning.
* tag 's390-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: workaround invalid gcc-11 out of bounds read warning
Pull slab fix from Vlastimil Babka:
"Just a single fix, since the lkp report originally for a slub-tiny
commit ended up being a gcov/compiler bug:
- periodically resched in SLAB's drain_freelist(), by David Rientjes"
* tag 'slab-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slab: periodically resched in drain_freelist()
put_device() shouldn't be called before a prior call to
device_register(). __thermal_cooling_device_register() doesn't follow
that properly and needs fixing. Also
thermal_cooling_device_destroy_sysfs() is getting called unnecessarily
on few error paths.
Fix all this by placing the calls at the right place.
Based on initial work done by Caleb Connolly.
Fixes: 4748f9687c ("thermal: core: fix some possible name leaks in error paths")
Fixes: c408b3d1d9 ("thermal: Validate new state in cur_state_store()")
Reported-by: Caleb Connolly <caleb.connolly@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Frank Rowand <frowand.list@gmail.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Caleb Connolly <caleb.connolly@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull zonefs fix from Damien Le Moal:
- A single patch to fix sync write operations to detect and handle
errors due to external zone corruptions resulting in writes at
invalid location, from me.
* tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Detect append writes at invalid locations
If the target ring is configured with IOPOLL, then we always need to hold
the target ring uring_lock before posting CQEs. We could just grab it
unconditionally, but since we don't expect many target rings to be of this
type, make grabbing the uring_lock conditional on the ring type.
Link: https://lore.kernel.org/io-uring/Y8krlYa52%2F0YGqkg@ip-172-31-85-199.ec2.internal/
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ALU table entry 2 register in KSZ9477 have bit positions reserved for
forwarding port map. This field is referred in ksz9477_fdb_del() for
clearing forward port map and alu table.
But current fdb_del refer ALU table entry 3 register for accessing forward
port map. Update ksz9477_fdb_del() to get forward port map from correct
alu table entry register.
With this bug, issue can be observed while deleting static MAC entries.
Delete any specific MAC entry using "bridge fdb del" command. This should
clear all the specified MAC entries. But it is observed that entries with
self static alone are retained.
Tested on LAN9370 EVB since ksz9477_fdb_del() is used common across
LAN937x and KSZ series.
Fixes: b987e98e50 ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230118174735.702377-1-rakesh.sankaranarayanan@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoid race between process wakeup and tpacket_v3 block timeout.
The test waits for cfg_timeout_msec for packets to arrive. Packets
arrive in tpacket_v3 rings, which pass packets ("frames") to the
process in batches ("blocks"). The sk waits for req3.tp_retire_blk_tov
msec to release a block.
Set the block timeout lower than the process waiting time, else
the process may find that no block has been released by the time it
scans the socket list. Convert to a ring of more than one, smaller,
blocks with shorter timeouts. Blocks must be page aligned, so >= 64KB.
Fixes: 5ebfb4cc30 ("selftests/net: toeplitz test")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230118151847.4124260-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The referenced commit changed the error code returned by the kernel
when preventing a non-established socket from attaching the ktls
ULP. Before to such a commit, the user-space got ENOTCONN instead
of EINVAL.
The existing self-tests depend on such error code, and the change
caused a failure:
RUN global.non_established ...
tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107)
non_established: Test failed at step #3
FAIL global.non_established
In the unlikely event existing applications do the same, address
the issue by restoring the prior error code in the above scenario.
Note that the only other ULP performing similar checks at init
time - smc_ulp_ops - also fails with ENOTCONN when trying to attach
the ULP to a non-established socket.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 2c02d41d71 ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The hypervisor can enable various new features (SEV_FEATURES[1:63]) and start a
SNP guest. Some of these features need guest side implementation. If any of
these features are enabled without it, the behavior of the SNP guest will be
undefined. It may fail booting in a non-obvious way making it difficult to
debug.
Instead of allowing the guest to continue and have it fail randomly later,
detect this early and fail gracefully.
The SEV_STATUS MSR indicates features which the hypervisor has enabled. While
booting, SNP guests should ascertain that all the enabled features have guest
side implementation. In case a feature is not implemented in the guest, the
guest terminates booting with GHCB protocol Non-Automatic Exit(NAE) termination
request event, see "SEV-ES Guest-Hypervisor Communication Block Standardization"
document (currently at https://developer.amd.com/wp-content/resources/56421.pdf),
section "Termination Request".
Populate SW_EXITINFO2 with mask of unsupported features that the hypervisor can
easily report to the user.
More details in the AMD64 APM Vol 2, Section "SEV_STATUS MSR".
[ bp:
- Massage.
- Move snp_check_features() call to C code.
Note: the CC:stable@ aspect here is to be able to protect older, stable
kernels when running on newer hypervisors. Or not "running" but fail
reliably and in a well-defined manner instead of randomly. ]
Fixes: cbd3d4f7c4 ("x86/sev: Check SEV-SNP features support")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230118061943.534309-1-nikunj@amd.com
In test_async_probe_init, second set of asynchronous devices are saved
in sync_dev[sync_id], which should be async_dev[async_id].
This makes these devices not unregistered when exit.
> modprobe test_async_driver_probe && \
> modprobe -r test_async_driver_probe && \
> modprobe test_async_driver_probe
...
> sysfs: cannot create duplicate filename '/devices/platform/test_async_driver.4'
> kobject_add_internal failed for test_async_driver.4 with -EEXIST,
don't try to register things with the same name in the same directory.
Fixes: 57ea974fb8 ("driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Link: https://lore.kernel.org/r/20221125063541.241328-1-chenzhongjin@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I got the following WARNING message while removing driver(ds2482):
------------[ cut here ]------------
do not call blocking ops when !TASK_RUNNING; state=1 set at [<000000002d50bfb6>] w1_process+0x9e/0x1d0 [wire]
WARNING: CPU: 0 PID: 262 at kernel/sched/core.c:9817 __might_sleep+0x98/0xa0
CPU: 0 PID: 262 Comm: w1_bus_master1 Tainted: G N 6.1.0-rc3+ #307
RIP: 0010:__might_sleep+0x98/0xa0
Call Trace:
exit_signals+0x6c/0x550
do_exit+0x2b4/0x17e0
kthread_exit+0x52/0x60
kthread+0x16d/0x1e0
ret_from_fork+0x1f/0x30
The state of task is set to TASK_INTERRUPTIBLE in loop in w1_process(),
set it to TASK_RUNNING when it breaks out of the loop to avoid the
warning.
Fixes: 3c52e4e627 ("W1: w1_process, block or sleep")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221205101558.3599162-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I got a deadloop report while doing device(ds2482) add/remove test:
[ 162.241881] w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=1.
[ 163.272251] w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=1.
[ 164.296157] w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=1.
...
__w1_remove_master_device() can't return, because the dev->refcnt is not zero.
w1_add_master_device() |
w1_alloc_dev() |
atomic_set(&dev->refcnt, 2) |
kthread_run() |
|__w1_remove_master_device()
| kthread_stop()
// KTHREAD_SHOULD_STOP is set, |
// threadfn(w1_process) won't be |
// called. |
kthread() |
| // refcnt will never be 0, it's deadloop.
| while (atomic_read(&dev->refcnt)) {...}
After calling w1_add_master_device(), w1_process() is not really
invoked, before w1_process() starting, if kthread_stop() is called
in __w1_remove_master_device(), w1_process() will never be called,
the refcnt can not be decreased, then it causes deadloop in remove
function because of non-zero refcnt.
We need to make sure w1_process() is really started, so move the
set refcnt into w1_process() to fix this problem.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221205080434.3149205-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(Actually, this is fixing the "Read the Current Status" command sent to
the device's outgoing mailbox, but it is only currently used for the PWM
instructions.)
The PCI-1760 is operated mostly by sending commands to a set of Outgoing
Mailbox registers, waiting for the command to complete, and reading the
result from the Incoming Mailbox registers. One of these commands is
the "Read the Current Status" command. The number of this command is
0x07 (see the User's Manual for the PCI-1760 at
<https://advdownload.advantech.com/productfile/Downloadfile2/1-11P6653/PCI-1760.pdf>.
The `PCI1760_CMD_GET_STATUS` macro defined in the driver should expand
to this command number 0x07, but unfortunately it currently expands to
0x03. (Command number 0x03 is not defined in the User's Manual.)
Correct the definition of the `PCI1760_CMD_GET_STATUS` macro to fix it.
This is used by all the PWM subdevice related instructions handled by
`pci1760_pwm_insn_config()` which are probably all broken. The effect
of sending the undefined command number 0x03 is not known.
Fixes: 14b93bb6bb ("staging: comedi: adv_pci_dio: separate out PCI-1760 support")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20230103143754.17564-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation for needing them somewhere else, move them and get rid of
the unused 'issue_flags' for the unlock side.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When -Woverride-init is enabled in a build, gcc points out that
qcom_geni_serial_pm_ops contains conflicting initializers:
drivers/tty/serial/qcom_geni_serial.c:1586:20: error: initialized field overwritten [-Werror=override-init]
1586 | .restore = qcom_geni_serial_sys_hib_resume,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/qcom_geni_serial.c:1586:20: note: (near initialization for 'qcom_geni_serial_pm_ops.restore')
drivers/tty/serial/qcom_geni_serial.c:1587:17: error: initialized field overwritten [-Werror=override-init]
1587 | .thaw = qcom_geni_serial_sys_hib_resume,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open-code the initializers with the version that was already used,
and use the pm_sleep_ptr() method to deal with unused ones,
in place of the __maybe_unused annotation.
Fixes: 35781d8356 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20221215165453.1864836-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ba47f97a18 ("serial: core: remove baud_rates when serial console
setup") changed uart_set_options to select the correct baudrate
configuration based on the absolute error between requested baudrate and
available standard baudrate settings.
Prior to that commit the baudrate was selected based on which predefined
standard baudrate did not exceed the requested baudrate.
This change of selection logic was never reflected in the atmel serial
driver. Thus the comment left in the atmel serial driver is no longer
accurate.
Additionally the manual rounding up described in that comment and applied
via (quot - 1) requests an incorrect baudrate. Since uart_set_options uses
tty_termios_encode_baud_rate to determine the appropriate baudrate flags
this can cause baudrate selection to fail entirely because
tty_termios_encode_baud_rate will only select a baudrate if relative error
between requested and selected baudrate does not exceed +/-2%.
Fix that by requesting actual, exact baudrate used by the serial.
Fixes: ba47f97a18 ("serial: core: remove baud_rates when serial console setup")
Cc: stable <stable@kernel.org>
Signed-off-by: Tobias Schramm <t.schramm@manjaro.org>
Acked-by: Richard Genoud <richard.genoud@gmail.com>
Link: https://lore.kernel.org/r/20230109072940.202936-1-t.schramm@manjaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Run the following tests on the qemu platform:
syzkaller:~# modprobe speakup_audptr
input: Speakup as /devices/virtual/input/input4
initialized device: /dev/synth, node (MAJOR 10, MINOR 125)
speakup 3.1.6: initialized
synth name on entry is: (null)
synth probe
spk_ttyio_initialise_ldisc failed because tty_kopen_exclusive returned
failed (errno -16), then remove the module, we will get a null-ptr-defer
problem, as follow:
syzkaller:~# modprobe -r speakup_audptr
releasing synth audptr
BUG: kernel NULL pointer dereference, address: 0000000000000080
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 2 PID: 204 Comm: modprobe Not tainted 6.1.0-rc6-dirty #1
RIP: 0010:mutex_lock+0x14/0x30
Call Trace:
<TASK>
spk_ttyio_release+0x19/0x70 [speakup]
synth_release.part.6+0xac/0xc0 [speakup]
synth_remove+0x56/0x60 [speakup]
__x64_sys_delete_module+0x156/0x250
? fpregs_assert_state_consistent+0x1d/0x50
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Modules linked in: speakup_audptr(-) speakup
Dumping ftrace buffer:
in_synth->dev was not initialized during modprobe, so we add check
for in_synth->dev to fix this bug.
Fixes: 4f2a81f3a8 ("speakup: Reference synth from tty and tty from synth")
Cc: stable <stable@kernel.org>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://lore.kernel.org/r/20221202060633.217364-1-cuigaosheng1@huawei.com
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Saeed Mahameed says:
====================
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net: mlx5: eliminate anonymous module_init & module_exit
net/mlx5: E-switch, Fix switchdev mode after devlink reload
net/mlx5e: Protect global IPsec ASO
net/mlx5e: Remove optimization which prevented update of ESN state
net/mlx5e: Set decap action based on attr for sample
net/mlx5e: QoS, Fix wrongfully setting parent_element_id on MODIFY_SCHEDULING_ELEMENT
net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT
net/mlx5e: Remove redundant xsk pointer check in mlx5e_mpwrq_validate_xsk
net/mlx5e: Avoid false lock dependency warning on tc_ht even more
net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Requesting an interrupt with IRQF_ONESHOT will run the primary handler
in the hard-IRQ context even in the force-threaded mode. The
force-threaded mode is used by PREEMPT_RT in order to avoid acquiring
sleeping locks (spinlock_t) in hard-IRQ context. This combination
makes it impossible and leads to "sleeping while atomic" warnings.
Use one interrupt handler for both handlers (primary and secondary)
and drop the IRQF_ONESHOT flag which is not needed.
Fixes: e359b4411c ("serial: stm32: fix threaded interrupt handling")
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Valentin Caron <valentin.caron@foss.st.com> # V3
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230112180417.25595-1-marex@denx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A local variable sg is used to store scatterlist pointer in
pch_dma_tx_complete(). The for loop doing Tx byte accounting before
dma_unmap_sg() alters sg in its increment statement. Therefore, the
pointer passed into dma_unmap_sg() won't match to the one given to
dma_map_sg().
To fix the problem, use priv->sg_tx_p directly in dma_unmap_sg()
instead of the local variable.
Fixes: da3564ee02 ("pch_uart: add multi-scatter processing")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230103093435.4396-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver's probe allocates memory for RX FIFO (port->rx_fifo) based on
default RX FIFO depth, e.g. 16. Later during serial startup the
qcom_geni_serial_port_setup() updates the RX FIFO depth
(port->rx_fifo_depth) to match real device capabilities, e.g. to 32.
The RX UART handle code will read "port->rx_fifo_depth" number of words
into "port->rx_fifo" buffer, thus exceeding the bounds. This can be
observed in certain configurations with Qualcomm Bluetooth HCI UART
device and KASAN:
Bluetooth: hci0: QCA Product ID :0x00000010
Bluetooth: hci0: QCA SOC Version :0x400a0200
Bluetooth: hci0: QCA ROM Version :0x00000200
Bluetooth: hci0: QCA Patch Version:0x00000d2b
Bluetooth: hci0: QCA controller version 0x02000200
Bluetooth: hci0: QCA Downloading qca/htbtfw20.tlv
bluetooth hci0: Direct firmware load for qca/htbtfw20.tlv failed with error -2
Bluetooth: hci0: QCA Failed to request file: qca/htbtfw20.tlv (-2)
Bluetooth: hci0: QCA Failed to download patch (-2)
==================================================================
BUG: KASAN: slab-out-of-bounds in handle_rx_uart+0xa8/0x18c
Write of size 4 at addr ffff279347d578c0 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rt5-00350-gb2450b7e00be-dirty #26
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Call trace:
dump_backtrace.part.0+0xe0/0xf0
show_stack+0x18/0x40
dump_stack_lvl+0x8c/0xb8
print_report+0x188/0x488
kasan_report+0xb4/0x100
__asan_store4+0x80/0xa4
handle_rx_uart+0xa8/0x18c
qcom_geni_serial_handle_rx+0x84/0x9c
qcom_geni_serial_isr+0x24c/0x760
__handle_irq_event_percpu+0x108/0x500
handle_irq_event+0x6c/0x110
handle_fasteoi_irq+0x138/0x2cc
generic_handle_domain_irq+0x48/0x64
If the RX FIFO depth changes after probe, be sure to resize the buffer.
Fixes: f9d690b6ec ("tty: serial: qcom_geni_serial: Allocate port->rx_fifo buffer in probe")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221221164022.1087814-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The interrupt handler (pt_core_irq_handler()) of the ptdma
driver can be called from interrupt context. The code flow
in this function can lead down to pt_core_execute_cmd() which
will attempt to grab a mutex, which is not appropriate in
interrupt context and ultimately leads to a kernel panic.
The fix here changes this mutex to a spinlock, which has
been verified to resolve the issue.
Fixes: fa5d823b16 ("dmaengine: ptdma: Initial driver for the AMD PTDMA")
Signed-off-by: Eric Pilmore <epilmore@gigaio.com>
Link: https://lore.kernel.org/r/20230119033907.35071-1-epilmore@gigaio.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The dwc3 core support now links against the extcon subsystem,
so it cannot be built-in when extcon is a loadable module:
arm-linux-gnueabi-ld: drivers/usb/dwc3/core.o: in function `dwc3_get_extcon':
core.c:(.text+0x572): undefined reference to `extcon_get_edev_by_phandle'
arm-linux-gnueabi-ld: core.c:(.text+0x596): undefined reference to `extcon_get_extcon_dev'
arm-linux-gnueabi-ld: core.c:(.text+0x5ea): undefined reference to `extcon_find_edev_by_node'
There was already a Kconfig dependency in the dual-role support,
but this is now needed for the entire dwc3 driver.
It is still possible to build dwc3 without extcon, but this
prevents it from being set to built-in when extcon is a loadable
module.
Fixes: d182c2e1bc ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20230118090147.2126563-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura
free") uses the get/put_cpu() to protect the usage of percpu pointer
in ->aura_freeptr() callback, but it also unnecessarily disable the
preemption for the blockable memory allocation. The commit 87b93b678e
("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
fix these sleep inside atomic warnings. But it only fix the one for
the non-rt kernel. For the rt kernel, we still get the similar warnings
like below.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by swapper/0/1:
#0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
#2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
Preemption disabled at:
[<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace.part.0+0xe8/0xf4
show_stack+0x20/0x30
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x224
rt_spin_lock+0x64/0x110
alloc_iova_fast+0x1ac/0x2ac
iommu_dma_alloc_iova+0xd4/0x110
__iommu_dma_map+0x80/0x144
iommu_dma_map_page+0xe8/0x260
dma_map_page_attrs+0xb4/0xc0
__otx2_alloc_rbuf+0x90/0x150
otx2_rq_aura_pool_init+0x1c8/0x284
otx2_init_hw_resources+0xe4/0x3a4
otx2_open+0xf0/0x610
__dev_open+0x104/0x224
__dev_change_flags+0x1e4/0x274
dev_change_flags+0x2c/0x7c
ic_open_devs+0x124/0x2f8
ip_auto_config+0x180/0x42c
do_one_initcall+0x90/0x4dc
do_basic_setup+0x10c/0x14c
kernel_init_freeable+0x10c/0x13c
kernel_init+0x2c/0x140
ret_from_fork+0x10/0x20
Of course, we can shuffle the get/put_cpu() to only wrap the invocation
of ->aura_freeptr() as what commit 87b93b678e does. But there are only
two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
cn10k_aura_freeptr(). There is no usage of perpcu variable in the
otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
We can move the get/put_cpu() into the corresponding callback which
really has the percpu variable usage and avoid the sprinkling of
get/put_cpu() in several places.
Fixes: 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura free")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
While one cpu is working on looking up the right socket from ehash
table, another cpu is done deleting the request socket and is about
to add (or is adding) the big socket from the table. It means that
we could miss both of them, even though it has little chance.
Let me draw a call trace map of the server side.
CPU 0 CPU 1
----- -----
tcp_v4_rcv() syn_recv_sock()
inet_ehash_insert()
-> sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
-> __sk_nulls_add_node_rcu(sk, list)
Notice that the CPU 0 is receiving the data after the final ack
during 3-way shakehands and CPU 1 is still handling the final ack.
Why could this be a real problem?
This case is happening only when the final ack and the first data
receiving by different CPUs. Then the server receiving data with
ACK flag tries to search one proper established socket from ehash
table, but apparently it fails as my map shows above. After that,
the server fetches a listener socket and then sends a RST because
it finds a ACK flag in the skb (data), which obeys RST definition
in RFC 793.
Besides, Eric pointed out there's one more race condition where it
handles tw socket hashdance. Only by adding to the tail of the list
before deleting the old one can we avoid the race if the reader has
already begun the bucket traversal and it would possibly miss the head.
Many thanks to Eric for great help from beginning to end.
Fixes: 5e0724d027 ("tcp/dccp: fix hashdance race for passive sessions")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
snd_hda_get_connections() can return a negative error code.
It may lead to accessing 'conn' array at a negative index.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artemii Karasev <karasev@ispras.ru>
Fixes: 30b4503378 ("ALSA: hda - Expose secret DAC-AA connection of some VIA codecs")
Link: https://lore.kernel.org/r/20230119082259.3634-1-karasev@ispras.ru
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In various places, string buffers of a fixed size are allocated, and
filled using snprintf() with the same fixed size, which is error-prone.
Replace this by calling devm_kasprintf() instead, which always uses the
appropriate size.
While at it, remove an unneeded intermediate variable, which allows us
to drop a cast as a bonus.
With the initial behavior it would have been possible to have a device tree
with a node address that would make "ccc<node_address>_pll<N>" exceed
18 characters. If that happened, the <N> would be cut off & both
pll 0 & 1 would be named identically. If that happens, pll1 would fail
to register. Thus, the fixes tag has been added to this commit.
Fixes: d39fb17276 ("clk: microchip: add PolarFire SoC fabric clock support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
[claudiu.beznea: added the rationale behind fixes tag]
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/f904fd28b2087d1463ea65f059924e3b1acc193c.1672764239.git.geert+renesas@glider.be
Polling the completion can progress the request state to IDLE, either
inline with the completion, or through softirq. Either way, the state
may not be COMPLETED, so don't check for that. We only care if the state
isn't IN_FLIGHT.
This is fixing an issue where the driver aborts an IO that we just
completed. Seeing the "aborting" message instead of "polled" is very
misleading as to where the timeout problem resides.
Fixes: bf392a5dc0 ("nvme-pci: Remove tag from process cq")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
NVMe controller register access hangs indefinitely when the co-processor
is not running. A missed reset is preferable over a hanging thread since
it could be recoverable.
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This is a functional revert of c76b8308e4 ("nvme-apple: fix controller
shutdown in apple_nvme_disable").
The commit broke suspend/resume since apple_nvme_reset_work() tries to
disable the controller on resume. This does not work for the apple NVMe
controller since register access only works while the co-processor
firmware is running.
Disabling the NVMe controller in the shutdown path is also required
for shutting the co-processor down. The original code was appropriate
for this hardware. Add a comment to prevent a similar breaking changes
in the future.
Fixes: c76b8308e4 ("nvme-apple: fix controller shutdown in apple_nvme_disable")
Reported-by: Janne Grunau <j@jannau.net>
Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/
Signed-off-by: Janne Grunau <j@jannau.net>
[hch: updated with a more descriptive comment from Hector Martin]
Signed-off-by: Christoph Hellwig <hch@lst.de>
This reverts commit 0aa64df30b.
Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the
slave ports of team, bonding and failover devices and it means no ipv6
packets can be sent out through these slave ports. However, for team
device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the
link will be marked failure. This patch removes the IFF_NO_ADDRCONF
flag set for team port, and we will fix the original issue in another
patch, as Jakub suggested.
Fixes: 0aa64df30b ("net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/63e09531fc47963d2e4eff376653d3db21b97058.1673980932.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, we run into a number of WARN()s when attempting to unload the
amdgpu driver (e.g. using "modprobe -r amdgpu"). These all stem from
calling drm_encoder_cleanup() too early. So, to fix this we can stop
calling drm_encoder_cleanup() from amdgpu_dm_fini() and instead have it
be called from amdgpu_dm_encoder_destroy(). Also, we don't need to free
in amdgpu_dm_encoder_destroy() since mst_encoders[] isn't explicitly
allocated by the slab allocator.
Fixes: f74367e492 ("drm/amdgpu/display: create fake mst encoders ahead of time (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The YCC conversion matrix for RGB -> COLOR_SPACE_YCBCR2020_TYPE is
missing the values for the fourth column of the matrix.
The fourth column of the matrix is essentially just a value that is
added given that the color is 3 components in size.
These values are needed to bias the chroma from the [-1, 1] -> [0, 1]
range.
This fixes color being very green when using Gamescope HDR on HDMI
output which prefers YCC 4:4:4.
Fixes: 40df2f809e ("drm/amd/display: color space ycbcr709 support")
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Code in get_output_color_space depends on knowing the pixel encoding to
determine whether to pick between eg. COLOR_SPACE_SRGB or
COLOR_SPACE_YCBCR709 for transparent RGB -> YCbCr 4:4:4 in the driver.
v2: Fixed patch being accidentally based on a personal feature branch, oops!
Fixes: ea117312ea ("drm/amd/display: Reduce HDMI pixel encoding if max clock is exceeded")
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Kalle Valo says:
====================
wireless fixes for v6.2
Third set of fixes for v6.2. This time most of them are for drivers,
only one revert for mac80211. For an important mt76 fix we had to
cherry pick two commits from wireless-next.
* tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
wifi: mt76: dma: fix a regression in adding rx buffers
wifi: mt76: handle possible mt76_rx_token_consume failures
wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails
wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid
wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device
wifi: brcmfmac: avoid handling disabled channels for survey dump
====================
Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[Why]
Setting scaling does not correctly update CRTC state. As a result
dc stream state's src (composition area) && dest (addressable area)
was not calculated as expected. This causes set scaling doesn's work.
[How]
Correctly update CRTC state when setting scaling property.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: hongao <hongao@uniontech.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
__USE_GNU should be an internal macro only used inside glibc. Either
memfd_create() or fallocate() requires _GNU_SOURCE per man page, where
__USE_GNU will further be defined by glibc headers include/features.h:
#ifdef _GNU_SOURCE
# define __USE_GNU 1
#endif
This fixes:
>> hugetlb-madvise.c:20: warning: "__USE_GNU" redefined
20 | #define __USE_GNU
|
In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-start.h:33,
from /usr/include/stdlib.h:26,
from hugetlb-madvise.c:16:
/usr/include/features.h:407: note: this is the location of the previous definition
407 | # define __USE_GNU 1
|
Link: https://lkml.kernel.org/r/Y8V9z+z6Tk7NetI3@x1n
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch should harden commit 15520a3f04 ("mm: use pte markers for
swap errors") on using pte markers for swapin errors on a few corner
cases.
1. Propagate swapin errors across fork()s: if there're swapin errors in
the parent mm, after fork()s the child should sigbus too when an error
page is accessed.
2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte
marker can be quickly switched to a swapin error.
3. Explicitly ignore swapin error pte markers in change_protection().
I mostly don't worry on (2) or (3) at all, but we should still have them.
Case (1) is special because it can potentially cause silent data corrupt
on child when parent has swapin error triggered with swapoff, but since
swapin error is rare itself already it's probably not easy to trigger
either.
Currently there is a priority difference between the uffd-wp bit and the
swapin error entry, in which the swapin error always has higher priority
(e.g. we don't need to wr-protect a swapin error pte marker).
If there will be a 3rd bit introduced, we'll probably need to consider a
more involved approach so we may need to start operate on the bits. Let's
leave that for later.
This patch is tested with case (1) explicitly where we'll get corrupted
data before in the child if there's existing swapin error pte markers, and
after patch applied the child can be rightfully killed.
We don't need to copy stable for this one since 15520a3f04 just landed
as part of v6.2-rc1, only "Fixes" applied.
Link: https://lkml.kernel.org/r/20221214200453.1772655-3-peterx@redhat.com
Fixes: 15520a3f04 ("mm: use pte markers for swap errors")
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The might_sleep() annotation in alua_rtpg_queue() is not correct since the
command completion code may call this function from atomic context.
Calling alua_rtpg_queue() from atomic context in the command completion
path is fine since request submitters must hold an sdev reference until
command execution has completed. This patch fixes the following kernel
complaint:
BUG: sleeping function called from invalid context at drivers/scsi/device_handler/scsi_dh_alua.c:992
Call Trace:
dump_stack_lvl+0xac/0x100
__might_resched+0x284/0x2c8
alua_rtpg_queue+0x3c/0x98 [scsi_dh_alua]
alua_check+0x122/0x250 [scsi_dh_alua]
alua_check_sense+0x172/0x228 [scsi_dh_alua]
scsi_check_sense+0x8a/0x2e0
scsi_decide_disposition+0x286/0x298
scsi_complete+0x6a/0x108
blk_complete_reqs+0x6e/0x88
__do_softirq+0x13e/0x6b8
__irq_exit_rcu+0x14a/0x170
irq_exit_rcu+0x22/0x50
do_ext_irq+0x10a/0x1d0
Link: https://lore.kernel.org/r/20230118180557.1212577-1-bvanassche@acm.org
Reported-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Tested-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a lock inversion and rwsem read-lock recursion in the devfreq
target callback which can lead to deadlocks.
Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
read lock when toggling the write booster, which involves taking the
dev_cmd mutex before taking another clk_scaling_lock read lock.
This can lead to a deadlock if another thread:
1) tries to acquire the dev_cmd and clk_scaling locks in the correct
order, or
2) takes a clk_scaling write lock before the attempt to take the
clk_scaling read lock a second time.
Fix this by dropping the clk_scaling_lock before toggling the write booster
as was done before commit 0e9d4ca43b ("scsi: ufs: Protect some contexts
from unexpected clock scaling").
While the devfreq callbacks are already serialised, add a second
serialising mutex to handle the unlikely case where a callback triggered
through the devfreq sysfs interface is racing with a request to disable
clock scaling through the UFS controller 'clkscale_enable' sysfs
attribute. This could otherwise lead to the write booster being left
disabled after having disabled clock scaling.
Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any
pending write booster update has completed on return.
Note that this currently only affects Qualcomm platforms since commit
87bd05016a ("scsi: ufs: core: Allow host driver to disable wb toggling
during clock scaling").
The lock inversion (i.e. 1 above) was reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-20221216 #211 Not tainted
------------------------------------------------------
kworker/u16:2/71 is trying to acquire lock:
ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0
but task is already holding lock:
ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380
which lock already depends on the new lock.
[ +0.011606]
the existing dependency chain (in reverse order) is:
-> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
lock_acquire+0x68/0x90
down_read+0x58/0x80
ufshcd_exec_dev_cmd+0x70/0x2c0
ufshcd_verify_dev_init+0x68/0x170
ufshcd_probe_hba+0x398/0x1180
ufshcd_async_scan+0x30/0x320
async_run_entry_fn+0x34/0x150
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
-> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
__lock_acquire+0x12a0/0x2240
lock_acquire.part.0+0xcc/0x220
lock_acquire+0x68/0x90
__mutex_lock+0x98/0x430
mutex_lock_nested+0x2c/0x40
ufshcd_query_flag+0x50/0x1c0
ufshcd_query_flag_retry+0x64/0x100
ufshcd_wb_toggle+0x5c/0x120
ufshcd_devfreq_scale+0x2c4/0x380
ufshcd_devfreq_target+0xf4/0x230
devfreq_set_target+0x84/0x2f0
devfreq_update_target+0xc4/0xf0
devfreq_monitor+0x38/0x1f0
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
*** DEADLOCK ***
Fixes: 0e9d4ca43b ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Cc: stable@vger.kernel.org # 5.12
Cc: Can Guo <quic_cang@quicinc.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull HID fixes from Jiri Kosina:
- fixes for potential empty list handling in HID core (Pietro Borrello)
- fix for NULL pointer dereference in betop driver that could be
triggered by malicious device (Pietro Borrello)
- fixes for handling calibration data preventing division by zero in
Playstation driver (Roderick Colenbrander)
- fix for memory leak on error path in amd-sfh driver (Basavaraj
Natikar)
- other few assorted small fixes and device ID-specific handling
* tag 'for-linus-2023011801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: betop: check shape of output reports
HID: playstation: sanity check DualSense calibration data.
HID: playstation: sanity check DualShock4 calibration data.
HID: uclogic: Add support for XP-PEN Deco 01 V2
HID: revert CHERRY_MOUSE_000C quirk
HID: check empty report_list in bigben_probe()
HID: check empty report_list in hid_validate_values()
HID: amd_sfh: Fix warning unwind goto
HID: intel_ish-hid: Add check for ishtp_dma_tx_map
Remove dfs_cache_update_tgthint() as it is not used anywhere.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
On async reads, page data is allocated before sending. When the
response is received but it has no data to fill (e.g.
STATUS_END_OF_FILE), __calc_signature() will still include the pages in
its computation, leading to an invalid signature check.
This patch fixes this by not setting the async read smb_rqst page data
(zeroed by default) if its got_bytes is 0.
This can be reproduced/verified with xfstests generic/465.
Cc: <stable@vger.kernel.org>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The ACPI PRM address space handler calls efi_call_virt_pointer() to
execute PRM firmware code, but doing so is only permitted when the EFI
runtime environment is available. Otherwise, such calls are guaranteed
to result in a crash, and must therefore be avoided.
Given that the EFI runtime services may become unavailable after a crash
occurring in the firmware, we need to check this each time the PRM
address space handler is invoked. If the EFI runtime services were not
available at registration time to being with, don't install the address
space handler at all.
Fixes: cefc7ca462 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull affs fix from David Sterba:
"One minor fix for a KCSAN report"
* tag 'affs-for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: initialize fsdata in affs_truncate()
Pull erofs fixes from Gao Xiang:
"Two patches fixes issues reported by syzbot, one fixes a missing
`domain_id` mount option in documentation and a minor cleanup:
- Fix wrong iomap->length calculation post EOF, which could cause a
WARN_ON in iomap_iter_done() (Siddh)
- Fix improper kvcalloc() use with __GFP_NOFAIL (me)
- Add missing `domain_id` mount option in documentation (Jingbo)
- Clean up fscache option parsing (Jingbo)"
* tag 'erofs-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clean up parsing of fscache related options
erofs: add documentation for 'domain_id' mount option
erofs: fix kvcalloc() misuse with __GFP_NOFAIL
erofs/zmap.c: Fix incorrect offset calculation
Pull LoongArch fixes from Huacai Chen:
"Fix a missing elf_hwcap, fix some stack unwinder bugs and two trivial
cleanups"
* tag 'loongarch-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Add generic ex-handler unwind in prologue unwinder
LoongArch: Strip guess unwinder out from prologue unwinder
LoongArch: Use correct sp value to get graph addr in stack unwinders
LoongArch: Get frame info in unwind_start() when regs is not available
LoongArch: Adjust PC value when unwind next frame in unwinder
LoongArch: Simplify larch_insn_gen_xxx implementation
LoongArch: Use common function sign_extend64()
LoongArch: Add HWCAP_LOONGARCH_CPUCFG to elf_hwcap
Terminate vdesc when terminating an ongoing transfer.
This will ensure that the vdesc is present in the desc_terminated list
The descriptor will be freed later in desc_free_list().
This fixes the memory leaks which can happen when terminating an
ongoing transfer.
Fixes: ee17028009 ("dmaengine: tegra: Add tegra gpcdma driver")
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Link: https://lore.kernel.org/r/20230118115801.15210-1-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
lookup_cache_entry() might return an error different than -ENOENT
(e.g. from ->char2uni), so handle those as well in
cache_refresh_path().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The logic for creating or updating a cache entry in __refresh_tcon()
could be simply done with cache_refresh_path(), so use it instead.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Avoid contention while updating dfs target hints. This should be
perfectly fine to update them under shared locks.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Simply downgrade the write lock on cache updates from
cache_refresh_path() and avoid unnecessary re-lookup in
dfs_cache_find().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Avoid getting DFS referral from an exclusive lock in
cache_refresh_path() because the tcon IPC used for getting the
referral could be disconnected and thus causing a deadlock as shown
below:
task A task B
====== ======
cifs_demultiplex_thread() dfs_cache_find()
cifs_handle_standard() cache_refresh_path()
reconnect_dfs_server() down_write()
dfs_cache_noreq_find() get_dfs_referral()
down_read() <- deadlock smb2_get_dfs_refer()
SMB2_ioctl()
cifs_send_recv()
compound_send_recv()
wait_for_response()
where task A cannot wake up task B because it is blocked on
down_read() due to the exclusive lock held in cache_refresh_path() and
therefore not being able to make progress.
Fixes: c9f7110399 ("cifs: keep referral server sessions alive")
Reviewed-by: Aurélien Aptel <aurelien.aptel@gmail.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
betopff_init() only checks the total sum of the report counts for each
report field to be at least 4, but hid_betopff_play() expects 4 report
fields.
A device advertising an output report with one field and 4 report counts
would pass the check but crash the kernel with a NULL pointer dereference
in hid_betopff_play().
Fixes: 52cd7785f3 ("HID: betop: add drivers/hid/hid-betopff.c")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
This series contains one fix (first patch) followed by a nice to have safety
belts in case we get a widget from topology which is not handled by SOF and will
not have corresponding swidget associated with.
commit 1796f808e4 ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
changed the policy such that I2C touchpads may be able to wake up the
system by default if the system is configured as such.
However on Clevo NL5xRU there is a mistake in the ACPI tables that the
TP_ATTN# signal connected to GPIO 9 is configured as ActiveLow and level
triggered but connected to a pull up. As soon as the system suspends the
touchpad loses power and then the system wakes up.
To avoid this problem, introduce a quirk for this model that will prevent
the wakeup capability for being set for GPIO 9.
Fixes: 1796f808e4 ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
Reported-by: Werner Sembach <wse@tuxedocomputers.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1722#note_1720627
Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
virtqueue callback via the following statement:
do {
if (use_napi)
virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, false);
} while (use_napi && kick &&
unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
When NAPI is used and kick is false, the callback won't be enabled
here. And when the virtqueue is about to be full, the tx will be
disabled, but we still don't enable tx interrupt which will cause a TX
hang. This could be observed when using pktgen with burst enabled.
TO be consistent with the logic that tries to disable cb only for
NAPI, fixing this by trying to enable delayed callback only when NAPI
is enabled when the queue is about to be full.
Fixes: a7766ef18b ("virtio_net: disable cb aggressively")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP TX timestamp handling was observed to be broken with this driver
when using the raw Layer 2 PTP encapsulation. ptp4l was not receiving
the expected TX timestamp after transmitting a packet, causing it to
enter a failure state.
The problem appears to be due to the way that the driver pads packets
which are smaller than the Ethernet minimum of 60 bytes. If headroom
space was available in the SKB, this caused the driver to move the data
back to utilize it. However, this appears to cause other data references
in the SKB to become inconsistent. In particular, this caused the
ptp_one_step_sync function to later (in the TX completion path) falsely
detect the packet as a one-step SYNC packet, even when it was not, which
caused the TX timestamp to not be processed when it should be.
Using the headroom for this purpose seems like an unnecessary complexity
as this is not a hot path in the driver, and in most cases it appears
that there is sufficient tailroom to not require using the headroom
anyway. Remove this usage of headroom to prevent this inconsistency from
occurring and causing other problems.
Fixes: 653e92a917 ("net: macb: add support for padding and fcs computation")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> # on SAMA7G5
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perf test "build id cache operations" fails for PE executable. Logs
below from powerpc system. Same is observed on x86 as well.
<<>>
Adding 5a0fd882b53084224ba47b624c55a469 ./tests/shell/../pe-file.exe: Ok
build id: 5a0fd882b53084224ba47b624c55a469
link: /tmp/perf.debug.w0V/.build-id/5a/0fd882b53084224ba47b624c55a469
file: /tmp/perf.debug.w0V/.build-id/5a/../../root/<user>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
failed: file /tmp/perf.debug.w0V/.build-id/5a/../../root/<user>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf does not exist
test child finished with -1
---- end ----
build id cache operations: FAILED!
<<>>
The test tries to do:
<<>>
mkdir /tmp/perf.debug.TeY1
perf --buildid-dir /tmp/perf.debug.TeY1 buildid-cache -v -a ./tests/shell/../pe-file.exe
<<>>
The option "--buildid-dir" sets the build id cache directory as
/tmp/perf.debug.TeY1. The option given to buildid-cahe, ie "-a
./tests/shell/../pe-file.exe", is to add the pe-file.exe to the cache.
The testcase, sets buildid-dir and adds the file: pe-file.exe to build
id cache. To check if the command is run successfully, "check" function
looks for presence of the file in buildid cache directory. But the check
here expects the added file to be executable. Snippet below:
<<>>
if [ ! -x $file ]; then
echo "failed: file ${file} does not exist"
exit 1
fi
<<>>
The buildid test is done for sha1 binary, md5 binary and also for PE
file. The first two binaries are created at runtime by compiling with
"--build-id" option and hence the check for sha1/md5 test should use [ !
-x ]. But in case of PE file, the permission for this input file is
rw-r--r-- Hence the file added to build id cache has same permissoin
Original file:
ls tests/pe-file.exe | xargs stat --printf "%n %A \n"
tests/pe-file.exe -rw-r--r--
buildid cache file:
ls /tmp/perf.debug.w0V/.build-id/5a/../../root/<user>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf | xargs stat --printf "%n %A \n"
/tmp/perf.debug.w0V/.build-id/5a/../../root/<user>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf -rw-r--r--
Fix the test to match with the permission of original file in case of FE
file. ie if the "tests/pe-file.exe" file is not having exec permission,
just check for existence of the buildid file using [ ! -e <file> ]
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230116050131.17221-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pablo Niera Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Fix syn-retransmits until initiator gives up when connection is re-used
due to rst marked as invalid, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The test "build id cache operations" fails on powerpc as below:
Adding 5a0fd882b53084224ba47b624c55a469 ./tests/shell/../pe-file.exe: Ok
build id: 5a0fd882b53084224ba47b624c55a469
link: /tmp/perf.debug.ZTu/.build-id/5a/0fd882b53084224ba47b624c55a469
file: /tmp/perf.debug.ZTu/.build-id/5a/../../root/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
failed: file /tmp/perf.debug.ZTu/.build-id/5a/../../root/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf does not exist
test child finished with -1
---- end ----
build id cache operations: FAILED!
The failing test is when trying to add pe-file.exe to build id cache.
'perf buildid-cache' can be used to add/remove/manage files from the
build-id cache. "-a" option is used to add a file to the build-id cache.
Simple command to do so for a PE exe file:
# ls -ltr tests/pe-file.exe
-rw-r--r--. 1 root root 75595 Jan 10 23:35 tests/pe-file.exe
The file is in home directory.
# mkdir /tmp/perf.debug.TeY1
# perf --buildid-dir /tmp/perf.debug.TeY1 buildid-cache -v -a tests/pe-file.exe
The above will create ".build-id" folder in build id directory, which is
/tmp/perf.debug.TeY1. Also adds file to this folder under build id.
Example:
# ls -ltr /tmp/perf.debug.TeY1/.build-id/5a/0fd882b53084224ba47b624c55a469/
total 76
-rw-r--r--. 1 root root 0 Jan 11 00:38 probes
-rwxr-xr-x. 1 root root 75595 Jan 11 00:38 elf
We can see in the results that file mode for original file and file in
build id directory is different. ie, build id file has executable
permission whereas original file doesn’t have.
The code path and function (build_id_cache__add to add a file to the
cache is in "util/build-id.c". In build_id_cache__add() function, it
first attempts to link the original file to destination cache folder.
If linking the file fails (which can happen if the destination and
source is on a different mount points), it will copy the file to
destination. Here copyfile() routine explicitly uses mode as "755" and
hence file in the destination will have executable permission.
Code snippet:
if (link(realname, filename) && errno != EEXIST && copyfile(name, filename))
strace logs:
172285 link("/home/<user_name>/linux/tools/perf/tests/pe-file.exe", "/tmp/perf.debug.TeY1/home/<user_name>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf") = -1 EXDEV (Invalid cross-device link)
172285 newfstatat(AT_FDCWD, "tests/pe-file.exe", {st_mode=S_IFREG|0644, st_size=75595, ...}, 0) = 0
172285 openat(AT_FDCWD, "/tmp/perf.debug.TeY1/home/<user_name>/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/.elf.KbAnsl", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
172285 fchmod(3, 0755) = 0
172285 openat(AT_FDCWD, "tests/pe-file.exe", O_RDONLY) = 4
172285 mmap(NULL, 75595, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fffa5cd0000
172285 pwrite64(3, "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 75595, 0) = 75595
Whereas if the link succeeds, it succeeds in the first attempt itself
and the file in the build-id dir will have same permission as original
file.
Example, above uses /tmp. Instead if we use "--buildid-dir /home/build",
linking will work here since mount points are same. Hence the
destination file will not have executable permission.
Since the testcase "tests/shell/buildid.sh" always looks for executable
file, test fails in powerpc environment when test is run from /root.
The patch adds a change in build_id_cache__add() to use copyfile_mode()
which also passes the file’s original mode as argument. This way the
destination file mode also will be same as original file.
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230116050131.17221-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
b5f0de6df6 ("net: dev: Convert sa_data to flexible array in struct sockaddr")
That don't result in any changes in the tables generated from that
header.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
9cb1096f85 ("KVM: arm64: Enable ring-based dirty memory tracking")
That doesn't result in any changes in tooling (built on a Libre Computer
Firefly ROC-RK3399-PC-V1.1-A running Ubuntu 22.04), only addresses this
perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/Y8fmIT5PIfGaZuwa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the function sdma_load_context() fails, the sdma_desc will be
freed, but the allocated desc->bd is forgot to be freed.
We already met the sdma_load_context() failure case and the log as
below:
[ 450.699064] imx-sdma 30bd0000.dma-controller: Timeout waiting for CH0 ready
...
In this case, the desc->bd will not be freed without this change.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20221130090800.102035-1-hui.wang@canonical.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
On ARMv5 and earlier, a randconfig build can still run into
WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE
Depends on [n]: IOMMU_SUPPORT [=y] && (ARM [=y] || ARM64 || COMPILE_TEST [=y]) && !GENERIC_ATOMIC64 [=y]
Selected by [y]:
- DRM_PANFROST [=y] && HAS_IOMEM [=y] && DRM [=y] && (ARM [=y] || ARM64 || COMPILE_TEST [=y] && !GENERIC_ATOMIC64 [=y]) && MMU [=y]
Rework the dependencies to always require a working cmpxchg64.
Fixes: db594ba3fc ("drm/panfrost: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117164456.1591901-1-arnd@kernel.org
Make sure calibration values are defined to prevent potential kernel
crashes. This fixes a hypothetical issue for virtual or clone devices
inspired by a similar fix for DS4.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Some DualShock4 devices report invalid calibration data resulting
in kernel oopses due to division by zero during report handling.
The devices affected generally appear to be clone devices, which don't
implement all reports properly and don't populate proper calibration
data. The issue may have been seen on an official device with erased
calibration reports.
This patch prevents the crashes by essentially disabling calibration
when invalid values are detected.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Tested-by: Alain Carlucci <alain.carlucci@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Matthew Garrett is still listed as a efivarfs co-maintainer, but the
email address bounces, and Matt is no longer involved in maintaining
this code.
So let's remove Matt as a efivarfs co-maintainer from MAINTAINERS.
Thanks for all the hard work!
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eli Cohen <eli@mellanox.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit removes eswitch mode none. So after devlink reload
in switchdev mode, eswitch mode is not changed. But actually eswitch
is disabled during devlink reload.
Fix it by setting eswitch mode to legacy when disabling eswitch
which is called by reload_down.
Fixes: f019679ea5 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
ASO operations are global to whole IPsec as they share one DMA address
for all operations. As such all WQE operations need to be protected with
lock. In this case, it must be spinlock to allow mlx5e_ipsec_aso_query()
operate in atomic context.
Fixes: 1ed78fc033 ("net/mlx5e: Update IPsec soft and hard limits")
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
aso->use_cache variable introduced in commit 8c582ddfbb ("net/mlx5e: Handle
hardware IPsec limits events") was an optimization to skip recurrent calls
to mlx5e_ipsec_aso_query(). Such calls are possible when lifetime event is
generated:
-> mlx5e_ipsec_handle_event()
-> mlx5e_ipsec_aso_query() - first call
-> xfrm_state_check_expire()
-> mlx5e_xfrm_update_curlft()
-> mlx5e_ipsec_aso_query() - second call
However, such optimization not really effective as mlx5e_ipsec_aso_query()
is needed to be called for update ESN anyway, which was missed due to misplaced
use_cache assignment.
Fixes: cee137a634 ("net/mlx5e: Handle ESN update events")
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently decap action is set based on tunnel_id. That means it is
set unconditionally. But for decap, ct and sample actions, decap is
done before ct. No need to decap again in sample.
And the actions are set correctly when parsing. So set decap action
based on attr instead of tunnel_id.
Fixes: 2741f22309 ("net/mlx5e: TC, Support sample offload action for tunneled traffic")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
According to HW spec parent_element_id field should be reserved (0x0) when calling
MODIFY_SCHEDULING_ELEMENT command.
This patch remove the wrong initialization of reserved field, parent_element_id, on
mlx5_qos_update_node.
Fixes: 214baf2287 ("net/mlx5e: Support HTB offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
According to HW spec element_type, element_attributes and parent_element_id fields
should be reserved (0x0) when calling MODIFY_SCHEDULING_ELEMENT command.
This patch remove initialization of these fields when calling the command.
Fixes: bd77bf1cb5 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This validation function is relevant only for XSK cases, hence it
assumes to be called only with xsk != NULL.
Thus checking for invalid xsk pointer is redundant and misleads static
code analyzers.
This commit removes redundant xsk pointer check.
This solves the following smatch warning:
drivers/net/ethernet/mellanox/mlx5/core/en/params.c:481
mlx5e_mpwrq_validate_xsk() error: we previously assumed 'xsk' could be
null (see line 478)
Fixes: 6470d2e7e8 ("net/mlx5e: xsk: Use KSM for unaligned XSK")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If the cpio command is not available the error emitted by
gen_kheaders.so is not clear as all output of the call to cpio is
discarded:
GNU make 4.4:
GEN kernel/kheaders_data.tar.xz
find: 'standard output': Broken pipe
find: write error
make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127
make[1]: *** [scripts/Makefile.build:504: kernel] Error 2
GNU make < 4.4:
GEN kernel/kheaders_data.tar.xz
make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:504: kernel] Error 2
Add an explicit check that will trigger a clear message about the issue:
CHK kernel/kheaders_data.tar.xz
./kernel/gen_kheaders.sh: line 17: type: cpio: not found
The other commands executed by gen_kheaders.sh are part of a standard
installation, so they are not checked.
Reported-by: Amy Parker <apark0006@student.cerritos.edu>
Link: https://lore.kernel.org/lkml/CAPOgqxFva=tOuh1UitCSN38+28q3BNXKq19rEsVNPRzRqKqZ+g@mail.gmail.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix a buffer overflow in mgmt_mesh_add
- Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2
- Fix hci_qca shutdown on closed serdev
- Fix possible circular locking dependencies on ISO code
- Fix possible deadlock in rfcomm_sk_state_change
* tag 'for-net-2023-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
Bluetooth: ISO: Fix possible circular locking dependency
Bluetooth: hci_event: Fix Invalid wait context
Bluetooth: ISO: Fix possible circular locking dependency
Bluetooth: hci_sync: fix memory leak in hci_update_adv_data()
Bluetooth: hci_qca: Fix driver shutdown on closed serdev
Bluetooth: hci_conn: Fix memory leaks
Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2
Bluetooth: Fix a buffer overflow in mgmt_mesh_add()
====================
Link: https://lore.kernel.org/r/20230118002944.1679845-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
bpf 2023-01-16
We've added 6 non-merge commits during the last 8 day(s) which contain
a total of 6 files changed, 22 insertions(+), 24 deletions(-).
The main changes are:
1) Mitigate a Spectre v4 leak in unprivileged BPF from speculative
pointer-as-scalar type confusion, from Luis Gerhorst.
2) Fix a splat when pid 1 attaches a BPF program that attempts to
send killing signal to itself, from Hao Sun.
3) Fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
PERF_BPF_EVENT_PROG_UNLOAD events, from Paul Moore.
4) Fix BPF verifier warning triggered from invalid kfunc call in
backtrack_insn, also from Hao Sun.
5) Fix potential deadlock in htab_lock_bucket from same bucket index
but different map_locked index, from Tonghao Zhang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation
bpf: hash map, avoid deadlock with suitable hash mask
bpf: remove the do_idr_lock parameter from bpf_prog_free_id()
bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD
bpf: Skip task with pid=1 in send_signal_common()
bpf: Skip invalid kfunc call in backtrack_insn
====================
Link: https://lore.kernel.org/r/20230116230745.21742-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IPA interrupt can fire when pm_runtime is disabled due to it racing
with the PM suspend/resume code. This causes a splat in the interrupt
handler when it tries to call pm_runtime_get().
Explicitly disable the interrupt in our ->suspend callback, and
re-enable it in ->resume to avoid this. If there is an interrupt pending
it will be handled after resuming. The interrupt is a wake_irq, as a
result even when disabled if it fires it will cause the system to wake
from suspend as well as cancel any suspend transition that may be in
progress. If there is an interrupt pending, the ipa_isr_thread handler
will be called after resuming.
Fixes: 1aac309d32 ("net: ipa: use autosuspend")
Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230115175925.465918-1-caleb.connolly@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reports a possible deadlock in rfcomm_sk_state_change [1].
While rfcomm_sock_connect acquires the sk lock and waits for
the rfcomm lock, rfcomm_sock_release could have the rfcomm
lock and hit a deadlock for acquiring the sk lock.
Here's a simplified flow:
rfcomm_sock_connect:
lock_sock(sk)
rfcomm_dlc_open:
rfcomm_lock()
rfcomm_sock_release:
rfcomm_sock_shutdown:
rfcomm_lock()
__rfcomm_dlc_close:
rfcomm_k_state_change:
lock_sock(sk)
This patch drops the sk lock before calling rfcomm_dlc_open to
avoid the possible deadlock and holds sk's reference count to
prevent use-after-free after rfcomm_dlc_open completes.
Reported-by: syzbot+d7ce59...@syzkaller.appspotmail.com
Fixes: 1804fdf6e4 ("Bluetooth: btintel: Combine setting up MSFT extension")
Link: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 [1]
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes the following trace caused by attempting to lock
cmd_sync_work_lock while holding the rcu_read_lock:
kworker/u3:2/212 is trying to lock:
ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at:
hci_cmd_sync_queue+0xad/0x140
other info that might help us debug this:
context-{4:4}
4 locks held by kworker/u3:2/212:
#0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
process_one_work+0x4dc/0x9a0
#1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
at: process_one_work+0x4dc/0x9a0
#2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at:
hci_cc_le_set_cig_params+0x64/0x4f0
#3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at:
hci_cc_le_set_cig_params+0x2f9/0x4f0
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When hci_cmd_sync_queue() failed in hci_update_adv_data(), inst_ptr is
not freed, which will cause memory leak, convert to use ERR_PTR/PTR_ERR
to pass the instance to callback so no memory needs to be allocated.
Fixes: 651cd3d65b ("Bluetooth: convert hci_update_adv_data to hci_sync")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The driver shutdown callback (which sends EDL_SOC_RESET to the device
over serdev) should not be invoked when HCI device is not open (e.g. if
hci_dev_open_sync() failed), because the serdev and its TTY are not open
either. Also skip this step if device is powered off
(qca_power_shutdown()).
The shutdown callback causes use-after-free during system reboot with
Qualcomm Atheros Bluetooth:
Unable to handle kernel paging request at virtual address
0072662f67726fd7
...
CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G W
6.1.0-rt5-00325-g8a5f56bcfcca #8
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Call trace:
tty_driver_flush_buffer+0x4/0x30
serdev_device_write_flush+0x24/0x34
qca_serdev_shutdown+0x80/0x130 [hci_uart]
device_shutdown+0x15c/0x260
kernel_restart+0x48/0xac
KASAN report:
BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50
Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1
CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted
6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Call trace:
dump_backtrace.part.0+0xdc/0xf0
show_stack+0x18/0x30
dump_stack_lvl+0x68/0x84
print_report+0x188/0x488
kasan_report+0xa4/0xf0
__asan_load8+0x80/0xac
tty_driver_flush_buffer+0x1c/0x50
ttyport_write_flush+0x34/0x44
serdev_device_write_flush+0x48/0x60
qca_serdev_shutdown+0x124/0x274
device_shutdown+0x1e8/0x350
kernel_restart+0x48/0xb0
__do_sys_reboot+0x244/0x2d0
__arm64_sys_reboot+0x54/0x70
invoke_syscall+0x60/0x190
el0_svc_common.constprop.0+0x7c/0x160
do_el0_svc+0x44/0xf0
el0_svc+0x2c/0x6c
el0t_64_sync_handler+0xbc/0x140
el0t_64_sync+0x190/0x194
Fixes: 7e7bbddd02 ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When hci_cmd_sync_queue() failed in hci_le_terminate_big() or
hci_le_big_terminate(), the memory pointed by variable d is not freed,
which will cause memory leak. Add release process to error path.
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Don't try to use HCI_OP_LE_READ_BUFFER_SIZE_V2 if controller don't
support ISO channels, but in order to check if ISO channels are
supported HCI_OP_LE_READ_LOCAL_FEATURES needs to be done earlier so the
features bits can be checked on hci_le_read_buffer_size_sync.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216817
Fixes: c1631dbc00 ("Bluetooth: hci_sync: Fix hci_read_buffer_size_sync")
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Smatch Warning:
net/bluetooth/mgmt_util.c:375 mgmt_mesh_add() error: __memcpy()
'mesh_tx->param' too small (48 vs 50)
Analysis:
'mesh_tx->param' is array of size 48. This is the destination.
u8 param[sizeof(struct mgmt_cp_mesh_send) + 29]; // 19 + 29 = 48.
But in the caller 'mesh_send' we reject only when len > 50.
len > (MGMT_MESH_SEND_SIZE + 31) // 19 + 31 = 50.
Fixes: b338d91703 ("Bluetooth: Implement support for Mesh")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When a connection is re-used, following can happen:
[ connection starts to close, fin sent in either direction ]
> syn # initator quickly reuses connection
< ack # peer sends a challenge ack
> rst # rst, sequence number == ack_seq of previous challenge ack
> syn # this syn is expected to pass
Problem is that the rst will fail window validation, so it gets
tagged as invalid.
If ruleset drops such packets, we get repeated syn-retransmits until
initator gives up or peer starts responding with syn/ack.
Before the commit indicated in the "Fixes" tag below this used to work:
The challenge-ack made conntrack re-init state based on the challenge
ack itself, so the following rst would pass window validation.
Add challenge-ack support: If we get ack for syn, record the ack_seq,
and then check if the rst sequence number matches the last ack number
seen in reverse direction.
Fixes: c7aab4f170 ("netfilter: nf_conntrack_tcp: re-init for syn packets only")
Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
msi_create_device_irq_domain() creates a firmware node for the new domain,
which is never freed. kmemleak reports:
unreferenced object 0xffff888120ba9a00 (size 96):
comm "systemd-modules", pid 221, jiffies 4294893411 (age 635.732s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff ................
00 00 00 00 00 00 00 00 18 9a ba 20 81 88 ff ff ........... ....
backtrace:
[<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
[<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
[<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
[<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
[<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
[<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
[<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
[<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
[<00000000df8efb43>] local_pci_probe+0xd6/0x170
[<0000000085cb9924>] pci_device_probe+0x231/0x6e0
Use the proper free operation for the firmware wnode so the name is freed
during error unwind of msi_create_device_irq_domain() and also free the
node in msi_remove_device_irq_domain() if it was automatically allocated.
To avoid extra NULL pointer checks make irq_domain_free_fwnode() tolerant
of NULL.
Fixes: 27a6dea3eb ("genirq/msi: Provide msi_create/free_device_irq_domain()")
Reported-by: Omri Barazi <obarazi@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kalle Valo <kvalo@kernel.org>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-24af6665e2da+c9-msi_leak_jgg@nvidia.com
To pick the changes in:
8aff460f21 ("KVM: x86: Add a VALID_MASK for the flags in kvm_msr_filter_range")
c1340fe359 ("KVM: x86: Add a VALID_MASK for the flag in kvm_msr_filter")
be83794210 ("KVM: x86: Disallow the use of KVM_MSR_FILTER_DEFAULT_ALLOW in the kernel")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Y8VR5wSAkd2A0HxS@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
b0305c1e0e ("KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Y7Loj5slB908QSXf@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
gcc-13 notices a type mismatch between function declaration
and definition for a few functions that have been converted
from returning vchiq specific status values to regular error
codes:
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:662:5: error: conflicting types for 'vchiq_initialise' due to enum/integer mismatch; have 'int(struct vchiq_instance **)' [-Werror=enum-int-mismatch]
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:1411:1: error: conflicting types for 'vchiq_use_internal' due to enum/integer mismatch; have 'int(struct vchiq_state *, struct vchiq_service *, enum USE_TYPE_E)' [-Werror=enum-int-mismatch]
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:1468:1: error: conflicting types for 'vchiq_release_internal' due to enum/integer mismatch; have 'int(struct vchiq_state *, struct vchiq_service *)' [-Werror=enum-int-mismatch]
Change the declarations to match the actual function definition.
Fixes: a9fbd828be ("staging: vchiq_arm: drop enum vchiq_status from vchiq_*_internal")
Cc: stable <stable@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230117163957.1109872-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 11.1.0 and 11.2.0 generate a wrong warning when compiling the
kernel e.g. with allmodconfig:
arch/s390/kernel/setup.c: In function ‘setup_lowcore_dat_on’:
./include/linux/fortify-string.h:57:33: error: ‘__builtin_memcpy’ reading 128 bytes from a region of size 0 [-Werror=stringop-overread]
...
arch/s390/kernel/setup.c:526:9: note: in expansion of macro ‘memcpy’
526 | memcpy(abs_lc->cregs_save_area, S390_lowcore.cregs_save_area,
| ^~~~~~
This could be addressed by using absolute_pointer() with the
S390_lowcore macro, but this is not a good idea since this generates
worse code for performance critical paths.
Therefore simply use a for loop to copy the array in question and get
rid of the warning.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Pull nfsd fixes from Chuck Lever:
- Fix recently introduced use-after-free bugs
* tag 'nfsd-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
NFSD: fix use-after-free in nfsd4_ssc_setup_dul()
Pull tomoyo fixes from Tetsuo Handa:
"Makefile and Kconfig updates for TOMOYO"
* tag 'tomoyo-pr-20230117' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: Update website link
tomoyo: Remove "select SRCU"
tomoyo: Omit use of bin2c
tomoyo: avoid unneeded creation of builtin-policy.h
tomoyo: fix broken dependency on *.conf.default
This patch is fix for Linux kernel v2.6.33 or later.
For request subaction to IEC 61883-1 FCP region, Linux FireWire subsystem
have had an issue of use-after-free. The subsystem allows multiple
user space listeners to the region, while data of the payload was likely
released before the listeners execute read(2) to access to it for copying
to user space.
The issue was fixed by a commit 281e20323a ("firewire: core: fix
use-after-free regression in FCP handler"). The object of payload is
duplicated in kernel space for each listener. When the listener executes
ioctl(2) with FW_CDEV_IOC_SEND_RESPONSE request, the object is going to
be released.
However, it causes memory leak since the commit relies on call of
release_request() in drivers/firewire/core-cdev.c. Against the
expectation, the function is never called due to the design of
release_client_resource(). The function delegates release task
to caller when called with non-NULL fourth argument. The implementation
of ioctl_send_response() is the case. It should release the object
explicitly.
This commit fixes the bug.
Cc: <stable@vger.kernel.org>
Fixes: 281e20323a ("firewire: core: fix use-after-free regression in FCP handler")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20230117090610.93792-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When there are no read queues read requests will be assigned a
default queue on allocation. However, blk_mq_get_cached_request() is not
prepared for that and will fail all attempts to grab read requests from
the cache. Worst case it doubles the number of requests allocated,
roughly half of which will be returned by blk_mq_free_plug_rqs().
It only affects batched allocations and so is io_uring specific.
For reference, QD8 t/io_uring benchmark improves by 20-35%.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/80d4511011d7d4751b4cf6375c4e38f237d935e3.1673955390.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The Texas Instruments TUSB8041 has an autosuspend problem at high
temperature.
If there is not USB traffic, after a couple of ms, the device enters in
autosuspend mode. In this condition the external clock stops working, to
save energy. When the USB activity turns on, ther hub exits the
autosuspend state, the clock starts running again and all works fine.
At ambient temperature all works correctly, but at high temperature,
when the USB activity turns on, the external clock doesn't restart and
the hub disappears from the USB bus.
Disabling the autosuspend mode for this hub solves the issue.
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Cc: stable <stable@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20221219124759.3207032-1-f.suligoi@asem.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The struct device driver-data pointer is used for any data that a driver
may need in various callbacks while bound to the device. For
convenience, subsystems typically provide wrappers such as
usb_set_intfdata() of the generic accessor functions for use in bus
callbacks.
There is generally no longer any need for a driver to clear the pointer,
but since commit 0998d06310 ("device-core: Ensure drvdata = NULL when
no driver is bound") the driver-data pointer is set to NULL by driver
core post unbind anyway.
For historical reasons, USB core also clears this pointer when an
explicitly claimed interface is released.
Due to a misunderstanding, a misleading kernel doc comment for
usb_set_intfdata() was recently added which claimed that the driver data
pointer must not be cleared during disconnect before "all actions [are]
completed", which is both imprecise and incorrect.
Specifically, drivers like cdc-acm which claim additional interfaces use
the driver-data pointer as a flag which is cleared when the first
interface is unbound. As long as a driver does not do something odd like
dereference the pointer in, for example, completion callbacks, this can
be done at any time during disconnect. And in any case this is no
different than for any other resource, like the driver data itself,
which may be freed by the disconnect callback.
Note that the comment actually also claimed that the interface itself
was somehow being set to NULL by driver core.
Fix the kernel doc by removing incorrect, overly specific and misleading
details and adding a comment about why some drivers do clear the
driver-data pointer.
Fixes: 27ef178497 ("usb: add usb_set_intfdata() documentation")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221212152035.31806-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In Google internal bug 265639009 we've received an (as yet) unreproducible
crash report from an aarch64 GKI 5.10.149-android13 running device.
AFAICT the source code is at:
https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13-5.10
The call stack is:
ncm_close() -> ncm_notify() -> ncm_do_notify()
with the crash at:
ncm_do_notify+0x98/0x270
Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)
Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):
// halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification)
0B 0D 00 79 strh w11, [x8, #6]
// word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request)
6C 0A 00 B9 str w12, [x19, #8]
// x10 (NULL) was read here from offset 0 of valid pointer x9
// IMHO we're reading 'cdev->gadget' and getting NULL
// gadget is indeed at offset 0 of struct usb_composite_dev
2A 01 40 F9 ldr x10, [x9]
// loading req->buf pointer, which is at offset 0 of struct usb_request
69 02 40 F9 ldr x9, [x19]
// x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed
4B 5D 40 B9 ldr w11, [x10, #0x5c]
which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:
event->wLength = cpu_to_le16(8);
req->length = NCM_STATUS_BYTECOUNT;
/* SPEED_CHANGE data is up/down speeds in bits/sec */
data = req->buf + sizeof *event;
data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
My analysis of registers and NULL ptr deref crash offset
(Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c)
heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing:
data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
which calls:
ncm_bitrate(NULL)
which then calls:
gadget_is_superspeed(NULL)
which reads
((struct usb_gadget *)NULL)->max_speed
and hits a panic.
AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C.
(remember there's a GKI KABI reservation of 16 bytes in struct work_struct)
It's not at all clear to me how this is all supposed to work...
but returning 0 seems much better than panic-ing...
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20230117131839.1138208-1-maze@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's the altmode re-registeration issue after data role
swap (DR_SWAP).
Comparing to USBPD 2.0, in USBPD 3.0, it loose the limit that only DFP
can initiate the VDM command to get partner identity information.
For a USBPD 3.0 UFP device, it may already get the identity information
from its port partner before DR_SWAP. If DR_SWAP send or receive at the
mean time, 'send_discover' flag will be raised again. It causes discover
identify action restart while entering ready state. And after all
discover actions are done, the 'tcpm_register_altmodes' will be called.
If old altmode is not unregistered, this sysfs create fail can be found.
In 'DR_SWAP_CHANGE_DR' state case, only DFP will unregister altmodes.
For UFP, the original altmodes keep registered.
This patch fix the logic that after DR_SWAP, 'tcpm_unregister_altmodes'
must be called whatever the current data role is.
Reviewed-by: Macpaul Lin <macpaul.lin@mediatek.com>
Fixes: ae8a2ca8a2 ("usb: typec: Group all TCPCI/TCPM code together")
Reported-by: TommyYl Chen <tommyyl.chen@mediatek.com>
Cc: stable@vger.kernel.org
Signed-off-by: ChiYuan Huang <cy_huang@richtek.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/1673248790-15794-1-git-send-email-cy_huang@richtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the color matching descriptor is only sent across the wire
a single time, following the descriptors for each format and frame.
According to the UVC 1.5 Specification 3.9.2.6 ("Color Matching
Descriptors"):
"Only one instance is allowed for a given format and if present,
the Color Matching descriptor shall be placed following the Video
and Still Image Frame descriptors for that format".
Add another reference to the color matching descriptor after the
yuyv frames so that it's correctly transmitted for that format
too.
Fixes: a9914127e8 ("USB gadget: Webcam device")
Cc: stable <stable@kernel.org>
Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Link: https://lore.kernel.org/r/20221216160528.479094-1-dan.scally@ideasonboard.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As per the documentation, function usb_ep_free_request guarantees
the request will not be queued or no longer be re-queued (or
otherwise used). However, with the current implementation it
doesn't make sure that the request in ep0 isn't reused.
Fix this by dequeuing the ep0req on functionfs_unbind before
freeing the request to align with the definition.
Fixes: ddf8abd259 ("USB: f_fs: the FunctionFS driver")
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Tested-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Link: https://lore.kernel.org/r/20221215052906.8993-3-quic_ugoswami@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While performing fast composition switch, there is a possibility that the
process of ffs_ep0_write/ffs_ep0_read get into a race condition
due to ep0req being freed up from functionfs_unbind.
Consider the scenario that the ffs_ep0_write calls the ffs_ep0_queue_wait
by taking a lock &ffs->ev.waitq.lock. However, the functionfs_unbind isn't
bounded so it can go ahead and mark the ep0req to NULL, and since there
is no NULL check in ffs_ep0_queue_wait we will end up in use-after-free.
Fix this by making a serialized execution between the two functions using
a mutex_lock(ffs->mutex).
Fixes: ddf8abd259 ("USB: f_fs: the FunctionFS driver")
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Tested-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Link: https://lore.kernel.org/r/20221215052906.8993-2-quic_ugoswami@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently each onboard_hub platform device owns an 'attach' work,
which is scheduled when the device probes. With this deadlocks
have been reported on a Raspberry Pi 3 B+ [1], which has nested
onboard hubs.
The flow of the deadlock is something like this (with the onboard_hub
driver built as a module) [2]:
- USB root hub is instantiated
- core hub driver calls onboard_hub_create_pdevs(), which creates the
'raw' platform device for the 1st level hub
- 1st level hub is probed by the core hub driver
- core hub driver calls onboard_hub_create_pdevs(), which creates
the 'raw' platform device for the 2nd level hub
- onboard_hub platform driver is registered
- platform device for 1st level hub is probed
- schedules 'attach' work
- platform device for 2nd level hub is probed
- schedules 'attach' work
- onboard_hub USB driver is registered
- device (and parent) lock of hub is held while the device is
re-probed with the onboard_hub driver
- 'attach' work (running in another thread) calls driver_attach(), which
blocks on one of the hub device locks
- onboard_hub_destroy_pdevs() is called by the core hub driver when one
of the hubs is detached
- destroying the pdevs invokes onboard_hub_remove(), which waits for the
'attach' work to complete
- waits forever, since the 'attach' work can't acquire the device lock
Use a single work struct for the driver instead of having a work struct
per onboard hub platform driver instance. With that it isn't necessary
to cancel the work in onboard_hub_remove(), which fixes the deadlock.
The work is only cancelled when the driver is unloaded.
[1] https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
[2] https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
Cc: stable@vger.kernel.org
Fixes: 8bc063641c ("usb: misc: Add onboard_usb_hub driver")
Link: https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
Link: https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Link: https://lore.kernel.org/r/20230110172954.v2.2.I16b51f32db0c32f8a8532900bfe1c70c8572881a@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The onboard_hub 'driver' consists of two drivers, a platform
driver and a USB driver. Currently when the onboard hub driver
is initialized it first registers the platform driver, then the
USB driver. This results in a race condition when the 'attach'
work is executed, which is scheduled when the platform device
is probed. The purpose of fhe 'attach' work is to bind elegible
USB hub devices to the onboard_hub USB driver. This fails if
the work runs before the USB driver has been registered.
Register the USB driver first, then the platform driver. This
increases the chances that the onboard_hub USB devices are probed
before their corresponding platform device, which the USB driver
tries to locate in _probe(). The driver already handles this
situation and defers probing if the onboard hub platform device
doesn't exist yet.
Cc: stable@vger.kernel.org
Fixes: 8bc063641c ("usb: misc: Add onboard_usb_hub driver")
Link: https://lore.kernel.org/lkml/Y6W00vQm3jfLflUJ@hovoldconsulting.com/T/#m0d64295f017942fd988f7c53425db302d61952b4
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/20230110172954.v2.1.I75494ebee7027a50235ce4b1e930fa73a578fbe2@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During ucsi_unregister() when destroying a connector's workqueue, there
may still be pending delayed work items that haven't been scheduled yet.
Because queue_delayed_work() uses a separate timer to schedule a work
item, the destroy_workqueue() call is not aware of any pending items.
Hence when a pending item's timer expires it would then try to queue on
a dangling workqueue pointer.
Fix this by keeping track of all work items in a list, so that prior to
destroying the workqueue any pending items can be flushed. Do this by
calling mod_delayed_work() as that will cause pending items to get
queued immediately, which then allows the ensuing destroy_workqueue() to
implicitly drain all currently queued items to completion and free
themselves.
Fixes: b9aa02ca39 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking")
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Co-developed-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Jack Pham <quic_jackp@quicinc.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230110071218.26261-1-quic_jackp@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
USB3 ports on xHC hosts may have retimers that cause too long
exit latency to work with native USB3 U1/U2 link power management states.
For now only use usb_acpi_port_lpm_incapable() to evaluate if port lpm
should be disabled while setting up the USB3 roothub.
Other ways to identify lpm incapable ports can be added here later if
ACPI _DSM does not exist.
Limit this to Intel hosts for now, this is to my knowledge only
an Intel issue.
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230116142216.1141605-8-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One USB3 roothub port may support link power management, while another
root port on the same xHC can't due to different retimers used for
the ports.
This is the case with Intel Alder Lake, and possible future platforms
where retimers used for USB4 ports cause too long exit latecy to
enable native USB3 lpm U1 and U2 states.
Add a flag in the xhci port structure to indicate if the port is
lpm_incapable, and check it while calculating exit latency.
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230116142216.1141605-6-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allow PCI hosts to check and tune roothub and port settings
before the hub is up and running.
This override is needed to turn off U1 and U2 LPM for some ports
based on per port ACPI _DSM, _UPC, or possibly vendor specific mmio
values for Intel xHC hosts.
Usb core calls the host update_hub_device once it creates a hub.
Entering U1 or U2 link power save state on ports with this limitation
will cause link to fail, turning the usb device unusable in that setup.
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230116142216.1141605-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure xhci_free_dev() and xhci_kill_endpoint_urbs() do not race
and cause null pointer dereference when host suddenly dies.
Usb core may call xhci_free_dev() which frees the xhci->devs[slot_id]
virt device at the same time that xhci_kill_endpoint_urbs() tries to
loop through all the device's endpoints, checking if there are any
cancelled urbs left to give back.
hold the xhci spinlock while freeing the virt device
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230116142216.1141605-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the host controller is not responding, all URBs queued to all
endpoints need to be killed. This can cause a kernel panic if we
dereference an invalid endpoint.
Fix this by using xhci_get_virt_ep() helper to find the endpoint and
checking if the endpoint is valid before dereferencing it.
[233311.853271] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead
[233311.853393] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
[233311.853964] pc : xhci_hc_died+0x10c/0x270
[233311.853971] lr : xhci_hc_died+0x1ac/0x270
[233311.854077] Call trace:
[233311.854085] xhci_hc_died+0x10c/0x270
[233311.854093] xhci_stop_endpoint_command_watchdog+0x100/0x1a4
[233311.854105] call_timer_fn+0x50/0x2d4
[233311.854112] expire_timers+0xac/0x2e4
[233311.854118] run_timer_softirq+0x300/0xabc
[233311.854127] __do_softirq+0x148/0x528
[233311.854135] irq_exit+0x194/0x1a8
[233311.854143] __handle_domain_irq+0x164/0x1d0
[233311.854149] gic_handle_irq.22273+0x10c/0x188
[233311.854156] el1_irq+0xfc/0x1a8
[233311.854175] lpm_cpuidle_enter+0x25c/0x418 [msm_pm]
[233311.854185] cpuidle_enter_state+0x1f0/0x764
[233311.854194] do_idle+0x594/0x6ac
[233311.854201] cpu_startup_entry+0x7c/0x80
[233311.854209] secondary_start_kernel+0x170/0x198
Fixes: 50e8725e7c ("xhci: Refactor command watchdog and fix split string.")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Hu <hhhuuu@google.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Message-ID: <0fe978ed-8269-9774-1c40-f8a98c17e838@linux.intel.com>
Link: https://lore.kernel.org/r/20230116142216.1141605-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The syzbot fuzzer and Gerald Lee have identified a use-after-free bug
in the gadgetfs driver, involving processes concurrently mounting and
unmounting the gadgetfs filesystem. In particular, gadgetfs_fill_super()
can race with gadgetfs_kill_sb(), causing the latter to deallocate
the_device while the former is using it. The output from KASAN says,
in part:
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:102 [inline]
BUG: KASAN: use-after-free in atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:176 [inline]
BUG: KASAN: use-after-free in __refcount_sub_and_test include/linux/refcount.h:272 [inline]
BUG: KASAN: use-after-free in __refcount_dec_and_test include/linux/refcount.h:315 [inline]
BUG: KASAN: use-after-free in refcount_dec_and_test include/linux/refcount.h:333 [inline]
BUG: KASAN: use-after-free in put_dev drivers/usb/gadget/legacy/inode.c:159 [inline]
BUG: KASAN: use-after-free in gadgetfs_kill_sb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086
Write of size 4 at addr ffff8880276d7840 by task syz-executor126/18689
CPU: 0 PID: 18689 Comm: syz-executor126 Not tainted 6.1.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
...
atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:176 [inline]
__refcount_sub_and_test include/linux/refcount.h:272 [inline]
__refcount_dec_and_test include/linux/refcount.h:315 [inline]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
put_dev drivers/usb/gadget/legacy/inode.c:159 [inline]
gadgetfs_kill_sb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086
deactivate_locked_super+0xa7/0xf0 fs/super.c:332
vfs_get_super fs/super.c:1190 [inline]
get_tree_single+0xd0/0x160 fs/super.c:1207
vfs_get_tree+0x88/0x270 fs/super.c:1531
vfs_fsconfig_locked fs/fsopen.c:232 [inline]
The simplest solution is to ensure that gadgetfs_fill_super() and
gadgetfs_kill_sb() are serialized by making them both acquire a new
mutex.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+33d7ad66d65044b93f16@syzkaller.appspotmail.com
Reported-and-tested-by: Gerald Lee <sundaywind2004@gmail.com>
Link: https://lore.kernel.org/linux-usb/CAO3qeMVzXDP-JU6v1u5Ags6Q-bb35kg3=C6d04DjzA9ffa5x1g@mail.gmail.com/
Fixes: e5d82a7360 ("vfs: Convert gadgetfs to use the new mount API")
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Y6XCPXBpn3tmjdCC@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After doorbell DMA fetches the TRB. If during dequeuing request
driver changes NORMAL TRB to LINK TRB but doesn't delete it from
controller cache then controller will handle cached TRB and packet
can be lost.
The example scenario for this issue looks like:
1. queue request - set doorbell
2. dequeue request
3. send OUT data packet from host
4. Device will accept this packet which is unexpected
5. queue new request - set doorbell
6. Device lost the expected packet.
By setting DFLUSH controller clears DRDY bit and stop DMA transfer.
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
cc: <stable@vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20221115100039.441295-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika writes:
"thunderbolt: Fixes for v6.2-rc5
This includes fixes for:
- on-board retimer scan return value
- runtime PM during tb_retimer_scan()
- USB3 link rate calculation
- XDomain lane bonding.
All these have been in linux-next with no reported issues."
* tag 'thunderbolt-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Disable XDomain lane 1 only in software connection manager
thunderbolt: Use correct function to calculate maximum USB3 link rate
thunderbolt: Do not call PM runtime functions in tb_retimer_scan()
thunderbolt: Do not report errors if on-board retimers are found
This partially reverts commit f6d910a89a ("HID: usbhid: Add ALWAYS_POLL quirk
for some mice"), as it turns out to break reboot on some platforms for reason
yet to be understood.
Fixes: f6d910a89a ("HID: usbhid: Add ALWAYS_POLL quirk for some mice")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In a number of cases the driver assigns a default value of -1 to
priv->plat->phy_addr. This may result in calling mdiobus_get_phy()
with addr parameter being -1. Therefore check for this scenario and
bail out before calling mdiobus_get_phy().
Fixes: 42e87024f7 ("net: stmmac: Fix case when PHY handle is not present")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/669f9671-ecd1-a41b-2727-7b73e3003985@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The Acer Aspire 4810T predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.
Add a DMI quirk to use the native backlight interface which does
work properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a check for empty report_list in bigben_probe().
The missing check causes a type confusion when issuing a list_entry()
on an empty report_list.
The problem is caused by the assumption that the device must
have valid report_list. While this will be true for all normal HID
devices, a suitably malicious device can violate the assumption.
Fixes: 256a90ed9e ("HID: hid-bigbenff: driver for BigBen Interactive PS3OFMINIPAD gamepad")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add a check for empty report_list in hid_validate_values().
The missing check causes a type confusion when issuing a list_entry()
on an empty report_list.
The problem is caused by the assumption that the device must
have valid report_list. While this will be true for all normal HID
devices, a suitably malicious device can violate the assumption.
Fixes: 1b15d2e5b8 ("HID: core: fix validation of report id 0")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Packet len computed as difference of length word extracted from
skb data and four may result in a negative value. In such case
processing of the buffer should be interrupted rather than
setting sr_skb->len to an unexpectedly large value (due to cast
from signed to unsigned integer) and passing sr_skb to
usbnet_skb_return.
Fixes: e9da0b56fe ("sr9700: sanity check for packet length")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Link: https://lore.kernel.org/r/20230114182326.30479-1-szymon.heidrich@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When exception is triggered, code flow go handle_\exception in some
cases. One of stackframe in this case as follows,
high -> +-------+
| REGS | <- a pt_regs
| |
| | <- ex trigger
| REGS | <- ex pt_regs <-+
| | |
| | |
low -> +-------+ ->unwind-+
When unwinder unwinds to handler_\exception it cannot go on prologue
analysis. Because it is an asynchronous code flow, we should get the
next frame PC from regs->csr_era rather than regs->regs[1]. At init time
we copy the handlers to eentry and also copy them to NUMA-affine memory
named pcpu_handlers if NUMA is enabled. Thus, unwinder cannot unwind
normally. To solve this, we try to give some hints in handler_\exception
and fixup unwinders in unwind_next_frame().
Reported-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The prolugue unwinder rely on symbol info. When PC is not in kernel text
address, it cannot find relative symbol info and it will be broken. The
guess unwinder will be used in this case. And the guess unwinder code in
prolugue unwinder is redundant. Strip it out and set the unwinder type
in unwind_state. Make guess_unwinder::unwind_next_frame() as default way
when other unwinders cannot unwind in some extreme case.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The stack frame when function_graph enable like follows,
--------- <- function sp_on_entry
|
|
|
FAKE_RA <- sp_on_entry - sizeof(pt_regs) + PT_R1
|
--------- <- sp_on_entry - sizeof(pt_regs)
So if we want to get the &FAKE_RA we should get sp_on_entry first. In
the unwinder_prologue case, we can get the sp_on_entry as state->sp,
because we try to calculate each CFA and the ra saved address. But in
the unwinder_guess case, we cannot get it because we do not try to
calculate the CFA. Although LoongArch have not fixed frame, the $ra is
saved at CFA - 8 in most cases, we can try guess, too. As we store the
pc in state, we not need to dereference state->sp, too.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
At unwind_start(), it is better to get its frame info here rather than
get them outside, even we don't have 'regs'. In this way we can simply
use unwind_{start, next_frame, done} outside.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When state->first is not set, the PC is a return address in the previous
frame. We need to adjust its value in case overflow to the next symbol.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
There exists a common function sign_extend64() to sign extend a 64-bit
value using specified bit as sign-bit in include/linux/bitops.h, it is
more efficient, let us use it and remove the arch-specific sign_extend()
under arch/loongarch.
Suggested-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Pull misc hotfixes from Andrew Morton:
"21 hotfixes. Thirteen of these address pre-6.1 issues and hence have
the cc:stable tag"
* tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
init/Kconfig: fix typo (usafe -> unsafe)
nommu: fix split_vma() map_count error
nommu: fix do_munmap() error path
nommu: fix memory leak in do_mmap() error path
MAINTAINERS: update Robert Foss' email address
proc: fix PIE proc-empty-vm, proc-pid-vm tests
mm: update mmap_sem comments to refer to mmap_lock
include/linux/mm: fix release_pages_arg kernel doc comment
lib/win_minmax: use /* notation for regular comments
kasan: mark kasan_kunit_executing as static
nilfs2: fix general protection fault in nilfs_btree_insert()
Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
hugetlb: unshare some PMDs when splitting VMAs
mm: fix vma->anon_name memory leak for anonymous shmem VMAs
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
...
fscrypt.git is being renamed to linux.git, so update MAINTAINERS
accordingly. (The reasons for the rename are to match what I'm doing
for the new fsverity repo, which also involves the branch names changing
to be clearer; and to avoid ambiguity with userspace tools.)
As long as I'm updating the fscrypt MAINTAINERS entry anyway, also:
- Move my name to the top, so that people bother me first if they just
choose the first person. (In practice I'm the primary maintainer, and
Ted and Jaegeuk are backups.)
- Remove an unnecessary wildcard.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230116233424.65657-1-ebiggers@kernel.org
David reported that the recent PCI/MSI rework results in MSI descriptor
leakage under XEN.
This is caused by:
1) The missing MSI_FLAG_FREE_MSI_DESCS flag in the XEN MSI domain info,
which is required now that PCI/MSI delegates descriptor freeing to
the core MSI code.
2) Not disassociating the interrupts on teardown, by setting the
msi_desc::irq to 0. This was not required before because the teardown
was unconditional and did not check whether a MSI descriptor was still
connected to a Linux interrupt.
On further inspection it came to light that the MSI_FLAG_DEV_SYSFS is
missing in the XEN MSI domain info as well to restore the pre 6.2 status
quo.
Add the missing MSI flags and disassociate the MSI descriptor from the
Linux interrupt in the XEN specific teardown function.
Fixes: b2bdda205c ("PCI/MSI: Let the MSI core free descriptors")
Fixes: 2f2940d168 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Fixes: ffd84485e6 ("PCI/MSI: Let the irq code handle sysfs groups")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/871qnunycr.ffs@tglx
The Xen MSI → PIRQ magic does support MSI-X, so advertise it.
(In fact it's better off with MSI-X than MSI, because it's actually
broken by design for 32-bit MSI, since it puts the high bits of the
PIRQ# into the high 32 bits of the MSI message address, instead of the
Extended Destination ID field which is in bits 4-11.
Strictly speaking, this really fixes a much older commit 2e4386eba0
("x86/xen: Wrap XEN MSI management into irqdomain") which failed to set
the flag. But that never really mattered until __pci_enable_msix_range()
started to check and bail out early. So in 6.2-rc we see failures e.g.
to bring up networking on an Amazon EC2 m4.16xlarge instance:
[ 41.498694] ena 0000:00:03.0 (unnamed net_device) (uninitialized): Failed to enable MSI-X. irq_cnt -524
[ 41.498705] ena 0000:00:03.0: Can not reserve msix vectors
[ 41.498712] ena 0000:00:03.0: Failed to enable and set the admin interrupts
Side note: This is the first bug found, and first patch tested, by running
Xen guests under QEMU/KVM instead of running under actual Xen.
Fixes: 99f3d27976 ("PCI/MSI: Reject MSI-X early")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/4bffa69a949bfdc92c4a18e5a1c3cbb3b94a0d32.camel@infradead.org
If we have one task trying to start the quota rescan worker while another
one is trying to disable quotas, we can end up hitting a race that results
in the quota rescan worker doing a NULL pointer dereference. The steps for
this are the following:
1) Quotas are enabled;
2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan().
It calls qgroup_rescan_init() which returns 0 (success) and then joins a
transaction and commits it;
3) Task B calls the quota disable ioctl and enters btrfs_quota_disable().
It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls
btrfs_qgroup_wait_for_completion(), which returns immediately since the
rescan worker is not yet running.
Then it starts a transaction and locks fs_info->qgroup_ioctl_lock;
4) Task A queues the rescan worker, by calling btrfs_queue_work();
5) The rescan worker starts, and calls rescan_should_stop() at the start
of its while loop, which results in 0 iterations of the loop, since
the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by
task B at step 3);
6) Task B sets fs_info->quota_root to NULL;
7) The rescan worker tries to start a transaction and uses
fs_info->quota_root as the root argument for btrfs_start_transaction().
This results in a NULL pointer dereference down the call chain of
btrfs_start_transaction(). The stack trace is something like the one
reported in Link tag below:
general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676f216 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: btrfs-qgroup-rescan btrfs_work_helper
RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564
Code: 48 89 fb 48 (...)
RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206
RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d
R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402
btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Modules linked in:
So fix this by having the rescan worker function not attempt to start a
transaction if it didn't do any rescan work.
Reported-by: syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/
Fixes: e804861bd4 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
During lseek, for SEEK_DATA and SEEK_HOLE modes, we access the disk_bytenr
of an extent without checking its type. However inline extents have their
data starting the offset of the disk_bytenr field, so accessing that field
when we have an inline extent can result in either of the following:
1) Interpret the inline extent's data as a disk_bytenr value;
2) In case the inline data is less than 8 bytes, we access part of some
other item in the leaf, or unused space in the leaf;
3) In case the inline data is less than 8 bytes and the extent item is
the first item in the leaf, we can access beyond the leaf's limit.
So fix this by not accessing the disk_bytenr field if we have an inline
extent.
Fixes: b6e833567e ("btrfs: make hole and data seeking a lot more efficient")
Reported-by: Matthias Schoepfer <matthias.schoepfer@googlemail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216908
Link: https://lore.kernel.org/linux-btrfs/7f25442f-b121-2a3a-5a3d-22bcaae83cd4@leemhuis.info/
CC: stable@vger.kernel.org # 6.1
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
write_one_page is an awkward interface that expects the page locked and
->writepage to be implemented. Replace that by zeroing the signature
bytes and synchronize the block device page using the proper bdev
helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_scratch_superblocks open codes scratching super block of a
non-zoned super block. Split the code to read, zero and write the
superblock for regular devices into a separate helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Pull btrfs fixes from David Sterba:
"Another batch of fixes, dealing with fallouts from 6.1 reported by
users:
- tree-log fixes:
- fix directory logging due to race with concurrent index key
deletion
- fix missing error handling when logging directory items
- handle case of conflicting inodes being added to the log
- remove transaction aborts for not so serious errors
- fix qgroup accounting warning when rescan can be started at time
with temporarily disable accounting
- print more specific errors to system log when device scan ioctl
fails
- disable space overcommit for ZNS devices, causing heavy performance
drop"
* tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not abort transaction on failure to update log root
btrfs: do not abort transaction on failure to write log tree when syncing log
btrfs: add missing setup of log for full commit at add_conflicting_inode()
btrfs: fix directory logging due to race with concurrent index key deletion
btrfs: fix missing error handling when logging directory items
btrfs: zoned: enable metadata over-commit for non-ZNS setup
btrfs: qgroup: do not warn on record without old_roots populated
btrfs: add extra error messages to cover non-ENOMEM errors from device_add_list()
Baoquan reported that after triggering a crash the subsequent crash-kernel
fails to boot about half of the time. It triggers a NULL pointer
dereference in the periodic tick code.
This happens because the legacy timer interrupt (IRQ0) is resent in
software which happens in soft interrupt (tasklet) context. In this context
get_irq_regs() returns NULL which leads to the NULL pointer dereference.
The reason for the resend is a spurious APIC interrupt on the IRQ0 vector
which is captured and leads to a resend when the legacy timer interrupt is
enabled. This is wrong because the legacy PIC interrupts are level
triggered and therefore should never be resent in software, but nothing
ever sets the IRQ_LEVEL flag on those interrupts, so the core code does not
know about their trigger type.
Ensure that IRQ_LEVEL is set when the legacy PCI interrupts are set up.
Fixes: a4633adcdb ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/87mt6rjrra.ffs@tglx
The last_pg is wrong, it is actually the first page of the last
scatterlist element. To get the last page of the last scatterlist element
we have to add prv->length. So it is checking mergability against the
wrong page, Further, a SG element is not guaranteed to end on a page
boundary, so we have to check the sub page location also for merge
eligibility.
Fix the above by checking physical contiguity based on PFNs, compute the
actual last page and then call pages_are_mergable().
Fixes: 1567b49d1a ("lib/scatterlist: add check when merging zone device pages")
Link: https://lore.kernel.org/r/20230111101054.188136-1-yishaih@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The revert of the removal of this driver happened after we fixed up
the split limits for NOWAIT issue, hence it got missed. Ensure that
we check for a NULL bio after splitting, in case it should be retried.
Marking this as fixing both commits, so that stable backport will do
this correctly.
Cc: stable@vger.kernel.org
Fixes: 9cea62b2cb ("block: don't allow splitting of a REQ_NOWAIT bio")
Fixes: 4b83e99ee7 ("Revert "pktcdvd: remove driver."")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several mutexes are taken while setting up console serial ports. In
particular, the tty_port->mutex and @console_mutex are taken:
serial_pnp_probe
serial8250_register_8250_port
uart_add_one_port (locks tty_port->mutex)
uart_configure_port
register_console (locks @console_mutex)
In order to synchronize kgdb's tty_find_polling_driver() with
register_console(), commit 6193bc9084 ("tty: serial: kgdboc:
synchronize tty_find_polling_driver() and register_console()") takes
the @console_mutex. However, this leads to the following call chain
(with locking):
platform_probe
kgdboc_probe
configure_kgdboc (locks @console_mutex)
tty_find_polling_driver
uart_poll_init (locks tty_port->mutex)
uart_set_options
This is clearly deadlock potential due to the reverse lock ordering.
Since uart_set_options() requires holding @console_mutex in order to
serialize early initialization of the serial-console lock, take the
@console_mutex in uart_poll_init() instead of configure_kgdboc().
Since configure_kgdboc() was using @console_mutex for safe traversal
of the console list, change it to use the SRCU iterator instead.
Add comments to uart_set_options() kerneldoc mentioning that it
requires holding @console_mutex (aka the console_list_lock).
Fixes: 6193bc9084 ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Export console_srcu_read_lock_is_held() to fix build kgdboc as a module.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230112161213.1434854-1-john.ogness@linutronix.de
The EFI runtime services run from a dedicated stack now, and so the
stack unwinder needs to be informed about this.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Comparing current_work() against efi_rts_work.work is sufficient to
decide whether current is currently running EFI runtime services code at
any level in its call stack.
However, there are other potential users of the EFI runtime stack, such
as the ACPI subsystem, which may invoke efi_call_virt_pointer()
directly, and so any sync exceptions occurring in firmware during those
calls are currently misidentified.
So instead, let's check whether the stashed value of the thread stack
pointer points into current's thread stack. This can only be the case if
current was interrupted while running EFI runtime code. Note that this
implies that we should clear the stashed value after switching back, to
avoid false positives.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cong Wang says:
====================
l2tp: fix race conditions in l2tp_tunnel_register()
This patchset contains two patches, the first one is a preparation for
the second one which is the actual fix. Please find more details in
each patch description.
I have ran the l2tp test (https://github.com/katalix/l2tp-ktest),
all test cases are passed.
v3: preserve EEXIST errno for user-space
v2: move IDR allocation to l2tp_tunnel_register()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp uses l2tp_tunnel_list to track all registered tunnels and
to allocate tunnel ID's. IDR can do the same job.
More importantly, with IDR we can hold the ID before a successful
registration so that we don't need to worry about late error
handling, it is not easy to rollback socket changes.
This is a preparation for the following fix.
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Guillaume Nault <gnault@redhat.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a nasty crash [1] in net_tx_action() which
made little sense until we got a repro.
This repro installs a taprio qdisc, but providing an
invalid TCA_RATE attribute.
qdisc_create() has to destroy the just initialized
taprio qdisc, and taprio_destroy() is called.
However, the hrtimer used by taprio had already fired,
therefore advance_sched() called __netif_schedule().
Then net_tx_action was trying to use a destroyed qdisc.
We can not undo the __netif_schedule(), so we must wait
until one cpu serviced the qdisc before we can proceed.
Many thanks to Alexander Potapenko for his help.
[1]
BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
__raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
_raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
spin_trylock include/linux/spinlock.h:359 [inline]
qdisc_run_begin include/net/sch_generic.h:187 [inline]
qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
__do_softirq+0x1cc/0x7fb kernel/softirq.c:571
run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
kthread+0x31b/0x430 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
Uninit was created at:
slab_post_alloc_hook mm/slab.h:732 [inline]
slab_alloc_node mm/slub.c:3258 [inline]
__kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
alloc_skb include/linux/skbuff.h:1257 [inline]
nlmsg_new include/net/netlink.h:953 [inline]
netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
__sys_sendmsg net/socket.c:2565 [inline]
__do_sys_sendmsg net/socket.c:2574 [inline]
__se_sys_sendmsg net/socket.c:2572 [inline]
__x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pable Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Increase timeout to 120 seconds for netfilter selftests to fix
nftables transaction tests, from Florian Westphal.
2) Fix overflow in bitmap_ip_create() due to integer arithmetics
in a 64-bit bitmask, from Gavrilov Ilia.
3) Fix incorrect arithmetics in nft_payload with double-tagged
vlan matching.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct queue statistics reading. All queue statistics are stored as unsigned
long values. The retrieval for ethtool fetches these values as u64. However, on
some systems the size of the counters are 32 bit. That yields wrong queue
statistic counters e.g., on arm32 systems such as the stm32mp157. Fix it by
using the correct data type.
Tested on Olimex STMP157-OLinuXino-LIME2 by simple running linuxptp for a short
period of time:
Non-patched kernel:
|root@st1:~# ethtool -S eth0 | grep q0
| q0_tx_pkt_n: 3775276254951 # ???
| q0_tx_irq_n: 879
| q0_rx_pkt_n: 1194000908909 # ???
| q0_rx_irq_n: 278
Patched kernel:
|root@st1:~# ethtool -S eth0 | grep q0
| q0_tx_pkt_n: 2434
| q0_tx_irq_n: 1274
| q0_rx_pkt_n: 1604
| q0_rx_irq_n: 846
Fixes: 68e9c5dee1 ("net: stmmac: add ethtool per-queue statistic framework")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com>
Cc: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading pinconf-pins from debugfs it fails to get the configured pull
type on RK3568, "unsupported pinctrl type" error messages is also reported.
Fix this by adding support for RK3568 in rockchip_get_pull, including a
reverse of the pull-up value swap applied in rockchip_set_pull so that
pull-up is correctly reported in pinconf-pins.
Also update the workaround comment to reflect affected pins, GPIO0_D3-D6.
Fixes: c0dadc0e47 ("pinctrl: rockchip: add support for rk3568")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Jianqun Xu <jay.xu@rock-chips.com>
Link: https://lore.kernel.org/r/20230110172955.1258840-1-jonas@kwiboo.se
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Since resplen and respoffs are signed integers sufficiently
large values of unsigned int len and offset members of RNDIS
response will result in negative values of prior variables.
This may be utilized to bypass implemented security checks
to either extract memory contents by manipulating offset or
overflow the data buffer via memcpy by manipulating both
offset and len.
Additionally assure that sum of resplen and respoffs does not
overflow so buffer boundaries are kept.
Fixes: 80f8c5b434 ("rndis_wlan: copy only useful data from rndis_command respond")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230111175031.7049-1-szymon.heidrich@gmail.com
An issue was reported in which periodically error messages are
printed in the kernel log:
[ 26.303445] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[ 26.303554] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43455-sdio.raspberrypi,3-model-b-plus.bin failed with error -2
[ 26.516752] brcmfmac_wcc: brcmf_wcc_attach: executing
[ 26.528264] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Jan 4 2021 19:56:29 version 7.45.229 (617f1f5 CY) FWID 01-2dbd9d2e
[ 27.076829] Bluetooth: hci0: BCM: features 0x2f
[ 27.078592] Bluetooth: hci0: BCM43455 37.4MHz Raspberry Pi 3+
[ 27.078601] Bluetooth: hci0: BCM4345C0 (003.001.025) build 0342
[ 30.142104] Adding 102396k swap on /var/swap. Priority:-2 extents:1 across:102396k SS
[ 30.590017] Bluetooth: MGMT ver 1.22
[ 104.897615] brcmfmac: cfg80211_set_channel: set chanspec 0x100e fail, reason -52
[ 104.897992] brcmfmac: cfg80211_set_channel: set chanspec 0xd022 fail, reason -52
[ 105.007672] brcmfmac: cfg80211_set_channel: set chanspec 0xd026 fail, reason -52
[ 105.117654] brcmfmac: cfg80211_set_channel: set chanspec 0xd02a fail, reason -52
[ 105.227636] brcmfmac: cfg80211_set_channel: set chanspec 0xd02e fail, reason -52
[ 106.987552] brcmfmac: cfg80211_set_channel: set chanspec 0xd090 fail, reason -52
[ 106.987911] brcmfmac: cfg80211_set_channel: set chanspec 0xd095 fail, reason -52
[ 106.988233] brcmfmac: cfg80211_set_channel: set chanspec 0xd099 fail, reason -52
[ 106.988565] brcmfmac: cfg80211_set_channel: set chanspec 0xd09d fail, reason -52
[ 106.988909] brcmfmac: cfg80211_set_channel: set chanspec 0xd0a1 fail, reason -52
This happens in brcmf_cfg80211_dump_survey() because we try a disabled
channel. When channel is marked as disabled we do not need to fill any
other info so bail out.
Fixes: 6c04deae14 ("brcmfmac: Add dump_survey cfg80211 ops for HostApd AutoChannelSelection")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230103124117.271988-2-arend.vanspriel@broadcom.com
mvebu fixes for 6.2 (part 1)
Fix regression for gpio support on Armada 38x and Armada 38x
Fix address for UART1 on AC5/AC5X
* tag 'mvebu-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
arm64: dts: marvell: AC5/AC5X: Fix address for UART1
Revert "ARM: dts: armada-39x: Fix compatible string for gpios"
Revert "ARM: dts: armada-38x: Fix compatible string for gpios"
Link: https://lore.kernel.org/r/87mt6mg08k.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Always configure GPIO pins which are used as interrupt source as INPUTs.
In case the default pin configuration is OUTPUT, or the prior stage does
configure the pins as OUTPUT, then Linux will not reconfigure the pin as
INPUT and no interrupts are received.
Always configure the interrupt source GPIO pin as input to fix the above case.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Fixes: 07bd1a6cc7 ("MXC arch: Add gpio support for the whole platform")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The driver currently performs register read-modify-write without locking
in its irqchip part, this could lead to a race condition when configuring
interrupt mode setting. Add the missing bgpio spinlock lock/unlock around
the register read-modify-write.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Fixes: 07bd1a6cc7 ("MXC arch: Add gpio support for the whole platform")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Once disable_freq_invariance_work is called the scale_freq_tick function
will not compute or update the arch_freq_scale values.
However the scheduler will still read these values and use them.
The result is that the scheduler might perform unfair decisions based on stale
values.
This patch adds the step of setting the arch_freq_scale values for all
cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus
have the same arch_freq_scale value the scaling is meaningless.
Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
The kernel commit 9a5418bc48 ("sched/core: Use kfree_rcu() in
do_set_cpus_allowed()") introduces a bug for kernels built with non-SMP
configs. Calling sched_setaffinity() on such a uniprocessor kernel will
cause cpumask_copy() to be called with a NULL pointer leading to general
protection fault. This is not really a problem in real use cases as
there aren't that many uniprocessor kernel configs in use and calling
sched_setaffinity() on such a uniprocessor system doesn't make sense.
Fix this problem by making sure cpumask_copy() will not be called in
such a case.
Fixes: 9a5418bc48 ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()")
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230115193122.563036-1-longman@redhat.com
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.
In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.
While at it, include the corresponding header file (<linux/kstrtox.h>)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Helge Deller <deller@gmx.de>
The updating of 'bfqg->ref' should be protected by 'bfqd->lock', however,
during code review, we found that bfq_pd_free() update 'bfqg->ref'
without holding the lock, which is problematic:
1) bfq_pd_free() triggered by removing cgroup is called asynchronously;
2) bfqq will grab bfqg reference, and exit bfqq will drop the reference,
which can concurrent with 1).
Unfortunately, 'bfqd->lock' can't be held here because 'bfqd' might already
be freed in bfq_pd_free(). Fix the problem by using atomic refcount apis.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230103084755.1256479-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using REQ_OP_ZONE_APPEND operations for synchronous writes to sequential
files succeeds regardless of the zone write pointer position, as long as
the target zone is not full. This means that if an external (buggy)
application writes to the zone of a sequential file underneath the file
system, subsequent file write() operation will succeed but the file size
will not be correct and the file will contain invalid data written by
another application.
Modify zonefs_file_dio_append() to check the written sector of an append
write (returned in bio->bi_iter.bi_sector) and return -EIO if there is a
mismatch with the file zone wp offset field. This change triggers a call
to zonefs_io_error() and a zone check. Modify zonefs_io_error_cb() to
not expose the unexpected data after the current inode size when the
errors=remount-ro mode is used. Other error modes are correctly handled
already.
Fixes: 02ef12a663 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Peek at old qdisc and graft only when deleting a leaf class in the htb,
rather than when deleting the htb itself. Do not peek at the qdisc of the
netdev queue when destroying the htb. The caller may already have grafted a
new qdisc that is not part of the htb structure being destroyed.
This fix resolves two use cases.
1. Using tc to destroy the htb.
- Netdev was being prematurely activated before the htb was fully
destroyed.
2. Using tc to replace the htb with another qdisc (which also leads to
the htb being destroyed).
- Premature netdev activation like previous case. Newly grafted qdisc
was also getting accidentally overwritten when destroying the htb.
Fixes: d03b195b5a ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230113005528.302625-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: userspace pm: create sockets for the right family
Before these patches, the Userspace Path Manager would allow the
creation of subflows with wrong families: taking the one of the MPTCP
socket instead of the provided ones and resulting in the creation of
subflows with likely not the right source and/or destination IPs. It
would also allow the creation of subflows between different families or
not respecting v4/v6-only socket attributes.
Patch 1 lets the userspace PM select the proper family to avoid creating
subflows with the wrong source and/or destination addresses because the
family is not the expected one.
Patch 2 makes sure the userspace PM doesn't allow the userspace to
create subflows for a family that is not allowed.
Patch 3 validates scenarios with a mix of v4 and v6 subflows for the
same MPTCP connection.
These patches fix issues introduced in v5.19 when the userspace path
manager has been introduced.
====================
Link: https://lore.kernel.org/r/20230112-upstream-net-20230112-netlink-v4-v6-v1-0-6a8363a221d2@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MPTCP protocol supports having subflows in both IPv4 and IPv6. In Linux,
it is possible to have that if the MPTCP socket has been created with
AF_INET6 family without the IPV6_V6ONLY option.
Here, a new IPv4 subflow is being added to the initial IPv6 connection,
then being removed using Netlink commands.
Cc: stable@vger.kernel.org # v5.19+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY
option has been set, the userspace PM would allow creating subflows
using IPv4 addresses, e.g. mapped in v6.
The kernel side of userspace PM will also accept creating subflows with
local and remote addresses having different families. Depending on the
subflow socket's family, different behaviours are expected:
- If AF_INET is forced with a v6 address, the kernel will take the last
byte of the IP and try to connect to that: a new subflow is created
but to a non expected address.
- If AF_INET6 is forced with a v4 address, the kernel will try to
connect to a v4 address (v4-mapped-v6). A -EBADF error from the
connect() part is then expected.
It is then required to check the given families can be accepted. This is
done by using a new helper for addresses family matching, taking care of
IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by
the in-kernel path-manager to use mixed IPv4 and IPv6 addresses.
While at it, a clear error message is now reported if there are some
conflicts with the families that have been passed by the userspace.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Let the caller specify the to-be-created subflow family.
For a given MPTCP socket created with the AF_INET6 family, the current
userspace PM can already ask the kernel to create subflows in v4 and v6.
If "plain" IPv4 addresses are passed to the kernel, they are
automatically mapped in v6 addresses "by accident". This can be
problematic because the userspace will need to pass different addresses,
now the v4-mapped-v6 addresses to destroy this new subflow.
On the other hand, if the MPTCP socket has been created with the AF_INET
family, the command to create a subflow in v6 will be accepted but the
result will not be the one as expected as new subflow will be created in
IPv4 using part of the v6 addresses passed to the kernel: not creating
the expected subflow then.
No functional change intended for the in-kernel PM where an explicit
enforcement is currently in place. This arbitrary enforcement will be
leveraged by other patches in a future version.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This lockdep splat says it better than I could:
================================
WARNING: inconsistent lock state
6.2.0-rc2-07010-ga9b9500ffaac-dirty #967 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0
{IN-SOFTIRQ-W} state was registered at:
_raw_spin_lock+0x5c/0xc0
sch_direct_xmit+0x148/0x37c
__dev_queue_xmit+0x528/0x111c
ip6_finish_output2+0x5ec/0xb7c
ip6_finish_output+0x240/0x3f0
ip6_output+0x78/0x360
ndisc_send_skb+0x33c/0x85c
ndisc_send_rs+0x54/0x12c
addrconf_rs_timer+0x154/0x260
call_timer_fn+0xb8/0x3a0
__run_timers.part.0+0x214/0x26c
run_timer_softirq+0x3c/0x74
__do_softirq+0x14c/0x5d8
____do_softirq+0x10/0x20
call_on_irq_stack+0x2c/0x5c
do_softirq_own_stack+0x1c/0x30
__irq_exit_rcu+0x168/0x1a0
irq_exit_rcu+0x10/0x40
el1_interrupt+0x38/0x64
irq event stamp: 7825
hardirqs last enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130
hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8
softirqs last enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8
softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(_xmit_ETHER#2);
<Interrupt>
lock(_xmit_ETHER#2);
*** DEADLOCK ***
3 locks held by kworker/1:3/179:
#0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
#1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
#2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34
Workqueue: events enetc_tx_onestep_tstamp
Call trace:
print_usage_bug.part.0+0x208/0x22c
mark_lock+0x7f0/0x8b0
__lock_acquire+0x7c4/0x1ce0
lock_acquire.part.0+0xe0/0x220
lock_acquire+0x68/0x84
_raw_spin_lock+0x5c/0xc0
netif_freeze_queues+0x5c/0xc0
netif_tx_lock+0x24/0x34
enetc_tx_onestep_tstamp+0x20/0x100
process_one_work+0x28c/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x11c
but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in
process context, therefore with softirqs enabled (i.o.w., it can be
interrupted by a softirq). If we hold the netif_tx_lock() when there is
an interrupt, and the NET_TX softirq then gets scheduled, this will take
the netif_tx_lock() a second time and deadlock the kernel.
To solve this, use netif_tx_lock_bh(), which blocks softirqs from
running.
Fixes: 7294380c52 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230112105440.1786799-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If uhdlc_priv_tsa != 1 then utdm is not initialized.
And if ret != NULL then goto undo_uhdlc_init, where
utdm is dereferenced. Same if dev == NULL.
Found by Astra Linux on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 8d68100ab4 ("soc/fsl/qe: fix err handling of ucc_of_parse_tdm")
Signed-off-by: Esina Ekaterina <eesina@astralinux.ru>
Link: https://lore.kernel.org/r/20230112074703.13558-1-eesina@astralinux.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a use-after-free that occurs in kfree_skb() called from
local_cleanup(). This could happen when killing nfc daemon (e.g. neard)
after detaching an nfc device.
When detaching an nfc device, local_cleanup() called from
nfc_llcp_unregister_device() frees local->rx_pending and decreases
local->ref by kref_put() in nfc_llcp_local_put().
In the terminating process, nfc daemon releases all sockets and it leads
to decreasing local->ref. After the last release of local->ref,
local_cleanup() called from local_release() frees local->rx_pending
again, which leads to the bug.
Setting local->rx_pending to NULL in local_cleanup() could prevent
use-after-free when local_cleanup() is called twice.
Found by a modified version of syzkaller.
BUG: KASAN: use-after-free in kfree_skb()
Call Trace:
dump_stack_lvl (lib/dump_stack.c:106)
print_address_description.constprop.0.cold (mm/kasan/report.c:306)
kasan_check_range (mm/kasan/generic.c:189)
kfree_skb (net/core/skbuff.c:955)
local_cleanup (net/nfc/llcp_core.c:159)
nfc_llcp_local_put.part.0 (net/nfc/llcp_core.c:172)
nfc_llcp_local_put (net/nfc/llcp_core.c:181)
llcp_sock_destruct (net/nfc/llcp_sock.c:959)
__sk_destruct (net/core/sock.c:2133)
sk_destruct (net/core/sock.c:2181)
__sk_free (net/core/sock.c:2192)
sk_free (net/core/sock.c:2203)
llcp_sock_release (net/nfc/llcp_sock.c:646)
__sock_release (net/socket.c:650)
sock_close (net/socket.c:1365)
__fput (fs/file_table.c:306)
task_work_run (kernel/task_work.c:179)
ptrace_notify (kernel/signal.c:2354)
syscall_exit_to_user_mode_prepare (kernel/entry/common.c:278)
syscall_exit_to_user_mode (kernel/entry/common.c:296)
do_syscall_64 (arch/x86/entry/common.c:86)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:106)
Allocated by task 4719:
kasan_save_stack (mm/kasan/common.c:45)
__kasan_slab_alloc (mm/kasan/common.c:325)
slab_post_alloc_hook (mm/slab.h:766)
kmem_cache_alloc_node (mm/slub.c:3497)
__alloc_skb (net/core/skbuff.c:552)
pn533_recv_response (drivers/nfc/pn533/usb.c:65)
__usb_hcd_giveback_urb (drivers/usb/core/hcd.c:1671)
usb_giveback_urb_bh (drivers/usb/core/hcd.c:1704)
tasklet_action_common.isra.0 (kernel/softirq.c:797)
__do_softirq (kernel/softirq.c:571)
Freed by task 1901:
kasan_save_stack (mm/kasan/common.c:45)
kasan_set_track (mm/kasan/common.c:52)
kasan_save_free_info (mm/kasan/genericdd.c:518)
__kasan_slab_free (mm/kasan/common.c:236)
kmem_cache_free (mm/slub.c:3809)
kfree_skbmem (net/core/skbuff.c:874)
kfree_skb (net/core/skbuff.c:931)
local_cleanup (net/nfc/llcp_core.c:159)
nfc_llcp_unregister_device (net/nfc/llcp_core.c:1617)
nfc_unregister_device (net/nfc/core.c:1179)
pn53x_unregister_nfc (drivers/nfc/pn533/pn533.c:2846)
pn533_usb_disconnect (drivers/nfc/pn533/usb.c:579)
usb_unbind_interface (drivers/usb/core/driver.c:458)
device_release_driver_internal (drivers/base/dd.c:1279)
bus_remove_device (drivers/base/bus.c:529)
device_del (drivers/base/core.c:3665)
usb_disable_device (drivers/usb/core/message.c:1420)
usb_disconnect (drivers/usb/core.c:2261)
hub_event (drivers/usb/core/hub.c:5833)
process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/workqueue.h:108 kernel/workqueue.c:2281)
worker_thread (include/linux/list.h:282 kernel/workqueue.c:2423)
kthread (kernel/kthread.c:319)
ret_from_fork (arch/x86/entry/entry_64.S:301)
Fixes: 3536da06db ("NFC: llcp: Clean local timers and works when removing a device")
Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Link: https://lore.kernel.org/r/20230111131914.3338838-1-jisoo.jang@yonsei.ac.kr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To mitigate Spectre v4, 2039f26f3a ("bpf: Fix leakage due to
insufficient speculative store bypass mitigation") inserts lfence
instructions after 1) initializing a stack slot and 2) spilling a
pointer to the stack.
However, this does not cover cases where a stack slot is first
initialized with a pointer (subject to sanitization) but then
overwritten with a scalar (not subject to sanitization because
the slot was already initialized). In this case, the second write
may be subject to speculative store bypass (SSB) creating a
speculative pointer-as-scalar type confusion. This allows the
program to subsequently leak the numerical pointer value using,
for example, a branch-based cache side channel.
To fix this, also sanitize scalars if they write a stack slot
that previously contained a pointer. Assuming that pointer-spills
are only generated by LLVM on register-pressure, the performance
impact on most real-world BPF programs should be small.
The following unprivileged BPF bytecode drafts a minimal exploit
and the mitigation:
[...]
// r6 = 0 or 1 (skalar, unknown user input)
// r7 = accessible ptr for side channel
// r10 = frame pointer (fp), to be leaked
//
r9 = r10 # fp alias to encourage ssb
*(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked
// lfence added here because of pointer spill to stack.
//
// Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor
// for no r9-r10 dependency.
//
*(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr
// 2039f26f3a: no lfence added because stack slot was not STACK_INVALID,
// store may be subject to SSB
//
// fix: also add an lfence when the slot contained a ptr
//
r8 = *(u64 *)(r9 - 8)
// r8 = architecturally a scalar, speculatively a ptr
//
// leak ptr using branch-based cache side channel:
r8 &= 1 // choose bit to leak
if r8 == 0 goto SLOW // no mispredict
// architecturally dead code if input r6 is 0,
// only executes speculatively iff ptr bit is 1
r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast)
SLOW:
[...]
After running this, the program can time the access to *(r7 + 0) to
determine whether the chosen pointer bit was 0 or 1. Repeat this 64
times to recover the whole address on amd64.
In summary, sanitization can only be skipped if one scalar is
overwritten with another scalar. Scalar-confusion due to speculative
store bypass can not lead to invalid accesses because the pointer
bounds deducted during verification are enforced using branchless
logic. See 979d63d50c ("bpf: prevent out of bounds speculation on
pointer arithmetic") for details.
Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks}
because speculative leaks are likely unexpected if these were enabled.
For example, leaking the address to a protected log file may be acceptable
while disabling the mitigation might unintentionally leak the address
into the cached-state of a map that is accessible to unprivileged
processes.
Fixes: 2039f26f3a ("bpf: Fix leakage due to insufficient speculative store bypass mitigation")
Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Henriette Hofmeier <henriette.hofmeier@rub.de>
Link: https://lore.kernel.org/bpf/edc95bad-aada-9cfc-ffe2-fa9bb206583c@cs.fau.de
Link: https://lore.kernel.org/bpf/20230109150544.41465-1-gerhorst@cs.fau.de
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
65us is not a reasonable maximum for this property, as some devices
might need a much longer setup time (e.g. those driven by firmware on
the other end). Plus, device tree property values are in 32-bit cells
and smaller widths should not be used without good reason.
Also move the logic to a helper function, since this will later be used
to parse other CS delay properties too.
Fixes: 33a2fde5f7 ("spi: Introduce spi-cs-setup-ns property")
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20230113102309.18308-2-marcan@marcan.st
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently qconf-cfg.sh is the only script that touches the "-bin"
target, even though all of the conf_cfg rules declare that they do.
Make the recipe unconditionally touch all declared targets to avoid
incompatibilities with upcoming versions of GNU make:
https://lists.gnu.org/archive/html/info-gnu/2022-10/msg00008.html
e.g.
scripts/kconfig/Makefile:215: warning: pattern recipe did not update peer target 'scripts/kconfig/nconf-bin'.
scripts/kconfig/Makefile:215: warning: pattern recipe did not update peer target 'scripts/kconfig/mconf-bin'.
scripts/kconfig/Makefile:215: warning: pattern recipe did not update peer target 'scripts/kconfig/gconf-bin'.
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Current code for RSS_GET ethtool command includes netlink attributes
in reply message to user space even if they are null. Added checks
to include netlink attribute in reply message only if a value is
received from driver. Drivers might return null for RSS indirection
table or hash key. Instead of including attributes with empty value
in the reply message, add netlink attribute only if there is content.
Fixes: 7112a04664 ("ethtool: add netlink based get rss support")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20230111235607.85509-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raju Rangoju says:
====================
amd-xgbe: PFC and KR-Training fixes
This patch series fixes the issues in kr-training and pfc
0001 - There is difference in the TX Flow Control registers (TFCR)
between the revisions of the hardware. Update the driver to use the
TFCR based on the reported version of the hardware.
0002 - AN restart triggered during KR training not only aborts the KR
training process but also move the HW to unstable state. Add the
necessary changes to fix kr-taining.
====================
Link: https://lore.kernel.org/r/20230111172852.1875384-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
AN restart triggered during KR training not only aborts the KR training
process but also move the HW to unstable state. Driver has to wait upto
500ms or until the KR training is completed before restarting AN cycle.
Fixes: 7c12aa0877 ("amd-xgbe: Move the PHY support into amd-xgbe")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is difference in the TX Flow Control registers (TFCR) between the
revisions of the hardware. The older revisions of hardware used to have
single register per queue. Whereas, the newer revision of hardware (from
ver 30H onwards) have one register per priority.
Update the driver to use the TFCR based on the reported version of the
hardware.
Fixes: c5aa9e3b81 ("amd-xgbe: Initial AMD 10GbE platform driver")
Co-developed-by: Ajith Nayak <Ajith.Nayak@amd.com>
Signed-off-by: Ajith Nayak <Ajith.Nayak@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Some fixes, stack only for now:
* iTXQ conversion fixes, various bugs reported
* properly reset multiple BSSID settings
* fix for a link_sta crash
* fix for AP VLAN checks
* fix for MLO address translation
* tag 'wireless-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix MLO + AP_VLAN check
mac80211: Fix MLO address translation for multiple bss case
wifi: mac80211: reset multiple BSSID options in stop_ap()
wifi: mac80211: Fix iTXQ AMPDU fragmentation handling
wifi: mac80211: sdata can be NULL during AMPDU start
wifi: mac80211: Proper mark iTXQs for resumption
wifi: mac80211: fix initialization of rx->link and rx->link_sta
====================
Link: https://lore.kernel.org/r/20230112111941.82408-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We are missing a ) when we attempt to complain about not having enough
configuration for clang, resulting in the rather inscrutable error:
../lib.mk:23: *** unterminated call to function 'error': missing ')'. Stop.
Add the required ) so we print the message we were trying to print.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit fb541ca4c3 ("md: remove lock_bdev / unlock_bdev") removes
wrappers for blkdev_get/blkdev_put. However, the uninitialized local
static variable of pointer type 'claim_rdev' in md_import_device()
is NULL, which leads to the following warning call trace:
WARNING: CPU: 22 PID: 1037 at block/bdev.c:577 bd_prepare_to_claim+0x131/0x150
CPU: 22 PID: 1037 Comm: mdadm Not tainted 6.2.0-rc3+ #69
..
RIP: 0010:bd_prepare_to_claim+0x131/0x150
..
Call Trace:
<TASK>
? _raw_spin_unlock+0x15/0x30
? iput+0x6a/0x220
blkdev_get_by_dev.part.0+0x4b/0x300
md_import_device+0x126/0x1d0
new_dev_store+0x184/0x240
md_attr_store+0x80/0xf0
kernfs_fop_write_iter+0x128/0x1c0
vfs_write+0x2be/0x3c0
ksys_write+0x5f/0xe0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
It turns out the md device cannot be used:
md: could not open device unknown-block(259,0).
md: md127 stopped.
Fix the issue by declaring the local static variable of struct type
and passing the pointer of the variable to blkdev_get_by_dev().
Fixes: fb541ca4c3 ("md: remove lock_bdev / unlock_bdev")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
sp_usb_phy_probe() will call platform_get_resource_byname() that may fail
and return NULL. devm_ioremap() will use usbphy->moon4_res_mem->start as
input, which may causes null-ptr-deref. Check the ret value of
platform_get_resource_byname() to avoid the null-ptr-deref.
Fixes: 99d9ccd973 ("phy: usb: Add USB2.0 phy driver for Sunplus SP7021")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20221125021222.25687-1-shangxiaojing@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: a164137ce9 ("ASoC: Intel: add machine driver for SOF+ES8336")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230112112852.67714-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: 9a87fc1e06 ("ASoC: Intel: bytcr_wm5102: Add machine driver for BYT/WM5102")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230112112852.67714-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: a232b96dce ("ASoC: Intel: bytcr_rt5640: use HID translation util")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230112112852.67714-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: 02c0a3b304 ("ASoC: Intel: bytcr_rt5651: add MCLK, quirks and cleanups")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230112112852.67714-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: 3c22a73fb8 ("ASoC: Intel: bytcht_es8316: fix HID handling")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20230112112852.67714-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When syncing a log, if we fail to update a log root in the log root tree,
we are aborting the transaction if the failure was not -ENOSPC. This is
excessive because there is a chance that a transaction commit can succeed,
and therefore avoid to turn the filesystem into RO mode. All we need to be
careful about is to mark the log for a full commit, which we already do,
to make sure no one commits a super block pointing to an outdated log root
tree.
So don't abort the transaction if we fail to update a log root in the log
root tree, and log an error if the failure is not -ENOSPC, so that it does
not go completely unnoticed.
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When syncing the log, if we fail to write log tree extent buffers, we mark
the log for a full commit and abort the transaction. However we don't need
to abort the transaction, all we really need to do is to make sure no one
can commit a superblock pointing to new log tree roots. Just because we
got a failure writing extent buffers for a log tree, it does not mean we
will also fail to do a transaction commit.
One particular case is if due to a bug somewhere, when writing log tree
extent buffers, the tree checker detects some corruption and the writeout
fails because of that. Aborting the transaction can be very disruptive for
a user, specially if the issue happened on a root filesystem. One example
is the scenario in the Link tag below, where an isolated corruption on log
tree leaves was causing transaction aborts when syncing the log.
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When logging conflicting inodes, if we reach the maximum limit of inodes,
we return BTRFS_LOG_FORCE_COMMIT to force a transaction commit. However
we don't mark the log for full commit (with btrfs_set_log_full_commit()),
which means that once we leave the log transaction and before we commit
the transaction, some other task may sync the log, which is incomplete
as we have not logged all conflicting inodes, leading to some inconsistent
in case that log ends up being replayed.
So also call btrfs_set_log_full_commit() at add_conflicting_inode().
Fixes: e09d94c9e4 ("btrfs: log conflicting inodes without holding log mutex of the initial inode")
CC: stable@vger.kernel.org # 6.1
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sometimes we log a directory without holding its VFS lock, so while we
logging it, dir index entries may be added or removed. This typically
happens when logging a dentry from a parent directory that points to a
new directory, through log_new_dir_dentries(), or when while logging
some other inode we also need to log its parent directories (through
btrfs_log_all_parents()).
This means that while we are at log_dir_items(), we may not find a dir
index key we found before, because it was deleted in the meanwhile, so
a call to btrfs_search_slot() may return 1 (key not found). In that case
we return from log_dir_items() with a success value (the variable 'err'
has a value of 0). This can lead to a few problems, specially in the case
where the variable 'last_offset' has a value of (u64)-1 (and it's
initialized to that when it was declared):
1) By returning from log_dir_items() with success (0) and a value of
(u64)-1 for '*last_offset_ret', we end up not logging any other dir
index keys that follow the missing, just deleted, index key. The
(u64)-1 value makes log_directory_changes() not call log_dir_items()
again;
2) Before returning with success (0), log_dir_items(), will log a dir
index range item covering a range from the last old dentry index
(stored in the variable 'last_old_dentry_offset') to the value of
'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
if the log is persisted and replayed after a power failure, it will
cause deletion of all the directory entries that have an index number
between last_old_dentry_offset + 1 and (u64)-1;
3) We can end up returning from log_dir_items() with
ctx->last_dir_item_offset having a lower value than
inode->last_dir_index_offset, because the former is set to the current
key we are processing at process_dir_items_leaf(), and at the end of
log_directory_changes() we set inode->last_dir_index_offset to the
current value of ctx->last_dir_item_offset. So if for example a
deletion of a lower dir index key happened, we set
ctx->last_dir_item_offset to that index value, then if we return from
log_dir_items() because btrfs_search_slot() returned 1, we end up
returning from log_dir_items() with success (0) and then
log_directory_changes() sets inode->last_dir_index_offset to a lower
value than it had before.
This can result in unpredictable and unexpected behaviour when we
need to log again the directory in the same transaction, and can result
in ending up with a log tree leaf that has duplicated keys, as we do
batch insertions of dir index keys into a log tree.
So fix this by making log_dir_items() move on to the next dir index key
if it does not find the one it was looking for.
Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When logging a directory, at log_dir_items(), if we get an error when
attempting to search the subvolume tree for a dir index item, we end up
returning 0 (success) from log_dir_items() because 'err' is left with a
value of 0.
This can lead to a few problems, specially in the case the variable
'last_offset' has a value of (u64)-1 (and it's initialized to that when
it was declared):
1) By returning from log_dir_items() with success (0) and a value of
(u64)-1 for '*last_offset_ret', we end up not logging any other dir
index keys that follow the missing, just deleted, index key. The
(u64)-1 value makes log_directory_changes() not call log_dir_items()
again;
2) Before returning with success (0), log_dir_items(), will log a dir
index range item covering a range from the last old dentry index
(stored in the variable 'last_old_dentry_offset') to the value of
'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
if the log is persisted and replayed after a power failure, it will
cause deletion of all the directory entries that have an index number
between last_old_dentry_offset + 1 and (u64)-1;
3) We can end up returning from log_dir_items() with
ctx->last_dir_item_offset having a lower value than
inode->last_dir_index_offset, because the former is set to the current
key we are processing at process_dir_items_leaf(), and at the end of
log_directory_changes() we set inode->last_dir_index_offset to the
current value of ctx->last_dir_item_offset. So if for example a
deletion of a lower dir index key happened, we set
ctx->last_dir_item_offset to that index value, then if we return from
log_dir_items() because btrfs_search_slot() returned an error, we end up
returning without any error from log_dir_items() and then
log_directory_changes() sets inode->last_dir_index_offset to a lower
value than it had before.
This can result in unpredictable and unexpected behaviour when we
need to log again the directory in the same transaction, and can result
in ending up with a log tree leaf that has duplicated keys, as we do
batch insertions of dir index keys into a log tree.
Fix this by setting 'err' to the value of 'ret' in case
btrfs_search_slot() or btrfs_previous_item() returned an error. That will
result in falling back to a full transaction commit.
Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since nfsd4_state_shrinker_count always calls mod_delayed_work with
0 delay, we can replace delayed_work with work_struct to save some
space and overhead.
Also add the call to cancel_work after unregister the shrinker
in nfs4_state_shutdown_net.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit 374146cad4 ("drm/vc4: Switch to drmm_mutex_init") converted,
among other functions, vc4_create_object() to use drmm_mutex_init().
However, that function is used to allocate a BO, and therefore the
mutex needs to be freed much sooner than when the DRM device is removed
from the system.
For each buffer allocation we thus end up allocating a small structure
as part of the DRM-managed mechanism that is never freed, eventually
leading us to no longer having any free memory anymore.
Let's switch back to mutex_init/mutex_destroy to deal with it properly.
Fixes: 374146cad4 ("drm/vc4: Switch to drmm_mutex_init")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230112091243.490799-1-maxime@cerno.tech
Theoretically the device might gone if its reference count drops to 0.
This might be the case when we try to find the first physical node of
the ACPI device. We need to keep reference to it until we get a result
of the above mentioned call. Refactor the code to drop the reference
count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the
acpi_dev_get_first_match_dev().
Fixes: 02527c3f23 ("ASoC: amd: add Machine driver for Jadeite platform")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://lore.kernel.org/r/20230112112356.67643-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently the nfsd-client shrinker is registered and unregistered at
the time the nfsd module is loaded and unloaded. The problem with this
is the shrinker is being registered before all of the relevant fields
in nfsd_net are initialized when nfsd is started. This can lead to an
oops when memory is low and the shrinker is called while nfsd is not
running.
This patch moves the register/unregister of nfsd-client shrinker from
module load/unload time to nfsd startup/shutdown time.
Fixes: 44df6f439a ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If signal_pending() returns true, schedule_timeout() will not be executed,
causing the waiting task to remain in the wait queue.
Fixed by adding a call to finish_wait(), which ensures that the waiting
task will always be removed from the wait queue.
Fixes: f4e44b3933 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
vsyscall detection code uses direct call to the beginning of
the vsyscall page:
asm ("call %P0" :: "i" (0xffffffffff600000))
It generates "call rel32" instruction but it is not relocated if binary
is PIE, so binary segfaults into random userspace address and vsyscall
page status is detected incorrectly.
Do more direct:
asm ("call *%rax")
which doesn't do need any relocaltions.
Mark g_vsyscall as volatile for a good measure, I didn't find instruction
setting it to 0. Now the code is obviously correct:
xor eax, eax
mov rdi, rbp
mov rsi, rbp
mov DWORD PTR [rip+0x2d15], eax # g_vsyscall = 0
mov rax, 0xffffffffff600000
call rax
mov DWORD PTR [rip+0x2d02], 1 # g_vsyscall = 1
mov eax, DWORD PTR ds:0xffffffffff600000
mov DWORD PTR [rip+0x2cf1], 2 # g_vsyscall = 2
mov edi, [rip+0x2ceb] # exit(g_vsyscall)
call exit
Note: fixed proc-empty-vm test oopses 5.19.0-28-generic kernel
but this is separate story.
Link: https://lkml.kernel.org/r/Y7h2xvzKLg36DSq8@p183
Fixes: 5bc73bb345 ("proc: test how it holds up with mapping'less process")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 449c796768 ("mm: teach release_pages() to take an array of
encoded page pointers too") added the kernel doc comment for
release_pages() on top of 'union release_pages_arg', so making 'make
htmldocs' complains as below:
./include/linux/mm.h:1268: warning: cannot understand function prototype: 'typedef union '
The kernel doc comment for the function is already on top of the
function's definition in mm/swap.c, and the new comment is actually not
for the function but indeed release_pages_arg. Fixing the comment to
reflect the intent would be one option. But, kernel doc cannot parse
the union as below due to the attribute.
./include/linux/mm.h:1272: error: Cannot parse struct or union!
Modify the comment to reflect the intent but do not mark it as a kernel
doc comment.
Link: https://lkml.kernel.org/r/20230106203331.127532-1-sj@kernel.org
Fixes: 449c796768 ("mm: teach release_pages() to take an array of encoded page pointers too")
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If nilfs2 reads a corrupted disk image and tries to reads a b-tree node
block by calling __nilfs_btree_get_block() against an invalid virtual
block address, it returns -ENOENT because conversion of the virtual block
address to a disk block address fails. However, this return value is the
same as the internal code that b-tree lookup routines return to indicate
that the block being searched does not exist, so functions that operate on
that b-tree may misbehave.
When nilfs_btree_insert() receives this spurious 'not found' code from
nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was
successful and continues the insert operation using incomplete lookup path
data, causing the following crash:
general protection fault, probably for non-canonical address
0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
...
RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline]
RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline]
RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238
Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89
ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c
28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02
...
Call Trace:
<TASK>
nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline]
nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147
nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101
__block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991
__block_write_begin fs/buffer.c:2041 [inline]
block_write_begin+0x93/0x1e0 fs/buffer.c:2102
nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261
generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772
__generic_file_write_iter+0x176/0x400 mm/filemap.c:3900
generic_file_write_iter+0xab/0x310 mm/filemap.c:3932
call_write_iter include/linux/fs.h:2186 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x7dc/0xc50 fs/read_write.c:584
ksys_write+0x177/0x2a0 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
</TASK>
This patch fixes the root cause of this problem by replacing the error
code that __nilfs_btree_get_block() returns on block address conversion
failure from -ENOENT to another internal code -EINVAL which means that the
b-tree metadata is corrupted.
By returning -EINVAL, it propagates without glitches, and for all relevant
b-tree operations, functions in the upper bmap layer output an error
message indicating corrupted b-tree metadata via
nilfs_bmap_convert_error(), and code -EIO will be eventually returned as
it should be.
Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com
Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ede796cecd5296353515@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst. An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.
Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.
Link: https://lkml.kernel.org/r/20221224082035.3197140-2-zokeefe@google.com
Fixes: 7c6c6cc4d3 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped. One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results. An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region. An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.
I don't believe there is a kernel stability concern here as we always
(re)validate the VMA / region accordingly. Also as Hugh mentions, the
user-visible effects are: we try to collapse more memory than requested
by the user, and/or failing an operation that should have otherwise
succeeded. An example is trying to collapse a 4MiB file contained
within a 12MiB VMA.
Don't expand the acted-on region when refetching vma->vm_end.
Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb). The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).
So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect). For example, when enabling softdirty tracking, we enable
writenotify. With uffd-wp on shared mappings, that changed. More details
on vma->vm_page_prot semantics were summarized in [1].
This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().
Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable. In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.
This fixes two known cases:
(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
writable, resulting in uffd-wp not triggering on write access.
Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. On
such a VMA, userfaultfd-wp is currently non-functional.
Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.
Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access. This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.
[1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com
Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
Fixes: b1f9e87686 ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ives van Hoorne <ives@codesandbox.io>
Debugged-by: Peter Xu <peterx@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition. And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.
Is anon_vma lock required? Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change. However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).
Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.
What this fixes is not a dangerous instability. But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).
Link: https://lkml.kernel.org/r/b740c9fb-edba-92ba-59fb-7a5592e5dfc@google.com
Fixes: 8d3c106e19 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We have to update the uffd-wp SWP PTE bit independent of the type of
migration entry. Currently, if we're unlucky and we want to install/clear
the uffd-wp bit just while we're migrating a read-only mapped hugetlb
page, we would miss to set/clear the uffd-wp bit.
Further, if we're processing a readable-exclusive migration entry and
neither want to set or clear the uffd-wp bit, we could currently end up
losing the uffd-wp bit. Note that the same would hold for writable
migrating entries, however, having a writable migration entry with the
uffd-wp bit set would already mean that something went wrong.
Note that the change from !is_readable_migration_entry ->
writable_migration_entry is harmless and actually cleaner, as raised by
Miaohe Lin and discussed in [1].
[1] https://lkml.kernel.org/r/90dd6a93-4500-e0de-2bf0-bf522c311b0c@huawei.com
Link: https://lkml.kernel.org/r/20221222205511.675832-3-david@redhat.com
Fixes: 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()".
Playing with virtio-mem and background snapshots (using uffd-wp) on
hugetlb in QEMU, I managed to trigger a VM_BUG_ON(). Looking into the
details, hugetlb_change_protection() seems to not handle uffd-wp correctly
in all cases.
Patch #1 fixes my test case. I don't have reproducers for patch #2, as it
requires running into migration entries.
I did not yet check in detail yet if !hugetlb code requires similar care.
This patch (of 2):
There are two problematic cases when stumbling over a PTE marker in
hugetlb_change_protection():
(1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will
end up in the "!huge_pte_none(pte)" case and mess up the PTE marker.
(2) We unprotect a uffd-wp PTE marker: we will similarly end up in the
"!huge_pte_none(pte)" case even though we cleared the PTE, because
the "pte" variable is stale. We'll mess up the PTE marker.
For example, if we later stumble over such a "wrongly modified" PTE marker,
we'll treat it like a present PTE that maps some garbage page.
This can, for example, be triggered by mapping a memfd backed by huge
pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a)
uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c)
unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE)
on that file range, we will run into a VM_BUG_ON:
[ 195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
[ 195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff)
[ 195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000
[ 195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page))
[ 195.039573] ------------[ cut here ]------------
[ 195.039574] kernel BUG at mm/rmap.c:1346!
[ 195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1
[ 195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022
[ 195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550
[ 195.039588] Code: [...]
[ 195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292
[ 195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000
[ 195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff
[ 195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08
[ 195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0
[ 195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100
[ 195.039595] FS: 00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000
[ 195.039596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0
[ 195.039598] PKRU: 55555554
[ 195.039599] Call Trace:
[ 195.039600] <TASK>
[ 195.039602] __unmap_hugepage_range+0x33b/0x7d0
[ 195.039605] unmap_hugepage_range+0x55/0x70
[ 195.039608] hugetlb_vmdelete_list+0x77/0xa0
[ 195.039611] hugetlbfs_fallocate+0x410/0x550
[ 195.039612] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 195.039616] vfs_fallocate+0x12e/0x360
[ 195.039618] __x64_sys_fallocate+0x40/0x70
[ 195.039620] do_syscall_64+0x58/0x80
[ 195.039623] ? syscall_exit_to_user_mode+0x17/0x40
[ 195.039624] ? do_syscall_64+0x67/0x80
[ 195.039626] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 195.039628] RIP: 0033:0x7fc7b590651f
[ 195.039653] Code: [...]
[ 195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[ 195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f
[ 195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c
[ 195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073
[ 195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000
[ 195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000
[ 195.039661] </TASK>
Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over
an exclusive marker. spin_unlock() + continue would get the job done.
However, instead, make it clearer that there are no fall-through
statements: we process each case (hwpoison, migration, marker, !none,
none) and then unlock the page table to continue with the next PTE. Let's
avoid "continue" statements and use a single spin_unlock() at the end.
Link: https://lkml.kernel.org/r/20221222205511.675832-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221222205511.675832-2-david@redhat.com
Fixes: 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The commit 79417d040f ("btrfs: zoned: disable metadata overcommit for
zoned") disabled the metadata over-commit to track active zones properly.
However, it also introduced a heavy overhead by allocating new metadata
block groups and/or flushing dirty buffers to release the space
reservations. Specifically, a workload (write only without any sync
operations) worsen its performance from 343.77 MB/sec (v5.19) to 182.89
MB/sec (v6.0).
The performance is still bad on current misc-next which is 187.95 MB/sec.
And, with this patch applied, it improves back to 326.70 MB/sec (+73.82%).
This patch introduces a new fs_info->flag BTRFS_FS_NO_OVERCOMMIT to
indicate it needs to disable the metadata over-commit. The flag is enabled
when a device with max active zones limit is loaded into a file-system.
Fixes: 79417d040f ("btrfs: zoned: disable metadata overcommit for zoned")
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There are some reports from the mailing list that since v6.1 kernel, the
WARN_ON() inside btrfs_qgroup_account_extent() gets triggered during
rescan:
WARNING: CPU: 3 PID: 6424 at fs/btrfs/qgroup.c:2756 btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs]
CPU: 3 PID: 6424 Comm: snapperd Tainted: P OE 6.1.2-1-default #1 openSUSE Tumbleweed 05c7a1b1b61d5627475528f71f50444637b5aad7
RIP: 0010:btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs]
Call Trace:
<TASK>
btrfs_commit_transaction+0x30c/0xb40 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
? start_transaction+0xc3/0x5b0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
btrfs_qgroup_rescan+0x42/0xc0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
btrfs_ioctl+0x1ab9/0x25c0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
? __rseq_handle_notify_resume+0xa9/0x4a0
? mntput_no_expire+0x4a/0x240
? __seccomp_filter+0x319/0x4d0
__x64_sys_ioctl+0x90/0xd0
do_syscall_64+0x5b/0x80
? syscall_exit_to_user_mode+0x17/0x40
? do_syscall_64+0x67/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fd9b790d9bf
</TASK>
[CAUSE]
Since commit e15e9f43c7 ("btrfs: introduce
BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting"), if
our qgroup is already in inconsistent state, we will no longer do the
time-consuming backref walk.
This can leave some qgroup records without a valid old_roots ulist.
Normally this is fine, as btrfs_qgroup_account_extents() would also skip
those records if we have NO_ACCOUNTING flag set.
But there is a small window, if we have NO_ACCOUNTING flag set, and
inserted some qgroup_record without a old_roots ulist, but then the user
triggered a qgroup rescan.
During btrfs_qgroup_rescan(), we firstly clear NO_ACCOUNTING flag, then
commit current transaction.
And since we have a qgroup_record with old_roots = NULL, we trigger the
WARN_ON() during btrfs_qgroup_account_extents().
[FIX]
Unfortunately due to the introduction of NO_ACCOUNTING flag, the
assumption that every qgroup_record would have its old_roots populated
is no longer correct.
Fix the false alerts and drop the WARN_ON().
Reported-by: Lukas Straub <lukasstraub2@web.de>
Reported-by: HanatoK <summersnow9403@gmail.com>
Fixes: e15e9f43c7 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting")
CC: stable@vger.kernel.org # 6.1
Link: https://lore.kernel.org/linux-btrfs/2403c697-ddaf-58ad-3829-0335fc89df09@gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When test case btrfs/219 (aka, mount a registered device but with a lower
generation) failed, there is not any useful information for the end user
to find out what's going wrong.
The mount failure just looks like this:
# mount -o loop /tmp/219.img2 /mnt/btrfs/
mount: /mnt/btrfs: mount(2) system call failed: File exists.
dmesg(1) may have more information after failed mount system call.
While the dmesg contains nothing but the loop device change:
loop1: detected capacity change from 0 to 524288
[CAUSE]
In device_list_add() we have a lot of extra checks to reject invalid
cases.
That function also contains the regular device scan result like the
following prompt:
BTRFS: device fsid 6222333e-f9f1-47e6-b306-55ddd4dcaef4 devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (3027)
But unfortunately not all errors have their own error messages, thus if
we hit something wrong in device_add_list(), there may be no error
messages at all.
[FIX]
Add errors message for all non-ENOMEM errors.
For ENOMEM, I'd say we're in a much worse situation, and there should be
some OOM messages way before our call sites.
CC: stable@vger.kernel.org # 6.0+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the offset + length goes over the ethernet + vlan header, then the
length is adjusted to copy the bytes that are within the boundaries of
the vlan_ethhdr scratchpad area. The remaining bytes beyond ethernet +
vlan header are copied directly from the skbuff data area.
Fix incorrect arithmetic operator: subtract, not add, the size of the
vlan header in case of double-tagged packets to adjust the length
accordingly to address CVE-2023-0179.
Reported-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Fixes: f6ae9f120d ("netfilter: nft_payload: add C-VLAN support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of
an arithmetic expression 2 << (netmask - mask_bits - 1) is subject
to overflow due to a failure casting operands to a larger data type
before performing the arithmetic.
Note that it's harmless since the value will be checked at the next step.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: b9fed74818 ("netfilter: ipset: Check and reject crazy /0 input parameters")
Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The kselftest framework uses a default timeout of 45 seconds for
all test scripts.
Increase the timeout to two minutes for the netfilter tests, this
should hopefully be enough,
Make sure that, should the script be canceled, the net namespace and
the spawned ping instances are removed.
Fixes: 25d8bcedbf ("selftests: add script to stress-test nft packet path vs. control plane")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
During kexec on ARM device, we notice that device_shutdown() only calls
pm_runtime_force_suspend() while shutting down the GPU. This means the GPU
kthread is still running and further, there maybe active submits.
This causes all kinds of issues during a kexec reboot:
Warning from shutdown path:
[ 292.509662] WARNING: CPU: 0 PID: 6304 at [...] adreno_runtime_suspend+0x3c/0x44
[ 292.509863] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
[ 292.509872] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 292.509881] pc : adreno_runtime_suspend+0x3c/0x44
[ 292.509891] lr : pm_generic_runtime_suspend+0x30/0x44
[ 292.509905] sp : ffffffc014473bf0
[...]
[ 292.510043] Call trace:
[ 292.510051] adreno_runtime_suspend+0x3c/0x44
[ 292.510061] pm_generic_runtime_suspend+0x30/0x44
[ 292.510071] pm_runtime_force_suspend+0x54/0xc8
[ 292.510081] adreno_shutdown+0x1c/0x28
[ 292.510090] platform_shutdown+0x2c/0x38
[ 292.510104] device_shutdown+0x158/0x210
[ 292.510119] kernel_restart_prepare+0x40/0x4c
And here from GPU kthread, an SError OOPs:
[ 192.648789] el1h_64_error+0x7c/0x80
[ 192.648812] el1_interrupt+0x20/0x58
[ 192.648833] el1h_64_irq_handler+0x18/0x24
[ 192.648854] el1h_64_irq+0x7c/0x80
[ 192.648873] local_daif_inherit+0x10/0x18
[ 192.648900] el1h_64_sync_handler+0x48/0xb4
[ 192.648921] el1h_64_sync+0x7c/0x80
[ 192.648941] a6xx_gmu_set_oob+0xbc/0x1fc
[ 192.648968] a6xx_hw_init+0x44/0xe38
[ 192.648991] msm_gpu_hw_init+0x48/0x80
[ 192.649013] msm_gpu_submit+0x5c/0x1a8
[ 192.649034] msm_job_run+0xb0/0x11c
[ 192.649058] drm_sched_main+0x170/0x434
[ 192.649086] kthread+0x134/0x300
[ 192.649114] ret_from_fork+0x10/0x20
Fix by calling adreno_system_suspend() in the device_shutdown() path.
[ Applied Rob Clark feedback on fixing adreno_unbind() similarly, also
tested as above. ]
Cc: Rob Clark <robdclark@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ricardo Ribalda <ribalda@chromium.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/517633/
Link: https://lore.kernel.org/r/20230109222547.1368644-1-joel@joelfernandes.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since commit 80b6093b55 ("kbuild: add -Wundef to KBUILD_CPPFLAGS
for W=1 builds"), building with W=1 detects -Wundef warnings for
assembly code.
$ make W=1 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- arch/arm/mm/
[snip]
AS arch/arm/mm/cache-v7.o
In file included from arch/arm/mm/cache-v7.S:17:
arch/arm/mm/proc-macros.S:109:5: warning: "L_PTE_SHARED" is not defined, evaluates to 0 [-Wundef]
109 | #if L_PTE_SHARED != PTE_EXT_SHARED
| ^~~~~~~~~~~~
arch/arm/mm/proc-macros.S:109:21: warning: "PTE_EXT_SHARED" is not defined, evaluates to 0 [-Wundef]
109 | #if L_PTE_SHARED != PTE_EXT_SHARED
| ^~~~~~~~~~~~~~
arch/arm/mm/proc-macros.S:113:10: warning: "L_PTE_XN" is not defined, evaluates to 0 [-Wundef]
113 | (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
| ^~~~~~~~
arch/arm/mm/proc-macros.S:113:19: warning: "L_PTE_USER" is not defined, evaluates to 0 [-Wundef]
113 | (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
| ^~~~~~~~~~
arch/arm/mm/proc-macros.S:113:30: warning: "L_PTE_RDONLY" is not defined, evaluates to 0 [-Wundef]
113 | (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
| ^~~~~~~~~~~~
arch/arm/mm/proc-macros.S:113:43: warning: "L_PTE_DIRTY" is not defined, evaluates to 0 [-Wundef]
113 | (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
| ^~~~~~~~~~~
arch/arm/mm/proc-macros.S:113:55: warning: "L_PTE_YOUNG" is not defined, evaluates to 0 [-Wundef]
113 | (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
| ^~~~~~~~~~~
arch/arm/mm/proc-macros.S:114:10: warning: "L_PTE_PRESENT" is not defined, evaluates to 0 [-Wundef]
114 | L_PTE_PRESENT) > L_PTE_SHARED
| ^~~~~~~~~~~~~
arch/arm/mm/proc-macros.S:114:27: warning: "L_PTE_SHARED" is not defined, evaluates to 0 [-Wundef]
114 | L_PTE_PRESENT) > L_PTE_SHARED
| ^~~~~~~~~~~~
Include <asm/pgtable.h> from proc-macros.S to fix the warnings.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
zero_page is a void* pointer but memblock_alloc() returns phys_addr_t type
so this generates a warning while using clang and with -Wint-error enabled
that becomes and error. So let's cast the return of memblock_alloc() to
(void *).
Cc: <stable@vger.kernel.org> # 4.14.x +
Fixes: 340a982825 ("ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation")
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
A fix was made in the mkspec script that uses a feature, ie. the
OR expression, which requires RPM 4.13. However, the script indicates
another minimum version. Lower versions may have success by using
the --no-deps option as suggested, but feels like bumping the version
to 4.13 is reasonable as it put me on the wrong track at first with
RPM 4.11 on my Centos7 machine.
Fixes: 02a893bc99 ("kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires")
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
It was never guaranteed to be exactly eight, but since commit
548b8b5168 ("scripts/setlocalversion: make git describe output more
reliable"), it has been exactly 12.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When the input enable pinconf was introduced, a default drive-strength
value of 2 was set for the pull up/down configs. However, this parameter
is unneeded when configuring the pin as input, and having a single
hardcoded value here is actually harmful: GPIOs on the RK3399 have
various same drive-strength capabilities depending on the bank and port
they belong to.
As an example, trying to configure the GPIO4_PD3 pin as an input with
pull-up enabled fails with the following output:
[ 10.706542] rockchip-pinctrl pinctrl: unsupported driver strength 2
[ 10.713661] rockchip-pinctrl pinctrl: pin_config_set op failed for pin 155
(acceptable drive-strength values for this pin being 3, 6, 9 and 12)
Let's drop the drive-strength property from all input pinconfs in order
to solve this issue.
Fixes: ec48c3e82c ("arm64: dts: rockchip: add an input enable pinconf to rk3399")
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
Reviewed-by: Caleb Connolly <kc@postmarketos.org>
Link: https://lore.kernel.org/r/20221215101947.254896-1-arnaud.ferraris@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
If CONFIG_ELF_CORE is not set:
fs/coredump.c:835:12: error: ‘dump_emit_page’ defined but not used [-Werror=unused-function]
835 | static int dump_emit_page(struct coredump_params *cprm, struct page *page)
| ^~~~~~~~~~~~~~
Fix this by moving dump_emit_page() inside the existing section
protected by #ifdef CONFIG_ELF_CORE.
Fixes: 06bbaa6dc5 ("[coredump] don't use __kernel_write() on kmap_local_page()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Qualcomm driver fixes for v6.2
Updated error handling in the async packer router driver made an
optional property required, fix this. Also improve error handling in the
probe function of the CPR driver.
* tag 'qcom-driver-fixes-for-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe()
soc: qcom: apr: Make qcom,protection-domain optional again
dt-bindings: soc: qcom: apr: Make qcom,protection-domain optional again
Link: https://lore.kernel.org/r/20230110213946.2183982-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 DTS fixes for 6.2
The cluster idle issue was resolved on SM8250, so the change disabling
the cluster state is being reverted.
Issues where identified with the QMP PHY binding, that would prevent
enablement of Displayport and it was decided not to support the old
binding for the recently introduced SC8280XP, which broke USB. This
adjusts the USB PHY nodes to the new binding. The reset signal for the
first QMP PHY is corrected as well.
The reserved memory map is updated on Xiaomi Mi 4C and Huawei Nexus 6P,
to avoid instabilities caused by use of protected memory regions.
The compatible for the MSM8992 TCSR mutex is corrected as well.
Lastly SDHCI interconnects on SM8350 are corrected to match the
providers #interconnect-cells.
* tag 'qcom-arm64-fixes-for-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: msm8992-libra: Fix the memory map
arm64: dts: qcom: msm8992: Don't use sfpb mutex
arm64: dts: msm8994-angler: fix the memory map
arm64: dts: qcom: sm8350: correct SDHCI interconnect arguments
Revert "arm64: dts: qcom: sm8250: Disable the not yet supported cluster idle state"
arm64: dts: msm8992-bullhead: add memory hole region
arm64: dts: qcom: sc8280xp: fix USB-DP PHY nodes
arm64: dts: qcom: sc8280xp: fix primary USB-DP PHY reset
Link: https://lore.kernel.org/r/20230110213724.2183668-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ARM: SoC fixes
These are three fixes for mistakes I discovered during the preparation
of the boardfile removal. Robert noticed the accidental removal
of PXA310 and PXA320 support because of a misplaced Kconfig statement,
and the two OMAP patches were build failures that got introduced
earlier but that I found while testing the removal.
* 'armsoc-build-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: omap1: fix building gpio15xx
ARM: omap1: fix !ARCH_OMAP1_ANY link failures
ARM: pxa: enable PXA310/PXA320 for DT-only build
In some randconfig builds, the asm/irq.h header is not included
in gpio15xx.c, so add an explicit include to avoid a build fialure:
In file included from arch/arm/mach-omap1/gpio15xx.c:15:
arch/arm/mach-omap1/irqs.h:99:34: error: 'NR_IRQS_LEGACY' undeclared here (not in a function)
99 | #define IH2_BASE (NR_IRQS_LEGACY + 32)
| ^~~~~~~~~~~~~~
arch/arm/mach-omap1/irqs.h:105:38: note: in expansion of macro 'IH2_BASE'
105 | #define INT_MPUIO (5 + IH2_BASE)
| ^~~~~~~~
arch/arm/mach-omap1/gpio15xx.c:28:27: note: in expansion of macro 'INT_MPUIO'
28 | .start = INT_MPUIO,
| ^~~~~~~~~
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
While compile-testing randconfig builds for the upcoming boardfile
removal, I noticed that an earlier patch of mine was completely
broken, and the introduction of CONFIG_ARCH_OMAP1_ANY only replaced
one set of build failures with another one, now resulting in
link failures like
ld: drivers/video/fbdev/omap/omapfb_main.o: in function `omapfb_do_probe':
drivers/video/fbdev/omap/omapfb_main.c:1703: undefined reference to `omap_set_dma_priority'
ld: drivers/dma/ti/omap-dma.o: in function `omap_dma_free_chan_resources':
drivers/dma/ti/omap-dma.c:777: undefined reference to `omap_free_dma'
drivers/dma/ti/omap-dma.c:1685: undefined reference to `omap_get_plat_info'
ld: drivers/usb/gadget/udc/omap_udc.o: in function `next_in_dma':
drivers/usb/gadget/udc/omap_udc.c:820: undefined reference to `omap_get_dma_active_status'
I tried reworking it, but the resulting patch ended up much bigger than
simply avoiding the original problem of unused-function warnings like
arch/arm/mach-omap1/mcbsp.c:76:30: error: unused variable 'omap1_mcbsp_ops' [-Werror,-Wunused-variable]
As a result, revert the previous fix, and rearrange the code that
produces warnings to hide them. For mcbsp, the #ifdef check can
simply be removed as the cpu_is_omapxxx() checks already achieve
the same result, while in the io.c the easiest solution appears to
be to merge the common map bits into each soc specific portion.
This gets cleaned in a nicer way after omap7xx support gets dropped,
as the remaining SoCs all have the exact same I/O map.
Fixes: 615dce5bf7 ("ARM: omap1: fix build with no SoC selected")
Cc: stable@vger.kernel.org
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When CONFIG_DEBUG_INFO_BTF_MODULES=y, running 'make modules'
in the clean kernel tree will get the following error.
$ grep CONFIG_DEBUG_INFO_BTF_MODULES .config
CONFIG_DEBUG_INFO_BTF_MODULES=y
$ make -s clean
$ make modules
[snip]
AR vmlinux.a
ar: ./built-in.a: No such file or directory
make: *** [Makefile:1241: vmlinux.a] Error 1
'modules' depends on 'vmlinux', but builtin objects are not built.
Define KBUILD_BUILTIN.
Fixes: f73edc8951 ("kbuild: unify two modpost invocations")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Nathan Chancellor reports that $(NM) emits an error message when
GNU Make 4.4 is used to build the ARM zImage.
$ make-4.4 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- O=build defconfig zImage
[snip]
LD vmlinux
NM System.map
SORTTAB vmlinux
OBJCOPY arch/arm/boot/Image
Kernel: arch/arm/boot/Image is ready
arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux': No such file
/bin/sh: 1: arithmetic expression: expecting primary: " "
LDS arch/arm/boot/compressed/vmlinux.lds
AS arch/arm/boot/compressed/head.o
GZIP arch/arm/boot/compressed/piggy_data
AS arch/arm/boot/compressed/piggy.o
CC arch/arm/boot/compressed/misc.o
This occurs since GNU Make commit 98da874c4303 ("[SV 10593] Export
variables to $(shell ...) commands"), and the O= option is needed to
reproduce it. The generated zImage is correct despite the error message.
As the commit description of 98da874c4303 [1] says, exported variables
are passed down to $(shell ) functions, which means exported recursive
variables might be expanded earlier than before, in the parse stage.
The following test code demonstrates the change for GNU Make 4.4.
[Test Makefile]
$(shell echo hello > foo)
export foo = $(shell cat bar/../foo)
$(shell mkdir bar)
all:
@echo $(foo)
[GNU Make 4.3]
$ rm -rf bar; make-4.3
hello
[GNU Make 4.4]
$ rm -rf bar; make-4.4
cat: bar/../foo: No such file or directory
hello
The 'foo' is a resursively expanded (i.e. lazily expanded) variable.
GNU Make 4.3 expands 'foo' just before running the recipe '@echo $(foo)',
at this point, the directory 'bar' exists.
GNU Make 4.4 expands 'foo' to evaluate $(shell mkdir bar) because it is
exported. At this point, the directory 'bar' does not exit yet. The cat
command cannot resolve the bar/../foo path, hence the error message.
Let's get back to the kernel Makefile.
In arch/arm/boot/compressed/Makefile, KBSS_SZ is referenced by
LDFLAGS_vmlinux, which is recursive and also exported by the top
Makefile.
GNU Make 4.3 expands KBSS_SZ just before running the recipes, so no
error message.
GNU Make 4.4 expands KBSS_SZ in the parse stage, where the directory
arm/arm/boot/compressed does not exit yet. When compiled with O=,
the output directory is created by $(shell mkdir -p $(obj-dirs))
in scripts/Makefile.build.
There are two ways to fix this particular issue:
- change "$(obj)/../../../../vmlinux" in KBSS_SZ to "vmlinux"
- unexport LDFLAGS_vmlinux
This commit takes the latter course because it is what I originally
intended.
Commit 3ec8a5b33d ("kbuild: do not export LDFLAGS_vmlinux")
unexported LDFLAGS_vmlinux.
Commit 5d4aeffbf7 ("kbuild: rebuild .vmlinux.export.o when its
prerequisite is updated") accidentally exported it again.
We can clean up arch/arm/boot/compressed/Makefile later.
[1]: https://git.savannah.gnu.org/cgit/make.git/commit/?id=98da874c43035a490cdca81331724f233a3d0c9a
Link: https://lore.kernel.org/all/Y7i8+EjwdnhHtlrr@dev-arch.thelio-3990X/
Fixes: 5d4aeffbf7 ("kbuild: rebuild .vmlinux.export.o when its prerequisite is updated")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Since commit cbf7827bc5 ("iommu/s390: Fix potential s390_domain
aperture shrinking") the s390 IOMMU driver uses reserved regions for the
system provided DMA ranges of PCI devices. Previously it reduced the
size of the IOMMU aperture and checked it on each mapping operation.
On current machines the system denies use of DMA addresses below 2^32 for
all PCI devices.
Usually mapping IOVAs in a reserved regions is harmless until a DMA
actually tries to utilize the mapping. However on s390 there is
a virtual PCI device called ISM which is implemented in firmware and
used for cross LPAR communication. Unlike real PCI devices this device
does not use the hardware IOMMU but inspects IOMMU translation tables
directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
mappings outside the allowed ranges it goes into an error state. This
error state then causes the device to be unavailable to the KVM guest.
Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
address 0 irrespective of the IOMMUs reserved regions. Even if usually
harmless this seems wrong in the general case so instead go through the
freshly updated IOVA list and try to find a range that isn't reserved,
and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
testing for fine grained super pages.
Fixes: af029169b8 ("vfio/type1: Check reserved region conflict and update iova list")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230110164427.4051938-2-schnelle@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When aops->write_begin() does not initialize fsdata, KMSAN may report
an error passing the latter to aops->write_end().
Fix this by unconditionally initializing fsdata.
Fixes: f2b6a16eb8 ("fs: affs convert to new aops")
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While this device uses the rk3399 it is also enclosed in a tight package
and cooled through the screen and back case. The default rk3399 thermal
limits can result in a burnt screen.
These lower limits have resulted in the existing burn not expanding and
will hopefully result in future devices not experiencing the issue.
Signed-off-by: Jarrah Gosbell <kernel@undef.tools>
Link: https://lore.kernel.org/r/20221207113212.8216-1-kernel@undef.tools
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
When multiple interfaces are present in the local interface
list, new skb copy is taken before rx processing except for
the first interface. The address translation happens each
time only on the original skb since the hdr pointer is not
updated properly to the newly created skb.
As a result frames start to drop in userspace when address
based checks or search fails.
Signed-off-by: Sriram R <quic_srirrama@quicinc.com>
Link: https://lore.kernel.org/r/20221208040050.25922-1-quic_srirrama@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a running wake_tx_queue() call is aborted due to a hw queue stop
the corresponding iTXQ is not always correctly marked for resumption:
wake_tx_push_queue() can stops the queue run without setting
@IEEE80211_TXQ_STOP_NETIF_TX.
Without the @IEEE80211_TXQ_STOP_NETIF_TX flag __ieee80211_wake_txqs()
will not schedule a new queue run and remaining frames in the queue get
stuck till another frame is queued to it.
Fix the issue for all drivers - also the ones with custom wake_tx_queue
callbacks - by moving the logic into ieee80211_tx_dequeue() and drop the
redundant @txqs_stopped.
@IEEE80211_TXQ_STOP_NETIF_TX is also renamed to @IEEE80211_TXQ_DIRTY to
better describe the flag.
Fixes: c850e31f79 ("wifi: mac80211: add internal handler for wake_tx_queue")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20221230121850.218810-1-alexander@wetzel-home.de
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix three error exit issues in expected receive setup.
Re-arrange error exits to increase readability.
Issues and fixes:
1. Possible missed page unpin if tidlist copyout fails and
not all pinned pages where made part of a TID.
Fix: Unpin the unused pages.
2. Return success with unset return values tidcnt and length
when no pages were pinned.
Fix: Return -ENOSPC if no pages were pinned.
3. Return success with unset return values tidcnt and length when
no rcvarray entries available.
Fix: Return -ENOSPC if no rcvarray entries are available.
Fixes: 7e7a436ecb ("staging/hfi1: Add TID entry program function body")
Fixes: 97736f36db ("IB/hfi1: Validate page aligned for a given virtual addres")
Fixes: f404ca4c7e ("IB/hfi1: Refactor hfi_user_exp_rcv_setup() IOCTL")
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/167328548150.1472310.1492305874804187634.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When registering a new DMA MR after selecting the best aligned page size
for it, we iterate over the given sglist to split each entry to smaller,
aligned to the selected page size, DMA blocks.
In given circumstances where the sg entry and page size fit certain
sizes and the sg entry is not aligned to the selected page size, the
total size of the aligned pages we need to cover the sg entry is >= 4GB.
Under this circumstances, while iterating page aligned blocks, the
counter responsible for counting how much we advanced from the start of
the sg entry is overflowed because its type is u32 and we pass 4GB in
size. This can lead to an infinite loop inside the iterator function
because the overflow prevents the counter to be larger
than the size of the sg entry.
Fix the presented problem by changing the advancement condition to
eliminate overflow.
Backtrace:
[ 192.374329] efa_reg_user_mr_dmabuf
[ 192.376783] efa_register_mr
[ 192.382579] pgsz_bitmap 0xfffff000 rounddown 0x80000000
[ 192.386423] pg_sz [0x80000000] umem_length[0xc0000000]
[ 192.392657] start 0x0 length 0xc0000000 params.page_shift 31 params.page_num 3
[ 192.399559] hp_cnt[3], pages_in_hp[524288]
[ 192.403690] umem->sgt_append.sgt.nents[1]
[ 192.407905] number entries: [1], pg_bit: [31]
[ 192.411397] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.415601] biter->__sg_advance [665837568] sg_dma_len[3221225472]
[ 192.419823] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.423976] biter->__sg_advance [2813321216] sg_dma_len[3221225472]
[ 192.428243] biter->__sg_nents [1] biter->__sg [0000000008b0c5d8]
[ 192.432397] biter->__sg_advance [665837568] sg_dma_len[3221225472]
Fixes: a808273a49 ("RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks")
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Link: https://lore.kernel.org/r/20230109133711.13678-1-ynachum@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When changing the ebpf program put() routines to support being called
from within IRQ context the program ID was reset to zero prior to
calling the perf event and audit UNLOAD record generators, which
resulted in problems as the ebpf program ID was bogus (always zero).
This patch addresses this problem by removing an unnecessary call to
bpf_prog_free_id() in __bpf_prog_offload_destroy() and adjusting
__bpf_prog_put() to only call bpf_prog_free_id() after audit and perf
have finished their bpf program unload tasks in
bpf_prog_put_deferred(). For the record, no one can determine, or
remember, why it was necessary to free the program ID, and remove it
from the IDR, prior to executing bpf_prog_put_deferred();
regardless, both Stanislav and Alexei agree that the approach in this
patch should be safe.
It is worth noting that when moving the bpf_prog_free_id() call, the
do_idr_lock parameter was forced to true as the ebpf devs determined
this was the correct as the do_idr_lock should always be true. The
do_idr_lock parameter will be removed in a follow-up patch, but it
was kept here to keep the patch small in an effort to ease any stable
backports.
I also modified the bpf_audit_prog() logic used to associate the
AUDIT_BPF record with other associated records, e.g. @ctx != NULL.
Instead of keying off the operation, it now keys off the execution
context, e.g. '!in_irg && !irqs_disabled()', which is much more
appropriate and should help better connect the UNLOAD operations with
the associated audit state (other audit records).
Cc: stable@vger.kernel.org
Fixes: d809e134be ("bpf: Prepare bpf_prog_put() to be called from irq context.")
Reported-by: Burn Alting <burn.alting@iinet.net.au>
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Suggested-by: Stanislav Fomichev <sdf@google.com>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230106154400.74211-1-paul@paul-moore.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reset controller fixes for v6.2
Avoid a build error by disabling the invalid combination of building
reset-ti-sci built-in but ti-sci as a module.
Fix a possible NULL pointer dereference in reset-uniphier-glue in case
platform_get_resource() fails.
* tag 'reset-fixes-for-v6.2' of git://git.pengutronix.de/pza/linux:
reset: uniphier-glue: Fix possible null-ptr-deref
reset: ti-sci: honor TI_SCI_PROTOCOL setting when not COMPILE_TEST
Link: https://lore.kernel.org/r/20230104095855.3809733-1-p.zabel@pengutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm SCMI fixes for v6.2
Few fixes addressing:
1. Possible compromise with the shorter message size from a misbheaving
SCMI platform firmware. The shmem accesses are now hardened to handle
the same in fetch_notification and fetch_response.
2. Possible unsafe locking scenario which is solved by calling
virtio_break_device() before getting hold of vioch->lock.
3. Possible stale error status reported from a previous message being
used again as it is not cleared.
* tag 'scmi-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Fix virtio channels cleanup on shutdown
firmware: arm_scmi: Harden shared memory access in fetch_notification
firmware: arm_scmi: Harden shared memory access in fetch_response
firmware: arm_scmi: Clear stale xfer->hdr.status
Link: https://lore.kernel.org/r/20230106093909.652657-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Memory controller drivers - fixes for v6.2
Broken in v6.2:
1. OMAP GPMC: do not fail if "gpmc,wait-pin" optional property
(introduced for v6.2) is missing.
Broken earlier:
1. Tegra MC: Drop SID override programming as it is handled by
bootloader and doing it in the kernel can cause unexpected results.
2. Atmel SDRAMC, MVEBU devbus: Add missing clock unprepare/disable in
exit and error paths.
* tag 'memory-controller-drv-fixes-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: mvebu-devbus: Fix missing clk_disable_unprepare in mvebu_devbus_probe()
memory: atmel-sdramc: Fix missing clk_disable_unprepare in atmel_ramc_probe()
memory: tegra: Remove clients SID override programming
memory: omap-gpmc: fix wait pin validation
Link: https://lore.kernel.org/r/20230109150322.329614-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The current implementation of rtc-efi is expecting all the 4
time services GET{SET}_TIME{WAKEUP} must be supported by UEFI
firmware. As per the EFI_RT_PROPERTIES_TABLE, the platform
specific implementations can choose to enable selective time
services based on the RTC device capabilities.
This patch does the following changes to provide GET/SET RTC
services on platforms that do not support the WAKEUP feature.
1) Relax time services cap check when creating a platform device.
2) Clear RTC_FEATURE_ALARM bit in the absence of WAKEUP services.
3) Conditional alarm entries in '/proc/driver/rtc'.
Cc: <stable@vger.kernel.org> # v6.0+
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230102230630.192911-1-sdonthineni@nvidia.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Instead of checking the state of various backlight_properties fields
against the memorised state in atmel_lcdfb_info.bl_power,
atmel_bl_update_status() should retrieve the desired state using
backlight_get_brightness (which takes into account the power state,
blanking etc.). This means the explicit checks using props.fb_blank
and props.power can be dropped.
The backlight framework ensures that backlight is never negative, so
the test before reading the brightness from the hardware always ends
up false and the whole block can be removed. The framework retrieves
the brightness from the hardware through atmel_bl_get_brightness()
when necessary.
As a result, bl_power in struct atmel_lcdfb_info is no longer
necessary, so remove that while we're at it. Since we only ever care
about reading the current state in backlight_properties, drop the
updates at the end of the function.
Signed-off-by: Stephen Kitt <steve@sk2.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Helge Deller <deller@gmx.de>
This reverts commit d10886a4e6.
If compatible = "marvell,armadaxp-gpio", the reg property requires a
second (address, size) pair, which points to the per-CPU interrupt
registers <0x18800 0x30> / <0x18840 0x30>.
Furthermore:
Commit 5f79c651e8 explains very well, why the gpio-mvebu driver does not
work reliably with per-CPU interrupts.
Commit 988c8c0cd0 deprecates compatible = marvell,armadaxp-gpio for this
reason.
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
This reverts commit c4de4667f1, which causes
a regression on Turris Omnia (Armada 385): GPIO interrupts cease to work,
ending up in the DSA switch being non-functional.
The blamed commit is incorrect in the first place:
If compatible = "marvell,armadaxp-gpio", the second (address, size) pair
of the reg property must to point to the per-CPU interrupt registers
<0x18800 0x30> / <0x18840 0x30>, and not to the blink enable registers
<0x181c0 0x08> / <0x181c8 0x08>.
But even fixing that leaves the GPIO interrupts broken on the Omnia.
Furthermore:
Commit 5f79c651e8 explains very well, why the gpio-mvebu driver does not
work reliably with per-CPU interrupts.
Commit 988c8c0cd0 deprecates compatible = marvell,armadaxp-gpio for this
reason.
Fixes: c4de4667f1 ("ARM: dts: armada-38x: Fix compatible string for gpios")
Reported-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Link: https://lore.kernel.org/r/f24474e70c1a4e9692bd596ef6d97ceda9511245.camel@gmail.com/
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
After commit b5aaaa666a ("ARM: pxa: add Kconfig dependencies for
ATAGS based boards"), the default PXA build no longer includes support
for the board files that are considered unused.
As a side-effect of this, the PXA310 and PXA320 support is not built
into the kernel any more, even though it should work in principle as
long as the symbols are enabled. As Robert points out, there are dts
files for zylonite and cm-x300, though those have not made it into the
mainline kernel.
Link: https://lore.kernel.org/linux-arm-kernel/m2sfglh02h.fsf@free.fr/
Reported-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
During boot of I2SE Duckbill the kernel log contains a
confusing error:
Failed to request dma
This is caused by i2c-mxs tries to request a not yet available DMA
channel (-EPROBE_DEFER). So suppress this message by using
dev_err_probe().
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If you create MRs more than 0x10000 times after loading the module,
responder starts to reply NAKs for RDMA/Atomic operations because of rkey
violation detected in check_rkey(). The root cause is that rkeys are
incremented each time a new MR is created and the value overflows into the
range reserved for MWs.
This commit also increases the value of RXE_MAX_MW that has been limited
unlike other parameters.
Fixes: 0994a1bcd5 ("RDMA/rxe: Bump up default maximum values used via uverbs")
Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
bin2c was, as its name implies, introduced to convert a binary file to
C code.
However, I did not see any good reason ever for using this tool because
using the .incbin directive is much faster, and often results in simpler
code.
Most of the uses of bin2c have been killed, for example:
- 13610aa908 ("kernel/configs: use .incbin directive to embed config_data.gz")
- 4c0f032d49 ("s390/purgatory: Omit use of bin2c")
security/tomoyo/Makefile has even less reason for using bin2c because
the policy files are text data. So, sed is enough for converting them
to C string literals, and what is nicer, generates human-readable
builtin-policy.h.
This is the last user of bin2c. After this commit lands, bin2c will be
removed.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
[penguin-kernel: Update sed script to also escape backslash and quote ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Ensure that i2c_mark_adapter_suspended() is always balanced by a call to
i2c_mark_adapter_resumed().
dw_i2c_plat_resume() must always be called, so that
i2c_mark_adapter_resumed() is called. This is not compatible with
DPM_FLAG_MAY_SKIP_RESUME, so remove the flag.
Since the controller is always resumed on system resume the
dw_i2c_plat_complete() callback is redundant and has been removed.
The unbalanced suspended flag was introduced by commit c57813b8b2
("i2c: designware: Lock the adapter while setting the suspended flag")
Before that commit, the system and runtime PM used the same functions. The
DPM_FLAG_MAY_SKIP_RESUME was used to skip the system resume if the driver
had been in runtime-suspend. If system resume was skipped, the suspended
flag would be cleared by the next runtime resume. The check of the
suspended flag was _after_ the call to pm_runtime_get_sync() in
i2c_dw_xfer(). So either a system resume or a runtime resume would clear
the flag before it was checked.
Having introduced the unbalanced suspended flag with that commit, a further
commit 80704a84a9
("i2c: designware: Use the i2c_mark_adapter_suspended/resumed() helpers")
changed from using a local suspended flag to using the
i2c_mark_adapter_suspended/resumed() functions. These use a flag that is
checked by I2C core code before issuing the transfer to the bus driver, so
there was no opportunity for the bus driver to runtime resume itself before
the flag check.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: c57813b8b2 ("i2c: designware: Lock the adapter while setting the suspended flag")
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
In functions i2c_dw_scl_lcnt() and i2c_dw_scl_hcnt() may have overflow
by depending on the values of the given parameters including the ic_clk.
For example in our use case where ic_clk is larger than one million,
multiplication of ic_clk * 4700 will result in 32 bit overflow.
Add cast of u64 to the calculation to avoid multiplication overflow, and
use the corresponding define for divide.
Fixes: 2373f6b974 ("i2c-designware: split of i2c-designware.c into core and bus specific parts")
Signed-off-by: Lareine Khawaly <lareine@amazon.com>
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
linux,keycode should be "linux,keycodes" according binding-doc
Documentation/devicetree/bindings/input/fsl,scu-key.yaml
Fixes: f537ee7f1e ("arm64: dts: freescale: add i.MX8DXL SoC support")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL-terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Signed-off-by: Helge Deller <deller@gmx.de>
When firmware connection manager is in use we should not touch the lane
adapter (well or any) configuration space so do this only when we know
that the software connection manager is active.
Fixes: 8e1de70425 ("thunderbolt: Add support for XDomain lane bonding")
Cc: stable@vger.kernel.org
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
We need to take minimum of both sides of the USB3 link into consideration,
not just the downstream port. Fix this by calling tb_usb3_max_link_rate()
instead.
Fixes: 0bd680cd90 ("thunderbolt: Add USB3 bandwidth management")
Cc: stable@vger.kernel.org
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
We cannot call PM runtime functions in tb_retimer_scan() because it will
also be called when retimers are scanned from userspace (happens when
there is no device connected on ChromeOS for instance) and at the same
USB4 port runtime resume hook. This leads to hang because neither can
proceed.
Fix this by runtime resuming USB4 ports in tb_scan_port() instead. This
makes sure the ports are runtime PM active when retimers are added under
it while avoiding the reported hang as well.
Reported-by: Utkarsh Patel <utkarsh.h.patel@intel.com>
Fixes: 1e56c88ade ("thunderbolt: Runtime resume USB4 port when retimers are scanned")
Cc: stable@vger.kernel.org
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The following build warnings are seen when running:
make dtbs_check DT_SCHEMA_FILES=fsl-imx-uart.yaml
arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dtb: serial@30860000: cts-gpios: False schema does not allow [[33, 3, 1]]
From schema: Documentation/devicetree/bindings/serial/fsl-imx-uart.yaml
arch/arm64/boot/dts/freescale/imx8mm-venice-gw7903.dtb: serial@30860000: rts-gpios: False schema does not allow [[33, 5, 1]]
From schema: Documentation/devicetree/bindings/serial/fsl-imx-uart.yaml
...
The imx8m Venice Gateworks boards do not expose the UART RTS and CTS
as native UART pins, so 'uart-has-rtscts' should not be used.
Using 'uart-has-rtscts' with 'rts-gpios' is an invalid combination
detected by serial.yaml.
Fix the problem by removing the incorrect 'uart-has-rtscts' property.
Fixes: 27c8f4ccc1 ("arm64: dts: imx8mm-venice-gw72xx-0x: add dt overlays for serial modes")
Fixes: d9a9a7cf32 ("arm64: dts: imx8m{m,n}-venice-*: add missing uart-has-rtscts property to UARTs")
Fixes: 870f645b39 ("arm64: dts: imx8mp-venice-gw74xx: add WiFi/BT module support")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The GPIO watchdog property name is 'always-running', not 'always-enabled'.
Use the correct property name and reinstate it into the DT.
Fixes: eff6b33c9c ("arm64: dts: imx8mm: Remove watchdog always-enabled property from eDM SBC")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
This change adds support for nested IPsec tunnels by ensuring that
XFRM-I verifies existing policies before decapsulating a subsequent
policies. Addtionally, this clears the secpath entries after policies
are verified, ensuring that previous tunnels with no-longer-valid
do not pollute subsequent policy checks.
This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.
Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
Test: Verified with additional Android Kernel Unit tests
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The kbuild test robot detected this by 'make versioncheck'.
Fixes: 2df8220cc5 ("kbuild: build init/built-in.a just once")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit 8d613a1d04 ("kbuild: drop $(objtree)/ prefix support
for clean-files") dropped support for prefixing with $(objtree).
Thus update the documentation to match that change.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The following kernel linkage error:
st_lsm6dsx_core.o: in function `st_lsm6dsx_sw_buffers_setup':
st_lsm6dsx_core.c:2578: undefined reference to `devm_iio_triggered_buffer_setup_ext'
is caused by the fact that the object owning devm_iio_triggered_buffer_setup_ext()
(drivers/iio/buffer/industrialio-triggered-buffer.o) is allowed to be
built as module when its user (drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c)
is built-in.
The st_lsm6dsx driver already has a "select IIO_BUFFER", so add another
select for IIO_TRIGGERED_BUFFER, to make that option follow what is set
for the st_lsm6dsx driver. This is similar to what other iio drivers do.
Fixes: 2cfb2180c3 ("iio: imu: st_lsm6dsx: introduce sw trigger support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230103130348.1733467-1-vladimir.oltean@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The 32-bit memory resource is needed for non-prefetchable memory
allocations on the PCIe bus, however with some cards (such as the
SM768) the system fails to allocate memory from this.
Checking the allocation against the datasheet, it looks like there
has been a mis-calcualation of the resource for the first memory
region (0x0060090000..0x0070ffffff) which in the data-sheet for
the fu740 (v1p2) is from 0x0060000000..0x007fffffff. Changing
this to allocate from 0x0060090000..0x007fffffff fixes the probing
issues.
Fixes: ae80d51480 ("riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC")
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: stable@vger.kernel.org
Tested-by: Ron Economos <re@w6rz.net> # from IRC
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
If *.conf.default is updated, builtin-policy.h should be rebuilt,
but this does not work when compiled with O= option.
[Without this commit]
$ touch security/tomoyo/policy/exception_policy.conf.default
$ make O=/tmp security/tomoyo/
make[1]: Entering directory '/tmp'
GEN Makefile
CALL /home/masahiro/ref/linux/scripts/checksyscalls.sh
DESCEND objtool
make[1]: Leaving directory '/tmp'
[With this commit]
$ touch security/tomoyo/policy/exception_policy.conf.default
$ make O=/tmp security/tomoyo/
make[1]: Entering directory '/tmp'
GEN Makefile
CALL /home/masahiro/ref/linux/scripts/checksyscalls.sh
DESCEND objtool
POLICY security/tomoyo/builtin-policy.h
CC security/tomoyo/common.o
AR security/tomoyo/built-in.a
make[1]: Leaving directory '/tmp'
$(srctree)/ is essential because $(wildcard ) does not follow VPATH.
Fixes: f02dee2d14 ("tomoyo: Do not generate empty policy files")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
The following kernel panic can be triggered when a task with pid=1 attaches
a prog that attempts to send killing signal to itself, also see [1] for more
details:
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
panic+0x2c4/0x60f kernel/panic.c:275
do_exit.cold+0x63/0xe4 kernel/exit.c:789
do_group_exit+0xd4/0x2a0 kernel/exit.c:950
get_signal+0x2460/0x2600 kernel/signal.c:2858
arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.
[1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
I noticed ~today~ while looking at the isa manual that I had not
accounted for another couple of edge cases with my regex. As before, I
think attempting to validate the canonical order for multiletter stuff
makes no sense - but we should totally try to avoid false-positives for
combinations that are known to be valid.
* b4-shazam-merge:
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
Link: https://lore.kernel.org/r/20221205174459.60195-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
APR should not fail if the service device tree node does not have
the qcom,protection-domain property, since this functionality does
not exist on older platforms such as MSM8916 and MSM8996.
Ignore -EINVAL (returned when the property does not exist) to fix
a regression on 6.2-rc1 that prevents audio from working:
qcom,apr remoteproc0:smd-edge.apr_audio_svc.-1.-1:
Failed to read second value of qcom,protection-domain
qcom,apr remoteproc0:smd-edge.apr_audio_svc.-1.-1:
Failed to add apr 3 svc
Fixes: 6d7860f575 ("soc: qcom: apr: Add check for idr_alloc and of_property_read_string_index")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221229151648.19839-3-stephan@gerhold.net
The verifier skips invalid kfunc call in check_kfunc_call(), which
would be captured in fixup_kfunc_call() if such insn is not eliminated
by dead code elimination. However, this can lead to the following
warning in backtrack_insn(), also see [1]:
------------[ cut here ]------------
verifier backtracking bug
WARNING: CPU: 6 PID: 8646 at kernel/bpf/verifier.c:2756 backtrack_insn
kernel/bpf/verifier.c:2756
__mark_chain_precision kernel/bpf/verifier.c:3065
mark_chain_precision kernel/bpf/verifier.c:3165
adjust_reg_min_max_vals kernel/bpf/verifier.c:10715
check_alu_op kernel/bpf/verifier.c:10928
do_check kernel/bpf/verifier.c:13821 [inline]
do_check_common kernel/bpf/verifier.c:16289
[...]
So make backtracking conservative with this by returning ENOTSUPP.
[1] https://lore.kernel.org/bpf/CACkBjsaXNceR8ZjkLG=dT3P=4A8SBsg0Z5h5PWLryF5=ghKq=g@mail.gmail.com/
Reported-by: syzbot+4da3ff23081bafe74fc2@syzkaller.appspotmail.com
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com
When unloading the SCMI core stack module, configured to use the virtio
SCMI transport, LOCKDEP reports the splat down below about unsafe locks
dependencies.
In order to avoid this possible unsafe locking scenario call upfront
virtio_break_device() before getting hold of vioch->lock.
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.1.0-00067-g6b934395ba07-dirty #4 Not tainted
-----------------------------------------------------
rmmod/307 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff000080c510e0 (&dev->vqs_list_lock){+.+.}-{3:3}, at: virtio_break_device+0x28/0x68
and this task is already holding:
ffff00008288ada0 (&channels[i].lock){-.-.}-{3:3}, at: virtio_chan_free+0x60/0x168 [scmi_module]
which would create a new lock dependency:
(&channels[i].lock){-.-.}-{3:3} -> (&dev->vqs_list_lock){+.+.}-{3:3}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&channels[i].lock){-.-.}-{3:3}
... which became HARDIRQ-irq-safe at:
lock_acquire+0x128/0x398
_raw_spin_lock_irqsave+0x78/0x140
scmi_vio_complete_cb+0xb4/0x3b8 [scmi_module]
vring_interrupt+0x84/0x120
vm_interrupt+0x94/0xe8
__handle_irq_event_percpu+0xb4/0x3d8
handle_irq_event_percpu+0x20/0x68
handle_irq_event+0x50/0xb0
handle_fasteoi_irq+0xac/0x138
generic_handle_domain_irq+0x34/0x50
gic_handle_irq+0xa0/0xd8
call_on_irq_stack+0x2c/0x54
do_interrupt_handler+0x8c/0x90
el1_interrupt+0x40/0x78
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x64/0x68
_raw_write_unlock_irq+0x48/0x80
ep_start_scan+0xf0/0x128
do_epoll_wait+0x390/0x858
do_compat_epoll_pwait.part.34+0x1c/0xb8
__arm64_sys_epoll_pwait+0x80/0xd0
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.3+0x98/0x120
do_el0_svc+0x34/0xd0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
to a HARDIRQ-irq-unsafe lock:
(&dev->vqs_list_lock){+.+.}-{3:3}
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0x128/0x398
_raw_spin_lock+0x58/0x70
__vring_new_virtqueue+0x130/0x1c0
vring_create_virtqueue+0xc4/0x2b8
vm_find_vqs+0x20c/0x430
init_vq+0x308/0x390
virtblk_probe+0x114/0x9b0
virtio_dev_probe+0x1a4/0x248
really_probe+0xc8/0x3a8
__driver_probe_device+0x84/0x190
driver_probe_device+0x44/0x110
__driver_attach+0x104/0x1e8
bus_for_each_dev+0x7c/0xd0
driver_attach+0x2c/0x38
bus_add_driver+0x1e4/0x258
driver_register+0x6c/0x128
register_virtio_driver+0x2c/0x48
virtio_blk_init+0x70/0xac
do_one_initcall+0x84/0x420
kernel_init_freeable+0x2d0/0x340
kernel_init+0x2c/0x138
ret_from_fork+0x10/0x20
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->vqs_list_lock);
local_irq_disable();
lock(&channels[i].lock);
lock(&dev->vqs_list_lock);
<Interrupt>
lock(&channels[i].lock);
*** DEADLOCK ***
================
Fixes: 42e90eb53b ("firmware: arm_scmi: Add a virtio channel refcount")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221222183823.518856-6-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
of_clk_get_by_name() returns error pointers instead of NULL.
Use IS_ERR() checks the return value to catch errors.
Fixes: 836fb30949 ("soc: imx8m: Enable OCOTP clock before reading the register")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The GW7901 has USB2 routed to a USB VBUS supply with over-current
protection via an active-low pin. Define the OC pin polarity properly.
Fixes: 2b1649a83a ("arm64: dts: imx: Add i.mx8mm Gateworks gw7901 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
As the kcalloc may return NULL pointer,
it should be better to check the ishtp_dma_tx_map
before use in order to avoid NULL pointer dereference.
Fixes: 3703f53b99 ("HID: intel_ish-hid: ISH Transport layer")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
drain_freelist() can be called with a very large number of slabs to free,
such as for kmem_cache_shrink(), or depending on various settings of the
slab cache when doing periodic reaping.
If there is a potentially long list of slabs to drain, periodically
schedule to ensure we aren't saturating the cpu for too long.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
dt_binding_check detects an issue with the pgc_hsiomix power
domain:
pgc: 'power-domains@17' does not match any of the regexes
This is because 'power-domains' should be 'power-domain'
Fixes: 2ae42e0c0b ("arm64: dts: imx8mp: add HSIO power-domains")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The GPC node references an interrupt parent, but it doesn't
state the interrupt itself. According to the TRM, this IRQ
is 87. This also eliminate an error detected from dt_binding_check
Fixes: fc0f051246 ("arm64: dts: imx8mp: add GPC node with GPU power domains")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Setting the device name after it has been registered confuses the sysfs
cleanup paths. This has already been fixed for the imx8m-blk-ctrl driver in
b64b46fbaa ("Revert "soc: imx: imx8m-blk-ctrl: set power device name""),
but the same problem exists in imx8mp-blk-ctrl.
Fixes: 556f5cf956 ("soc: imx: add i.MX8MP HSIO blk-ctrl")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The clk_xtal32k have clock-cells = <0>, drop the bogus specifier.
Fixes: 9509593f32 ("arm64: dts: imx8mm: Model PMIC to SNVS RTC clock path on Data Modul i.MX8M Mini eDM SBC")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Calling of_find_compatible_node() returns a node pointer with refcount
incremented. Use of_node_put() on it when done.
The patch fixes the same problem on different i.MX platforms.
Fixes: 8b88f7ef31 ("ARM: mx25: Retrieve IIM base from dt")
Fixes: 94b2bec1b0 ("ARM: imx27: Retrieve the SYSCTRL base address from devicetree")
Fixes: 3172225d45 ("ARM: imx31: Retrieve the IIM base address from devicetree")
Fixes: f68ea682d1 ("ARM: imx35: Retrieve the IIM base address from devicetree")
Fixes: ee18a7154e ("ARM: imx5: retrieve iim base from device tree")
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
For clock and strobe pad of usdhc, need to config as pull down.
Current pad config set these pad as both pull up and pull down,
this is wrong, so fix it here.
Find this issue when enable HS400ES mode on one Micron eMMC chip,
CMD8 always meet CRC error in HS400ES/HS400 mode.
Fixes: e37907bd82 ("arm64: dts: freescale: add i.MX93 11x11 EVK basic support")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Set optional `simple-audio-card,mclk-fs` parameter to ensure a proper
clock to the nau8822 audio codec. Without this change with an audio
stream rate of 44.1 kHz the playback is faster.
Set the MCLK at the right frequency, codec can properly use it to
generate 44.1 kHz I2S-FS.
Fixes: 6a57f224f7 ("arm64: dts: freescale: add initial support for verdin imx8m mini")
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
There is no "no-emmc" property, so intention for SD/SDIO only nodes was
to use "no-mmc".
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Early hardware did not support hardware handshaking on the UART, but
final production hardware did. When the hardware was updated the chip
select was changed to facilitate hardware handshaking on UART3. Fix the
ecspi2 pin mux to eliminate a pin conflict with UART3 and allow the
EEPROM to operate again.
Fixes: 4ce01ce36d ("arm64: dts: imx8mm-beacon: Enable RTS-CTS on UART3")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
"make dtbs_check":
arch/arm64/boot/dts/freescale/fsl-ls1012a-qds.dtb: pca9547@77: $nodename:0: 'pca9547@77' does not match '^(i2c-?)?mux'
From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
arch/arm64/boot/dts/freescale/fsl-ls1012a-qds.dtb: pca9547@77: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@4' were unexpected)
From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
...
Fix this by renaming PCA954x nodes to "i2c-mux", to match the I2C bus
multiplexer/switch DT bindings and the Generic Names Recommendation in
the Devicetree Specification.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
"make dtbs_check":
arch/arm/boot/dts/vf610-zii-dev-rev-b.dtb: tca9548@70: $nodename:0: 'tca9548@70' does not match '^(i2c-?)?mux'
From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
arch/arm/boot/dts/vf610-zii-dev-rev-b.dtb: tca9548@70: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@1', 'i2c@2', 'i2c@3', 'i2c@4' were unexpected)
From schema: /scratch/geert/linux/linux-renesas/Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
...
Fix this by renaming PCA9548 nodes to "i2c-mux", to match the I2C bus
multiplexer/switch DT bindings and the Generic Names Recommendation in
the Devicetree Specification.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
"make dtbs_check":
arch/arm/boot/dts/imx53-ppd.dtb: i2c-switch@70: $nodename:0: 'i2c-switch@70' does not match '^(i2c-?)?mux'
From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
arch/arm/boot/dts/imx53-ppd.dtb: i2c-switch@70: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@1', 'i2c@2', 'i2c@3', 'i2c@4', 'i2c@5', 'i2c@6', 'i2c@7' were unexpected)
From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml
Fix this by renaming the PCA9547 node to "i2c-mux", to match the I2C bus
multiplexer/switch DT bindings and the Generic Names Recommendation in
the Devicetree Specification.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Set optional `simple-audio-card,mclk-fs` parameter to ensure a proper
clock to the wm8904 audio codec. Without this change with an audio
stream rate of 44.1 kHz the playback is completely distorted.
Fixes: 6a57f224f7 ("arm64: dts: freescale: add initial support for verdin imx8m mini")
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The following build warning is seen when running:
make dtbs_check DT_SCHEMA_FILES=fsl-imx-uart.yaml
arch/arm/boot/dts/imx6dl-gw560x.dtb: serial@2020000: rts-gpios: False schema does not allow [[20, 1, 0]]
From schema: Documentation/devicetree/bindings/serial/fsl-imx-uart.yaml
The imx6qdl-gw560x board does not expose the UART RTS and CTS
as native UART pins, so 'uart-has-rtscts' should not be used.
Using 'uart-has-rtscts' with 'rts-gpios' is an invalid combination
detected by serial.yaml.
Fix the problem by removing the incorrect 'uart-has-rtscts' property.
Fixes: b8a559feff ("ARM: dts: imx: add Gateworks Ventana GW5600 support")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
'clock_frequency' is not a valid property.
Use the correct 'clock-frequency' instead.
Fixes: 8b646cfb84 ("ARM: dts: imx7d-pico: Add support for the dwarf baseboard")
Fixes: 6418fd9241 ("ARM: dts: imx7d-pico: Add support for the nymph baseboard")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
'clock_frequency' is not a valid property.
Use the correct 'clock-frequency' instead.
Fixes: 47246fafef ("ARM: dts: imx6ul-pico: Add support for the dwarf baseboard")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
'regulator-compatible' is not a valid property according to
nxp,pca9450-regulator.yaml and causes the following warning:
DTC_CHK arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk2.dtb
...
pmic@25: regulators:LDO1: Unevaluated properties are not allowed ('regulator-compatible' was unexpected)
Remove the invalid 'regulator-compatible' property.
Cc: Teresa Remmet <t.remmet@phytec.de>
Fixes: 88f7f6bcca ("arm64: dts: freescale: Add support for phyBOARD-Pollux-i.MX8MP")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
NXP internal information shows that the PHY refclk is gated by the
GLOBAL_TX_PIX_CLK_EN bit, so to allow the PHY PLL to lock without the
LCDIF being already active, tie this bit to the HDMI_TX_PHY power
domain.
Fixes: e3442022f5 ("soc: imx: add i.MX8MP HDMI blk-ctrl")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
sppctl_gpio_inv_get is only used from the debugfs code inside
of an #ifdef, so we get a warning without that:
drivers/pinctrl/sunplus/sppctl.c:393:12: error: 'sppctl_gpio_inv_get' defined but not used [-Werror=unused-function]
393 | static int sppctl_gpio_inv_get(struct gpio_chip *chip, unsigned int offset)
| ^~~~~~~~~~~~~~~~~~~
Replace the #ifdef with an IS_ENABLED() check that avoids the warning.
Fixes: aa74c44be1 ("pinctrl: Add driver for Sunplus SP7021")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20221215163822.542622-1-arnd@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Due to recent improvements of the cluster idle state support for Qcom based
platforms, we are now able to support the deepest cluster idle state. Let's
therefore revert the earlier workaround.
This reverts commit cadaa773bc ("arm64: dts: qcom: sm8250: Disable the
not yet supported cluster idle state"), which is available from v6.1-rc6.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221122123713.65631-1-ulf.hansson@linaro.org
When device is in active mode, it fails to set an ACCEL full-scale
range(2g/4g/8g) in FXOS8700_XYZ_DATA_CFG. This is not align with the
datasheet, but it is a fxos8700 chip behavior.
Keep the device in standby mode before setting ACCEL full-scale range
into FXOS8700_XYZ_DATA_CFG in chip initialization phase and setting
scale phase.
Fixes: 84e5ddd5c4 ("iio: imu: Add support for the FXOS8700 IMU")
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Link: https://lore.kernel.org/r/20221208071911.2405922-6-carlos.song@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
ACCEL output data registers contain the X-axis, Y-axis, and Z-axis
14-bit left-justified sample data and MAGN output data registers
contain the X-axis, Y-axis, and Z-axis 16-bit sample data. The ACCEL
raw register output data should be divided by 4 before sent to
userspace.
Apply a 2 bits signed right shift to the raw data from ACCEL output
data register but keep that from MAGN sensor as the origin.
Fixes: 84e5ddd5c4 ("iio: imu: Add support for the FXOS8700 IMU")
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Link: https://lore.kernel.org/r/20221208071911.2405922-5-carlos.song@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The length of ACCEL and MAGN 3-axis channels output data is 6 byte
individually. However block only read 3 bytes data into buffer from
ACCEL or MAGN output data registers every time. It causes an incomplete
ACCEL and MAGN channels readback.
Set correct value count for regmap_bulk_read to get 6 bytes ACCEL and
MAGN channels readback.
Fixes: 84e5ddd5c4 ("iio: imu: Add support for the FXOS8700 IMU")
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Link: https://lore.kernel.org/r/20221208071911.2405922-4-carlos.song@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
irq flood happen when run
cat /sys/bus/iio/devices/iio:device0/in_voltage1_raw
imx8qxp_adc_read_raw()
{
...
enable irq
/* adc start */
writel(1, adc->regs + IMX8QXP_ADR_ADC_SWTRIG);
^^^^ trigger irq flood.
wait_for_completion_interruptible_timeout();
readl(adc->regs + IMX8QXP_ADR_ADC_RESFIFO);
^^^^ clear irq here.
...
}
There is only FIFO watermark interrupt at this ADC controller.
IRQ line will be assert until software read data from FIFO.
So IRQ flood happen during wait_for_completion_interruptible_timeout().
Move FIFO read into irq handle to avoid irq flood.
Fixes: 1e23dcaa1a ("iio: imx8qxp-adc: Add driver support for NXP IMX8QXP ADC")
Cc: stable@vger.kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Cai Huoqing <cai.huoqing@linux.dev>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/r/20221201140110.2653501-1-Frank.Li@nxp.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
PSIL_EP_NATIVE endpoints may not have PEER registers for BCNT and thus
udma_decrement_byte_counters() should not try to decrement these counters.
This fixes the issue of crypto IPERF testing where the client side (EVM)
hangs without transfer of packets to the server side, seen since this
function was added.
Fixes: 7c94dcfa8f ("dmaengine: ti: k3-udma: Reset UDMA_CHAN_RT byte counters to prevent overflow")
Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20221128085005.489964-1-j-choudhary@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
On driver unload any pending descriptors are flushed and pending
DMA descriptors are explicitly completed:
idxd_dmaengine_drv_remove() ->
drv_disable_wq() ->
idxd_wq_free_irq() ->
idxd_flush_pending_descs() ->
idxd_dma_complete_txd()
With this done during driver unload any remaining descriptor is
likely stuck and can be dropped. Even so, the descriptor may still
have a callback set that could no longer be accessible. An
example of such a problem is when the dmatest fails and the dmatest
module is unloaded. The failure of dmatest leaves descriptors with
dma_async_tx_descriptor::callback pointing to code that no longer
exist. This causes a page fault as below at the time the IDXD driver
is unloaded when it attempts to run the callback:
BUG: unable to handle page fault for address: ffffffffc0665190
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Fix this by clearing the callback pointers on the transmit
descriptors only when workqueue is disabled.
Fixes: 403a2e2365 ("dmaengine: idxd: change MSIX allocation based on per wq activation")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/37d06b772aa7f8863ca50f90930ea2fd80b38fc3.1670452419.git.reinette.chatre@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
On driver unload any pending descriptors are flushed at the
time the interrupt is freed:
idxd_dmaengine_drv_remove() ->
drv_disable_wq() ->
idxd_wq_free_irq() ->
idxd_flush_pending_descs().
If there are any descriptors present that need to be flushed this
flow triggers a "not present" page fault as below:
BUG: unable to handle page fault for address: ff391c97c70c9040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
The address that triggers the fault is the address of the
descriptor that was freed moments earlier via:
drv_disable_wq()->idxd_wq_free_resources()
Fix the use after free by freeing the descriptors after any possible
usage. This is done after idxd_wq_reset() to ensure that the memory
remains accessible during possible completion writes by the device.
Fixes: 63c14ae6c1 ("dmaengine: idxd: refactor wq driver enable/disable operations")
Suggested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/6c4657d9cff0a0a00501a7b928297ac966e9ec9d.1670452419.git.reinette.chatre@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The workqueue is enabled when the appropriate driver is loaded and
disabled when the driver is removed. When the driver is removed it
assumes that the workqueue was enabled successfully and proceeds to
free allocations made during workqueue enabling.
Failure during workqueue enabling does not prevent the driver from
being loaded. This is because the error path within drv_enable_wq()
returns success unless a second failure is encountered
during the error path. By returning success it is possible to load
the driver even if the workqueue cannot be enabled and
allocations that do not exist are attempted to be freed during
driver remove.
Some examples of problematic flows:
(a)
idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
In above flow, if idxd_wq_request_irq() fails then
idxd_wq_unmap_portal() is called on error exit path, but
drv_enable_wq() returns 0 because idxd_wq_disable() succeeds. The
driver is thus loaded successfully.
idxd_dmaengine_drv_remove()->drv_disable_wq()->idxd_wq_unmap_portal()
Above flow on driver unload triggers the WARN in devm_iounmap() because
the device resource has already been removed during error path of
drv_enable_wq().
(b)
idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
In above flow, if idxd_wq_request_irq() fails then
idxd_wq_init_percpu_ref() is never called to initialize the percpu
counter, yet the driver loads successfully because drv_enable_wq()
returns 0.
idxd_dmaengine_drv_remove()->__idxd_wq_quiesce()->percpu_ref_kill():
Above flow on driver unload triggers a BUG when attempting to drop the
initial ref of the uninitialized percpu ref:
BUG: kernel NULL pointer dereference, address: 0000000000000010
Fix the drv_enable_wq() error path by returning the original error that
indicates failure of workqueue enabling. This ensures that the probe
fails when an error is encountered and the driver remove paths are only
attempted when the workqueue was enabled successfully.
Fixes: 1f2bb40337 ("dmaengine: idxd: move wq_enable() to device.c")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/e8d8116e5efa0fd14fadc5adae6ffd319f0e5ff1.1670452419.git.reinette.chatre@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Rx operation on SPI GSI DMA is currently not working.
As per GSI spec, link_rx bit is to be set on GO TRE on tx
channel whenever there is going to be a DMA TRE on rx
channel. This is currently set for duplex operation only.
Set the bit for rx operation as well.
This is part of changes required to bring up Rx.
Fixes: 94b8f0e58f ("dmaengine: qcom: gpi: set chain and link flag for duplex")
Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/1671212293-14767-1-git-send-email-quic_vnivarth@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
On newer Tegra releases, early boot SID override programming and SID
override programming during resume is handled by bootloader.
In the function tegra186_mc_program_sid() which is getting removed, SID
override register of all clients is written without checking if secure
firmware has allowed write on it or not. If write is disabled by secure
firmware then it can lead to errors coming from secure firmware and hang
in kernel boot.
Also, SID override is programmed on-demand during probe_finalize() call
of IOMMU which is done in tegra186_mc_client_sid_override() in this same
file. This function does it correctly by checking if write is permitted
on SID override register. It also checks if SID override register is
already written with correct value and skips re-writing it in that case.
Fixes: 393d66fd2c ("memory: tegra: Implement SID override programming")
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20221125040752.12627-1-amhetre@nvidia.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Currently we return an error even if on-board retimers are found and
that's not expected. Fix this to return an error only if there was one
and 0 otherwise.
Fixes: 1e56c88ade ("thunderbolt: Runtime resume USB4 port when retimers are scanned")
Cc: stable@vger.kernel.org
Signed-off-by: Utkarsh Patel <utkarsh.h.patel@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.