Commit Graph

12349 Commits

Author SHA1 Message Date
Mateusz Guzik
c8afaa1b0f locking: remove spin_lock_prefetch
The only remaining consumer is new_inode, where it showed up in 2001 as
commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1]
with a changelog which does not mention it.

Since then the line got only touched up to keep compiling.

While it may have been of benefit back in the day, it is guaranteed to
at best not get in the way in the multicore setting -- as the code
performs *a lot* of work between the prefetch and actual lock acquire,
any contention means the cacheline is already invalid by the time the
routine calls spin_lock().  It adds spurious traffic, for short.

On top of it prefetch is notoriously tricky to use for single-threaded
purposes, making it questionable from the get go.

As such, remove it.

I admit upfront I did not see value in benchmarking this change, but I
can do it if that is deemed appropriate.

Removal from new_inode and of the entire thing are in the same patch as
requested by Linus, so whatever weird looks can be directed at that guy.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-12 09:18:47 -07:00
Linus Torvalds
43972cf2de Merge tag 'x86_urgent_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Do not parse the confidential computing blob on non-AMD hardware as
   it leads to an EFI config table ending up unmapped

 - Use the correct segment selector in the 32-bit version of getcpu() in
   the vDSO

 - Make sure vDSO and VVAR regions are placed in the 47-bit VA range
   even on 5-level paging systems

 - Add models 0x90-0x91 to the range of AMD Zenbleed-affected CPUs

* tag 'x86_urgent_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
  x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
  x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
  x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
  x86/sev: Do not try to parse for the CC blob on non-AMD hardware
2023-08-12 08:47:01 -07:00
Linus Torvalds
272b86ba9d Merge tag 'x86_bugs_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mitigation fixes from Borislav Petkov:
 "The first set of fallout fixes after the embargo madness. There will
  be another set next week too.

   - A first series of cleanups/unifications and documentation
     improvements to the SRSO and GDS mitigations code which got
     postponed to after the embargo date

   - Fix the SRSO aliasing addresses assertion so that the LLVM linker
     can parse it too"

* tag 'x86_bugs_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: cpu: Fix the fallback cpu_show_gds() name
  x86: Move gds_ucode_mitigated() declaration to header
  x86/speculation: Add cpu_show_gds() prototype
  driver core: cpu: Make cpu_show_not_affected() static
  x86/srso: Fix build breakage with the LLVM linker
  Documentation/srso: Document IBPB aspect and fix formatting
  driver core: cpu: Unify redundant silly stubs
  Documentation/hw-vuln: Unify filename specification in index
2023-08-12 08:34:20 -07:00
Linus Torvalds
29d99aae13 Merge tag 'acpi-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "Rework the handling of interrupt overrides on AMD Zen-based machines
  to avoid recently introduced regressions (Hans de Goede).

  Note that this is intended as a short-term mitigation for 6.5 and the
  long-term approach will be to attempt to use the configuration left by
  the BIOS, but it requires more investigation"

* tag 'acpi-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M
  ACPI: resource: Honor MADT INT_SRC_OVR settings for IRQ1 on AMD Zen
  ACPI: resource: Always use MADT override IRQ settings for all legacy non i8042 IRQs
  ACPI: resource: revert "Remove "Zen" specific match and quirks"
2023-08-11 12:30:00 -07:00
Arnd Bergmann
eb3515dc99 x86: Move gds_ucode_mitigated() declaration to header
The declaration got placed in the .c file of the caller, but that
causes a warning for the definition:

arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes]

Move it to a header where both sides can observe it instead.

Fixes: 81ac7e5d74 ("KVM: Add GDS_NO support to KVM")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org
2023-08-10 09:13:21 -07:00
Hans de Goede
c6a1fd910d ACPI: resource: Honor MADT INT_SRC_OVR settings for IRQ1 on AMD Zen
On AMD Zen acpi_dev_irq_override() by default prefers the DSDT IRQ 1
settings over the MADT settings.

This causes the keyboard to malfunction on some laptop models
(see Links), all models from the Links have an INT_SRC_OVR MADT entry
for IRQ 1.

Fixes: a9c4a912b7 ("ACPI: resource: Remove "Zen" specific match and quirks")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217336
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217406
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-09 21:18:46 +02:00
Borislav Petkov (AMD)
77245f1c3c x86/CPU/AMD: Do not leak quotient data after a division by 0
Under certain circumstances, an integer division by 0 which faults, can
leave stale quotient data from a previous division operation on Zen1
microarchitectures.

Do a dummy division 0/1 before returning from the #DE exception handler
in order to avoid any leaks of potentially sensitive data.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-09 07:55:00 -07:00
Jinghao Jia
7324f74d39 x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
The BUILD_VDSO macro was incorrectly spelled as BULID_VDSO in
asm/linkage.h. This causes the !defined(BULID_VDSO) directive to always
evaluate to true.

Correct the spelling to BUILD_VDSO.

Fixes: bea75b3389 ("x86/Kconfig: Introduce function padding")
Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230808182353.76218-1-jinghao@linux.ibm.com
2023-08-08 23:13:17 +02:00
Xin Li
39163d5479 x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
The vDSO getcpu() reads CPU ID from the GDT_ENTRY_CPUNODE entry when the RDPID
instruction is not available. And GDT_ENTRY_CPUNODE is defined as 28 on 32-bit
Linux kernel and 15 on 64-bit. But the 32-bit getcpu() on 64-bit Linux kernel
is compiled with 32-bit Linux kernel GDT_ENTRY_CPUNODE, i.e., 28, beyond the
64-bit Linux kernel GDT limit. Thus, it just fails _silently_.

When BUILD_VDSO32_64 is defined, choose the 64-bit Linux kernel GDT definitions
to compile the 32-bit getcpu().

Fixes: 877cff5296 ("x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230322061758.10639-1-xin3.li@intel.com
Link: https://lore.kernel.org/oe-lkp/202303020903.b01fd1de-yujie.liu@intel.com
2023-08-08 09:31:43 +02:00
Linus Torvalds
64094e7e31 Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/gds fixes from Dave Hansen:
 "Mitigate Gather Data Sampling issue:

   - Add Base GDS mitigation

   - Support GDS_NO under KVM

   - Fix a documentation typo"

* tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix backwards on/off logic about YMM support
  KVM: Add GDS_NO support to KVM
  x86/speculation: Add Kconfig option for GDS
  x86/speculation: Add force option to GDS mitigation
  x86/speculation: Add Gather Data Sampling mitigation
2023-08-07 17:03:54 -07:00
Linus Torvalds
138bcddb86 Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/srso fixes from Borislav Petkov:
 "Add a mitigation for the speculative RAS (Return Address Stack)
  overflow vulnerability on AMD processors.

  In short, this is yet another issue where userspace poisons a
  microarchitectural structure which can then be used to leak privileged
  information through a side channel"

* tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/srso: Tie SBPB bit setting to microcode patch detection
  x86/srso: Add a forgotten NOENDBR annotation
  x86/srso: Fix return thunks in generated code
  x86/srso: Add IBPB on VMEXIT
  x86/srso: Add IBPB
  x86/srso: Add SRSO_NO support
  x86/srso: Add IBPB_BRTYPE support
  x86/srso: Add a Speculative RAS Overflow mitigation
  x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07 16:35:44 -07:00
Linus Torvalds
024ff300db Merge tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:

 - Fix a bug in a python script for Hyper-V (Ani Sinha)

 - Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley)

 - Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh
   Sengar)

 - Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing,
   ZhiHu)

* tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()
  x86/hyperv: add noop functions to x86_init mpparse functions
  vmbus_testing: fix wrong python syntax for integer value comparison
  x86/hyperv: fix a warning in mshyperv.h
  x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
  x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg
  Drivers: hv: Change hv_free_hyperv_page() to take void * argument
2023-08-04 17:16:14 -07:00
Linus Torvalds
98a05fe8cd Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "x86:

   - Do not register IRQ bypass consumer if posted interrupts not
     supported

   - Fix missed device interrupt due to non-atomic update of IRR

   - Use GFP_KERNEL_ACCOUNT for pid_table in ipiv

   - Make VMREAD error path play nice with noinstr

   - x86: Acquire SRCU read lock when handling fastpath MSR writes

   - Support linking rseq tests statically against glibc 2.35+

   - Fix reference count for stats file descriptors

   - Detect userspace setting invalid CR0

  Non-KVM:

   - Remove coccinelle script that has caused multiple confusion
     ("debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE()
     usage", acked by Greg)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: selftests: Expand x86's sregs test to cover illegal CR0 values
  KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
  KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
  Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"
  KVM: selftests: Verify stats fd is usable after VM fd has been closed
  KVM: selftests: Verify stats fd can be dup()'d and read
  KVM: selftests: Verify userspace can create "redundant" binary stats files
  KVM: selftests: Explicitly free vcpus array in binary stats test
  KVM: selftests: Clean up stats fd in common stats_test() helper
  KVM: selftests: Use pread() to read binary stats header
  KVM: Grab a reference to KVM for VM and vCPU stats file descriptors
  selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
  Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
  KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes
  KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path
  KVM: VMX: Make VMREAD error path play nice with noinstr
  KVM: x86/irq: Conditionally register IRQ bypass consumer again
  KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv
  KVM: x86: check the kvm_cpu_get_interrupt result before using it
  KVM: x86: VMX: set irr_pending in kvm_apic_update_irr
  ...
2023-07-30 11:19:08 -07:00
Sean Christopherson
26a0652cb4 KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode.  The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29 11:05:31 -04:00
Borislav Petkov (AMD)
d893832d0e x86/srso: Add IBPB on VMEXIT
Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Borislav Petkov (AMD)
233d6f68b9 x86/srso: Add IBPB
Add the option to mitigate using IBPB on a kernel entry. Pull in the
Retbleed alternative so that the IBPB call from there can be used. Also,
if Retbleed mitigation is done using IBPB, the same mitigation can and
must be used here.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Borislav Petkov (AMD)
1b5277c0ea x86/srso: Add SRSO_NO support
Add support for the CPUID flag which denotes that the CPU is not
affected by SRSO.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Borislav Petkov (AMD)
79113e4060 x86/srso: Add IBPB_BRTYPE support
Add support for the synthetic CPUID flag which "if this bit is 1,
it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch
type predictions from the CPU branch predictor."

This flag is there so that this capability in guests can be detected
easily (otherwise one would have to track microcode revisions which is
impossible for guests).

It is also needed only for Zen3 and -4. The other two (Zen1 and -2)
always flush branch type predictions by default.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Borislav Petkov (AMD)
fb3bd914b3 x86/srso: Add a Speculative RAS Overflow mitigation
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference.  In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:14 +02:00
ZhiHu
060f2b979c x86/hyperv: fix a warning in mshyperv.h
The following checkpatch warning is removed:
  WARNING: Use #include <linux/io.h> instead of <asm/io.h>

Signed-off-by: ZhiHu <huzhi001@208suo.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-07-23 23:12:47 +00:00
Daniel Sneddon
8974eb5882 x86/speculation: Add Gather Data Sampling mitigation
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.

Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.

This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.

Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.

The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:

    /sys/devices/system/cpu/vulnerabilities/gather_data_sampling

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-19 16:45:37 -07:00
Borislav Petkov (AMD)
0e52740ffd x86/bugs: Increase the x86 bugs vector size to two u32s
There was never a doubt in my mind that they would not fit into a single
u32 eventually.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-18 09:35:38 +02:00
Borislav Petkov (AMD)
522b1d6921 x86/cpu/amd: Add a Zenbleed fix
Add a fix for the Zen2 VZEROUPPER data corruption bug where under
certain circumstances executing VZEROUPPER can cause register
corruption or leak data.

The optimal fix is through microcode but in the case the proper
microcode revision has not been applied, enable a fallback fix using
a chicken bit.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-17 15:48:10 +02:00
Brian Gerst
3aec4ecb3d x86: Rewrite ret_from_fork() in C
When kCFI is enabled, special handling is needed for the indirect call
to the kernel thread function.  Rewrite the ret_from_fork() function in
C so that the compiler can properly handle the indirect call.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230623225529.34590-3-brgerst@gmail.com
2023-07-10 09:52:25 +02:00
Peter Zijlstra
be0fffa5ca x86/alternative: Rename apply_ibt_endbr()
The current name doesn't reflect what it does very well.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.427441595%40infradead.org
2023-07-10 09:52:23 +02:00
Peter Zijlstra
0479a42d4c x86/cfi: Extend {JMP,CAKK}_NOSPEC comment
With the introduction of kCFI these helpers are no longer equivalent
to C indirect calls and should be used with care.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.360957723%40infradead.org
2023-07-10 09:52:23 +02:00
Linus Torvalds
e8069f5a8e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Eager page splitting optimization for dirty logging, optionally
     allowing for a VM to avoid the cost of hugepage splitting in the
     stage-2 fault path.

   - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
     with services that live in the Secure world. pKVM intervenes on
     FF-A calls to guarantee the host doesn't misuse memory donated to
     the hyp or a pKVM guest.

   - Support for running the split hypervisor with VHE enabled, known as
     'hVHE' mode. This is extremely useful for testing the split
     hypervisor on VHE-only systems, and paves the way for new use cases
     that depend on having two TTBRs available at EL2.

   - Generalized framework for configurable ID registers from userspace.
     KVM/arm64 currently prevents arbitrary CPU feature set
     configuration from userspace, but the intent is to relax this
     limitation and allow userspace to select a feature set consistent
     with the CPU.

   - Enable the use of Branch Target Identification (FEAT_BTI) in the
     hypervisor.

   - Use a separate set of pointer authentication keys for the
     hypervisor when running in protected mode, as the host is untrusted
     at runtime.

   - Ensure timer IRQs are consistently released in the init failure
     paths.

   - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
     Traps (FEAT_EVT), as it is a register commonly read from userspace.

   - Erratum workaround for the upcoming AmpereOne part, which has
     broken hardware A/D state management.

  RISC-V:

   - Redirect AMO load/store misaligned traps to KVM guest

   - Trap-n-emulate AIA in-kernel irqchip for KVM guest

   - Svnapot support for KVM Guest

  s390:

   - New uvdevice secret API

   - CMM selftest and fixes

   - fix racy access to target CPU for diag 9c

  x86:

   - Fix missing/incorrect #GP checks on ENCLS

   - Use standard mmu_notifier hooks for handling APIC access page

   - Drop now unnecessary TR/TSS load after VM-Exit on AMD

   - Print more descriptive information about the status of SEV and
     SEV-ES during module load

   - Add a test for splitting and reconstituting hugepages during and
     after dirty logging

   - Add support for CPU pinning in demand paging test

   - Add support for AMD PerfMonV2, with a variety of cleanups and minor
     fixes included along the way

   - Add a "nx_huge_pages=never" option to effectively avoid creating NX
     hugepage recovery threads (because nx_huge_pages=off can be toggled
     at runtime)

   - Move handling of PAT out of MTRR code and dedup SVM+VMX code

   - Fix output of PIC poll command emulation when there's an interrupt

   - Add a maintainer's handbook to document KVM x86 processes,
     preferred coding style, testing expectations, etc.

   - Misc cleanups, fixes and comments

  Generic:

   - Miscellaneous bugfixes and cleanups

  Selftests:

   - Generate dependency files so that partial rebuilds work as
     expected"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
  Documentation/process: Add a maintainer handbook for KVM x86
  Documentation/process: Add a label for the tip tree handbook's coding style
  KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
  RISC-V: KVM: Remove unneeded semicolon
  RISC-V: KVM: Allow Svnapot extension for Guest/VM
  riscv: kvm: define vcpu_sbi_ext_pmu in header
  RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
  RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel emulation of AIA APLIC
  RISC-V: KVM: Implement device interface for AIA irqchip
  RISC-V: KVM: Skeletal in-kernel AIA irqchip support
  RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
  RISC-V: KVM: Add APLIC related defines
  RISC-V: KVM: Add IMSIC related defines
  RISC-V: KVM: Implement guest external interrupt line management
  KVM: x86: Remove PRIx* definitions as they are solely for user space
  s390/uv: Update query for secret-UVCs
  s390/uv: replace scnprintf with sysfs_emit
  s390/uvdevice: Add 'Lock Secret Store' UVC
  ...
2023-07-03 15:32:22 -07:00
Paolo Bonzini
751d77fefa Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEAD
KVM x86/pmu changes for 6.5:

 - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
   included along the way
2023-07-01 07:18:51 -04:00
Linus Torvalds
cccf0c2ee5 Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Add new feature to have function graph tracer record the return
   value. Adds a new option: funcgraph-retval ; when set, will show the
   return value of a function in the function graph tracer.

 - Also add the option: funcgraph-retval-hex where if it is not set, and
   the return value is an error code, then it will return the decimal of
   the error code, otherwise it still reports the hex value.

 - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
   That when a application opens it, it becomes the task that the timer
   lat tracer traces. The application can also read this file to find
   out how it's being interrupted.

 - Add the file /sys/kernel/tracing/available_filter_functions_addrs
   that works just the same as available_filter_functions but also shows
   the addresses of the functions like kallsyms, except that it gives
   the address of where the fentry/mcount jump/nop is. This is used by
   BPF to make it easier to attach BPF programs to ftrace hooks.

 - Replace strlcpy with strscpy in the tracing boot code.

* tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix warnings when building htmldocs for function graph retval
  riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing/boot: Replace strlcpy with strscpy
  tracing/timerlat: Add user-space interface
  tracing/osnoise: Skip running osnoise if all instances are off
  tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
  ftrace: Show all functions with addresses in available_filter_functions_addrs
  selftests/ftrace: Add funcgraph-retval test case
  LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
  function_graph: Support recording and printing the return value of function
  fgraph: Add declaration of "struct fgraph_ret_regs"
2023-06-30 10:33:17 -07:00
Linus Torvalds
1b722407a1 Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "There is one set of patches to misc for a i915 gsc/mei proxy driver.

  Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
  of refactoring.

  core:
   - replace strlcpy with strscpy
   - EDID changes to support further conversion to struct drm_edid
   - Move i915 DSC parameter code to common DRM helpers
   - Add Colorspace functionality

  aperture:
   - ignore framebuffers with non-primary devices

  fbdev:
   - use fbdev i/o helpers
   - add Kconfig options for fb_ops helpers
   - use new fb io helpers directly in drivers

  sysfs:
   - export DRM connector ID

  scheduler:
   - Avoid an infinite loop

  ttm:
   - store function table in .rodata
   - Add query for TTM mem limit
   - Add NUMA awareness to pools
   - Export ttm_pool_fini()

  bridge:
   - fsl-ldb: support i.MX6SX
   - lt9211, lt9611: remove blanking packets
   - tc358768: implement input bus formats, devm cleanups
   - ti-snd65dsi86: implement wait_hpd_asserted
   - analogix: fix endless probe loop
   - samsung-dsim: support swapped clock, fix enabling, support var
     clock
   - display-connector: Add support for external power supply
   - imx: Fix module linking
   - tc358762: Support reset GPIO

  panel:
   - nt36523: Support Lenovo J606F
   - st7703: Support Anbernic RG353V-V2
   - InnoLux G070ACE-L01 support
   - boe-tv101wum-nl6: Improve initialization
   - sharp-ls043t1le001: Mode fixes
   - simple: BOE EV121WXM-N10-1850, S6D7AA0
   - Ampire AM-800480L1TMQW-T00H
   - Rocktech RK043FN48H
   - Starry himax83102-j02
   - Starry ili9882t

  amdgpu:
   - add new ctx query flag to handle reset better
   - add new query/set shadow buffer for rdna3
   - DCN 3.2/3.1.x/3.0.x updates
   - Enable DC_FP on loongarch
   - PCIe fix for RDNA2
   - improve DC FAMS/SubVP support for better power management
   - partition support for lots of engines
   - Take NUMA into account when allocating memory
   - Add new DRM_AMDGPU_WERROR config parameter to help with CI
   - Initial SMU13 overdrive support
   - Add support for new colorspace KMS API
   - W=1 fixes

  amdkfd:
   - Query TTM mem limit rather than hardcoding it
   - GC 9.4.3 partition support
   - Handle NUMA for partitions
   - Add debugger interface for enabling gdb
   - Add KFD event age tracking

  radeon:
   - Fix possible UAF

  i915:
   - new getparam for PXP support
   - GSC/MEI proxy driver
   - Meteorlake display enablement
   - avoid clearing preallocated framebuffers with TTM
   - implement framebuffer mmap support
   - Disable sampler indirect state in bindless heap
   - Enable fdinfo for GuC backends
   - GuC loading and firmware table handling fixes
   - Various refactors for multi-tile enablement
   - Define MOCS and PAT tables for MTL
   - GSC/MEI support for Meteorlake
   - PMU multi-tile support
   - Large driver kernel doc cleanup
   - Allow VRR toggling and arbitrary refresh rates
   - Support async flips on linear buffers on display ver 12+
   - Expose CRTC CTM property on ILK/SNB/VLV
   - New debugfs for display clock frequencies
   - Hotplug refactoring
   - Display refactoring
   - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
   - Use large rings for compute contexts
   - HuC loading for MTL
   - Allow user to set cache at BO creation
   - MTL powermanagement enhancements
   - Switch to dedicated workqueues to stop using flush_scheduled_work()
   - Move display runtime init under display/
   - Remove 10bit gamma on desktop gen3 parts, they don't support it

  habanalabs:
   - uapi: return 0 for user queries if there was a h/w or f/w error
   - Add pci health check when we lose connection with the firmware.
     This can be used to distinguish between pci link down and firmware
     getting stuck.
   - Add more info to the error print when TPC interrupt occur.
   - Firmware fixes

  msm:
   - Adreno A660 bindings
   - SM8350 MDSS bindings fix
   - Added support for DPU on sm6350 and sm6375 platforms
   - Implemented tearcheck support to support vsync on SM150 and newer
     platforms
   - Enabled missing features (DSPP, DSC, split display) on sc8180x,
     sc8280xp, sm8450
   - Added support for DSI and 28nm DSI PHY on MSM8226 platform
   - Added support for DSI on sm6350 and sm6375 platforms
   - Added support for display controller on MSM8226 platform
   - A690 GPU support
   - Move cmdstream dumping out of fence signaling path
   - a610 support
   - Support for a6xx devices without GMU

  nouveau:
   - NULL ptr before deref fixes

  armada:
   - implement fbdev emulation as client

  sun4i:
   - fix mipi-dsi dotclock
   - release clocks

  vc4:
   - rgb range toggle property
   - BT601 / BT2020 HDMI support

  vkms:
   - convert to drmm helpers
   - add reflection and rotation support
   - fix rgb565 conversion

  gma500:
   - fix iomem access

  shmobile:
   - support renesas soc platform
   - enable fbdev

  mxsfb:
   - Add support for i.MX93 LCDIF

  stm:
   - dsi: Use devm_ helper
   - ltdc: Fix potential invalid pointer deref

  renesas:
   - Group drivers in renesas subdirectory to prepare for new platform
   - Drop deprecated R-Car H3 ES1.x support

  meson:
   - Add support for MIPI DSI displays

  virtio:
   - add sync object support

  mediatek:
   - Add display binding document for MT6795"

* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
  drm/i915: Fix a NULL vs IS_ERR() bug
  drm/i915: make i915_drm_client_fdinfo() reference conditional again
  drm/i915/huc: Fix missing error code in intel_huc_init()
  drm/i915/gsc: take a wakeref for the proxy-init-completion check
  drm/msm/a6xx: Add A610 speedbin support
  drm/msm/a6xx: Add A619_holi speedbin support
  drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  drm/msm/a6xx: Fix some A619 tunables
  drm/msm/a6xx: Add A610 support
  drm/msm/a6xx: Add support for A619_holi
  drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  drm/msm/a6xx: Introduce GMU wrapper support
  drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
  drm/msm/a6xx: Extend and explain UBWC config
  drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  drm/msm/a6xx: Add a helper for software-resetting the GPU
  drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
  drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
  drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
  ...
2023-06-29 11:00:17 -07:00
Linus Torvalds
77b1a7f7a0 Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:

 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
   directories

 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector. It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs

 - Zhen Lei has enhanced kexec's ability to support two crash regions

 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries

 - And the usual bunch of singleton patches in various places

* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  kernel/time/posix-stubs.c: remove duplicated include
  ocfs2: remove redundant assignment to variable bit_off
  watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
  powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
  devres: show which resource was invalid in __devm_ioremap_resource()
  watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
  watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
  watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
  watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
  watchdog/hardlockup: make the config checks more straightforward
  watchdog/hardlockup: sort hardlockup detector related config values a logical way
  watchdog/hardlockup: move SMP barriers from common code to buddy code
  watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  ...
2023-06-28 10:59:38 -07:00
Linus Torvalds
6f612579be Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molar:
 "Build footprint & performance improvements:

   - Reduce memory usage with CONFIG_DEBUG_INFO=y

     In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel,
     DWARF creates almost 200 million relocations, ballooning objtool's
     peak heap usage to 53GB. These patches reduce that to 25GB.

     On a distro-type kernel with kernel IBT enabled, they reduce
     objtool's peak heap usage from 4.2GB to 2.8GB.

     These changes also improve the runtime significantly.

  Debuggability improvements:

   - Add the unwind_debug command-line option, for more extend unwinding
     debugging output
   - Limit unreachable warnings to once per function
   - Add verbose option for disassembling affected functions
   - Include backtrace in verbose mode
   - Detect missing __noreturn annotations
   - Ignore exc_double_fault() __noreturn warnings
   - Remove superfluous global_noreturns entries
   - Move noreturn function list to separate file
   - Add __kunit_abort() to noreturns

  Unwinder improvements:

   - Allow stack operations in UNWIND_HINT_UNDEFINED regions
   - drm/vmwgfx: Add unwind hints around RBP clobber

  Cleanups:

   - Move the x86 entry thunk restore code into thunk functions
   - x86/unwind/orc: Use swap() instead of open coding it
   - Remove unnecessary/unused variables

  Fixes for modern stack canary handling"

* tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y
  objtool: Skip reading DWARF section data
  objtool: Free insns when done
  objtool: Get rid of reloc->rel[a]
  objtool: Shrink elf hash nodes
  objtool: Shrink reloc->sym_reloc_entry
  objtool: Get rid of reloc->jump_table_start
  objtool: Get rid of reloc->addend
  objtool: Get rid of reloc->type
  objtool: Get rid of reloc->offset
  objtool: Get rid of reloc->idx
  objtool: Get rid of reloc->list
  objtool: Allocate relocs in advance for new rela sections
  objtool: Add for_each_reloc()
  objtool: Don't free memory in elf_close()
  objtool: Keep GElf_Rel[a] structs synced
  objtool: Add elf_create_section_pair()
  objtool: Add mark_sec_changed()
  objtool: Fix reloc_hash size
  objtool: Consolidate rel/rela handling
  ...
2023-06-27 15:05:41 -07:00
Linus Torvalds
4d6751815b Merge tag 'x86-mm-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:

 - Remove Xen-PV leftovers from init_32.c

 - Fix __swp_entry_to_pte() warning splat for Xen PV guests, triggered
   on CONFIG_DEBUG_VM_PGTABLE=y

* tag 'x86-mm-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove Xen-PV leftovers from init_32.c
  x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
2023-06-27 14:47:17 -07:00
Linus Torvalds
a193cc7506 Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Rework & fix the event forwarding logic by extending the core
   interface.

   This fixes AMD PMU events that have to be forwarded from the
   core PMU to the IBS PMU.

 - Add self-tests to test AMD IBS invocation via core PMU events

 - Clean up Intel FixCntrCtl MSR encoding & handling

* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Re-instate the linear PMU search
  perf/x86/intel: Define bit macros for FixCntrCtl MSR
  perf test: Add selftest to test IBS invocation via core pmu events
  perf/core: Remove pmu linear searching code
  perf/ibs: Fix interface via core pmu events
  perf/core: Rework forwarding of {task|cpu}-clock events
2023-06-27 14:43:02 -07:00
Linus Torvalds
bc6cb4d5bc Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

   The cmpxchg128() family of functions is basically & functionally the
   same as cmpxchg_double(), but with a saner interface.

   Instead of a 6-parameter horror that forced u128 - u64/u64-halves
   layout details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128
   types.

 - Restructure the generated atomic headers, and add kerneldoc comments
   for all of the generic atomic{,64,_long}_t operations.

   The generated definitions are much cleaner now, and come with
   documentation.

 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
   taking multiple locks of the same type.

   This gets rid of one use of lockdep_set_novalidate_class() in the
   bcache code.

 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
   shadowing generating garbage code on Clang on certain ARM builds.

* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
  percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
  locking/atomic: treewide: delete arch_atomic_*() kerneldoc
  locking/atomic: docs: Add atomic operations to the driver basic API documentation
  locking/atomic: scripts: generate kerneldoc comments
  docs: scripts: kernel-doc: accept bitwise negation like ~@var
  locking/atomic: scripts: simplify raw_atomic*() definitions
  locking/atomic: scripts: simplify raw_atomic_long*() definitions
  locking/atomic: scripts: split pfx/name/sfx/order
  locking/atomic: scripts: restructure fallback ifdeffery
  locking/atomic: scripts: build raw_atomic_long*() directly
  locking/atomic: treewide: use raw_atomic*_<op>()
  locking/atomic: scripts: add trivial raw_atomic*_<op>()
  locking/atomic: scripts: factor out order template generation
  locking/atomic: scripts: remove leftover "${mult}"
  locking/atomic: scripts: remove bogus order parameter
  locking/atomic: xtensa: add preprocessor symbols
  locking/atomic: x86: add preprocessor symbols
  locking/atomic: sparc: add preprocessor symbols
  locking/atomic: sh: add preprocessor symbols
  ...
2023-06-27 14:14:30 -07:00
Linus Torvalds
ed3b7923a8 Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Scheduler SMP load-balancer improvements:

   - Avoid unnecessary migrations within SMT domains on hybrid systems.

     Problem:

        On hybrid CPU systems, (processors with a mixture of
        higher-frequency SMT cores and lower-frequency non-SMT cores),
        under the old code lower-priority CPUs pulled tasks from the
        higher-priority cores if more than one SMT sibling was busy -
        resulting in many unnecessary task migrations.

     Solution:

        The new code improves the load balancer to recognize SMT cores
        with more than one busy sibling and allows lower-priority CPUs
        to pull tasks, which avoids superfluous migrations and lets
        lower-priority cores inspect all SMT siblings for the busiest
        queue.

   - Implement the 'runnable boosting' feature in the EAS balancer:
     consider CPU contention in frequency, EAS max util & load-balance
     busiest CPU selection.

     This improves CPU utilization for certain workloads, while leaves
     other key workloads unchanged.

  Scheduler infrastructure improvements:

   - Rewrite the scheduler topology setup code by consolidating it into
     the build_sched_topology() helper function and building it
     dynamically on the fly.

   - Resolve the local_clock() vs. noinstr complications by rewriting
     the code: provide separate sched_clock_noinstr() and
     local_clock_noinstr() functions to be used in instrumentation code,
     and make sure it is all instrumentation-safe.

  Fixes:

   - Fix a kthread_park() race with wait_woken()

   - Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
       - Fix UP PREEMPT bug by unifying the SMP and UP implementations
       - Fix task_struct::saved_state handling

   - Fix various rq clock update bugs, unearthed by turning on the rq
     clock debugging code.

   - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
     by creating enough cgroups, by removing the warnign and restricting
     window size triggers to PSI file write-permission or
     CAP_SYS_RESOURCE.

   - Propagate SMT flags in the topology when removing degenerate domain

   - Fix grub_reclaim() calculation bug in the deadline scheduler code

   - Avoid resetting the min update period when it is unnecessary, in
     psi_trigger_destroy().

   - Don't balance a task to its current running CPU in load_balance(),
     which was possible on certain NUMA topologies with overlapping
     groups.

   - Fix the sched-debug printing of rq->nr_uninterruptible

  Cleanups:

   - Address various -Wmissing-prototype warnings, as a preparation to
     (maybe) enable this warning in the future.

   - Remove unused code

   - Mark more functions __init

   - Fix shadow-variable warnings"

* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
  sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
  sched/core: Fixed missing rq clock update before calling set_rq_offline()
  sched/deadline: Update GRUB description in the documentation
  sched/deadline: Fix bandwidth reclaim equation in GRUB
  sched/wait: Fix a kthread_park race with wait_woken()
  sched/topology: Mark set_sched_topology() __init
  sched/fair: Rename variable cpu_util eff_util
  arm64/arch_timer: Fix MMIO byteswap
  sched/fair, cpufreq: Introduce 'runnable boosting'
  sched/fair: Refactor CPU utilization functions
  cpuidle: Use local_clock_noinstr()
  sched/clock: Provide local_clock_noinstr()
  x86/tsc: Provide sched_clock_noinstr()
  clocksource: hyper-v: Provide noinstr sched_clock()
  clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  x86/vdso: Fix gettimeofday masking
  math64: Always inline u128 version of mul_u64_u64_shr()
  s390/time: Provide sched_clock_noinstr()
  loongarch: Provide noinstr sched_clock_read()
  ...
2023-06-27 14:03:21 -07:00
Linus Torvalds
12dc010071 Merge tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:

 - Some SEV and CC platform helpers cleanup and simplifications now that
   the usage patterns are becoming apparent

[ I'm sure I'm the only one that has gets confused by all the TLAs, but
  in case there are others: here SEV is AMD's "Secure Encrypted
  Virtualization" and CC is generic "Confidential Computing".

  There's also Intel SGX (Software Guard Extensions) and TDX (Trust
  Domain Extensions), along with all the vendor memory encryption
  extensions (SME, TSME, TME, and WTF).

  And then we have arm64 with RMA and CCA, and I probably forgot another
  dozen or so related acronyms    - Linus ]

* tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/coco: Get rid of accessor functions
  x86/sev: Get rid of special sev_es_enable_key
  x86/coco: Mark cc_platform_has() and descendants noinstr
2023-06-27 13:26:30 -07:00
Linus Torvalds
dc43fc753b Merge tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mtrr updates from Borislav Petkov:
 "A serious scrubbing of the MTRR code including adding a new map
  mechanism in order to look up the memory type of a region easily.

  Also address memory range lookup issues like returning an invalid
  memory type. Furthermore, this handles the decoupling of PAT from MTRR
  more naturally.

  All work by Juergen Gross"

* tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/xen: Set default memory type for PV guests to WB
  x86/mtrr: Unify debugging printing
  x86/mtrr: Remove unused code
  x86/mm: Only check uniform after calling mtrr_type_lookup()
  x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
  x86/mtrr: Use new cache_map in mtrr_type_lookup()
  x86/mtrr: Add mtrr=debug command line option
  x86/mtrr: Construct a memory map with cache modes
  x86/mtrr: Add get_effective_type() service function
  x86/mtrr: Allocate mtrr_value array dynamically
  x86/mtrr: Move 32-bit code from mtrr.c to legacy.c
  x86/mtrr: Have only one set_mtrr() variant
  x86/mtrr: Replace vendor tests in MTRR code
  x86/xen: Set MTRR state when running as Xen PV initial domain
  x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest
  x86/mtrr: Support setting MTRR state for software defined MTRRs
  x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept
  x86/mtrr: Remove physical address size calculation
2023-06-27 13:11:32 -07:00
Linus Torvalds
19300488c9 Merge tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Dave Hansen:
 "As usual, these are all over the map. The biggest cluster is work from
  Arnd to eliminate -Wmissing-prototype warnings:

   - Address -Wmissing-prototype warnings

   - Remove repeated 'the' in comments

   - Remove unused current_untag_mask()

   - Document urgent tip branch timing

   - Clean up MSR kernel-doc notation

   - Clean up paravirt_ops doc

   - Update Srivatsa S. Bhat's maintained areas

   - Remove unused extern declaration acpi_copy_wakeup_routine()"

* tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine()
  Documentation: virt: Clean up paravirt_ops doc
  x86/mm: Remove unused current_untag_mask()
  x86/mm: Remove repeated word in comments
  x86/lib/msr: Clean up kernel-doc notation
  x86/platform: Avoid missing-prototype warnings for OLPC
  x86/mm: Add early_memremap_pgprot_adjust() prototype
  x86/usercopy: Include arch_wb_cache_pmem() declaration
  x86/vdso: Include vdso/processor.h
  x86/mce: Add copy_mc_fragile_handle_tail() prototype
  x86/fbdev: Include asm/fb.h as needed
  x86/hibernate: Declare global functions in suspend.h
  x86/entry: Add do_SYSENTER_32() prototype
  x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()
  x86/mm: Include asm/numa.h for set_highmem_pages_init()
  x86: Avoid missing-prototype warnings for doublefault code
  x86/fpu: Include asm/fpu/regset.h
  x86: Add dummy prototype for mk_early_pgtbl_32()
  x86/pci: Mark local functions as 'static'
  x86/ftrace: Move prepare_ftrace_return prototype to header
  ...
2023-06-26 16:43:54 -07:00
Linus Torvalds
5dfe7a7e52 Merge tag 'x86_tdx_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tdx updates from Dave Hansen:

 - Fix a race window where load_unaligned_zeropad() could cause a fatal
   shutdown during TDX private<=>shared conversion

   The race has never been observed in practice but might allow
   load_unaligned_zeropad() to catch a TDX page in the middle of its
   conversion process which would lead to a fatal and unrecoverable
   guest shutdown.

 - Annotate sites where VM "exit reasons" are reused as hypercall
   numbers.

* tag 'x86_tdx_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix enc_status_change_finish_noop()
  x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
  x86/mm: Allow guest.enc_status_change_prepare() to fail
  x86/tdx: Wrap exit reason with hcall_func()
2023-06-26 16:32:47 -07:00
Linus Torvalds
36db314440 Merge tag 'x86_platform_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Dave Hansen:
 "Allow CPUs in SGX/HPE Ultraviolet to start using Sub-NUMA clustering
  (SNC) mode. SNC has been around outside the UV world for a while but
  evidently never worked on UV systems.

  SNC is rather notorious for breaking bad assumptions of a 1:1
  relationship between physical sockets and NUMA nodes. The UV code was
  rather prolific with these assumptions and took quite a bit of
  refactoring to remove them"

* tag 'x86_platform_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Update UV[23] platform code for SNC
  x86/platform/uv: Remove remaining BUG_ON() and BUG() calls
  x86/platform/uv: UV support for sub-NUMA clustering
  x86/platform/uv: Helper functions for allocating and freeing conversion tables
  x86/platform/uv: When searching for minimums, start at INT_MAX not 99999
  x86/platform/uv: Fix printed information in calc_mmioh_map
  x86/platform/uv: Introduce helper function uv_pnode_to_socket.
  x86/platform/uv: Add platform resolving #defines for misc GAM_MMIOH_REDIRECT*
2023-06-26 16:26:44 -07:00
Linus Torvalds
941d77c773 Merge tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:

 - Compute the purposeful misalignment of zen_untrain_ret automatically
   and assert __x86_return_thunk's alignment so that future changes to
   the symbol macros do not accidentally break them.

 - Remove CONFIG_X86_FEATURE_NAMES Kconfig option as its existence is
   pointless

* tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retbleed: Add __x86_return_thunk alignment checks
  x86/cpu: Remove X86_FEATURE_NAMES
  x86/Kconfig: Make X86_FEATURE_NAMES non-configurable in prompt
2023-06-26 15:42:34 -07:00
Linus Torvalds
2c96136a3f Merge tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 confidential computing update from Borislav Petkov:

 - Add support for unaccepted memory as specified in the UEFI spec v2.9.

   The gist of it all is that Intel TDX and AMD SEV-SNP confidential
   computing guests define the notion of accepting memory before using
   it and thus preventing a whole set of attacks against such guests
   like memory replay and the like.

   There are a couple of strategies of how memory should be accepted -
   the current implementation does an on-demand way of accepting.

* tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sevguest: Add CONFIG_CRYPTO dependency
  x86/efi: Safely enable unaccepted memory in UEFI
  x86/sev: Add SNP-specific unaccepted memory support
  x86/sev: Use large PSC requests if applicable
  x86/sev: Allow for use of the early boot GHCB for PSC requests
  x86/sev: Put PSC struct on the stack in prep for unaccepted memory support
  x86/sev: Fix calculation of end address based on number of pages
  x86/tdx: Add unaccepted memory support
  x86/tdx: Refactor try_accept_one()
  x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub
  efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
  efi: Add unaccepted memory support
  x86/boot/compressed: Handle unaccepted memory
  efi/libstub: Implement support for unaccepted memory
  efi/x86: Get full memory map in allocate_e820()
  mm: Add support for unaccepted memory
2023-06-26 15:32:39 -07:00
Linus Torvalds
8c69e7afe9 Merge tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 instruction alternatives updates from Borislav Petkov:

 - Up until now the Fast Short Rep Mov optimizations implied the
   presence of the ERMS CPUID flag. AMD decoupled them with a BIOS
   setting so decouple that dependency in the kernel code too

 - Teach the alternatives machinery to handle relocations

 - Make debug_alternative accept flags in order to see only that set of
   patching done one is interested in

 - Other fixes, cleanups and optimizations to the patching code

* tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: PAUSE is not a NOP
  x86/alternatives: Add cond_resched() to text_poke_bp_batch()
  x86/nospec: Shorten RESET_CALL_DEPTH
  x86/alternatives: Add longer 64-bit NOPs
  x86/alternatives: Fix section mismatch warnings
  x86/alternative: Optimize returns patching
  x86/alternative: Complicate optimize_nops() some more
  x86/alternative: Rewrite optimize_nops() some
  x86/lib/memmove: Decouple ERMS from FSRM
  x86/alternative: Support relocations in alternatives
  x86/alternative: Make debug-alternative selective
2023-06-26 15:14:55 -07:00
Linus Torvalds
88afbb21d4 Merge tag 'x86-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Thomas Gleixner:
 "A set of fixes for kexec(), reboot and shutdown issues:

   - Ensure that the WBINVD in stop_this_cpu() has been completed before
     the control CPU proceedes.

     stop_this_cpu() is used for kexec(), reboot and shutdown to park
     the APs in a HLT loop.

     The control CPU sends an IPI to the APs and waits for their CPU
     online bits to be cleared. Once they all are marked "offline" it
     proceeds.

     But stop_this_cpu() clears the CPU online bit before issuing
     WBINVD, which means there is no guarantee that the AP has reached
     the HLT loop.

     This was reported to cause intermittent reboot/shutdown failures
     due to some dubious interaction with the firmware.

     This is not only a problem of WBINVD. The code to actually "stop"
     the CPU which runs between clearing the online bit and reaching the
     HLT loop can cause large enough delays on its own (think
     virtualization). That's especially dangerous for kexec() as kexec()
     expects that all APs are in a safe state and not executing code
     while the boot CPU jumps to the new kernel. There are more issues
     vs kexec() which are addressed separately.

     Cure this by implementing an explicit synchronization point right
     before the AP reaches HLT. This guarantees that the AP has
     completed the full stop proceedure.

   - Fix the condition for WBINVD in stop_this_cpu().

     The WBINVD in stop_this_cpu() is required for ensuring that when
     switching to or from memory encryption no dirty data is left in the
     cache lines which might cause a write back in the wrong more later.

     This checks CPUID directly because the feature bit might have been
     cleared due to a command line option.

     But that CPUID check accesses leaf 0x8000001f::EAX unconditionally.
     Intel CPUs return the content of the highest supported leaf when a
     non-existing leaf is read, while AMD CPUs return all zeros for
     unsupported leafs.

     So the result of the test on Intel CPUs is lottery and on AMD its
     just correct by chance.

     While harmless it's incorrect and causes the conditional wbinvd()
     to be issued where not required, which caused the above issue to be
     unearthed.

   - Make kexec() robust against AP code execution

     Ashok observed triple faults when doing kexec() on a system which
     had been booted with "nosmt".

     It turned out that the SMT siblings which had been brought up
     partially are parked in mwait_play_dead() to enable power savings.

     mwait_play_dead() is monitoring the thread flags of the AP's idle
     task, which has been chosen as it's unlikely to be written to.

     But kexec() can overwrite the previous kernel text and data
     including page tables etc. When it overwrites the cache lines
     monitored by an AP that AP resumes execution after the MWAIT on
     eventually overwritten text, stack and page tables, which obviously
     might end up in a triple fault easily.

     Make this more robust in several steps:

      1) Use an explicit per CPU cache line for monitoring.

      2) Write a command to these cache lines to kick APs out of MWAIT
         before proceeding with kexec(), shutdown or reboot.

         The APs confirm the wakeup by writing status back and then
         enter a HLT loop.

      3) If the system uses INIT/INIT/STARTUP for AP bringup, park the
         APs in INIT state.

         HLT is not a guarantee that an AP won't wake up and resume
         execution. HLT is woken up by NMI and SMI. SMI puts the CPU
         back into HLT (+/- firmware bugs), but NMI is delivered to the
         CPU which executes the NMI handler. Same issue as the MWAIT
         scenario described above.

         Sending an INIT/INIT sequence to the APs puts them into wait
         for STARTUP state, which is safe against NMI.

     There is still an issue remaining which can't be fixed: #MCE

     If the AP sits in HLT and receives a broadcast #MCE it will try to
     handle it with the obvious consequences.

     INIT/INIT clears CR4.MCE in the AP which will cause a broadcast
     #MCE to shut down the machine.

     So there is a choice between fire (HLT) and frying pan (INIT).
     Frying pan has been chosen as it's at least preventing the NMI
     issue.

     On systems which are not using INIT/INIT/STARTUP there is not much
     which can be done right now, but at least the obvious and easy to
     trigger MWAIT issue has been addressed"

* tag 'x86-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smp: Put CPUs into INIT on shutdown if possible
  x86/smp: Split sending INIT IPI out into a helper function
  x86/smp: Cure kexec() vs. mwait_play_dead() breakage
  x86/smp: Use dedicated cache-line for mwait_play_dead()
  x86/smp: Remove pointless wmb()s from native_stop_other_cpus()
  x86/smp: Dont access non-existing CPUID leaf
  x86/smp: Make stop_other_cpus() more robust
2023-06-26 14:45:53 -07:00
Linus Torvalds
9244724fbf Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
2023-06-26 13:59:56 -07:00
Linus Torvalds
7cffdbe360 Merge tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Thomas Gleixner:
 "Initialize FPU late.

  Right now FPU is initialized very early during boot. There is no real
  requirement to do so. The only requirement is to have it done before
  alternatives are patched.

  That's done in check_bugs() which does way more than what the function
  name suggests.

  So first rename check_bugs() to arch_cpu_finalize_init() which makes
  it clear what this is about.

  Move the invocation of arch_cpu_finalize_init() earlier in
  start_kernel() as it has to be done before fork_init() which needs to
  know the FPU register buffer size.

  With those prerequisites the FPU initialization can be moved into
  arch_cpu_finalize_init(), which removes it from the early and fragile
  part of the x86 bringup"

* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
  x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
  x86/fpu: Mark init functions __init
  x86/fpu: Remove cpuinfo argument from init functions
  x86/init: Initialize signal frame size late
  init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
  init: Invoke arch_cpu_finalize_init() earlier
  init: Remove check_bugs() leftovers
  um/cpu: Switch to arch_cpu_finalize_init()
  sparc/cpu: Switch to arch_cpu_finalize_init()
  sh/cpu: Switch to arch_cpu_finalize_init()
  mips/cpu: Switch to arch_cpu_finalize_init()
  m68k/cpu: Switch to arch_cpu_finalize_init()
  loongarch/cpu: Switch to arch_cpu_finalize_init()
  ia64/cpu: Switch to arch_cpu_finalize_init()
  ARM: cpu: Switch to arch_cpu_finalize_init()
  x86/cpu: Switch to arch_cpu_finalize_init()
  init: Provide arch_cpu_finalize_init()
2023-06-26 13:39:10 -07:00
Linus Torvalds
64bf6ae93e Merge tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "Miscellaneous features, cleanups, and fixes for vfs and individual fs

  Features:

   - Use mode 0600 for file created by cachefilesd so it can be run by
     unprivileged users. This aligns them with directories which are
     already created with mode 0700 by cachefilesd

   - Reorder a few members in struct file to prevent some false sharing
     scenarios

   - Indicate that an eventfd is used a semaphore in the eventfd's
     fdinfo procfs file

   - Add a missing uapi header for eventfd exposing relevant uapi
     defines

   - Let the VFS protect transitions of a superblock from read-only to
     read-write in addition to the protection it already provides for
     transitions from read-write to read-only. Protecting read-only to
     read-write transitions allows filesystems such as ext4 to perform
     internal writes, keeping writers away until the transition is
     completed

  Cleanups:

   - Arnd removed the architecture specific arch_report_meminfo()
     prototypes and added a generic one into procfs.h. Note, we got a
     report about a warning in amdpgpu codepaths that suggested this was
     bisectable to this change but we concluded it was a false positive

   - Remove unused parameters from split_fs_names()

   - Rename put_and_unmap_page() to unmap_and_put_page() to let the name
     reflect the order of the cleanup operation that has to unmap before
     the actual put

   - Unexport buffer_check_dirty_writeback() as it is not used outside
     of block device aops

   - Stop allocating aio rings from highmem

   - Protecting read-{only,write} transitions in the VFS used open-coded
     barriers in various places. Replace them with proper little helpers
     and document both the helpers and all barrier interactions involved
     when transitioning between read-{only,write} states

   - Use flexible array members in old readdir codepaths

  Fixes:

   - Use the correct type __poll_t for epoll and eventfd

   - Replace all deprecated strlcpy() invocations, whose return value
     isn't checked with an equivalent strscpy() call

   - Fix some kernel-doc warnings in fs/open.c

   - Reduce the stack usage in jffs2's xattr codepaths finally getting
     rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088
     bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
     royally annoying compilation warning

   - Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not
     fmode_t is required to avoid fmode_t to integer degradation
     warnings

   - Create coredumps with O_WRONLY instead of O_RDWR. There's a long
     explanation in that commit how O_RDWR is actually a bug which we
     found out with the help of Linus and git archeology

   - Fix "no previous prototype" warnings in the pipe codepaths

   - Add overflow calculations for remap_verify_area() as a signed
     addition overflow could be triggered in xfstests

   - Fix a null pointer dereference in sysv

   - Use an unsigned variable for length calculations in jfs avoiding
     compilation warnings with gcc 13

   - Fix a dangling pipe pointer in the watch queue codepath

   - The legacy mount option parser provided as a fallback by the VFS
     for filesystems not yet converted to the new mount api did prefix
     the generated mount option string with a leading ',' causing issues
     for some filesystems

   - Fix a repeated word in a comment in fs.h

   - autofs: Update the ctime when mtime is updated as mandated by
     POSIX"

* tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  readdir: Replace one-element arrays with flexible-array members
  fs: Provide helpers for manipulating sb->s_readonly_remount
  fs: Protect reconfiguration of sb read-write from racing writes
  eventfd: add a uapi header for eventfd userspace APIs
  autofs: set ctime as well when mtime changes on a dir
  eventfd: show the EFD_SEMAPHORE flag in fdinfo
  fs/aio: Stop allocating aio rings from HIGHMEM
  fs: Fix comment typo
  fs: unexport buffer_check_dirty_writeback
  fs: avoid empty option when generating legacy mount string
  watch_queue: prevent dangling pipe pointer
  fs.h: Optimize file struct to prevent false sharing
  highmem: Rename put_and_unmap_page() to unmap_and_put_page()
  cachefiles: Allow the cache to be non-root
  init: remove unused names parameter in split_fs_names()
  jfs: Use unsigned variable for length calculations
  fs/sysv: Null check to prevent null-ptr-deref bug
  fs: use UB-safe check for signed addition overflow in remap_verify_area
  procfs: consolidate arch_report_meminfo declaration
  fs: pipe: reveal missing function protoypes
  ...
2023-06-26 09:50:21 -07:00
Donglin Peng
d938ba1768 x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the x86 platform.

We introduce a new structure called fgraph_ret_regs for the x86
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address
to the function ftrace_return_to_handler to record the return
value.

Link: https://lkml.kernel.org/r/53a506f0f18ff4b7aeb0feb762f1c9a5e9b83ee9.1680954589.git.pengdonglin@sangfor.com.cn

Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-20 18:38:38 -04:00
Thomas Gleixner
45e34c8af5 x86/smp: Put CPUs into INIT on shutdown if possible
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.

Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.

A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.

So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
2023-06-20 14:51:47 +02:00