Commit Graph

1279771 Commits

Author SHA1 Message Date
Paolo Bonzini
58ef24699b KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
The guest memory population logic will need to know what page size or level
(4K, 2M, ...) is mapped.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <eabc3f3e5eb03b370cadf6e1901ea34d7a020adc.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:36 -04:00
Sean Christopherson
f5e7f00cf1 KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
Move the accounting of the result of kvm_mmu_do_page_fault() to its
callers, as only pf_fixed is common to guest page faults and async #PFs,
and upcoming support KVM_PRE_FAULT_MEMORY won't bump _any_ stats.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:35 -04:00
Sean Christopherson
5186ec223b KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
Account stat.pf_taken in kvm_mmu_page_fault(), i.e. the actual page fault
handler, instead of conditionally bumping it in kvm_mmu_do_page_fault().
The "real" page fault handler is the only path that should ever increment
the number of taken page faults, as all other paths that "do page fault"
are by definition not handling faults that occurred in the guest.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:35 -04:00
Isaku Yamahata
bc1a5cd002 KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
memory range and calls the arch-specific function.  The implementation is
optional and enabled by a Kconfig symbol.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:35 -04:00
Isaku Yamahata
9aed7a6c0b KVM: Document KVM_PRE_FAULT_MEMORY ioctl
Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1]

It populates guest memory.  It doesn't do extra operations on the
underlying technology-specific initialization [2].  For example,
CoCo-related operations won't be performed.  Concretely for TDX, this API
won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
are required for such operations.

The key point is to adapt of vcpu ioctl instead of VM ioctl.  First,
populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
hundreds of GB.

[1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
[2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:35 -04:00
Paolo Bonzini
02b0d3b9d4 Merge branch 'kvm-6.10-fixes' into HEAD 2024-06-20 17:31:50 -04:00
Isaku Yamahata
8a4e2742a5 KVM: x86/tdp_mmu: Sprinkle __must_check
The TDP MMU function __tdp_mmu_set_spte_atomic uses a cmpxchg64 to replace
the SPTE value and returns -EBUSY on failure.  The caller must check the
return value and retry.  Add __must_check to it, as well as to two more
functions that forward the return value of __tdp_mmu_set_spte_atomic to
their caller.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-Id: <8f7d5a1b241bf5351eaab828d1a1efe5c17699ca.1705965635.git.isaku.yamahata@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 17:28:44 -04:00
Paolo Bonzini
d81473840c KVM: interrupt kvm_gmem_populate() on signals
kvm_gmem_populate() is a potentially lengthy operation that can involve
multiple calls to the firmware.  Interrupt it if a signal arrives.

Fixes: 1f6c06b177 ("KVM: guest_memfd: Add interface for populating gmem pages with user data")
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Cc: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 17:28:44 -04:00
Bibo Mao
676f819c3e KVM: Discard zero mask with function kvm_dirty_ring_reset
Function kvm_reset_dirty_gfn may be called with parameters cur_slot /
cur_offset / mask are all zero, it does not represent real dirty page.
It is not necessary to clear dirty page in this condition. Also return
value of macro __fls() is undefined if mask is zero which is called in
funciton kvm_reset_dirty_gfn(). Here just return.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240613122803.1031511-1-maobibo@loongson.cn>
[Move the conditional inside kvm_reset_dirty_gfn; suggested by
 Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 17:20:11 -04:00
Paolo Bonzini
c31745d2c5 virt: guest_memfd: fix reference leak on hwpoisoned page
If kvm_gmem_get_pfn() detects an hwpoisoned page, it returns -EHWPOISON
but it does not put back the reference that kvm_gmem_get_folio() had
grabbed.  Add the forgotten folio_put().

Fixes: a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Cc: stable@vger.kernel.org
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 17:12:11 -04:00
Alexey Dobriyan
f474092c6f kvm: do not account temporary allocations to kmem
Some allocations done by KVM are temporary, they are created as result
of program actions, but can't exists for arbitrary long times.

They should have been GFP_TEMPORARY (rip!).

OTOH, kvm-nx-lpage-recovery and kvm-pit kernel threads exist for as long
as VM exists but their task_struct memory is not accounted.
This is story for another day.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Message-ID: <c0122f66-f428-417e-a360-b25fc0f154a0@p183>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 14:19:12 -04:00
Sean Christopherson
b018589013 MAINTAINERS: Drop Wanpeng Li as a Reviewer for KVM Paravirt support
Drop Wanpeng as a KVM PARAVIRT reviewer as his @tencent.com email is
bouncing, and according to lore[*], the last activity from his @gmail.com
address was almost two years ago.

[*] https://lore.kernel.org/all/CANRm+Cwj29M9HU3=JRUOaKDR+iDKgr0eNMWQi0iLkR5THON-bg@mail.gmail.com

Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240610163427.3359426-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 14:18:08 -04:00
Sean Christopherson
f3ced000a2 KVM: x86: Always sync PIR to IRR prior to scanning I/O APIC routes
Sync pending posted interrupts to the IRR prior to re-scanning I/O APIC
routes, irrespective of whether the I/O APIC is emulated by userspace or
by KVM.  If a level-triggered interrupt routed through the I/O APIC is
pending or in-service for a vCPU, KVM needs to intercept EOIs on said
vCPU even if the vCPU isn't the destination for the new routing, e.g. if
servicing an interrupt using the old routing races with I/O APIC
reconfiguration.

Commit fceb3a36c2 ("KVM: x86: ioapic: Fix level-triggered EOI and
userspace I/OAPIC reconfigure race") fixed the common cases, but
kvm_apic_pending_eoi() only checks if an interrupt is in the local
APIC's IRR or ISR, i.e. misses the uncommon case where an interrupt is
pending in the PIR.

Failure to intercept EOI can manifest as guest hangs with Windows 11 if
the guest uses the RTC as its timekeeping source, e.g. if the VMM doesn't
expose a more modern form of time to the guest.

Cc: stable@vger.kernel.org
Cc: Adamos Ttofari <attofari@amazon.de>
Cc: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240611014845.82795-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20 14:18:02 -04:00
Ravi Bangoria
f99b052256 KVM: SNP: Fix LBR Virtualization for SNP guest
SEV-ES and thus SNP guest mandates LBR Virtualization to be _always_ ON.
Although commit b7e4be0a22 ("KVM: SEV-ES: Delegate LBR virtualization
to the processor") did the correct change for SEV-ES guests, it missed
the SNP. Fix it.

Reported-by: Srikanth Aithal <sraithal@amd.com>
Fixes: b7e4be0a22 ("KVM: SEV-ES: Delegate LBR virtualization to the processor")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240605114810.1304-1-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05 07:50:50 -04:00
Tao Su
db574f2f96 KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn().
Before checking the mismatch of private vs. shared, mmu_invalidate_seq is
saved to fault->mmu_seq, which can be used to detect an invalidation
related to the gfn occurred, i.e. KVM will not install a mapping in page
table if fault->mmu_seq != mmu_invalidate_seq.

Currently there is a second snapshot of mmu_invalidate_seq, which may not
be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute
may be changed between the two snapshots, but the gfn may be mapped in
page table without hindrance. Therefore, drop the second snapshot as it
has no obvious benefits.

Fixes: f6adeae81f ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()")
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05 06:45:06 -04:00
Paolo Bonzini
45ce0314bf Merge tag 'kvmarm-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.10, take #1

- Large set of FP/SVE fixes for pKVM, addressing the fallout
  from the per-CPU data rework and making sure that the host
  is not involved in the FP/SVE switching any more

- Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
  is copletely supported

- Fix for the respective priorities of Failed PAC, Illegal
  Execution state and Instruction Abort exceptions

- Fix the handling of AArch32 instruction traps failing their
  condition code, which was broken by the introduction of
  ESR_EL2.ISS2

- Allow vpcus running in AArch32 state to be restored in
  System mode

- Fix AArch32 GPR restore that would lose the 64 bit state
  under some conditions
2024-06-05 06:32:18 -04:00
Fuad Tabba
afb91f5f8a KVM: arm64: Ensure that SME controls are disabled in protected mode
KVM (and pKVM) do not support SME guests. Therefore KVM ensures
that the host's SME state is flushed and that SME controls for
enabling access to ZA storage and for streaming are disabled.

pKVM needs to protect against a buggy/malicious host. Ensure that
it wouldn't run a guest when protected mode is enabled should any
of the SME controls be enabled.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
a69283ae1d KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format
When setting/clearing CPACR bits for EL0 and EL1, use the ELx
format of the bits, which covers both. This makes the code
clearer, and reduces the chances of accidentally missing a bit.

No functional change intended.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
1696fc2174 KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM
Now that we have introduced finalize_init_hyp_mode(), lets
consolidate the initializing of the host_data fpsimd_state and
sve state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
b5b9955617 KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM
When running in protected mode we don't want to leak protected
guest state to the host, including whether a guest has used
fpsimd/sve. Therefore, eagerly restore the host state on guest
exit when running in protected mode, which happens only if the
guest has used fpsimd/sve.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
66d5b53e20 KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM
Protected mode needs to maintain (save/restore) the host's sve
state, rather than relying on the host kernel to do that. This is
to avoid leaking information to the host about guests and the
type of operations they are performing.

As a first step towards that, allocate memory mapped at hyp, per
cpu, for the host sve state. The following patch will use this
memory to save/restore the host state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
e511e08a9f KVM: arm64: Specialize handling of host fpsimd state on trap
In subsequent patches, n/vhe will diverge on saving the host
fpsimd/sve state when taking a guest fpsimd/sve trap. Add a
specialized helper to handle it.

No functional change intended.

Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
6d8fb3cbf7 KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be
toggled in different parts of the code, but the exact bits and
their polarity differ between these two formats and the mode
(vhe/nvhe/hvhe).

To reduce the amount of duplicated code and the chance of getting
the wrong bit/polarity or missing a field, abstract the set/clear
of CPTR_EL2 bits behind a helper.

Since (h)VHE is the way of the future, use the CPACR_EL1 format,
which is a subset of the VHE CPTR_EL2, as a reference.

No functional change intended.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
45f4ea9bcf KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state
Since the prototypes for __sve_save_state/__sve_restore_state at
hyp were added, the underlying macro has acquired a third
parameter for saving/restoring ffr.

Fix the prototypes to account for the third parameter, and
restore the ffr for the guest since it is saved.

Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Fuad Tabba
87bb39ed40 KVM: arm64: Reintroduce __sve_save_state
Now that the hypervisor is handling the host sve state in
protected mode, it needs to be able to save it.

This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Paolo Bonzini
ab978c62e7 Merge branch 'kvm-6.11-sev-snp' into HEAD
Pull base x86 KVM support for running SEV-SNP guests from Michael Roth:

* add some basic infrastructure and introduces a new KVM_X86_SNP_VM
  vm_type to handle differences versus the existing KVM_X86_SEV_VM and
  KVM_X86_SEV_ES_VM types.

* implement the KVM API to handle the creation of a cryptographic
  launch context, encrypt/measure the initial image into guest memory,
  and finalize it before launching it.

* implement handling for various guest-generated events such as page
  state changes, onlining of additional vCPUs, etc.

* implement the gmem/mmu hooks needed to prepare gmem-allocated pages
  before mapping them into guest private memory ranges as well as
  cleaning them up prior to returning them to the host for use as
  normal memory. Because those cleanup hooks supplant certain
  activities like issuing WBINVDs during KVM MMU invalidations, avoid
  duplicating that work to avoid unecessary overhead.

This merge leaves out support support for attestation guest requests
and for loading the signing keys to be used for attestation requests.
2024-06-03 13:19:46 -04:00
Paolo Bonzini
b50788f7cd Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.10, take #1

- No need to use mask when hart-index-bits is 0
- Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
2024-06-03 13:18:18 -04:00
Paolo Bonzini
b3233c737e Merge branch 'kvm-fixes-6.10-1' into HEAD
* Fixes and debugging help for the #VE sanity check.  Also disable
  it by default, even for CONFIG_DEBUG_KERNEL, because it was found
  to trigger spuriously (most likely a processor erratum as the
  exact symptoms vary by generation).

* Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
  situation (GIF=0 or interrupt shadow) when the processor supports
  virtual NMI.  While generally KVM will not request an NMI window
  when virtual NMIs are supported, in this case it *does* have to
  single-step over the interrupt shadow or enable the STGI intercept,
  in order to deliver the latched second NMI.

* Drop support for hand tuning APIC timer advancement from userspace.
  Since we have adaptive tuning, and it has proved to work well,
  drop the module parameter for manual configuration and with it a
  few stupid bugs that it had.
2024-06-03 13:18:08 -04:00
Paolo Bonzini
f9d1b541d0 Merge branch 'kvm-fixes-6.10-1' into HEAD
* Fixes and debugging help for the #VE sanity check.  Also disable
  it by default, even for CONFIG_DEBUG_KERNEL, because it was found
  to trigger spuriously (most likely a processor erratum as the
  exact symptoms vary by generation).

* Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
  situation (GIF=0 or interrupt shadow) when the processor supports
  virtual NMI.  While generally KVM will not request an NMI window
  when virtual NMIs are supported, in this case it *does* have to
  single-step over the interrupt shadow or enable the STGI intercept,
  in order to deliver the latched second NMI.

* Drop support for hand tuning APIC timer advancement from userspace.
  Since we have adaptive tuning, and it has proved to work well,
  drop the module parameter for manual configuration and with it a
  few stupid bugs that it had.
2024-06-03 13:10:11 -04:00
Sean Christopherson
89a58812c4 KVM: x86: Drop support for hand tuning APIC timer advancement from userspace
Remove support for specifying a static local APIC timer advancement value,
and instead present a read-only boolean parameter to let userspace enable
or disable KVM's dynamic APIC timer advancement.  Realistically, it's all
but impossible for userspace to specify an advancement that is more
precise than what KVM's adaptive tuning can provide.  E.g. a static value
needs to be tuned for the exact hardware and kernel, and if KVM is using
hrtimers, likely requires additional tuning for the exact configuration of
the entire system.

Dropping support for a userspace provided value also fixes several flaws
in the interface.  E.g. KVM interprets a negative value other than -1 as a
large advancement, toggling between a negative and positive value yields
unpredictable behavior as vCPUs will switch from dynamic to static
advancement, changing the advancement in the middle of VM creation can
result in different values for vCPUs within a VM, etc.  Those flaws are
mostly fixable, but there's almost no justification for taking on yet more
complexity (it's minimal complexity, but still non-zero).

The only arguments against using KVM's adaptive tuning is if a setup needs
a higher maximum, or if the adjustments are too reactive, but those are
arguments for letting userspace control the absolute max advancement and
the granularity of each adjustment, e.g. similar to how KVM provides knobs
for halt polling.

Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
Cc: Shuling Zhou <zhoushuling@huawei.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240522010304.1650603-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:08:05 -04:00
Ravi Bangoria
b7e4be0a22 KVM: SEV-ES: Delegate LBR virtualization to the processor
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. Although KVM currently enforces LBRV for SEV-ES guests, there
are multiple issues with it:

o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
  interception is used to dynamically toggle LBRV for performance reasons,
  this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:

  [guest ~]# wrmsr 0x1d9 0x4
  KVM: entry failed, hardware error 0xffffffff
  EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000

  Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
  No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
  is of swap type A.

o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
  VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
  encryption.

[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
     2023, Vol 2, 15.35.2 Enabling SEV-ES.
     https://bugzilla.kernel.org/attachment.cgi?id=304653

Fixes: 376c6d2850 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Co-developed-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:07:18 -04:00
Ravi Bangoria
d922056215 KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. So, prevent SEV-ES guests when LBRV support is missing.

[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
     2023, Vol 2, 15.35.2 Enabling SEV-ES.
     https://bugzilla.kernel.org/attachment.cgi?id=304653

Fixes: 376c6d2850 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:06:48 -04:00
Nikunj A Dadhania
27bd5fdc24 KVM: SEV-ES: Prevent MSR access post VMSA encryption
KVM currently allows userspace to read/write MSRs even after the VMSA is
encrypted. This can cause unintentional issues if MSR access has side-
effects. For ex, while migrating a guest, userspace could attempt to
migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
the target. Fix this by preventing access to those MSRs which are context
switched via the VMSA, once the VMSA is encrypted.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:06:48 -04:00
Tom Lendacky
b2ec042347 KVM: SVM: Remove the need to trigger an UNBLOCK event on AP creation
All SNP APs are initially started using the APIC INIT/SIPI sequence in
the guest. This sequence moves the AP MP state from
KVM_MP_STATE_UNINITIALIZED to KVM_MP_STATE_RUNNABLE, so there is no need
to attempt the UNBLOCK.

As it is, the UNBLOCK support in SVM is only enabled when AVIC is
enabled. When AVIC is disabled, AP creation is still successful.

Remove the KVM_REQ_UNBLOCK request from the AP creation code and revert
the changes to the vcpu_unblocking() kvm_x86_ops path.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 12:38:17 -04:00
Paolo Bonzini
73137f5924 KVM: SEV: Don't WARN() if RMP lookup fails when invalidating gmem pages
The hook only handles cleanup work specific to SNP, e.g. RMP table
entries and flushing caches for encrypted guest memory. When run on a
non-SNP-enabled host (currently only possible using
KVM_X86_SW_PROTECTED_VM, e.g. via KVM selftests), the callback is a noop
and will WARN due to the RMP table not being present. It's actually
expected in this case that the RMP table wouldn't be present and that
the hook should be a noop, so drop the WARN_ONCE().

Reported-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/kvm/ZkU3_y0UoPk5yAeK@google.com/
Fixes: 8eb01900b0 ("KVM: SEV: Implement gmem hook for invalidating private pages")
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 12:38:14 -04:00
Michael Roth
febff040b1 KVM: SEV: Automatically switch reclaimed pages to shared
Currently there's a consistent pattern of always calling
host_rmp_make_shared() immediately after snp_page_reclaim(), so go ahead
and handle it automatically as part of snp_page_reclaim(). Also rename
it to kvm_rmp_make_shared() to more easily distinguish it as a
KVM-specific variant of the more generic rmp_make_shared() helper.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 12:36:52 -04:00
Linus Torvalds
c3f38fa61a Linux 6.10-rc2 v6.10-rc2 2024-06-02 15:44:56 -07:00
Linus Torvalds
58d89ee81a Merge tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:

 - Add a quirk for three different devices that have shown issues with
   LPM (link power management). These devices appear to not implement
   LPM properly, since we see command timeouts when enabling LPM. The
   quirk disables LPM for these problematic devices. (Me)

 - Do not apply the Intel PCS quirk on Alder Lake. The quirk is not
   needed and was originally added by mistake when LPM support was
   enabled for this AHCI controller. Enabling the quirk when not needed
   causes the the controller to not be able to detect the connected
   devices on some platforms.

* tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
  ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
  ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
2024-06-02 13:30:53 -07:00
Linus Torvalds
a693b9c95a Merge tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Miscellaneous topology parsing fixes:

   - Fix topology parsing regression on older CPUs in the new AMD/Hygon
     parser

   - Fix boot crash on odd Intel Quark and similar CPUs that do not fill
     out cpuinfo_x86::x86_clflush_size and zero out
     cpuinfo_x86::x86_cache_alignment as a result.

     Provide 32 bytes as a general fallback value.

   - Fix topology enumeration on certain rare CPUs where the BIOS locks
     certain CPUID leaves and the kernel unlocked them late, which broke
     with the new topology parsing code. Factor out this unlocking logic
     and move it earlier in the parsing sequence"

* tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology/intel: Unlock CPUID before evaluating anything
  x86/cpu: Provide default cache line size if not enumerated
  x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
2024-06-02 09:32:34 -07:00
Linus Torvalds
3fca58ffad Merge tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Export a symbol to make life easier for instrumentation/debugging"

* tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86: Export 'percpu arch_freq_scale'
2024-06-02 09:23:35 -07:00
Linus Torvalds
efa8f11a7e Merge tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fix from Ingo Molnar:
 "Add missing MODULE_DESCRIPTION() lines"

* tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
  perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
2024-06-02 09:20:37 -07:00
Linus Torvalds
00a8c352dd Merge tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:

 - scsi: mpt3sas: Avoid possible run-time warning with long manufacturer
   strings

 - mailmap: update entry for Kees Cook

 - kunit/fortify: Remove __kmalloc_node() test

* tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Remove __kmalloc_node() test
  mailmap: update entry for Kees Cook
  scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings
2024-06-02 09:15:28 -07:00
Linus Torvalds
83814698cf Merge tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Enforce full ordering for ATOMIC operations with BPF_FETCH

 - Fix uaccess build errors seen with GCC 13/14

 - Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT

 - Drop error message from lparcfg guest name lookup

Thanks to Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao,
Puranjay Mohan, and Samuel Holland.

* tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
  powerpc/uaccess: Use YZ asm constraint for ld
  powerpc/uaccess: Fix build errors seen with GCC 13/14
  powerpc/pseries/lparcfg: drop error message from guest name lookup
  powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
2024-06-01 17:34:35 -07:00
Linus Torvalds
54bec8ed57 Merge tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
 "After merging a commit 1fffe7a34c ("script: modpost: emit a warning
  when the description is missing"), MODULE_DESCRIPTOR seems to be
  mandatory for kernel modules. In FireWire subsystem, the most of
  practical kernel modules have the field, while KUnit test modules do
  not. A single patch is applied to fix them"

* tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: add missing MODULE_DESCRIPTION() to test modules
2024-06-01 17:05:00 -07:00
Linus Torvalds
89be4025b0 Merge tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Two small smb3 fixes:

   - Fix socket creation with sfu mount option (spotted by test generic/423)

   - Minor cleanup: fix missing description in two files"

* tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix creating sockets when using sfu mount options
  fs: smb: common: add missing MODULE_DESCRIPTION() macros
2024-06-01 14:35:57 -07:00
Linus Torvalds
ec9eeb89e6 Merge tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix a Kconfig bug regarding comparisons to 'm' or 'n'

 - Replace missed $(srctree)/$(src)

 - Fix unneeded kallsyms step 3

 - Remove incorrect "compatible" properties from image nodes in
   image.fit

 - Improve gen_kheaders.sh

 - Fix 'make dt_binding_check'

 - Clean up unnecessary code

* tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
  kheaders: use `command -v` to test for existence of `cpio`
  kheaders: explicitly define file modes for archived headers
  scripts/make_fit: Drop fdt image entry compatible string
  kbuild: remove a stale comment about cleaning in link-vmlinux.sh
  kbuild: fix short log for AS in link-vmlinux.sh
  kbuild: change scripts/mksysmap into sed script
  kbuild: avoid unneeded kallsyms step 3
  kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
  kconfig: remove redundant check in expr_join_or()
  kconfig: fix comparison to constant symbols, 'm', 'n'
  kconfig: remove unused expr_is_no()
2024-06-01 09:33:55 -07:00
Linus Torvalds
bbeb1219ee Merge tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:

 - Fix a livelock by dropping an xfarray sortinfo folio when an error
   is encountered

 - During extended attribute operations, Initialize transaction
   reservation computation based on attribute operation code

 - Relax symbolic link's ondisk verification code to allow symbolic
   links with short remote targets

 - Prevent soft lockups when unmapping file ranges and also during
   remapping blocks during a reflink operation

 - Fix compilation warnings when XFS is built with W=1 option

* tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Add cond_resched to block unmap range and reflink remap path
  xfs: don't open-code u64_to_user_ptr
  xfs: allow symlinks with short remote targets
  xfs: fix xfs_init_attr_trans not handling explicit operation codes
  xfs: drop xfarray sortinfo folio on error
  xfs: Stop using __maybe_unused in xfs_alloc.c
  xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()
2024-06-01 08:59:04 -07:00
Linus Torvalds
f26ee67a0f Merge tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
 "Here is a single revert for a much-reported regression in 6.10-rc1
  when it comes to a few older architectures.

  Turns out that the VT ioctls don't work the same across all cpu types
  because of some old compatibility requrements for stuff like alpha and
  powerpc. So revert the change that attempted to have them use the
  _IO() macros and go back to the known-working values instead.

  This has NOT been in linux-next but has had many reports that it fixes
  the issue with 6.10-rc1"

* tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "VT: Use macros to define ioctls"
2024-06-01 08:53:39 -07:00
Linus Torvalds
d9aab0b1c9 Merge tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
 "This fixes a wrong path walk triggered by syzkaller"

* tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add layout1.refer_mount_root
  landlock: Fix d_parent walk
2024-06-01 08:28:24 -07:00
Greg Kroah-Hartman
7bc4244c88 Revert "VT: Use macros to define ioctls"
This reverts commit 8c467f3300.

Turns out this breaks many architectures as the vt ioctls do not all
match up everywhere due to historical reasons, so the original commit is
invalid for many values.

Reported-by: Nick Bowler <nbowler@draconx.ca>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/ad4e561c-1d49-4f25-882c-7a36c6b1b5c0@draconx.ca
Link: https://lore.kernel.org/r/0da9785e-ba44-4718-9d08-4e96c1ba7ab2@kernel.org
Link: https://lore.kernel.org/all/34d848f4-670b-4493-bf21-130ef862521b@xenosoft.de/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-01 07:28:21 +02:00