Commit Graph

319 Commits

Author SHA1 Message Date
Linus Torvalds
4f712ee0cb Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00
Linus Torvalds
685d982112 Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:

 - The biggest change is the rework of the percpu code, to support the
   'Named Address Spaces' GCC feature, by Uros Bizjak:

      - This allows C code to access GS and FS segment relative memory
        via variables declared with such attributes, which allows the
        compiler to better optimize those accesses than the previous
        inline assembly code.

      - The series also includes a number of micro-optimizations for
        various percpu access methods, plus a number of cleanups of %gs
        accesses in assembly code.

      - These changes have been exposed to linux-next testing for the
        last ~5 months, with no known regressions in this area.

 - Fix/clean up __switch_to()'s broken but accidentally working handling
   of FPU switching - which also generates better code

 - Propagate more RIP-relative addressing in assembly code, to generate
   slightly better code

 - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
   make it easier for distros to follow & maintain these options

 - Rework the x86 idle code to cure RCU violations and to clean up the
   logic

 - Clean up the vDSO Makefile logic

 - Misc cleanups and fixes

* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/idle: Select idle routine only once
  x86/idle: Let prefer_mwait_c1_over_halt() return bool
  x86/idle: Cleanup idle_setup()
  x86/idle: Clean up idle selection
  x86/idle: Sanitize X86_BUG_AMD_E400 handling
  sched/idle: Conditionally handle tick broadcast in default_idle_call()
  x86: Increase brk randomness entropy for 64-bit systems
  x86/vdso: Move vDSO to mmap region
  x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
  x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
  x86/retpoline: Ensure default return thunk isn't used at runtime
  x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
  x86/vdso: Use $(addprefix ) instead of $(foreach )
  x86/vdso: Simplify obj-y addition
  x86/vdso: Consolidate targets and clean-files
  x86/bugs: Rename CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK
  x86/bugs: Rename CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
  x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
  x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
  x86/bugs: Rename CONFIG_SLS                  => CONFIG_MITIGATION_SLS
  ...
2024-03-11 19:53:15 -07:00
Linus Torvalds
38b334fc76 Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:

 - Add the x86 part of the SEV-SNP host support.

   This will allow the kernel to be used as a KVM hypervisor capable of
   running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
   is the ultimate goal of the AMD confidential computing side,
   providing the most comprehensive confidential computing environment
   up to date.

   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the
   next cycle.

 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel

 - The usual set of fixes, cleanups and improvements all over the place

* tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/sev: Disable KMSAN for memory encryption TUs
  x86/sev: Dump SEV_STATUS
  crypto: ccp - Have it depend on AMD_IOMMU
  iommu/amd: Fix failure return from snp_lookup_rmpentry()
  x86/sev: Fix position dependent variable references in startup code
  crypto: ccp: Make snp_range_list static
  x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  Documentation: virt: Fix up pre-formatted text block for SEV ioctls
  crypto: ccp: Add the SNP_SET_CONFIG command
  crypto: ccp: Add the SNP_COMMIT command
  crypto: ccp: Add the SNP_PLATFORM_STATUS command
  x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
  KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
  crypto: ccp: Handle legacy SEV commands when SNP is enabled
  crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  x86/sev: Introduce an SNP leaked pages list
  crypto: ccp: Provide an API to issue SEV and SNP commands
  ...
2024-03-11 17:44:11 -07:00
Linus Torvalds
720c857907 Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FRED support from Thomas Gleixner:
 "Support for x86 Fast Return and Event Delivery (FRED).

  FRED is a replacement for IDT event delivery on x86 and addresses most
  of the technical nightmares which IDT exposes:

   1) Exception cause registers like CR2 need to be manually preserved
      in nested exception scenarios.

   2) Hardware interrupt stack switching is suboptimal for nested
      exceptions as the interrupt stack mechanism rewinds the stack on
      each entry which requires a massive effort in the low level entry
      of #NMI code to handle this.

   3) No hardware distinction between entry from kernel or from user
      which makes establishing kernel context more complex than it needs
      to be especially for unconditionally nestable exceptions like NMI.

   4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
      is a problem when the perf NMI takes a fault when collecting a
      stack trace.

   5) Partial restore of ESP when returning to a 16-bit segment

   6) Limitation of the vector space which can cause vector exhaustion
      on large systems.

   7) Inability to differentiate NMI sources

  FRED addresses these shortcomings by:

   1) An extended exception stack frame which the CPU uses to save
      exception cause registers. This ensures that the meta information
      for each exception is preserved on stack and avoids the extra
      complexity of preserving it in software.

   2) Hardware interrupt stack switching is non-rewinding if a nested
      exception uses the currently interrupt stack.

   3) The entry points for kernel and user context are separate and GS
      BASE handling which is required to establish kernel context for
      per CPU variable access is done in hardware.

   4) NMIs are now nesting protected. They are only reenabled on the
      return from NMI.

   5) FRED guarantees full restore of ESP

   6) FRED does not put a limitation on the vector space by design
      because it uses a central entry points for kernel and user space
      and the CPUstores the entry type (exception, trap, interrupt,
      syscall) on the entry stack along with the vector number. The
      entry code has to demultiplex this information, but this removes
      the vector space restriction.

      The first hardware implementations will still have the current
      restricted vector space because lifting this limitation requires
      further changes to the local APIC.

   7) FRED stores the vector number and meta information on stack which
      allows having more than one NMI vector in future hardware when the
      required local APIC changes are in place.

  The series implements the initial FRED support by:

   - Reworking the existing entry and IDT handling infrastructure to
     accomodate for the alternative entry mechanism.

   - Expanding the stack frame to accomodate for the extra 16 bytes FRED
     requires to store context and meta information

   - Providing FRED specific C entry points for events which have
     information pushed to the extended stack frame, e.g. #PF and #DB.

   - Providing FRED specific C entry points for #NMI and #MCE

   - Implementing the FRED specific ASM entry points and the C code to
     demultiplex the events

   - Providing detection and initialization mechanisms and the necessary
     tweaks in context switching, GS BASE handling etc.

  The FRED integration aims for maximum code reuse vs the existing IDT
  implementation to the extent possible and the deviation in hot paths
  like context switching are handled with alternatives to minimalize the
  impact. The low level entry and exit paths are seperate due to the
  extended stack frame and the hardware based GS BASE swichting and
  therefore have no impact on IDT based systems.

  It has been extensively tested on existing systems and on the FRED
  simulation and as of now there are no outstanding problems"

* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/fred: Fix init_task thread stack pointer initialization
  MAINTAINERS: Add a maintainer entry for FRED
  x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
  x86/fred: Invoke FRED initialization code to enable FRED
  x86/fred: Add FRED initialization functions
  x86/syscall: Split IDT syscall setup code into idt_syscall_init()
  KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
  x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
  x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
  x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
  x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
  x86/traps: Add sysvec_install() to install a system interrupt handler
  x86/fred: FRED entry/exit and dispatch code
  x86/fred: Add a machine check entry stub for FRED
  x86/fred: Add a NMI entry stub for FRED
  x86/fred: Add a debug fault entry stub for FRED
  x86/idtentry: Incorporate definitions/declarations of the FRED entries
  x86/fred: Make exc_page_fault() work for FRED
  x86/fred: Allow single-step trap and NMI when starting a new task
  x86/fred: No ESPFIX needed when FRED is enabled
  ...
2024-03-11 16:00:17 -07:00
Paolo Bonzini
7d8942d8e7 Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09 11:48:35 -05:00
Ingo Molnar
4589f199eb Merge branch 'x86/bugs' into x86/core, to pick up pending changes before dependent patches
Merge in pending alternatives patching infrastructure changes, before
applying more patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-14 10:49:37 +01:00
Linus Torvalds
4356e9f841 work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09 15:57:48 -08:00
Paolo Bonzini
dcf0926e9b x86: replace CONFIG_HAVE_KVM with IS_ENABLED(CONFIG_KVM)
It is more accurate to check if KVM is enabled, instead of having the
architecture say so.  Architectures always "have" KVM, so for example
checking CONFIG_HAVE_KVM in x86 code is pointless, but if KVM is disabled
in a specific build, there is no need for support code.

Alternatively, many of the #ifdefs could simply be deleted.  However,
this would add completely dead code.  For example, when KVM is disabled,
there should not be any posted interrupts, i.e. NOT wiring up the "dummy"
handlers and treating IRQs on those vectors as spurious is the right
thing to do.

Cc: x86@kernel.org
Cc: kbingham@kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:45:35 -05:00
H. Peter Anvin (Intel)
cd6df3f378 x86/cpu: Add MSR numbers for FRED configuration
Add MSR numbers for the FRED configuration registers per FRED spec 5.0.

Originally-by: Megha Dey <megha.dey@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-13-xin3.li@intel.com
2024-01-31 22:01:05 +01:00
Arnaldo Carvalho de Melo
15d6daad8f tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
To pick the changes from:

  1e536e1068 ("x86/cpu: Detect TDX partial write machine check erratum")
  765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")
  30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
  04c3024560 ("x86/barrier: Do not serialize MSR accesses on AMD")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

      CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
      CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-30 10:09:18 -03:00
Brijesh Singh
b6e0f6666f x86/cpufeatures: Add SEV-SNP CPU feature
Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Since enabling the SNP CPU feature imposes a number of additional
requirements on host initialization and handling legacy firmware APIs
for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
relevant handling can be added, but leave it disabled via a
disabled-features mask.

Once all the necessary changes needed to maintain legacy SEV/SEV-ES
support are introduced in subsequent patches, the SNP feature bit will
be unmasked/enabled.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-2-michael.roth@amd.com
2024-01-29 17:13:16 +01:00
Arnaldo Carvalho de Melo
e30dca91e5 tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  a5d3df8ae1 ("KVM: remove deprecated UAPIs")
  6d72283526 ("KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT")
  89ea60c2c7 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
  8dd2eee9d5 ("KVM: x86/mmu: Handle page fault for private memory")
  a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
  5a475554db ("KVM: Introduce per-page memory attributes")
  16f95f3b95 ("KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace")
  bb58b90b1a ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2")
  3f9cd0ca84 ("KVM: arm64: Allow userspace to get the writable masks for feature ID registers")

That automatically adds support for some new ioctls and remove a bunch
of deprecated ones.

This ends up making the new binary to forget about the deprecated one,
so when used in an older system it will not be able to resolve those
codes to strings.

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2024-01-27 14:48:16.523014020 -0300
  +++ after	2024-01-27 14:48:24.183932866 -0300
  @@ -14,6 +14,7 @@
   	[0x46] = "SET_USER_MEMORY_REGION",
   	[0x47] = "SET_TSS_ADDR",
   	[0x48] = "SET_IDENTITY_MAP_ADDR",
  +	[0x49] = "SET_USER_MEMORY_REGION2",
   	[0x60] = "CREATE_IRQCHIP",
   	[0x61] = "IRQ_LINE",
   	[0x62] = "GET_IRQCHIP",
  @@ -22,14 +23,8 @@
   	[0x65] = "GET_PIT",
   	[0x66] = "SET_PIT",
   	[0x67] = "IRQ_LINE_STATUS",
  -	[0x69] = "ASSIGN_PCI_DEVICE",
   	[0x6a] = "SET_GSI_ROUTING",
  -	[0x70] = "ASSIGN_DEV_IRQ",
   	[0x71] = "REINJECT_CONTROL",
  -	[0x72] = "DEASSIGN_PCI_DEVICE",
  -	[0x73] = "ASSIGN_SET_MSIX_NR",
  -	[0x74] = "ASSIGN_SET_MSIX_ENTRY",
  -	[0x75] = "DEASSIGN_DEV_IRQ",
   	[0x76] = "IRQFD",
   	[0x77] = "CREATE_PIT2",
   	[0x78] = "SET_BOOT_CPU_ID",
  @@ -66,7 +61,6 @@
   	[0x9f] = "GET_VCPU_EVENTS",
   	[0xa0] = "SET_VCPU_EVENTS",
   	[0xa3] = "ENABLE_CAP",
  -	[0xa4] = "ASSIGN_SET_INTX_MASK",
   	[0xa5] = "SIGNAL_MSI",
   	[0xa6] = "GET_XCRS",
   	[0xa7] = "SET_XCRS",
  @@ -97,6 +91,8 @@
   	[0xcd] = "SET_SREGS2",
   	[0xce] = "GET_STATS_FD",
   	[0xd0] = "XEN_HVM_EVTCHN_SEND",
  +	[0xd2] = "SET_MEMORY_ATTRIBUTES",
  +	[0xd4] = "CREATE_GUEST_MEMFD",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

This silences these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZbVLbkngp4oq13qN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-27 15:29:47 -03:00
Arnaldo Carvalho de Melo
1743726689 tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
To pick up the changes in:

  765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")

Addressing this tools/perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2024-01-25 11:08:12.363223880 -0300
  +++ after	2024-01-25 11:08:24.839307699 -0300
  @@ -21,6 +21,7 @@
   	[0x0000004f] = "PPIN",
   	[0x00000060] = "LBR_CORE_TO",
   	[0x00000079] = "IA32_UCODE_WRITE",
  +	[0x00000087] = "IA32_MKTME_KEYID_PARTITIONING",
   	[0x0000008b] = "AMD64_PATCH_LEVEL",
   	[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
   	[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
  Using CPUID GenuineIntel-6-8E-A
  0x87
  New filter for msr:read_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
  0x87
  New filter for msr:write_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
   0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
  				   do_trace_write_msr ([kernel.kallsyms])
  				   do_trace_write_msr ([kernel.kallsyms])
  				   __switch_to_xtra ([kernel.kallsyms])
  				   __switch_to ([kernel.kallsyms])
  				   __schedule ([kernel.kallsyms])
  				   schedule ([kernel.kallsyms])
  				   futex_wait_queue_me ([kernel.kallsyms])
  				   futex_wait ([kernel.kallsyms])
  				   do_futex ([kernel.kallsyms])
  				   __x64_sys_futex ([kernel.kallsyms])
  				   do_syscall_64 ([kernel.kallsyms])
  				   entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
  				   __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
   0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
  				   do_trace_write_msr ([kernel.kallsyms])
  				   do_trace_write_msr ([kernel.kallsyms])
  				   __switch_to_xtra ([kernel.kallsyms])
  				   __switch_to ([kernel.kallsyms])
  				   __schedule ([kernel.kallsyms])
  				   schedule_idle ([kernel.kallsyms])
  				   do_idle ([kernel.kallsyms])
  				   cpu_startup_entry ([kernel.kallsyms])
  				   secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJt27rjkQVU1YoP@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-01-26 10:51:48 -03:00
H. Peter Anvin (Intel)
e554a8ca49 x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled
Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
cpu_feature_enabled() work correctly with FRED.

Originally-by: Megha Dey <megha.dey@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-8-xin3.li@intel.com
2024-01-25 19:10:30 +01:00
H. Peter Anvin (Intel)
51c158f7aa x86/cpufeatures: Add the CPU feature bit for FRED
Any FRED enabled CPU will always have the following features as its
baseline:

  1) LKGS, load attributes of the GS segment but the base address into
     the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
     cache.

  2) WRMSRNS, non-serializing WRMSR for faster MSR writes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-7-xin3.li@intel.com
2024-01-25 19:10:30 +01:00
Xin Li
a4cb5ece14 x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support
WRMSRNS is an instruction that behaves exactly like WRMSR, with
the only difference being that it is not a serializing instruction
by default. Under certain conditions, WRMSRNS may replace WRMSR to
improve performance.

Add its CPU feature bit, opcode to the x86 opcode map, and an
always inline API __wrmsrns() to embed WRMSRNS into the code.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231205105030.8698-2-xin3.li@intel.com
2024-01-25 19:10:29 +01:00
Breno Leitao
0911b8c52c x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-11-leitao@debian.org
2024-01-10 10:52:29 +01:00
Breno Leitao
ac61d43983 x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
Step 7/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-8-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
aefb2f2e61 x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted a few more uses in comments/messages as well. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
ea4654e088 x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
5fa31af31e x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
Step 3/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org
2024-01-10 10:52:28 +01:00
Linus Torvalds
bef91c28f2 Merge tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature updates from Borislav Petkov:

 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
   and use them everywhere instead of ad-hoc family/model checks. Drop
   an ancient AMD errata checking facility as a result

 - Fix a fragile initcall ordering in intel_epb

 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
   X2APIC MSRs on AMD as it is not needed there

* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add X86_FEATURE_ZEN1
  x86/CPU/AMD: Drop now unused CPU erratum checking function
  x86/CPU/AMD: Get rid of amd_erratum_1485[]
  x86/CPU/AMD: Get rid of amd_erratum_400[]
  x86/CPU/AMD: Get rid of amd_erratum_383[]
  x86/CPU/AMD: Get rid of amd_erratum_1054[]
  x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
  x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
  x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
  x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
  x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
  x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
  x86/CPU/AMD: Carve out the erratum 1386 fix
  x86/CPU/AMD: Add ZenX generations flags
  x86/cpu/intel_epb: Don't rely on link order
  x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08 15:45:31 -08:00
Borislav Petkov (AMD)
232afb5578 x86/CPU/AMD: Add X86_FEATURE_ZEN1
Add a synthetic feature flag specifically for first generation Zen
machines. There's need to have a generic flag for all Zen generations so
make X86_FEATURE_ZEN be that flag.

Fixes: 30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com
2023-12-12 11:17:37 +01:00
Namhyung Kim
c23708f376 tools headers: Update tools's copy of x86/asm headers
tldr; Just FYI, I'm carrying this on the perf tools tree.

Full explanation:

There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.

The way these headers are used in perf are not restricted to just
including them to compile something.

There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.

E.g.:

  $ ls -1 tools/perf/trace/beauty/*.sh | head -5
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/fsmount.sh
  $
  $ tools/perf/trace/beauty/fadvise.sh
  static const char *fadvise_advices[] = {
        [0] = "NORMAL",
        [1] = "RANDOM",
        [2] = "SEQUENTIAL",
        [3] = "WILLNEED",
        [4] = "DONTNEED",
        [5] = "NOREUSE",
  };
  $

The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231121225650.390246-8-namhyung@kernel.org
2023-11-22 10:57:47 -08:00
Linus Torvalds
189b756271 Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19 10:10:14 -07:00
Jiri Slaby (SUSE)
31c65705a8 perf/benchmark: fix seccomp_unotify benchmark for 32-bit
Commit 7d5cb68af6 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68af6 (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18 17:47:18 -07:00
Arnaldo Carvalho de Melo
15ca35494e tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  1b5277c0ea ("x86/srso: Add SRSO_NO support")
  8974eb5882 ("x86/speculation: Add Gather Data Sampling mitigation")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZQGismCqcDddjEIQ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13 08:53:37 -03:00
Linus Torvalds
0c02183427 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Clean up vCPU targets, always returning generic v8 as the preferred
     target

   - Trap forwarding infrastructure for nested virtualization (used for
     traps that are taken from an L2 guest and are needed by the L1
     hypervisor)

   - FEAT_TLBIRANGE support to only invalidate specific ranges of
     addresses when collapsing a table PTE to a block PTE. This avoids
     that the guest refills the TLBs again for addresses that aren't
     covered by the table PTE.

   - Fix vPMU issues related to handling of PMUver.

   - Don't unnecessary align non-stack allocations in the EL2 VA space

   - Drop HCR_VIRT_EXCP_MASK, which was never used...

   - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
     parameter instead

   - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()

   - Remove prototypes without implementations

  RISC-V:

   - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest

   - Added ONE_REG interface for SATP mode

   - Added ONE_REG interface to enable/disable multiple ISA extensions

   - Improved error codes returned by ONE_REG interfaces

   - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V

   - Added get-reg-list selftest for KVM RISC-V

  s390:

   - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)

     Allows a PV guest to use crypto cards. Card access is governed by
     the firmware and once a crypto queue is "bound" to a PV VM every
     other entity (PV or not) looses access until it is not bound
     anymore. Enablement is done via flags when creating the PV VM.

   - Guest debug fixes (Ilya)

  x86:

   - Clean up KVM's handling of Intel architectural events

   - Intel bugfixes

   - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
     debug registers and generate/handle #DBs

   - Clean up LBR virtualization code

   - Fix a bug where KVM fails to set the target pCPU during an IRTE
     update

   - Fix fatal bugs in SEV-ES intrahost migration

   - Fix a bug where the recent (architecturally correct) change to
     reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
     skip it)

   - Retry APIC map recalculation if a vCPU is added/enabled

   - Overhaul emergency reboot code to bring SVM up to par with VMX, tie
     the "emergency disabling" behavior to KVM actually being loaded,
     and move all of the logic within KVM

   - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
     TSC ratio MSR cannot diverge from the default when TSC scaling is
     disabled up related code

   - Add a framework to allow "caching" feature flags so that KVM can
     check if the guest can use a feature without needing to search
     guest CPUID

   - Rip out the ancient MMU_DEBUG crud and replace the useful bits with
     CONFIG_KVM_PROVE_MMU

   - Fix KVM's handling of !visible guest roots to avoid premature
     triple fault injection

   - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
     API surface that is needed by external users (currently only
     KVMGT), and fix a variety of issues in the process

  Generic:

   - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
     events to pass action specific data without needing to constantly
     update the main handlers.

   - Drop unused function declarations

  Selftests:

   - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs

   - Add support for printf() in guest code and covert all guest asserts
     to use printf-based reporting

   - Clean up the PMU event filter test and add new testcases

   - Include x86 selftests in the KVM x86 MAINTAINERS entry"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
  KVM: x86/mmu: Include mmu.h in spte.h
  KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
  KVM: x86/mmu: Disallow guest from using !visible slots for page tables
  KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
  KVM: x86/mmu: Harden new PGD against roots without shadow pages
  KVM: x86/mmu: Add helper to convert root hpa to shadow page
  drm/i915/gvt: Drop final dependencies on KVM internal details
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86: Remove the unused page-track hook track_flush_slot()
  drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  ...
2023-09-07 13:52:20 -07:00
Paolo Bonzini
bd7fe98b35 Merge tag 'kvm-x86-svm-6.6' of https://github.com/kvm-x86/linux into HEAD
KVM: x86: SVM changes for 6.6:

 - Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug
   registers and generate/handle #DBs

 - Clean up LBR virtualization code

 - Fix a bug where KVM fails to set the target pCPU during an IRTE update

 - Fix fatal bugs in SEV-ES intrahost migration

 - Fix a bug where the recent (architecturally correct) change to reinject
   #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
2023-08-31 13:32:40 -04:00
Linus Torvalds
1687d8aca5 Merge tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Dave Hansen:
 "This includes a very thorough rework of the 'struct apic' handlers.
  Quite a variety of them popped up over the years, especially in the
  32-bit days when odd apics were much more in vogue.

  The end result speaks for itself, which is a removal of a ton of code
  and static calls to replace indirect calls.

  If there's any breakage here, it's likely to be around the 32-bit
  museum pieces that get light to no testing these days.

  Summary:

   - Rework apic callbacks, getting rid of unnecessary ones and
     coalescing lots of silly duplicates.

   - Use static_calls() instead of indirect calls for apic->foo()

   - Tons of cleanups an crap removal along the way"

* tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/apic: Turn on static calls
  x86/apic: Provide static call infrastructure for APIC callbacks
  x86/apic: Wrap IPI calls into helper functions
  x86/apic: Mark all hotpath APIC callback wrappers __always_inline
  x86/xen/apic: Mark apic __ro_after_init
  x86/apic: Convert other overrides to apic_update_callback()
  x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()
  x86/apic: Provide apic_update_callback()
  x86/xen/apic: Use standard apic driver mechanism for Xen PV
  x86/apic: Provide common init infrastructure
  x86/apic: Wrap apic->native_eoi() into a helper
  x86/apic: Nuke ack_APIC_irq()
  x86/apic: Remove pointless arguments from [native_]eoi_write()
  x86/apic/noop: Tidy up the code
  x86/apic: Remove pointless NULL initializations
  x86/apic: Sanitize APIC ID range validation
  x86/apic: Prepare x2APIC for using apic::max_apic_id
  x86/apic: Simplify X2APIC ID validation
  x86/apic: Add max_apic_id member
  x86/apic: Wrap APIC ID validation into an inline
  ...
2023-08-30 10:44:46 -07:00
Linus Torvalds
b03a434214 Merge tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:

 - Provide USER_NOTIFY flag for synchronous mode (Andrei Vagin, Peter
   Oskolkov). This touches the scheduler and perf but has been Acked by
   Peter Zijlstra.

 - Fix regression in syscall skipping and restart tracing on arm32. This
   touches arch/arm/ but has been Acked by Arnd Bergmann.

* tag 'seccomp-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Add missing kerndoc notations
  ARM: ptrace: Restore syscall skipping for tracers
  ARM: ptrace: Restore syscall restart tracing
  selftests/seccomp: Handle arm32 corner cases better
  perf/benchmark: add a new benchmark for seccom_unotify
  selftest/seccomp: add a new test for the sync mode of seccomp_user_notify
  seccomp: add the synchronous mode for seccomp_unotify
  sched: add a few helpers to wake up tasks on the current cpu
  sched: add WF_CURRENT_CPU and externise ttwu
  seccomp: don't use semaphore and wait_queue together
2023-08-28 12:38:26 -07:00
Linus Torvalds
b4f63b0f2d Merge tag 'perf-tools-fixes-for-v6.5-3-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Revert a patch that unconditionally resolved addresses to inlines in
   callchains, something that was done before when DWARF mode was asked
   for, but could as well be done when just frame pointers (the default)
   was selected.

   This enriches the callchains with inlines but the way to resolve it
   is gross right now, relying on addr2line, and even if we come up with
   an efficient way of processing all the associated DWARF info for a
   big file as vmlinux is, this has to be something people opt-in, as it
   will still result in overheads, so revert it until we get this done
   in a saner way.

 - Update the x86 msr-index.h header with the kernel original, no change
   in tooling output, just addresses a tools/perf build warning.

  - Resolve a regression where special "tool events", such as
    "duration_time" were being presented for all CPUs, when it only
    makes sense to show it for the workload, that is, just once.

* tag 'perf-tools-fixes-for-v6.5-3-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf stat: Don't display zero tool counts
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  Revert "perf report: Append inlines to non-DWARF callchains"
2023-08-09 21:06:14 -07:00
Arnaldo Carvalho de Melo
8cdd4aeff2 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZND17H7BI4ariERn@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-08 10:52:32 -03:00
Xin Li
6e3edb0fb5 tools: Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools
IRQ_MOVE_CLEANUP_VECTOR is not longer in use. Remove the last traces.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230621171248.6805-4-xin3.li@intel.com
2023-08-06 14:15:10 +02:00
Alexey Kardashevskiy
d1f85fbe83 KVM: SEV: Enable data breakpoints in SEV-ES
Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load most _guest_ debug
registers (except DR6/7), and KVM cannot manually context switch guest
DRs due the VMSA being encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB.  Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialised but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 intercepts as KVM can't modify guest DR7, and intercepting
DR7 would completely defeat the purpose of enabling DebugSwap.

Make X86_FEATURE_DEBUG_SWAP appear in /proc/cpuinfo (by not adding "") to
let the operator know if the VM can debug.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-7-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:12:56 -07:00
Borislav Petkov (AMD)
0e52740ffd x86/bugs: Increase the x86 bugs vector size to two u32s
There was never a doubt in my mind that they would not fit into a single
u32 eventually.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-18 09:35:38 +02:00
Andrei Vagin
7d5cb68af6 perf/benchmark: add a new benchmark for seccom_unotify
The benchmark is similar to the pipe benchmark. It creates two processes,
one is calling syscalls, and another process is handling them via seccomp
user notifications. It measures the time required to run a specified number
of interations.

 $ ./perf bench sched  seccomp-notify --sync-mode --loop 1000000
 # Running 'sched/seccomp-notify' benchmark:
 # Executed 1000000 system calls

     Total time: 2.769 [sec]

       2.769629 usecs/op
         361059 ops/sec

 $ ./perf bench sched  seccomp-notify
 # Running 'sched/seccomp-notify' benchmark:
 # Executed 1000000 system calls

     Total time: 8.571 [sec]

       8.571119 usecs/op
         116670 ops/sec

Signed-off-by: Andrei Vagin <avagin@google.com>
Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230308073201.3102738-7-avagin@google.com
Link: https://lore.kernel.org/r/20230630051953.454638-1-avagin@gmail.com
[kees: Added PRIu64 format string]
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-07-17 16:08:08 -07:00
Linus Torvalds
8c69e7afe9 Merge tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 instruction alternatives updates from Borislav Petkov:

 - Up until now the Fast Short Rep Mov optimizations implied the
   presence of the ERMS CPUID flag. AMD decoupled them with a BIOS
   setting so decouple that dependency in the kernel code too

 - Teach the alternatives machinery to handle relocations

 - Make debug_alternative accept flags in order to see only that set of
   patching done one is interested in

 - Other fixes, cleanups and optimizations to the patching code

* tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: PAUSE is not a NOP
  x86/alternatives: Add cond_resched() to text_poke_bp_batch()
  x86/nospec: Shorten RESET_CALL_DEPTH
  x86/alternatives: Add longer 64-bit NOPs
  x86/alternatives: Fix section mismatch warnings
  x86/alternative: Optimize returns patching
  x86/alternative: Complicate optimize_nops() some more
  x86/alternative: Rewrite optimize_nops() some
  x86/lib/memmove: Decouple ERMS from FSRM
  x86/alternative: Support relocations in alternatives
  x86/alternative: Make debug-alternative selective
2023-06-26 15:14:55 -07:00
Peter Zijlstra
df25edbac3 x86/alternatives: Add longer 64-bit NOPs
By adding support for longer NOPs there are a few more alternatives
that can turn into a single instruction.

Add up to NOP11, the same limit where GNU as .nops also stops
generating longer nops. This is because a number of uarchs have severe
decode penalties for more than 3 prefixes.

  [ bp: Sync up with the version in tools/ while at it. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515093020.661756940@infradead.org
2023-05-31 10:21:21 +02:00
Tiezhu Yang
4e111f0cf0 perf bench syscall: Fix __NR_execve undeclared build error
The __NR_execve definition for i386 was deleted by mistake
in the commit ece7f7c050 ("perf bench syscall: Add fork
syscall benchmark"), add it to fix the build error on i386.

Fixes: ece7f7c050 ("perf bench syscall: Add fork syscall benchmark")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: loongson-kernel@lists.loongnix.cn
Closes: https://lore.kernel.org/all/CA+G9fYvgBR1iB0CorM8OC4AM_w_tFzyQKHc+rF6qPzJL=TbfDQ@mail.gmail.com/
Link: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19 12:08:00 -03:00
Arnaldo Carvalho de Melo
1b5f159ce8 tools headers disabled-features: Sync with the kernel sources
To pick the changes from:

  e0bddc19ba ("x86/mm: Reduce untagged_addr() overhead for systems without LAM")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTpdlzrlRjjnY6K@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 11:49:47 -03:00
Arnaldo Carvalho de Melo
29719e3198 tools headers UAPI: Sync arch prctl headers with the kernel sources
To pick the changes in this cset:

  a03c376eba ("x86/arch_prctl: Add AMX feature numbers as ABI constants")
  23e5d9ec2b ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
  2f8794bd08 ("x86/mm: Provide arch_prctl() interface for LAM")

This picks these new prctls in a third range, that was also added to the
tools/perf/trace/beauty/arch_prctl.c beautifier.

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  @@ -20,3 +20,11 @@
   	[0x2003 - 0x2001]= "MAP_VDSO_64",
   };

  +#define x86_arch_prctl_codes_3_offset 0x4001
  +static const char *x86_arch_prctl_codes_3[] = {
  +	[0x4001 - 0x4001]= "GET_UNTAG_MASK",
  +	[0x4002 - 0x4001]= "ENABLE_TAGGED_ADDR",
  +	[0x4003 - 0x4001]= "GET_MAX_TAG_BITS",
  +	[0x4004 - 0x4001]= "FORCE_TAGGED_SVA",
  +};
  +
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTjNPpD3FOWfetM@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 11:23:43 -03:00
Arnaldo Carvalho de Melo
9bc83d6e38 tools headers x86 cpufeatures: Sync with the kernel sources
To pick the changes from:

  3d8f61bf8b ("x86: KVM: Add common feature flag for AMD's PSFD")
  3763bf5802 ("x86/cpufeatures: Redefine synthetic virtual NMI bit as AMD's "real" vNMI")
  6449dcb0ca ("x86: CPUID and CR3/CR4 flags for Linear Address Masking")
  be8de49bea ("x86/speculation: Identify processors vulnerable to SMT RSB predictions")
  e7862eda30 ("x86/cpu: Support AMD Automatic IBRS")
  faabfcb194 ("x86/cpu, kvm: Add the SMM_CTL MSR not present feature")
  5b909d4ae5 ("x86/cpu, kvm: Add the Null Selector Clears Base feature")
  84168ae786 ("x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf")
  a9dc9ec5a1 ("x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature")
  f8df91e73a ("x86/cpufeatures: Add macros for Intel's new fast rep string features")
  78335aac61 ("x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag")
  f334f723a6 ("x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag")
  a018d2e3d4 ("x86/cpufeatures: Add Architectural PerfMon Extension bit")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/lkml/ZGTTw642q8mWgv2Y@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-17 10:39:40 -03:00
Yanteng Si
34e82891d9 tools arch x86: Sync the msr-index.h copy with the kernel sources
Picking the changes from:

  c68e3d4739 ("x86/include/asm/msr-index.h: Add IFS Array test bits")

Silencing these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/05778ab3c168c8030f6b20e60375dc803f0cd300.1683712945.git.siyanteng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-10 14:19:20 -03:00
Yanteng Si
705049ca4f tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
Picking the changes from:

  e65733b5c5 ("KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL")
  30ec7997d1 ("KVM: arm64: timers: Allow userspace to set the global counter offset")
  821d935c87 ("KVM: arm64: Introduce support for userspace SMCCC filtering")
  81dc9504a7 ("KVM: arm64: nv: timers: Support hyp timer emulation")
  a8308b3fc9 ("KVM: arm64: Refactor hvc filtering to support different actions")
  0e5c9a9d65 ("KVM: arm64: Expose SMC/HVC width to userspace")

Silencing these perf build warnings:

 Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
 diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
 Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
 diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
 Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
 diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/ac5adb58411d23b3360d436a65038fefe91c32a8.1683712945.git.siyanteng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-10 14:19:20 -03:00
Linus Torvalds
f085df1be6 Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Third version of perf tool updates, with the build problems with with
  using a 'vmlinux.h' generated from the main build fixed, and the bpf
  skeleton build disabled by default.

  Build:

   - Require libtraceevent to build, one can disable it using
     NO_LIBTRACEEVENT=1.

     It is required for tools like 'perf sched', 'perf kvm', 'perf
     trace', etc.

     libtraceevent is available in most distros so installing
     'libtraceevent-devel' should be a one-time event to continue
     building perf as usual.

     Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
     sufficient for lots of users not interested in those libtraceevent
     dependent features.

   - Allow Python support in 'perf script' when libtraceevent isn't
     linked, as not all features requires it, for instance Intel PT does
     not use tracepoints.

   - Error if the python interpreter needed for jevents to work isn't
     available and NO_JEVENTS=1 isn't set, preventing a build without
     support for JSON vendor events, which is a rare but possible
     condition. The two check error messages:

        $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
        $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

   - Make libbpf 1.0 the minimum required when building with out of
     tree, distro provided libbpf.

   - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
     demangler, add 'perf test' entry for it.

   - Make binutils libraries opt in, as distros disable building with it
     due to licensing, they were used for C++ demangling, for instance.

   - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
     equivalent) isn't installed, we'll just have a build warning:

       Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

   - Add a feature test for scandirat(), that is not implemented so far
     in musl and uclibc, disabling features that need it, such as
     scanning for tracepoints in /sys/kernel/tracing/events.

  perf BPF filters:

   - New feature where BPF can be used to filter samples, for instance:

      $ sudo ./perf record -e cycles --filter 'period > 1000' true
      $ sudo ./perf script
           perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
           perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
           perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
                true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
                true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

   - In addition to 'period' (PERF_SAMPLE_PERIOD), the other
     PERF_SAMPLE_ can be used for filtering, and also some other sample
     accessible values, from tools/perf/Documentation/perf-record.txt:

        Essentially the BPF filter expression is:

        <term> <operator> <value> (("," | "||") <term> <operator> <value>)*

     The <term> can be one of:
        ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
        code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
        p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
        mem_dtlb, mem_blk, mem_hops

     The <operator> can be one of:
        ==, !=, >, >=, <, <=, &

     The <value> can be one of:
        <number> (for any term)
        na, load, store, pfetch, exec (for mem_op)
        l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
        na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
        remote (for mem_remote)
        na, locked (for mem_locked)
        na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
        na, by_data, by_addr (for mem_blk)
        hops0, hops1, hops2, hops3 (for mem_hops)

  perf lock contention:

   - Show lock type with address.

   - Track and show mmap_lock, siglock and per-cpu rq_lock with address.
     This is done for mmap_lock by following the current->mm pointer:

      $ sudo ./perf lock con -abl -- sleep 10
       contended   total wait     max wait     avg wait            address   symbol
       ...
           16344    312.30 ms      2.22 ms     19.11 us   ffff8cc702595640
           17686    310.08 ms      1.49 ms     17.53 us   ffff8cc7025952c0
               3     84.14 ms     45.79 ms     28.05 ms   ffff8cc78114c478   mmap_lock
            3557     76.80 ms     68.75 us     21.59 us   ffff8cc77ca3af58
               1     68.27 ms     68.27 ms     68.27 ms   ffff8cda745dfd70
               9     54.53 ms      7.96 ms      6.06 ms   ffff8cc7642a48b8   mmap_lock
           14629     44.01 ms     60.00 us      3.01 us   ffff8cc7625f9ca0
            3481     42.63 ms    140.71 us     12.24 us   ffffffff937906ac   vmap_area_lock
           16194     38.73 ms     42.15 us      2.39 us   ffff8cd397cbc560
              11     38.44 ms     10.39 ms      3.49 ms   ffff8ccd6d12fbb8   mmap_lock
               1      5.43 ms      5.43 ms      5.43 ms   ffff8cd70018f0d8
            1674      5.38 ms    422.93 us      3.21 us   ffffffff92e06080   tasklist_lock
             581      4.51 ms    130.68 us      7.75 us   ffff8cc9b1259058
               5      3.52 ms      1.27 ms    703.23 us   ffff8cc754510070
             112      3.47 ms     56.47 us     31.02 us   ffff8ccee38b3120
             381      3.31 ms     73.44 us      8.69 us   ffffffff93790690   purge_vmap_area_lock
             255      3.19 ms     36.35 us     12.49 us   ffff8d053ce30c80

   - Update default map size to 16384.

   - Allocate single letter option -M for --map-nr-entries, as it is
     proving being frequently used.

   - Fix struct rq lock access for older kernels with BPF's CO-RE
     (Compile once, run everywhere).

   - Fix problems found with MSAn.

  perf report/top:

   - Add inline information when using --call-graph=fp or lbr, as was
     already done to the --call-graph=dwarf callchain mode.

   - Improve the 'srcfile' sort key performance by really using an
     optimization introduced in 6.2 for the 'srcline' sort key that
     avoids calling addr2line for comparision with each sample.

  perf sched:

   - Make 'perf sched latency/map/replay' to use "sched:sched_waking"
     instead of "sched:sched_waking", consistent with 'perf record'
     since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
     exists").

  perf ftrace:

   - Make system wide the default target for latency subcommand, run the
     following command then generate some network traffic and press
     control+C:

       # perf ftrace latency -T __kfree_skb
     ^C
         DURATION     |      COUNT | GRAPH                                          |
          0 - 1    us |         27 | #############                                  |
          1 - 2    us |         22 | ###########                                    |
          2 - 4    us |          8 | ####                                           |
          4 - 8    us |          5 | ##                                             |
          8 - 16   us |         24 | ############                                   |
         16 - 32   us |          2 | #                                              |
         32 - 64   us |          1 |                                                |
         64 - 128  us |          0 |                                                |
        128 - 256  us |          0 |                                                |
        256 - 512  us |          0 |                                                |
        512 - 1024 us |          0 |                                                |
          1 - 2    ms |          0 |                                                |
          2 - 4    ms |          0 |                                                |
          4 - 8    ms |          0 |                                                |
          8 - 16   ms |          0 |                                                |
         16 - 32   ms |          0 |                                                |
         32 - 64   ms |          0 |                                                |
         64 - 128  ms |          0 |                                                |
        128 - 256  ms |          0 |                                                |
        256 - 512  ms |          0 |                                                |
        512 - 1024 ms |          0 |                                                |
          1 - ...   s |          0 |                                                |
       #

  perf top:

   - Add --branch-history (LBR: Last Branch Record) option, just like
     already available for 'perf record'.

   - Fix segfault in thread__comm_len() where thread->comm was being
     used outside thread->comm_lock.

  perf annotate:

   - Allow configuring objdump and addr2line in ~/.perfconfig., so that
     you can use alternative binaries, such as llvm's.

  perf kvm:

   - Add TUI mode for 'perf kvm stat report'.

  Reference counting:

   - Add reference count checking infrastructure to check for use after
     free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
     more to come.

     To build with it use -DREFCNT_CHECKING=1 in the make command line
     to build tools/perf. Documented at:

       https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

   - The above caught, for instance, fix, present in this series:

        - Fix maps use after put in 'perf test "Share thread maps"':

          'maps' is copied from leader, but the leader is put on line 79
          and then 'maps' is used to read the reference count below - so
          a use after put, with the put of maps happening within
          thread__put.

     Fixed by reversing the order of puts so that the leader is put
     last.

   - Also several fixes were made to places where reference counts were
     not being held.

   - Make this one of the tests in 'make -C tools/perf build-test' to
     regularly build test it and to make sure no direct access to the
     reference counted structs are made, doing that via accessors to
     check the validity of the struct pointer.

  ARM64:

   - Fix 'perf report' segfault when filtering coresight traces by
     sparse lists of CPUs.

   - Add support for 'simd' as a sort field for 'perf report', to show
     ARM's NEON SIMD's predicate flags: "partial" and "empty".

  arm64 vendor events:

   - Add N1 metrics.

  Intel vendor events:

   - Add graniterapids, grandridge and sierraforrest events.

   - Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
     broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
     jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
     silvermont, skylake, tigerlake and westmereep-dp

   - Refresh metrics for alderlake-n, broadwell, broadwellde,
     broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
     skylakex.

  perf stat:

   - Implement --topdown using JSON metrics.

   - Add TopdownL1 JSON metric as a default if present, but disable it
     for now for some Intel hybrid architectures, a series of patches
     addressing this is being reviewed and will be submitted for v6.5.

   - Use metrics for --smi-cost.

   - Update topdown documentation.

  Vendor events (JSON) infrastructure:

   - Add support for computing and printing metric threshold values. For
     instance, here is one found in thesapphirerapids json file:

       {
           "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
           "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
           "MetricGroup": "smi",
           "MetricName": "smi_cycles",
           "MetricThreshold": "smi_cycles > 0.1",
           "ScaleUnit": "100%"
       },

   - Test parsing metric thresholds with the fake PMU in 'perf test
     pmu-events'.

   - Support for printing metric thresholds in 'perf list'.

   - Add --metric-no-threshold option to 'perf stat'.

   - Add rand (reverse and) and has_pmem (optane memory) support to
     metrics.

   - Sort list of input files to avoid depending on the order from
     readdir() helping in obtaining reproducible builds.

  S/390:

   - Add common metrics: - CPI (cycles per instruction), prbstate (ratio
     of instructions executed in problem state compared to total number
     of instructions), l1mp (Level one instruction and data cache misses
     per 100 instructions).

   - Add cache metrics for z13, z14, z15 and z16.

   - Add metric for TLB and cache.

  ARM:

   - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
     (Memory Tagging Extension) and MOPS (Memory Operations) load/store.

  Intel PT hardware tracing:

   - Add event type names UINTR (User interrupt delivered) and UIRET
     (Exiting from user interrupt routine), documented in table 32-50
     "CFE Packet Type and Vector Fields Details" in the Intel Processor
     Trace chapter of The Intel SDM Volume 3 version 078.

   - Add support for new branch instructions ERETS and ERETU.

   - Fix CYC timestamps after standalone CBR

  ARM CoreSight hardware tracing:

   - Allow user to override timestamp and contextid settings.

   - Fix segfault in dso lookup.

   - Fix timeless decode mode detection.

   - Add separate decode paths for timeless and per-thread modes.

  auxtrace:

   - Fix address filter entire kernel size.

  Miscellaneous:

   - Fix use-after-free and unaligned bugs in the PLT handling routines.

   - Use zfree() to reduce chances of use after free.

   - Add missing 0x prefix for addresses printed in hexadecimal in 'perf
     probe'.

   - Suppress massive unsupported target platform errors in the unwind
     code.

   - Fix return incorrect build_id size in elf_read_build_id().

   - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

   - Add missing new parameter in kfree_skb tracepoint to the python
     scripts using it.

   - Add 'perf bench syscall fork' benchmark.

   - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
     'perf mem'.

   - Fix wrong size expectation for perf test 'Setup struct
     perf_event_attr' caused by the patch adding
     perf_event_attr::config3.

   - Fix some spelling mistakes"

* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
  Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
  Revert "perf build: Warn for BPF skeletons if endian mismatches"
  perf metrics: Fix SEGV with --for-each-cgroup
  perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
  perf stat: Separate bperf from bpf_profiler
  perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
  perf test record+probe_libc_inet_pton: Fix call chain match on s390
  perf tracepoint: Fix memory leak in is_valid_tracepoint()
  perf cs-etm: Add fix for coresight trace for any range of CPUs
  perf build: Fix unescaped # in perf build-test
  perf unwind: Suppress massive unsupported target platform errors
  perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
  perf script: Print raw ip instead of binary offset for callchain
  perf symbols: Fix return incorrect build_id size in elf_read_build_id()
  perf list: Modify the warning message about scandirat(3)
  perf list: Fix memory leaks in print_tracepoint_events()
  perf lock contention: Rework offset calculation with BPF CO-RE
  perf lock contention: Fix struct rq lock access
  perf stat: Disable TopdownL1 on hybrid
  perf stat: Avoid SEGV on counter->name
  ...
2023-05-07 11:32:18 -07:00
Linus Torvalds
2aff7c706c Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Tiezhu Yang
ece7f7c050 perf bench syscall: Add fork syscall benchmark
This is a follow up patch for the execve bench which is actually
fork + execve, it makes sense to add the fork syscall benchmark
to compare the execve part precisely.

Some archs have no __NR_fork definition which is used only as a
check condition to call test_fork(), let us just define it as -1
to avoid build error.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/1679381821-22736-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:55 -03:00
Josh Poimboeuf
fb799447ae x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.

The problem is UNWIND_HINT_EMPTY is used in two different situations:

  1) The end of the kernel stack unwind before hitting user entry, boot
     code, or fork entry

  2) A blind spot in ORC coverage where the unwinder has to bail due to
     lack of information about the next frame

The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.

Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf
f902cfdd46 x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things.  Separate them
out more explicitly.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:57 +01:00