Pull arm64 fixes from Will Deacon:
"The usual summary below, but the main fix is for the fast GUP lockless
page-table walk when we have a combination of compile-time and
run-time folding of the p4d and the pud respectively.
- Remove some redundant Kconfig conditionals
- Fix string output in ptrace selftest
- Fix fast GUP crashes in some page-table configurations
- Remove obsolete linker option when building the vDSO
- Fix some sysreg field definitions for the GIC"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Fix lockless walks with static and dynamic page-table folding
arm64/sysreg: Correct the values for GICv4.1
arm64/vdso: Remove --hash-style=sysv
kselftest: missing arg in ptrace.c
arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER'
arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig
Change the asm/unistd.h header for arm64 to no longer include
asm-generic/unistd.h itself, but instead generate both the asm/unistd.h
contents and the list of entry points using the syscall.tbl scripts that
we use on most other architectures.
Once his is done for the remaining architectures, the generic unistd.h
header can be removed and the generated tbl file put in its place.
The Makefile changes are more complex than they should be, I need
a little help to improve those. Ideally this should be done in an
architecture-independent way as well.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This is a straight conversion from the old asm/unistd32.h into the
format used by 32-bit arm and most other architectures, calling scripts
to generate the asm/unistd32.h header and a new asm/syscalls32.h headers.
I used a semi-automated text replacement method to do the conversion,
and then used 'vimdiff' to synchronize the whitespace and the (unused)
names of the non-compat syscalls with the arm version.
There are two differences between the generated syscalls names and the
old version:
- the old asm/unistd32.h contained only a __NR_sync_file_range2
entry, while the arm32 version also defines
__NR_arm_sync_file_range with the same number. I added this
duplicate back in asm/unistd32.h.
- __NR__sysctl was removed from the arm64 file a while ago, but
all the tables still contain it. This should probably get removed
everywhere but I added it here for consistency.
On top of that, the arm64 version does not contain any references to
the 32-bit OABI syscalls that are not supported by arm64. If we ever
want to share the file between arm32 and arm64, it would not be
hard to add support for both in one file.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cortex-X4 and Neoverse-V3 suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time. This is described in their Software Developer Errata Notice (SDEN)
documents:
* Cortex-X4 SDEN v8.0, erratum 3194386:
https://developer.arm.com/documentation/SDEN-2432808/0800/
* Neoverse-V3 SDEN v6.0, erratum 3312417:
https://developer.arm.com/documentation/SDEN-2891958/0600/
To workaround these errata, it is necessary to place a speculation
barrier (SB) after MSR to the SSBS special-purpose register. This patch
adds the requisite SB after writes to SSBS within the kernel, and hides
the presence of SSBS from EL0 such that userspace software which cares
about SSBS will manipulate this via prctl(PR_GET_SPECULATION_CTRL, ...).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
* for-next/stage1-lpa2: (48 commits)
: Add support for LPA2 and WXN and stage 1
arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed
arm64/mm: Use generic __pud_free() helper in pud_free() implementation
arm64: gitignore: ignore relacheck
arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange
arm64: mm: Make PUD folding check in set_pud() a runtime check
arm64: mm: add support for WXN memory translation attribute
mm: add arch hook to validate mmap() prot flags
arm64: defconfig: Enable LPA2 support
arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
arm64: ptdump: Deal with translation levels folded at runtime
arm64: ptdump: Disregard unaddressable VA space
arm64: mm: Add support for folding PUDs at runtime
arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
arm64: mm: Add 5 level paging support to fixmap and swapper handling
arm64: Enable LPA2 at boot if supported by the system
arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
arm64: mm: Add definitions to support 5 levels of paging
arm64: mm: Add LPA2 support to phys<->pte conversion routines
arm64: mm: Wire up TCR.DS bit to PTE shareability fields
...
* arm64/for-next/perf: (39 commits)
docs: perf: Fix build warning of hisi-pcie-pmu.rst
perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
MAINTAINERS: Add entry for StarFive StarLink PMU
docs: perf: Add description for StarFive's StarLink PMU
dt-bindings: perf: starfive: Add JH8100 StarLink PMU
perf: starfive: Add StarLink PMU support
docs: perf: Update usage for target filter of hisi-pcie-pmu
drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()
drivers/perf: hisi_pcie: Relax the check on related events
drivers/perf: hisi_pcie: Check the target filter properly
drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth
drivers/perf: hisi_pcie: Fix incorrect counting under metric mode
drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()
drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()
drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09
perf/arm_cspmu: Add devicetree support
dt-bindings/perf: Add Arm CoreSight PMU
perf/arm_cspmu: Simplify counter reset
perf/arm_cspmu: Simplify attribute groups
perf/arm_cspmu: Simplify initialisation
...
* for-next/reorg-va-space:
: Reorganise the arm64 kernel VA space in preparation for LPA2 support
: (52-bit VA/PA).
arm64: kaslr: Adjust randomization range dynamically
arm64: mm: Reclaim unused vmemmap region for vmalloc use
arm64: vmemmap: Avoid base2 order of struct page size to dimension region
arm64: ptdump: Discover start of vmemmap region at runtime
arm64: ptdump: Allow all region boundaries to be defined at boot time
arm64: mm: Move fixmap region above vmemmap region
arm64: mm: Move PCI I/O emulation region above the vmemmap region
* for-next/rust-for-arm64:
: Enable Rust support for arm64
arm64: rust: Enable Rust support for AArch64
rust: Refactor the build target to allow the use of builtin targets
* for-next/misc:
: Miscellaneous arm64 patches
ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
arm64: Remove enable_daif macro
arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception
arm64: cpufeatures: Clean up temporary variable to simplify code
arm64: Update setup_arch() comment on interrupt masking
arm64: remove unnecessary ifdefs around is_compat_task()
arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang
arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values
arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values
arm64/sve: Document that __SVE_VQ_MAX is much larger than needed
arm64: make member of struct pt_regs and it's offset macro in the same order
arm64: remove unneeded BUILD_BUG_ON assertion
arm64: kretprobes: acquire the regs via a BRK exception
arm64: io: permit offset addressing
arm64: errata: Don't enable workarounds for "rare" errata by default
* for-next/daif-cleanup:
: Clean up DAIF handling for EL0 returns
arm64: Unmask Debug + SError in do_notify_resume()
arm64: Move do_notify_resume() to entry-common.c
arm64: Simplify do_notify_resume() DAIF masking
* for-next/kselftest:
: Miscellaneous arm64 kselftest patches
kselftest/arm64: Test that ptrace takes effect in the target process
* for-next/documentation:
: arm64 documentation patches
arm64/sme: Remove spurious 'is' in SME documentation
arm64/fp: Clarify effect of setting an unsupported system VL
arm64/sme: Fix cut'n'paste in ABI document
arm64/sve: Remove bitrotted comment about syscall behaviour
* for-next/sysreg:
: sysreg updates
arm64/sysreg: Update ID_AA64DFR0_EL1 register
arm64/sysreg: Update ID_DFR0_EL1 register fields
arm64/sysreg: Add register fields for ID_AA64DFR1_EL1
* for-next/dpisa:
: Support for 2023 dpISA extensions
kselftest/arm64: Add 2023 DPISA hwcap test coverage
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Handle FPMR context in generic signal frame parser
arm64/hwcap: Define hwcaps for 2023 DPISA features
arm64/ptrace: Expose FPMR via ptrace
arm64/signal: Add FPMR signal handling
arm64/fpsimd: Support FEAT_FPMR
arm64/fpsimd: Enable host kernel access to FPMR
arm64/cpufeature: Hook new identification registers up to cpufeature
FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
FP8 related features added to the architecture at the same time. Detect
support for this register and context switch it for EL0 when present.
Due to the sharing of responsibility for saving floating point state
between the host kernel and KVM FP8 support is not yet implemented in KVM
and a stub similar to that used for SVCR is provided for FPMR in order to
avoid bisection issues. To make it easier to share host state with the
hypervisor we store FPMR as a hardened usercopy field in uw (along with
some padding).
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Open-coding the feature matching parameters for LVA/LVA2 leads to
issues with upcoming changes to the cpufeature code.
By making TGRAN{4,16,64} and VARange signed/unsigned as per the
architecture, we can use the existing macros, making the feature
match robust against those changes.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* for-next/sysregs:
arm64/sysreg: Add missing system instruction definitions for FGT
arm64/sysreg: Add missing system register definitions for FGT
arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definition for FPMR
arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64PFR2_EL1
arm64/sysreg: update CPACR_EL1 register
arm64/sysreg: add system register POR_EL{0,1}
arm64/sysreg: Add definition for HAFGRTR_EL2
arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
* for-next/lpa2-prep:
arm64: mm: get rid of kimage_vaddr global variable
arm64: mm: Take potential load offset into account when KASLR is off
arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
arm64: Add ARM64_HAS_LPA2 CPU capability
arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
arm64/mm: Update tlb invalidation routines for FEAT_LPA2
arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
arm64/mm: Modify range-based tlbi to decrement scale
Back in 2016, it was argued that implementations lacking a HW
prefetcher could be helped by sprinkling a number of PRFM
instructions in strategic locations.
In 2023, the one platform that presumably needed this hack is no
longer in active use (let alone maintained), and an quick
experiment shows dropping this hack only leads to a 0.4% drop
on a full kernel compilation (tested on a MT30-GS0 48 CPU system).
Given that this is pretty much in the noise department and that
it may give odd ideas to other implementers, drop the hack for
good.
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in the hypervisor.
Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, we combine both into a single
capability, allowing us to simplify the implementation. KVM requires
support in both stages in order to use LPA2 since the same library is
used for hyp stage 1 and guest stage 2 pgtables.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com
Pull arm64 updates from Catalin Marinas:
"No major architecture features this time around, just some new HWCAP
definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.
The bulk of the changes is reworking of the CPU capability checking
code (cpus_have_cap() etc).
- Major refactoring of the CPU capability detection logic resulting
in the removal of the cpus_have_const_cap() function and migrating
the code to "alternative" branches where possible
- Backtrace/kgdb: use IPIs and pseudo-NMI
- Perf and PMU:
- Add support for Ampere SoC PMUs
- Multi-DTC improvements for larger CMN configurations with
multiple Debug & Trace Controllers
- Rework the Arm CoreSight PMU driver to allow separate
registration of vendor backend modules
- Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
driver; use device_get_match_data() in the xgene driver; fix
NULL pointer dereference in the hisi driver caused by calling
cpuhp_state_remove_instance(); use-after-free in the hisi driver
- HWCAP updates:
- FEAT_SVE_B16B16 (BFloat16)
- FEAT_LRCPC3 (release consistency model)
- FEAT_LSE128 (128-bit atomic instructions)
- SVE: remove a couple of pseudo registers from the cpufeature code.
There is logic in place already to detect mismatched SVE features
- Miscellaneous:
- Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
bouncing is needed. The buffer is still required for small
kmalloc() buffers
- Fix module PLT counting with !RANDOMIZE_BASE
- Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
synchronisation code out of the set_ptes() loop
- More compact cpufeature displaying enabled cores
- Kselftest updates for the new CPU features"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
arm64/mm: Hoist synchronization out of set_ptes() loop
...
Currently we have a negative cpucap which describes the *absence* of
FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
somewhat awkward relative to other cpucaps that describe the presence of
a feature, and it would be nicer to have a cpucap which describes the
presence of FP/SIMD:
* This will allow the cpucap to be treated as a standard
ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.
* This ensures that the cpucap will only transition from not-present to
present, reducing the risk of unintentional and/or unsafe usage of
FP/SIMD before cpucaps are finalized.
* This will allow using arm64_cpu_capabilities::cpu_enable() to enable
the use of FP/SIMD later, with FP/SIMD being disabled at boot time
otherwise. This will ensure that any unintentional and/or unsafe usage
of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
never unintentionally enabled for userspace in mismatched big.LITTLE
systems.
This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
Note that as FP/SIMD will now be trapped when not supported system-wide,
do_fpsimd_acc() must handle these traps in the same way as for SVE and
SME. The commentary in fpsimd_restore_current_state() is updated to
describe the new scheme.
No users of system_supports_fpsimd() need to know that FP/SIMD is
available prior to alternatives being patched, so this is updated to
use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
cpucap, without generating code to test the system_cpucaps bitmap.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For clarity it would be nice to factor cpucap manipulation out of
<asm/cpufeature.h>, and the obvious place would be <asm/cpucap.h>, but
this will clash somewhat with <generated/asm/cpucaps.h>.
Rename <generated/asm/cpucaps.h> to <generated/asm/cpucap-defs.h>,
matching what we do for <generated/asm/sysreg-defs.h>, and introduce a
new <asm/cpucaps.h> which includes the generated header.
Subsequent patches will fill out <asm/cpucaps.h>.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
FEAT_LRCPC3 adds more instructions to support the Release Consistency model.
Add a HWCAP so that userspace can make decisions about instructions it can use.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230919162757.2707023-2-joey.gouly@arm.com
[catalin.marinas@arm.com: change the HWCAP number]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
affected Cortex-A520 core, a speculatively executed unprivileged load
might leak data from a privileged load via a cache side channel. The
issue only exists for loads within a translation regime with the same
translation (e.g. same ASID and VMID). Therefore, the issue only affects
the return to EL0.
The workaround is to execute a TLBI before returning to EL0 after all
loads of privileged data. A non-shareable TLBI to any address is
sufficient.
The workaround isn't necessary if page table isolation (KPTI) is
enabled, but for simplicity it will be. Page table isolation should
normally be disabled for Cortex-A520 as it supports the CSV3 feature
and the E0PD feature (used when KASLR is enabled).
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230921194156.1050055-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
ClearBHB support is indicated by the CLRBHB field in ID_AA64ISAR2_EL1.
Following some refactoring the kernel incorrectly checks the BC field
instead. Fix the detection to use the right field.
(Note: The original ClearBHB support had it as FTR_HIGHER_SAFE, but this
patch uses FTR_LOWER_SAFE, which seems more correct.)
Also fix the detection of BC (hinted conditional branches) to use
FTR_LOWER_SAFE, so that it is not reported on mismatched systems.
Fixes: 356137e68a ("arm64/sysreg: Make BHB clear feature defines match the architecture")
Fixes: 8fcc8285c0 ("arm64/sysreg: Convert ID_AA64ISAR2_EL1 to automatic generation")
Cc: stable@vger.kernel.org
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230912133429.2606875-1-kristina.martsenko@arm.com
Signed-off-by: Will Deacon <will@kernel.org>