Pull RISC-V fixes from Palmer Dabbelt:
- prevent users from enabling the alternatives framework (and thus
errata handling) on XIP kernels, where runtime code patching does not
function correctly.
- properly detect offset overflow for AUIPC-based relocations in
modules. This may manifest as modules calling arbitrary invalid
addresses, depending on the address allocated when a module is
loaded.
* tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix auipc+jalr relocation range checks
riscv: alternative only works on !XIP_KERNEL
RISC-V can do PC-relative jumps with a 32bit range using the following
two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12
jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
are treated as two's-complement signed values. For this reason the
immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12
imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When
the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
20 bits and imm12 considered positive. When the 11th bit is 1 the carry
of the addition by 0x800 means imm20 is one higher, but since imm12 is
then considered negative the two's complement representation means it
all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or
equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
result in a backwards jump. Similarly the lower range of offset is also
moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The alternative mechanism needs runtime code patching, it can't work
on XIP_KERNEL. And the errata workarounds are implemented via the
alternative mechanism. So add !XIP_KERNEL dependency for alternative
and erratas.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 44c9225729 ("RISC-V: enable XIP")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull RISC-V fixes from Palmer Dabbelt:
- Fixes for a handful of KASAN-related crashes.
- A fix to avoid a crash during boot for SPARSEMEM &&
!SPARSEMEM_VMEMMAP configurations.
- A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.
- A fix for the K210's device tree to properly populate the interrupt
map, so hart1 will get interrupts again.
* tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: k210: fix broken IRQs on hart1
riscv: Fix kasan pud population
riscv: Move high_memory initialization to setup_bootmem
riscv: Fix config KASAN && DEBUG_VIRTUAL
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix is_linear_mapping with recent move of KASAN region
Commit 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: <stable@vger.kernel.org>
Fixes: 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In sv48, the kasan inner regions are not aligned on PGDIR_SIZE and then
when we populate the kasan linear mapping region, we clear the kasan
vmalloc region which is in the same PGD.
Fix this by copying the content of the kasan early pud after allocating a
new PGD for the first time.
Fixes: e8a62cc26d ("riscv: Implement sv48 support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.
It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.
Fix this by initializing high_memory in setup_bootmem.
------------[ cut here ]------------
virt_to_phys used for non-linear address: ffffffffffffffff (0xffffffffffffffff)
WARNING: CPU: 0 PID: 0 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xac/0x1b8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00007-ga68b89289e26 #27
Hardware name: riscv-virtio,qemu (DT)
epc : __virt_to_phys+0xac/0x1b8
ra : __virt_to_phys+0xac/0x1b8
epc : ffffffff80014922 ra : ffffffff80014922 sp : ffffffff84a03c30
gp : ffffffff85866c80 tp : ffffffff84a3f180 t0 : ffffffff86bce657
t1 : fffffffef09406e8 t2 : 0000000000000000 s0 : ffffffff84a03c70
s1 : ffffffffffffffff a0 : 000000000000004f a1 : 00000000000f0000
a2 : 0000000000000002 a3 : ffffffff8011f408 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffff84a03747
s2 : ffffffd800000000 s3 : ffffffff86ef4000 s4 : ffffffff8467f828
s5 : fffffff800000000 s6 : 8000000000006800 s7 : 0000000000000000
s8 : 0000000480000000 s9 : 0000000080038ea0 s10: 0000000000000000
s11: ffffffffffffffff t3 : ffffffff84a035c0 t4 : fffffffef09406e8
t5 : fffffffef09406e9 t6 : ffffffff84a03758
status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff8322ef4c>] cma_declare_contiguous_nid+0xf2/0x64a
[<ffffffff83212a58>] dma_contiguous_reserve_area+0x46/0xb4
[<ffffffff83212c3a>] dma_contiguous_reserve+0x174/0x18e
[<ffffffff83208fc2>] paging_init+0x12c/0x35e
[<ffffffff83206bd2>] setup_arch+0x120/0x74e
[<ffffffff83201416>] start_kernel+0xce/0x68c
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<0000000000000000>] 0x0
softirqs last enabled at (0): [<0000000000000000>] 0x0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.
Fix this by declaring phys_addr.c as non-kasan instrumentable.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721 (riscv: Add KASAN support)
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.
Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.
Fix this by redefining the condition that matches wrong addresses.
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.
But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.
Fix this by removing the usage of this function in kasan_early_init.
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.
Fix this by using the maximum size of the physical memory.
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the K210 sdcard defconfig, to avoid using a
fixed delay for the root FS
- A fix to make sure there's a proper call frame for
trace_hardirqs_{on,off}().
* tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix oops caused by irqsoff latency tracer
riscv: fix nommu_k210_sdcard_defconfig
Pull RISC-V fixes from Palmer Dabbelt:
"A set of three fixes, all aimed at fixing some fallout from the recent
sparse hart ID support"
* tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix IPI/RFENCE hmask on non-monotonic hartid ordering
RISC-V: Fix handling of empty cpu masks
RISC-V: Fix hartid mask handling for hartid 31 and up
If the boot CPU does not have the lowest hartid, "hartid - hbase" can
become negative, leading to an incorrect hmask, causing userspace to
crash with SEGV. This is observed on e.g. Starlight Beta, where cpuid 1
maps to hartid 0, and cpuid 0 maps to hartid 1.
Fix this by detecting this case, and shifting the accumulated mask and
updating hbase, if possible.
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The cpumask rework slightly changed the behavior of the code. Fix this
by treating an empty cpumask as meaning all online CPUs.
Extracted from a patch by Atish Patra <atishp@rivosinc.com>.
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jessica reports that using "1 << hartid" causes undefined behavior for
hartid 31 and up.
Fix this by using the BIT() helper instead of an explicit shift.
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid undefined behavior when stack backtracing, which
manifests in GCC as incorrect stack addresses
- A few fixes for the XIP kernels
- A fix to tracking NUMA state on CPU hotplug
- Support for the recently relesaed binutils-2.38, which changed the
default ISA version to one without CSRs or fence.i in 'I' extension
* tag 'riscv-for-linus-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix build with binutils 2.38
riscv: cpu-hotplug: clear cpu from numa map when teardown
riscv: extable: fix err reg writing in dedicated uaccess handler
riscv/mm: Add XIP_FIXUP for riscv_pfn_base
riscv/mm: Add XIP_FIXUP for phys_ram_base
riscv: Fix XIP_FIXUP_FLASH_OFFSET
riscv: eliminate unreliable __builtin_frame_address(1)
From version 2.38, binutils default to ISA spec version 20191213. This
means that the csr read/write (csrr*/csrw*) instructions and fence.i
instruction has separated from the `I` extension, become two standalone
extensions: Zicsr and Zifencei. As the kernel uses those instruction,
this causes the following build failure:
CC arch/riscv/kernel/vdso/vgettimeofday.o
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
The fix is to specify those extensions explicitely in -march. However as
older binutils version do not support this, we first need to detect
that.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
There is numa_add_cpu() when cpus online, accordingly, there should be
numa_remove_cpu() when cpus offline.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 4f0e8eef77 ("riscv: Add numa support for riscv64 platform")
Cc: stable@vger.kernel.org
[Palmer: Add missing NUMA include]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Mayuresh reported commit 20802d8d47 ("riscv: extable: add a dedicated
uaccess handler") breaks the writev02 test case in LTP. This is due to
the err reg isn't correctly set with the errno(-EFAULT in writev02
case). First of all, the err and zero regs are reg numbers rather than
reg offsets in struct pt_regs; Secondly, regs_set_gpr() should write
the regs when offset isn't zero(zero means epc)
Fix it by correcting regs_set_gpr() logic and passing the correct reg
offset to it.
Reported-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Fixes: 20802d8d47 ("riscv: extable: add a dedicated uaccess handler")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been
delivered
- Workaround for Cortex-A510's single-step[ erratum
This manifests as a crash early in boot on VexRiscv.
Signed-off-by: Myrtle Shah <gatecat@ds0.me>
[Palmer: split commit]
Fixes: 6d7f91d914 ("riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
There were several problems with the calculation. Not only was an 'and'
being computed into t1 but thrown away; but the 'and' itself would
cause problems if the granularity of the XIP physical address was less
than XIP_OFFSET - in my case I had the kernel image at 2MB in SPI flash.
Fixes: f9ace4ede4 ("riscv: remove .text section size limitation for XIP")
Cc: stable@vger.kernel.org
Signed-off-by: Myrtle Shah <gatecat@ds0.me>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
I tried different pieces of code which uses __builtin_frame_address(1)
(with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
expected on riscv64. The result is negative.
What the compiler had generated is as below:
31 fp = (unsigned long)__builtin_frame_address(1);
0xffffffff80006024 <+200>: ld s1,0(s0)
It takes '0(s0)' as the address of frame 1 (caller), but the actual address
should be '-16(s0)'.
| ... | <-+
+-----------------+ |
| return address | |
| previous fp | |
| saved registers | |
| local variables | |
$fp --> | ... | |
+-----------------+ |
| return address | |
| previous fp --------+
| saved registers |
$sp --> | local variables |
+-----------------+
This leads the kernel can not dump the full stack trace on riscv.
[ 7.222126][ T1] Call Trace:
[ 7.222804][ T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a
This problem is not exposed on most riscv builds just because the '0(s0)'
occasionally is the address frame 2 (caller's caller), if only ra and fp
are stored in frame 1 (caller).
| ... | <-+
+-----------------+ |
| return address | |
$fp --> | previous fp | |
+-----------------+ |
| return address | |
| previous fp --------+
| saved registers |
$sp --> | local variables |
+-----------------+
This could be a *bug* of gcc that should be fixed. But as noted in gcc
manual "Calling this function with a nonzero argument can have
unpredictable effects, including crashing the calling program.", let's
remove the '__builtin_frame_address(1)' in backtrace code.
With this fix now it can show full stack trace:
[ 10.444838][ T1] Call Trace:
[ 10.446199][ T1] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a
[ 10.447711][ T1] [<ffffffff800060ac>] show_stack+0x32/0x3e
[ 10.448710][ T1] [<ffffffff80a005c0>] dump_stack_lvl+0x58/0x7a
[ 10.449941][ T1] [<ffffffff80a005f6>] dump_stack+0x14/0x1c
[ 10.450929][ T1] [<ffffffff804c04ee>] ubsan_epilogue+0x10/0x5a
[ 10.451869][ T1] [<ffffffff804c092e>] __ubsan_handle_load_invalid_value+0x6c/0x78
[ 10.453049][ T1] [<ffffffff8018f834>] __pagevec_release+0x62/0x64
[ 10.455476][ T1] [<ffffffff80190830>] truncate_inode_pages_range+0x132/0x5be
[ 10.456798][ T1] [<ffffffff80190ce0>] truncate_inode_pages+0x24/0x30
[ 10.457853][ T1] [<ffffffff8045bb04>] kill_bdev+0x32/0x3c
...
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Fixes: eac2f3059e ("riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The SBI implementation version returned by KVM RISC-V should be the
Host Linux version code.
Fixes: c62a768597 ("RISC-V: KVM: Add SBI v0.2 base extension")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Those applications that run in VU mode and access the time CSR cause
a virtual instruction trap as Guest kernel currently does not
initialize the scounteren CSR.
To fix this, we should make CY, TM, and IR counters accessibile
by default in VU mode (similar to OpenSBI).
Fixes: a33c72faf2 ("RISC-V: KVM: Implement VCPU create, init and
destroy functions")
Cc: stable@vger.kernel.org
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
(EQS) by calling guest_enter_irqoff(), and unmask IRQs prior to exiting
the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU
in this case, we may run the core IRQ code and IRQ handler without RCU
watching, leading to various potential problems.
Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.
This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:
guest_timing_enter_irqoff();
guest_state_enter_irqoff();
< run the vcpu >
guest_state_exit_irqoff();
< take any pending IRQs >
guest_timing_exit_irqoff();
Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_riscv_enter_exit_vcpu() helper which is marked
noinstr.
Fixes: 99cdc6c18c ("RISC-V: Add initial skeletal KVM support")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anup Patel <anup@brainfault.org>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Tested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
Pull more RISC-V updates from Palmer Dabbelt:
- Support for sv48 paging
- Hart ID mappings are now sparse, which enables more CPUs to come up
on systems with sparse hart IDs
- A handful of cleanups and fixes
* tag 'riscv-for-linus-5.17-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (27 commits)
RISC-V: nommu_virt: Drop unused SLAB_MERGE_DEFAULT
RISC-V: Remove redundant err variable
riscv: dts: sifive unmatched: Add gpio poweroff
riscv: canaan: remove useless select of non-existing config SYSCON
RISC-V: Do not use cpumask data structure for hartid bitmap
RISC-V: Move spinwait booting method to its own config
RISC-V: Move the entire hart selection via lottery to SMP
RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
RISC-V: Do not print the SBI version during HSM extension boot print
RISC-V: Avoid using per cpu array for ordered booting
riscv: default to CONFIG_RISCV_SBI_V01=n
riscv: fix boolconv.cocci warnings
riscv: Explicit comment about user virtual address space size
riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
riscv: Implement sv48 support
asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
riscv: Allow to dynamically define VA_BITS
riscv: Introduce functions to switch pt_ops
riscv: Split early kasan mapping to prepare sv48 introduction
riscv: Move KASAN mapping next to the kernel mapping
...
Our nommu_virt_defconfig set SLOB=y and SLAB_MERGE_DEFAULT=n. As of
eb52c0fc23 ("mm: Make SLAB_MERGE_DEFAULT depend on SL[AU]B") it's no
longer necessary to set the second, which appears to never have had any
effect for SLOB=y anyway.
This was suggested by savedefconfig.
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Some of the GPIO pins on the Unmatched are wire up to control the power
of the board, indicate that in the device tree.
Signed-off-by: Ron Economos <w6rz@comcast.net>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, SBI APIs accept a hartmask that is generated from struct
cpumask. Cpumask data structure can hold upto NR_CPUs value. Thus, it
is not the correct data structure for hartids as it can be higher
than NR_CPUs for platforms with sparse or discontguous hartids.
Remove all association between hartid mask and struct cpumask.
Reviewed-by: Anup Patel <anup@brainfault.org> (For Linux RISC-V changes)
Acked-by: Anup Patel <anup@brainfault.org> (For KVM RISC-V changes)
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The spinwait booting method should only be used for platforms with older
firmware without SBI HSM extension or M-mode firmware because spinwait
method can't support cpu hotplug, kexec or sparse hartid. It is better
to move the entire spinwait implementation to its own config which can
be disabled if required. It is enabled by default to maintain backward
compatibility and M-mode Linux.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The booting hart selection via lottery is only useful for SMP systems.
Moreover, the lottery selection is only necessary for systems using
spinwait booting method. It is better to keep the entire lottery
selection together so that it can be disabled in future.
Move the lottery selection code to under CONFIG_SMP.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The __cpu_up_stack/task_pointer array is only used for spinwait method
now. The per cpu array based lookup is also fragile for platforms with
discontiguous/sparse hartids. The spinwait method is only used for
M-mode Linux or older firmwares without SBI HSM extension. For general
Linux systems, ordered booting method is preferred anyways to support
cpu hotplug and kexec.
Make sure that __cpu_up_stack/task_pointer is only used for spinwait
method. Take this opportunity to rename it to
__cpu_spinwait_stack/task_pointer to emphasize the purpose as well.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The HSM extension information log also prints the SBI version v0.2. This
is misleading as the underlying firmware SBI version may be different
from v0.2.
Remove the unncessary printing of SBI version.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently both order booting and spinwait approach uses a per cpu
array to update stack & task pointer. This approach will not work for the
following cases.
1. If NR_CPUs are configured to be less than highest hart id.
2. A platform has sparse hartid.
This issue can be fixed for ordered booting as the booting cpu brings up
one cpu at a time using SBI HSM extension which has opaque parameter
that is unused until now.
Introduce a common secondary boot data structure that can store the stack
and task pointer. Secondary harts will use this data while booting up
to setup the sp & tp.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The SBI 0.1 specification is obsolete. The current version is 0.3.
Hence we should not rely by default on SBI 0.1 being implemented.
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/riscv/mm/init.c:48:11-16: WARNING: conversion to bool not needed here
Remove unneeded conversion to bool
Semantic patch information:
Relational and logical operators evaluate to bool,
explicit conversion is overly verbose and unneeded.
Generated by: scripts/coccinelle/misc/boolconv.cocci
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patchset allows to have a single kernel for sv39 and sv48 without
being relocatable.
The idea comes from Arnd Bergmann who suggested to do the same as x86,
that is mapping the kernel to the end of the address space, which allows
the kernel to be linked at the same address for both sv39 and sv48 and
then does not require to be relocated at runtime.
This implements sv48 support at runtime. The kernel will try to boot
with 4-level page table and will fallback to 3-level if the HW does not
support it. Folding the 4th level into a 3-level page table has almost
no cost at runtime.
Note that kasan region had to be moved to the end of the address space
since its location must be known at compile-time and then be valid for
both sv39 and sv48 (and sv57 that is coming).
* riscv-sv48-v3:
riscv: Explicit comment about user virtual address space size
riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
riscv: Implement sv48 support
asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
riscv: Allow to dynamically define VA_BITS
riscv: Introduce functions to switch pt_ops
riscv: Split early kasan mapping to prepare sv48 introduction
riscv: Move KASAN mapping next to the kernel mapping
riscv: Get rid of MAXPHYSMEM configs
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Define precisely the size of the user accessible virtual space size
for sv32/39/48 mmu types and explain why the whole virtual address
space is split into 2 equal chunks between kernel and user space.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Now that the mmu type is determined at runtime using SATP
characteristic, use the global variable pgtable_l4_enabled to output
mmu type of the processor through /proc/cpuinfo instead of relying on
device tree infos.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
By adding a new 4th level of page table, give the possibility to 64bit
kernel to address 2^48 bytes of virtual address: in practice, that offers
128TB of virtual address space to userspace and allows up to 64TB of
physical memory.
If the underlying hardware does not support sv48, we will automatically
fallback to a standard 3-level page table by folding the new PUD level into
PGDIR level. In order to detect HW capabilities at runtime, we
use SATP feature that ignores writes with an unsupported mode.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
With 4-level page table folding at runtime, we don't know at compile time
the size of the virtual address space so we must set VA_BITS dynamically
so that sparsemem reserves the right amount of memory for struct pages.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>