of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
When kcalloc fails, it missing of_node_put() and results in refcount
leak. Fix this by goto out_put_node label.
Fixes: 52085d3f20 ("irqchip/gic-v3: Dynamically allocate PPI partition descriptors")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220601080930.31005-5-linmq006@gmail.com
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: a5e8801202 ("irqchip/apple-aic: Parse FIQ affinities from device-tree")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220601080930.31005-4-linmq006@gmail.com
of_find_node_by_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: a5e8801202 ("irqchip/apple-aic: Parse FIQ affinities from device-tree")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220601080930.31005-3-linmq006@gmail.com
of_find_matching_node_and_match() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 82b0a434b4 ("irqchip/gic/realview: Support more RealView DCC variants")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220601080930.31005-2-linmq006@gmail.com
The Xilinx IRQ controller doesn't really have any architecture
dependencies - it's a generic AXI component that can be used for any
FPGA core from Zynq hard processor systems to microblaze+riscv soft
cores and more.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220606213952.298686-1-jamie@jamieiles.com
The hardware timestamp engine documentation is driver API material, and
really belongs in the driver-API book; move it there.
Cc: Thierry Reding <treding@nvidia.com>
Acked-by: Dipen Patel <dipenp@nvidia.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
After reinitialization of iavf, ice driver gets VIRTCHNL_OP_ADD_ETH_ADDR
message with incorrectly set type of MAC address. Hardware address should
have is_primary flag set as true. This way ice driver knows what it has
to set as a MAC address.
Check if the address is primary in iavf_add_filter function and set flag
accordingly.
To test set all-zero MAC on a VF. This triggers iavf re-initialization
and VIRTCHNL_OP_ADD_ETH_ADDR message gets sent to PF.
For example:
ip link set dev ens785 vf 0 mac 00:00:00:00:00:00
This triggers re-initialization of iavf. New MAC should be assigned.
Now check if MAC is non-zero:
ip link show dev ens785
Fixes: a3e839d539 ("iavf: Add usage of new virtchnl format to set default MAC")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After PF reset and ethtool -t there was call trace in dmesg
sometimes leading to panic. When there was some time, around 5
seconds, between reset and test there were no errors.
Problem was that pf reset calls i40e_vsi_close in prep_for_reset
and ethtool -t calls i40e_vsi_close in diag_test. If there was not
enough time between those commands the second i40e_vsi_close starts
before previous i40e_vsi_close was done which leads to crash.
Add check to diag_test if pf is in reset and don't start offline
tests if it is true.
Add netif_info("testing failed") into unhappy path of i40e_diag_test()
Fixes: e17bc411ae ("i40e: Disable offline diagnostics if VFs are enabled")
Fixes: 510efb2682 ("i40e: Fix ethtool offline diagnostic with netqueues")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If ADQ is enabled for a VF, then actual number of queue pair
is a number of currently available traffic classes for this VF.
Without this change the configuration of the Rx/Tx queues
fails with error.
Fixes: d29e0d233e ("i40e: missing input validation on VF message handling by the PF")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Procedure of configure tc flower filters erroneously allows to create
filters on TC0 where unfiltered packets are also directed by default.
Issue was caused by insufficient checks of hw_tc parameter specifying
the hardware traffic class to pass matching packets to.
Fix checking hw_tc parameter which blocks creation of filters on TC0.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The arch support status files don't match reality as of v5.19-rc1,
use the features-refresh.sh to refresh all the arch-support.txt files
in place. The main effect is to add entries for the new loong
architecture.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Link: https://lore.kernel.org/r/20220609025656.143460-1-zhengzengkai@huawei.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add dt-binding documentation of dsi for MediaTek MT8186 SoC.
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20220504091923.2219-3-rex-bc.chen@mediatek.com/
Signed-off-by: Xinlei Lee <xinlei.lee@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
When requesting an interrupt, we correctly call into the runtime
PM framework to guarantee that the underlying interrupt controller
is up and running.
However, we fail to do so for chained interrupt controllers, as
the mux interrupt is not requested along the same path.
Augment __irq_do_set_handler() to call into the runtime PM code
in this case, making sure the PM flow is the same for all interrupts.
Reported-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/26973cddee5f527ea17184c0f3fccb70bc8969a0.camel@pengutronix.de
The selftests nested code only supports 4-level paging at the moment.
This means it cannot map nested guest physical addresses with more than
48 bits. Allow perf_test_util nested mode to work on hosts with more
than 48 physical addresses by restricting the guest test region to
48-bits.
While here, opportunistically fix an off-by-one error when dealing with
vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
number of GFNs, rather than the maximum allowed GFN. This didn't result
in any correctness issues, but it did end up shifting the test region
down slightly when using huge pages.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.
For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Break up the long lines for LIBKVM and alphabetize each architecture.
This makes reading the Makefile easier, and will make reading diffs to
LIBKVM easier.
No functional change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The linker does obey strong/weak symbols when linking static libraries,
it simply resolves an undefined symbol to the first-encountered symbol.
This means that defining __weak arch-generic functions and then defining
arch-specific strong functions to override them in libkvm will not
always work.
More specifically, if we have:
lib/generic.c:
void __weak foo(void)
{
pr_info("weak\n");
}
void bar(void)
{
foo();
}
lib/x86_64/arch.c:
void foo(void)
{
pr_info("strong\n");
}
And a selftest that calls bar(), it will print "weak". Now if you make
generic.o explicitly depend on arch.o (e.g. add function to arch.c that
is called directly from generic.c) it will print "strong". In other
words, it seems that the linker is free to throw out arch.o when linking
because generic.o does not explicitly depend on it, which causes the
linker to lose the strong symbol.
One solution is to link libkvm.a with --whole-archive so that the linker
doesn't throw away object files it thinks are unnecessary. However that
is a bit difficult to plumb since we are using the common selftests
makefile rules. An easier solution is to drop libkvm.a just link
selftests with all the .o files that were originally in libkvm.a.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend
on $(STATIC_LIBS), so there is no reason to have an extra "all" rule.
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a small helper function to check if a given EPT/VPID capability
is supported. This will be re-used in a follow-up commit to check for 1G
page support.
No functional change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a VMX-related macro so move it to vmx.h. While here, open code
the mask like the rest of the VMX bitmask macros.
No functional change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.
No function change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:
- The name suggests it encodes the size of the page rather than the
level.
- In other places in x86_64/processor.c we just use a raw int to encode
the level.
Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
around raw ints when referring to the level. This makes the code easier
to understand since these macros are very common in KVM MMU code.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0
doesn't intercept PAUSE") introduced passthrough support for nested pause
filtering, (when the host doesn't intercept PAUSE) (either disabled with
kvm module param, or disabled with '-overcommit cpu-pm=on')
Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
the feature was exposed as supported by KVM cpuid unconditionally, thus
if L1 could try to use it even when the L0 KVM can't really support it.
In this case the fallback caused KVM to intercept each PAUSE instruction;
in some cases, such intercept can slow down the nested guest so much
that it can fail to boot. Instead, before the problematic commit KVM
was already setting both thresholds to 0 in vmcb02, but after the first
userspace VM exit shrink_ple_window was called and would reset the
pause_filter_count to the default value.
To fix this, change the fallback strategy - ignore the guest threshold
values, but use/update the host threshold values unless the guest
specifically requests disabling PAUSE filtering (either simple or
advanced).
Also fix a minor bug: on nested VM exit, when PAUSE filter counter
were copied back to vmcb01, a dirty bit was not set.
Thanks a lot to Suravee Suthikulpanit for debugging this!
Fixes: 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that these functions are always called with preemption disabled,
remove the preempt_disable()/preempt_enable() pair inside them.
No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On SVM, if preemption happens right after the call to finish_rcuwait
but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
will re-enable AVIC, and then we will try to re-enable it again
in kvm_arch_vcpu_unblocking which will lead to a warning
in __avic_vcpu_load.
The same problem can happen if the vCPU is preempted right after the call
to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
and in this case, we will end up with AVIC enabled during sleep -
Ooops.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently nothing prevents preemption in kvm_vcpu_update_apicv.
On SVM, If the preemption happens after we update the
vcpu->arch.apicv_active, the preemption itself will
'update' the inhibition since the AVIC will be first disabled
on vCPU unload and then enabled, when the current task
is loaded again.
Then we will try to update it again, which will lead to a warning
in __avic_vcpu_load, that the AVIC is already enabled.
Fix this by disabling preemption in this code.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are two issues in avic_kick_target_vcpus_fast
1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
which must be treated as a broadcast destination.
Fix this by explicitly checking for it.
Also don’t use ‘index’ in this case as it gives no new information.
2. It is legal to issue a logical IPI request to more than one target.
Index field only provides index in physical id table of first
such target and therefore can't be used before we are sure
that only a single target was addressed.
Instead, parse the ICRL/ICRH, double check that a unicast interrupt
was requested, and use that info to figure out the physical id
of the target vCPU.
At that point there is no need to use the index field as well.
In addition to fixing the above issues, also skip the call to
kvm_apic_match_dest.
It is possible to do this now, because now as long as AVIC is not
inhibited, it is guaranteed that none of the vCPUs changed their
apic id from its default value.
This fixes boot of windows guest with AVIC enabled because it uses
IPI with 0xFF destination and no destination shorthand.
Fixes: 7223fd2d53 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AVIC is now inhibited if the guest changes the apic id,
and therefore this code is no longer needed.
There are several ways this code was broken, including:
1. a vCPU was only allowed to change its apic id to an apic id
of an existing vCPU.
2. After such change, the vCPU whose apic id entry was overwritten,
could not correctly change its own apic id, because its own
entry is already overwritten.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Neither of these settings should be changed by the guest and it is
a burden to support it in the acceleration code, so just inhibit
this code instead.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These days there are too many AVIC/APICv inhibit
reasons, and it doesn't hurt to have some documentation
for them.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Assign shadow_me_value, not shadow_me_mask, to PAE root entries,
a.k.a. shadow PDPTRs, when host memory encryption is supported. The
"mask" is the set of all possible memory encryption bits, e.g. MKTME
KeyIDs, whereas "value" holds the actual value that needs to be
stuffed into host page tables.
Using shadow_me_mask results in a failed VM-Entry due to setting
reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to
physical addresses with non-zero MKTME bits sending to_shadow_page()
into the weeds:
set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
BUG: unable to handle page fault for address: ffd43f00063049e8
PGD 86dfd8067 P4D 0
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
kvm_mmu_free_roots+0xd1/0x200 [kvm]
__kvm_mmu_unload+0x29/0x70 [kvm]
kvm_mmu_unload+0x13/0x20 [kvm]
kvm_arch_destroy_vm+0x8a/0x190 [kvm]
kvm_put_kvm+0x197/0x2d0 [kvm]
kvm_vm_release+0x21/0x30 [kvm]
__fput+0x8e/0x260
____fput+0xe/0x10
task_work_run+0x6f/0xb0
do_exit+0x327/0xa90
do_group_exit+0x35/0xa0
get_signal+0x911/0x930
arch_do_signal_or_restart+0x37/0x720
exit_to_user_mode_prepare+0xb2/0x140
syscall_exit_to_user_mode+0x16/0x30
do_syscall_64+0x4e/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: e54f1ff244 ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220608012015.19566-1-yuan.yao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduces a driver for the LogiCVC display controller, a programmable
logic controller optimized for use in Xilinx Zynq-7000 SoCs and other
Xilinx FPGAs. The controller is mostly configured at logic synthesis
time so only a subset of configuration is left for the driver to
handle.
The following features are implemented and tested:
- LVDS 4-bit interface;
- RGB565 pixel formats;
- Multiple layers and hardware composition;
- Layer-wide alpha mode;
The following features are implemented but untested:
- Other RGB pixel formats;
- Layer framebuffer configuration for version 4;
- Lowest-layer used as background color;
- Per-pixel alpha mode.
The following features are not implemented:
- YUV pixel formats;
- DVI, LVDS 3-bit, ITU656 and camera link interfaces;
- External parallel input for layer;
- Color-keying;
- LUT-based alpha modes.
Additional implementation-specific notes:
- Panels are only enabled after the first page flip to avoid flashing a
white screen.
- Depth used in context of the LogiCVC driver only counts color components
to match the definition of the synthesis parameters.
Support is implemented for both version 3 and 4 of the controller.
With version 3, framebuffers are stored in a dedicated contiguous
memory area, with a base address hardcoded for each layer. This requires
using a dedicated CMA pool registered at the base address and tweaking a
few offset-related registers to try to use any buffer allocated from
the pool. This is done on a best-effort basis to have the hardware cope
with the DRM framebuffer allocation model and there is no guarantee
that each buffer allocated by GEM CMA can be used for any layer.
In particular, buffers allocated below the base address for a layer are
guaranteed not to be configurable for that layer. See the implementation of
logicvc_layer_buffer_find_setup for specifics.
Version 4 allows configuring each buffer address directly, which
guarantees that any buffer can be configured.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220520141555.1429041-2-paul.kocialkowski@bootlin.com
Ponte Vecchio no longer has MSLICE or LNCF steering, but the bspec does
document several new types of multicast register ranges. Fortunately,
most of the different MCR types all provide valid values at instance
(0,0) so there's no need to read fuse registers and calculate a
non-terminated instance. We'll lump all of those range types (BSLICE,
HALFBSLICE, TILEPSMI, CC, and L3BANK) into a single category called
"INSTANCE0" to keep things simple. We'll also perform explicit steering
for each of these multicast register types, even if the implicit
steering setup for COMPUTE/DSS ranges would have worked too; this is
based on guidance from our hardware architects who suggested that we
move away from implicit steering and start explicitly steer all MCR
register accesses on modern platforms (we'll work on transitioning
COMPUTE/DSS to explicit steering in the future).
Note that there's one additional MCR range type defined in the bspec
(SQIDI) that we don't handle here. Those ranges use a different
steering control register that we never touch; since instance 0 is also
always a valid setting there, we can just ignore those ranges.
Finally, we'll rename the HAS_MSLICES() macro to HAS_MSLICE_STEERING().
PVC hardware still has units referred to as mslices, but there's no
register steering based on mslice for this platform.
v2:
- Rebase on other recent changes
- Swap two table rows to keep table sorted & easy to read. (Harish)
Bspec: 67609
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220608170700.4026648-1-matthew.d.roper@intel.com
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending
state of a HW interrupt from userspace (and make the code
common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKh/2cPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDNKAQALlEZytemZk/R+uqkt85+vn4NPPlQdxBIYPZ
9ftqHwd0k3MvXtT9+DvTyBgVaR24TNp0ofO0mIYYQZbT4Kr+1LjKu9Jpxj0F8pjV
HllS8bWsFPHeUlN7UWE7SPiheQkH00xrtVYZH8aY2F/X05H1hO+dxcKbzSdmKeX4
JgNhFKYPrYgut5x0nQcs/SU04KfHYPwUC0MeMq+kI9dqiqX0OeJxWl2QxFIazxBg
kmPBAY7570DUi7u4wzmA9IHyk2a/WORil3Hp6oujCQqz8/KhwdNZEOMvjlwqrdLn
q1/RN9Z2RgjtmOPVvzURwKpOahXS5F+x62CTHADtcLF9SBBKi+GE7ZDYNkMq5YRt
xyu2PRHwcQPhfi1njzw4OlPreScSOpPkGV6MQZwEykA06poq6d0Qwt/5qr/UGbF2
hZUKh6cseZmvi2+ZM/vp2S7WF5nXUBS8yZBFBTEAPv77lsJBUFM4ruy1JDhlyMit
c8aGUmXFGYcEk85jIePNyWwaEQRcvt976GSwhMqOnFXAc+XoyGmwDKaPoS8HQ7PV
FuUTuDfYtuhkN1OoUsIasg1/hDqkvC4edURLEScYWvotf0HXzS4kb8EWzP9Wd83c
O105bl+uySzovJuFPMLpky7KYBvvN6FgQI+a/tQXs7dq2NI/+Kn8TWUyOQADJgTD
rI3eXFcM
=qYdB
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.19, take #1
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending
state of a HW interrupt from userspace (and make the code
common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
This reverts commit fb561bf9ab.
With
commit 27599aacba
Author: Thomas Zimmermann <tzimmermann@suse.de>
Date: Tue Jan 25 10:12:18 2022 +0100
fbdev: Hot-unplug firmware fb devices on forced removal
this should be fixed properly and we can remove this somewhat hackish
check here (e.g. this won't catch drm drivers if fbdev emulation isn't
enabled).
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ilya Trukhanov <lahvuun@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-5-javierm@redhat.com
The platform devices registered by sysfb match with firmware-based DRM or
fbdev drivers, that are used to have early graphics using a framebuffer
provided by the system firmware.
DRM or fbdev drivers later are probed and remove conflicting framebuffers,
leading to these platform devices for generic drivers to be unregistered.
But the current solution has a race, since the sysfb_init() function could
be called after a DRM or fbdev driver is probed and request to unregister
the devices for drivers with conflicting framebuffes.
To prevent this, disable any future sysfb platform device registration by
calling sysfb_disable(), if a driver requests to remove the conflicting
framebuffers.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-4-javierm@redhat.com
This can be used by subsystems to unregister a platform device registered
by sysfb and also to disable future platform device registration in sysfb.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-3-javierm@redhat.com
This function just returned 0 on success or an errno code on error, but it
could be useful for sysfb_init() callers to have a pointer to the device.
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-2-javierm@redhat.com
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process
to read/write registers of another process.
To get/set a register, the API takes an index into an imaginary address
space called the "USER area", where the registers of the process are
laid out in some fashion.
The kernel then maps that index to a particular register in its own data
structures and gets/sets the value.
The API only allows a single machine-word to be read/written at a time.
So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels.
The way floating point registers (FPRs) are addressed is somewhat
complicated, because double precision float values are 64-bit even on
32-bit CPUs. That means on 32-bit kernels each FPR occupies two
word-sized locations in the USER area. On 64-bit kernels each FPR
occupies one word-sized location in the USER area.
Internally the kernel stores the FPRs in an array of u64s, or if VSX is
enabled, an array of pairs of u64s where one half of each pair stores
the FPR. Which half of the pair stores the FPR depends on the kernel's
endianness.
To handle the different layouts of the FPRs depending on VSX/no-VSX and
big/little endian, the TS_FPR() macro was introduced.
Unfortunately the TS_FPR() macro does not take into account the fact
that the addressing of each FPR differs between 32-bit and 64-bit
kernels. It just takes the index into the "USER area" passed from
userspace and indexes into the fp_state.fpr array.
On 32-bit there are 64 indexes that address FPRs, but only 32 entries in
the fp_state.fpr array, meaning the user can read/write 256 bytes past
the end of the array. Because the fp_state sits in the middle of the
thread_struct there are various fields than can be overwritten,
including some pointers. As such it may be exploitable.
It has also been observed to cause systems to hang or otherwise
misbehave when using gdbserver, and is probably the root cause of this
report which could not be easily reproduced:
https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/
Rather than trying to make the TS_FPR() macro even more complicated to
fix the bug, or add more macros, instead add a special-case for 32-bit
kernels. This is more obvious and hopefully avoids a similar bug
happening again in future.
Note that because 32-bit kernels never have VSX enabled the code doesn't
need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to
ensure that 32-bit && VSX is never enabled.
Fixes: 87fec0514f ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds")
Cc: stable@vger.kernel.org # v3.13+
Reported-by: Ariel Miculas <ariel.miculas@belden.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au
This reverts commit 3380557fc7.
It turned out this "4-byte" ID might have been an honest mistake.
Regrettably, the chip Andreas has might be a counterfeit or is
damaged in some other way and shouldn't have ended up in a router.
Andreas reported his chip is returning just four bytes:
"98 f1 80 15 00 00 00 00".
However, according to Kioxia/Toshiba's datasheet, there should
have been at least another byte that would have contained the
correct OOB size that Andreas needed.
Miquel and Andreas are both favoring reverting the patch over
further, possibly hacky modifications:
"[Reverting] is the safest option here. Apart from this device, we
do not know how many devices have these damaged/counterfeit chips.
If it is just a couple and only on Fritzboxes, as suggested in the
Github issue the patch could be carried through OpenWrt[...]"
Thanks to several users on the openwrt forum and github issue,
who stayed along for the ride:
- Peter-vdL for reporting the issue and testing patches.
- neg2led and Hannu Nyman who did all the
datasheet digging and debugging.
Cc: Andreas Boehler <dev@aboehler.at>
Suggested-by: Andreas Boehler <dev@aboehler.at>
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://github.com/openwrt/openwrt/issues/9962
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220607185918.1048204-1-chunkeey@gmail.com
In order for a file to access its own directory entry set,
exfat_inode_info(ei) has two copied values. One is ei->dir, which is
a snapshot of exfat_chain of the parent directory, and the other is
ei->entry, which is the offset of the start of the directory entry set
in the parent directory.
Since the parent directory can be updated after the snapshot point,
it should be used only for accessing one's own directory entry set.
However, as of now, during renaming, it could try to traverse or to
allocate clusters via snapshot values, it does not make sense.
This potential problem has been revealed when exfat_update_parent_info()
was removed by commit d8dad2588a ("exfat: fix referencing wrong parent
directory information after renaming"). However, I don't think it's good
idea to bring exfat_update_parent_info() back.
Instead, let's use the updated exfat_chain of parent directory diectly.
Fixes: d8dad2588a ("exfat: fix referencing wrong parent directory information after renaming")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>