Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.
- MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
but does *NOT* want to call it for each VMA.
- MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
invalidate across multiple VMAs if possible.
With these it is possible to capture the three forms:
1) empty stubs;
select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
2) start: flush_cache_range(), end: empty;
select MMU_GATHER_MERGE_VMAS
3) start: flush_cache_range(), end: flush_tlb_range();
default
Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
can accomodate a XenPV guest due to how the latter is setting up the PAT
machinery
Now that the retbleed nightmare is public, here's the first round of
fallout fixes:
- Fix a build failure on 32-bit due to missing include
- Remove an untraining point in espfix64 return path
- other small cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLTudcACgkQEsHwGGHe
VUrY4BAAtWm7wC6T8rzovbsyticj6kcehRMBEXxtlEP5LOeltR0dbNaIGskrS2Li
Q9YxxtQhbZPXqzqB+xeHVhDPThzsd3+wRvvetmR4fW/c3XCYr+fLLFjHj0NEvX0P
lQzuY8GKWGU/QTrjKSKclGvqyB692Fvdu4YImlnrGSbR6ywwVttditd3YNJR0w1q
7S4zq90uwWuX6cTLqXIcbKbhssOjcR1Agj9+bE8i+rzyB2VtNoihJCJh0pTJAn3P
RaXnxI/7J6Y+5imPf5/ywu8gxhvGBTy5MU/1v2pw939EurU9tmhVkNVWdO2g/qYY
V+Y1nj9xV0ucL4hlUBqAFdM+5jFC89Ey1X2tSgUgSl+44L/d8IIjVCp6inVoyCzE
Olbc6q7A/V8PNXfo4g6gDwVc3Ii53Fwgtu8xVHkwPGfjly6+yZ9O/RUBXcBAOnpU
jfS9LSc/Ro7kxqFy32beUgB7wwhMpkYuHe6ECxrvXj1IK13y3OkdxFzm03ty7S/E
BwjrkltDia9BQ6i4Ywy+qSBYkSH6+sxxt4pboB+ft6/p2JIw4YJIp9PRqzPAG0jx
JTjcZ9YAr7zPNlWp5e30BjGawgbuKPq0wkF1r6QD+3VzNf9+SSDmtkYGWeAyTTP2
SkzLy5QCNBTeeq0FZVYZFm/vcF7wccQhQdwNcezxygKAmihH0xw=
=ND3p
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Improve the check whether the kernel supports WP mappings so that it
can accomodate a XenPV guest due to how the latter is setting up the
PAT machinery
- Now that the retbleed nightmare is public, here's the first round of
fallout fixes:
* Fix a build failure on 32-bit due to missing include
* Remove an untraining point in espfix64 return path
* other small cleanups
* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Remove apostrophe typo
um: Add missing apply_returns()
x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
x86/bugs: Mark retbleed_strings static
x86/pat: Fix x86_has_pat_wp()
x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
Fix more fallout from recent changes of the ACPI CPPC handling on AMD
platforms (Mario Limonciello).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLRsmYSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxw7AP/i0f3x8wTGby77H9aFHxqYD1az5/nUco
Ymp8+l1Bm9TNbfBa1EqmwGwleho5X2GXUzFX2K3cKqNNp3fG/R0fsp7/+/h8YDmd
DzwidRaIQPMsqdUzjIncB83UNIhFeh3/gix0n+iZM+cVw2dBdluXHiq88HtDjOFz
ZkCF0ka5QcJyLJ4DgIbF5/CV/q3hd0ZT3xKLC0HfedZ7YwPRTL1yefUK3sosLZTW
qDWYL/86+XwoFBntBQ6gfGYS/xQ7nQ50QKa+SD8cqLCSrQFYswLkRnQpDg1XJlMT
XPc5WRrlbMC5VV+daY5/uMflRVdDLSuBKt5uTyFrvgguvw9S0Bgct8l/tABa8KRL
AGDMoqW6V0Lc3z1Jyu4xjjTY6lrhyTB+any8K/roGrAxSYqZveMiAgyyMLbZfxNa
dU1IAIw41g/ucBB/pE0T1jfrVrAIsLVoKXSl4ixpu50yC1DjJrv85zQfOyEVI9jM
/okPONeRLTbXnB7+xRCA8AB5nXgPOqPqpLYPW7IDHghskacqZiAo6fm7tWEC5ijX
epiI/JvwU7Pbt9UuS/7Kui9yC/33ErZEdYRC4QOhBQS4vLVpzQMRFRjdF8d50Jzk
Qm73OzFKDZNm/8ladAWSdyvmvl/Ch8PjAqIBqmZNQYAoBYmWjN1LhuN+d17/ro2I
jE05pwY8TXQL
=Y0dz
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix more fallout from recent changes of the ACPI CPPC handling on AMD
platforms (Mario Limonciello)"
* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
Remove a superfluous ' in the mitigation string.
Fixes: e8ec1b6e08 ("x86/bugs: Enable STIBP for JMP2RET")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
* Fix missing PAGE_PFN_MASK
* Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
x86:
* Fix for nested virtualization when TSC scaling is active
* Estimate the size of fastcc subroutines conservatively, avoiding disastrous
underestimation when return thunks are enabled
* Avoid possible use of uninitialized fields of 'struct kvm_lapic_irq'
Generic:
* Mark as such the boolean values available from the statistics file descriptors
* Clarify statistics documentation
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLRVkcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMm4QgAgZHQTSyA4+/xOYs0cBX2Q6YYkDGG
yUTjiiLmXjzKmjRkfhqKO75aqGhbv08U20hfHRdxxYV5b2Ful/xEnryj+mjyEBmv
wFO1Q8Tlwi+6Wwen+VN0tjiQwdY/N6+dI39U2Nn4yCtYyLbCALTWSlq3qr6RjhaI
P8XFXcPweyow3GsFrwgJVJ/vA/gaAhY17NOmdI5icFioTeJbrrAYw88Cbh9PzkGS
IsgmHn8Yt9a3x/rzo2LhhMbzsXDR87l+OlJhmGCUB5L0kRt8rJz30ysCeKgTpkoz
QOBZPdODeJ4Pdk4Z82A7NPUAFaaGGxUMkeIoAIXJ0F/VIpKb7+l3AETlZA==
=x3x6
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"RISC-V:
- Fix missing PAGE_PFN_MASK
- Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
x86:
- Fix for nested virtualization when TSC scaling is active
- Estimate the size of fastcc subroutines conservatively, avoiding
disastrous underestimation when return thunks are enabled
- Avoid possible use of uninitialized fields of 'struct
kvm_lapic_irq'
Generic:
- Mark as such the boolean values available from the statistics file
descriptors
- Clarify statistics documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: emulate: do not adjust size of fastop and setcc subroutines
KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
Documentation: kvm: clarify histogram units
kvm: stats: tell userspace which values are boolean
x86/kvm: fix FASTOP_SIZE when return thunks are enabled
KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
riscv: Fix missing PAGE_PFN_MASK
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.
It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.
Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0bc1: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.
Fixes: a183b638b6 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).
Therefore, add "boolean value" as a new "unit". While it is not exactly
a unit, it walks and quacks like one. In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").
Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
hang upon boot or shortly after when a non-default TSC frequency was
set for L1. The issue is observed on a host where TSC scaling is
supported. The problem appears to be that Windows doesn't use TSC
frequency for its guests even when the feature is advertised and KVM
filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
L1's. This leads to L2 running with the default frequency (matching
host's) while L1 is running with an altered one.
Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
it was set for L1. TSC_MULTIPLIER is already correctly computed and
written by prepare_vmcs02().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220712135009.952805-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
UNTRAIN_RET is not needed in native_irq_return_ldt because RET
untraining has already been done at this point.
In addition, when the RETBleed mitigation is IBPB, UNTRAIN_RET clobbers
several registers (AX, CX, DX) so here it trashes user values which are
in these registers.
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/35b0d50f-12d1-10c3-f5e8-d6c140486d4a@oracle.com
When commit 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all
and when CPPC_LIB is supported") was introduced, we found collateral
damage that a number of AMD systems that supported CPPC but
didn't advertise support in _OSC stopped having a functional
amd-pstate driver. The _OSC was only enforced on Intel systems at that
time.
This was fixed for the MSR based designs by commit 8b356e536e
("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
but some shared memory based designs also support CPPC but haven't
advertised support in the _OSC. Add support for those designs as well by
hardcoding the list of systems.
Fixes: 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported")
Fixes: 8b356e536e ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
Link: https://lore.kernel.org/all/3559249.JlDtxWtqDm@natalenko.name/
Cc: 5.18+ <stable@vger.kernel.org> # 5.18+
Reported-and-tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. Xen hypervisor) it might return false
even if the PAT configuration is allowing WP mappings. This due to the
fact that when running as Xen PV guest the PAT MSR is setup by the
hypervisor and cannot be changed by the guest. This results in the WP
related entry to be at a different position when running as Xen PV
guest compared to the bare metal or fully virtualized case.
The correct way to test for WP support is:
1. Get the PTE protection bits needed to select WP mode by reading
__cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR
setting this might return protection bits for a stronger mode, e.g.
UC-)
2. Translate those bits back into the real cache mode selected by those
PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)]
3. Test for the cache mode to be _PAGE_CACHE_MODE_WP
Fixes: f88a68facd ("x86/mm: Extend early_memremap() support with additional attrs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14
Link: https://lore.kernel.org/r/20220503132207.17234-1-jgross@suse.com
The build on x86_32 currently fails after commit
9bb2ec608a (objtool: Update Retpoline validation)
with:
arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'
ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.
Fixes: 9bb2ec608a ("objtool: Update Retpoline validation")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.org
solved and the nightmare is complete, here's the next one: speculating
after RET instructions and leaking privileged information using the now
pretty much classical covert channels.
It is called RETBleed and the mitigation effort and controlling
functionality has been modelled similar to what already existing
mitigations provide.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLNdDYACgkQEsHwGGHe
VUrNAw/+OTFF7md0+17Ju6vvagc/nXfUxk/r0lWU9/KzbRXvPTZdPKTW4NN5c0IS
VnogyUGFFpzU3dKU2os9ejTD4kHNx0oLuBfQt4w7t4qR+g3+nAH0ywNjH/N1VTJt
iDpww7CxqloV+i9RCsWV+zQPMPfc2VMUhe6xqNB2CgEDrruzFrDASZR6zzarsKxY
x4rwHn0ZkV7zNJfcNpV2323qktqHgBtAFf7GlZK8hBsgsiSk+xDk9CODkfxfWIV7
o4BNvNmaUKDJL51hpuzvIzYwDSiRO5AXdjxHG/0CHc3r3dtA6Xt1elHbERAyUMuM
P+6XievP5ZV/xXXjoZ5Vla67o3bbGKmTo2WluvVGeg8ahzQEwyPGqeXn77hk+of+
BtasZyLgfdwSeWExxp0n5Nhh972TMpy5K4gqOFXcxvPSuTl6tTw77F1u0UQLaVVH
QzHNu+RO/2iQ/P30cOM11IbZ9sfcBOj+5mjfoDoR4qCtoCQfyfHK+HlwXjZ+uk98
xU/FnQbOKPRVxiyCVhrbKFxjW7iL7AIb0nRgxHzGGoIJ6A71Tbwa/5gGakE7WEBz
e7ce8NW2JFucGBFYyiBab6I6fB7lbvmqbNPerYEVoU5YxZkMu+xxyToqBnsyPfHZ
lxgEGREUaY8aZmGDfrD9EYyhhtQU/MwdpN+FY3xXQdUJkvkNaLg=
=0Ca0
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull lockdep fix for x86 retbleed from Borislav Petkov:
- Fix lockdep complaint for __static_call_fixup()
* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/static_call: Serialize __static_call_fixup() properly
__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().
Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.
Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
solved and the nightmare is complete, here's the next one: speculating
after RET instructions and leaking privileged information using the now
pretty much classical covert channels.
It is called RETBleed and the mitigation effort and controlling
functionality has been modelled similar to what already existing
mitigations provide.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLKqAgACgkQEsHwGGHe
VUoM5w/8CSvwPZ3otkhmu8MrJPtWc7eLDPjYN4qQP+19e+bt094MoozxeeWG2wmp
hkDJAYHT2Oik/qDuEdhFgNYwS7XGgbV3Py3B8syO4//5SD5dkOSG+QqFXvXMdFri
YsVqqNkjJOWk/YL9Ql5RS/xQewsrr0OqEyWWocuI6XAvfWV4kKvlRSd+6oPqtZEO
qYlAHTXElyIrA/gjmxChk1HTt5HZtK3uJLf4twNlUfzw7LYFf3+sw3bdNuiXlyMr
WcLXMwGpS0idURwP3mJa7JRuiVBzb4+kt8mWwWqA02FkKV45FRRRFhFUsy667r00
cdZBaWdy+b7dvXeliO3FN/x1bZwIEUxmaNy1iAClph4Ifh0ySPUkxAr8EIER7YBy
bstDJEaIqgYg8NIaD4oF1UrG0ZbL0ImuxVaFdhG1hopQsh4IwLSTLgmZYDhfn/0i
oSqU0Le+A7QW9s2A2j6qi7BoAbRW+gmBuCgg8f8ECYRkFX1ZF6mkUtnQxYrU7RTq
rJWGW9nhwM9nRxwgntZiTjUUJ2HtyXEgYyCNjLFCbEBfeG5QTg7XSGFhqDbgoymH
85vsmSXYxgTgQ/kTW7Fs26tOqnP2h1OtLJZDL8rg49KijLAnISClEgohYW01CWQf
ZKMHtz3DM0WBiLvSAmfGifScgSrLB5AjtvFHT0hF+5/okEkinVk=
=09fW
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 retbleed fixes from Borislav Petkov:
"Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
after RET instructions and leaking privileged information using the
now pretty much classical covert channels.
It is called RETBleed and the mitigation effort and controlling
functionality has been modelled similar to what already existing
mitigations provide"
* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
x86/speculation: Disable RRSBA behavior
x86/kexec: Disable RET on kexec
x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/retbleed: Add fine grained Kconfig knobs
x86/cpu/amd: Enumerate BTC_NO
x86/common: Stamp out the stepping madness
KVM: VMX: Prevent RSB underflow before vmenter
x86/speculation: Fill RSB on vmexit for IBRS
KVM: VMX: Fix IBRS handling after vmexit
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Convert launched argument to flags
KVM: VMX: Flatten __vmx_vcpu_run()
objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
x86/speculation: Remove x86_spec_ctrl_mask
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
...
failures where the hypervisor verifies page tables and uninitialized
data in that range leads to bogus failures in those checks
- Add any potential setup_data entries supplied at boot to the identity
pagetable mappings to prevent kexec kernel boot failures. Usually, this
is not a problem for the normal kernel as those mappings are part of
the initially mapped 2M pages but if kexec gets to allocate the second
kernel somewhere else, those setup_data entries need to be mapped there
too.
- Fix objtool not to discard text references from the __tracepoints
section so that ENDBR validation still works
- Correct the setup_data types limit as it is user-visible, before 5.19
releases
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLKpf8ACgkQEsHwGGHe
VUrc5w/8DIVLQ8w+Balf2TGfp5Sl3mPkg+eoARH29qtXhvVBs5KJB9sbT1IGnxao
nE4yNeiIKhH5SEd17l11E7eWuUtNgZENLsUb3aiAdsItNS+MzOWQuEOPbnAwgJmk
oKdxiI1SHiVoPy5KVXOcyAS90PSJIkhhxwgR5MInGdmpSUzEFsx5SY82ZfOjOkZU
L7zCsJzeDfhJdWiR4N0MXWRaFbIvRxI1uXyqgv+Lo6JK5l8dyUUSEdWyLUqZ7E4M
GFo6LwR3lskQM2bE9vBWS0h1X00d5oDMzfono8kZzRGA/11plZHRI007PCez8yZh
4sUnnxsfCy2YF8/8hs4IhrHZdcWW9XoN4gTUsjD0wekGTHhOEqu5qpAnVSrXbvvM
ZfPF8vM+DLPTWQqAT0a4aj1vd1RflDIQPSXKDzJDjeF49zouAj1ae/3KSOYJDzN9
V6NGiKBnzj1rbtm0+8jOsTQusmh/oDage7uLlmel3hTfNOc2Ay0LXrJWcvqhj66V
4CtCd12sLeavin+mGptni6lXbsue61EolRtH44RvZJsXLVY8iclM4onl728xOrxj
CBtJo6bd3oQYy0SQsysXGDVR7BSXtwAYfArYR8BrMTtgHxuyULt/BDoew4r7XADB
Xxz7ADJZ3DI3Gqza5H6r89Tj6Oi3yXiBWUVUNXFCMYc6ZrqvZc0=
=tOvF
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prepare for and clear .brk early in order to address XenPV guests
failures where the hypervisor verifies page tables and uninitialized
data in that range leads to bogus failures in those checks
- Add any potential setup_data entries supplied at boot to the identity
pagetable mappings to prevent kexec kernel boot failures. Usually,
this is not a problem for the normal kernel as those mappings are
part of the initially mapped 2M pages but if kexec gets to allocate
the second kernel somewhere else, those setup_data entries need to be
mapped there too.
- Fix objtool not to discard text references from the __tracepoints
section so that ENDBR validation still works
- Correct the setup_data types limit as it is user-visible, before 5.19
releases
* tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Fix the setup data types max limit
x86/ibt, objtool: Don't discard text references from tracepoint section
x86/compressed/64: Add identity mappings for setup_data entries
x86: Fix .brk attribute in linker script
x86: Clear .brk area at early boot
x86/xen: Use clear_bss() for Xen PV guests
Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.
Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.
A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).
For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
All the invocations unroll to __x86_return_thunk and this file
must be PIC independent.
This fixes kexec on 64-bit AMD boxes.
[ bp: Fix 32-bit build. ]
Reported-by: Edward Tran <edward.tran@oracle.com>
Reported-by: Awais Tanveer <awais.tanveer@oracle.com>
Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
There are some VM configurations which have Skylake model but do not
support IBPB. In those cases, when using retbleed=ibpb, userspace is going
to be killed and kernel is going to panic.
If the CPU does not support IBPB, warn and proceed with the auto option. Also,
do not fallback to IBPB on AMD/Hygon systems if it is not supported.
Fixes: 3ebc170068 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit
ee774dac0d ("x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()")
moved PUSH_AND_CLEAR_REGS out of error_entry, into its own function, in
part to avoid calling error_entry() for XenPV.
However, commit
7c81c0c921 ("x86/entry: Avoid very early RET")
had to change that because the 'ret' was too early and moved it into
idtentry, bloating the text size, since idtentry is expanded for every
exception vector.
However, with the advent of xen_error_entry() in commit
d147553b64 ("x86/xen: Add UNTRAIN_RET")
it became possible to remove PUSH_AND_CLEAR_REGS from idtentry, back
into *error_entry().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cannon lake is also affected by RETBleed, add it to the list.
Fixes: 6ad0ad2bf8 ("x86/bugs: Report Intel retbleed vulnerability")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
The decompressed kernel initially relies on the identity map set up by
the boot/compressed kernel for accessing things like boot_params. With
the recent introduction of SEV-SNP support, the decompressed kernel
also needs to access the setup_data entries pointed to by
boot_params->hdr.setup_data.
This can lead to a crash in the kexec kernel during early boot due to
these entries not currently being included in the initial identity map,
see thread at Link below.
Include mappings for the setup_data entries in the initial identity map.
[ bp: Massage commit message and use a helper var for better readability. ]
Fixes: b190a043c4 ("x86/sev: Add SEV-SNP feature detection/setup")
Reported-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/TYCPR01MB694815CD815E98945F63C99183B49@TYCPR01MB6948.jpnprd01.prod.outlook.com
commit 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all and
when CPPC_LIB is supported") added support for claiming to
support CPPC in _OSC on non-Intel platforms.
This unfortunately caused a regression on a vartiety of AMD
platforms in the field because a number of AMD platforms don't set
the `_OSC` bit 5 or 6 to indicate CPPC or CPPC v2 support.
As these AMD platforms already claim CPPC support via a dedicated
MSR from `X86_FEATURE_CPPC`, use this enable this feature rather
than requiring the `_OSC` on platforms with a dedicated MSR.
If there is additional breakage on the shared memory designs also
missing this _OSC, additional follow up changes may be needed.
Fixes: 72f2ecb7ec ("Set CPPC _OSC bits for all and when CPPC_LIB is supported")
Reported-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit in Fixes added the "NOLOAD" attribute to the .brk section as a
"failsafe" measure.
Unfortunately, this leads to the linker no longer covering the .brk
section in a program header, resulting in the kernel loader not knowing
that the memory for the .brk section must be reserved.
This has led to crashes when loading the kernel as PV dom0 under Xen,
but other scenarios could be hit by the same problem (e.g. in case an
uncompressed kernel is used and the initrd is placed directly behind
it).
So drop the "NOLOAD" attribute. This has been verified to correctly
cover the .brk section by a program header of the resulting ELF file.
Fixes: e32683c6f7 ("x86/mm: Fix RESERVE_BRK() for older binutils")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20220630071441.28576-4-jgross@suse.com
The .brk section has the same properties as .bss: it is an alloc-only
section and should be cleared before being used.
Not doing so is especially a problem for Xen PV guests, as the
hypervisor will validate page tables (check for writable page tables
and hypervisor private bits) before accepting them to be used.
Make sure .brk is initially zero by letting clear_bss() clear the brk
area, too.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com
Instead of clearing the bss area in assembly code, use the clear_bss()
function.
This requires to pass the start_info address as parameter to
xen_start_kernel() in order to avoid the xen_start_info being zeroed
again.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
Do fine-grained Kconfig for all the various retbleed parts.
NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion.
Zen3 CPUs don't suffer BTC.
Hypervisors are expected to synthesise BTC_NO when it is appropriate
given the migration pool, to prevent kernels using heuristics.
[ bp: Massage. ]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
The whole MMIO/RETBLEED enumeration went overboard on steppings. Get
rid of all that and simply use ANY.
If a future stepping of these models would not be affected, it had
better set the relevant ARCH_CAP_$FOO_NO bit in
IA32_ARCH_CAPABILITIES.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
On VMX, there are some balanced returns between the time the guest's
SPEC_CTRL value is written, and the vmenter.
Balanced returns (matched by a preceding call) are usually ok, but it's
at least theoretically possible an NMI with a deep call stack could
empty the RSB before one of the returns.
For maximum paranoia, don't allow *any* returns (balanced or otherwise)
between the SPEC_CTRL write and the vmenter.
[ bp: Fix 32-bit build. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Prevent RSB underflow/poisoning attacks with RSB. While at it, add a
bunch of comments to attempt to document the current state of tribal
knowledge about RSB attacks and what exactly is being mitigated.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
For legacy IBRS to work, the IBRS bit needs to be always re-written
after vmexit, even if it's already on.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
On eIBRS systems, the returns in the vmexit return path from
__vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks.
Fix that by moving the post-vmexit spec_ctrl handling to immediately
after the vmexit.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in
preparation for doing SPEC_CTRL handling immediately after vmexit, which
will need another flag.
This is much easier than adding a fourth argument, because this code
supports both 32-bit and 64-bit, and the fourth argument on 32-bit would
have to be pushed on the stack.
Note that __vmx_vcpu_run_flags() is called outside of the noinstr
critical section because it will soon start calling potentially
traceable functions.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Move the vmx_vm{enter,exit}() functionality into __vmx_vcpu_run(). This
will make it easier to do the spec_ctrl handling before the first RET.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit
c536ed2fff ("objtool: Remove SAVE/RESTORE hints")
removed the save/restore unwind hints because they were no longer
needed. Now they're going to be needed again so re-add them.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
This mask has been made redundant by kvm_spec_ctrl_test_value(). And it
doesn't even work when MSR interception is disabled, as the guest can
just write to SPEC_CTRL directly.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
There's no need to recalculate the host value for every entry/exit.
Just use the cached value in spec_ctrl_current().
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
If the SMT state changes, SSBD might get accidentally disabled. Fix
that.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
If a kernel is built with CONFIG_RETPOLINE=n, but the user still wants
to mitigate Spectre v2 using IBRS or eIBRS, the RSB filling will be
silently disabled.
There's nothing retpoline-specific about RSB buffer filling. Remove the
CONFIG_RETPOLINE guards around it.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken
bit for some speculation behaviour. It needs setting.
Note: very belatedly AMD released naming; it's now officially called
MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT
but shall remain the SPECTRAL CHICKEN.
Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Since entry asm is tricky, add a validation pass that ensures the
retbleed mitigation has been done before the first actual RET
instruction.
Entry points are those that either have UNWIND_HINT_ENTRY, which acts
as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or
those that have UWIND_HINT_IRET_REGS at +0.
This is basically a variant of validate_branch() that is
intra-function and it will simply follow all branches from marked
entry points and ensures that all paths lead to ANNOTATE_UNRET_END.
If a path hits RET or an indirection the path is a fail and will be
reported.
There are 3 ANNOTATE_UNRET_END instances:
- UNTRAIN_RET itself
- exception from-kernel; this path doesn't need UNTRAIN_RET
- all early exceptions; these also don't need UNTRAIN_RET
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>