Pull ARM SoC fixes from Arnd Bergmann:
"One more set of simple ARM platform fixes:
- A boot regression on qualcomm msm8998
- Gemini display controllers got turned off by accident
- incorrect reference counting in optee"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
tee: optee: add missing of_node_put after of_device_is_available
arm64: dts: qcom: msm8998: Extend TZ reserved memory area
ARM: dts: gemini: Re-enable display controller
Pull x86 fixes from Thomas Gleixner:
"Two last minute fixes:
- Prevent value evaluation via functions happening in the user access
enabled region of __put_user() (put another way: make sure to
evaluate the value to be stored in user space _before_ enabling
user space accesses)
- Correct the definition of a Hyper-V hypercall constant"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT
x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
Pull SCSI fixes from James Bottomley:
"Nine small fixes.
The resume fix is a cosmetic removal of a warning with an incorrect
condition causing it to alarm people wrongly.
The other eight patches correct a thinko in Christoph Hellwig's DMA
conversion series. Without it all these drivers end up with 32 bit DMA
masks meaning they bounce any page over 4GB before sending it to the
controller.
Nowadays, even laptops mostly have memory above 4GB, so this can lead
to significant performance degradation with all the bouncing"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Avoid that system resume triggers a kernel warning
scsi: hptiop: fix calls to dma_set_mask()
scsi: hisi_sas: fix calls to dma_set_mask_and_coherent()
scsi: csiostor: fix calls to dma_set_mask_and_coherent()
scsi: bfa: fix calls to dma_set_mask_and_coherent()
scsi: aic94xx: fix calls to dma_set_mask_and_coherent()
scsi: 3w-sas: fix calls to dma_set_mask_and_coherent()
scsi: 3w-9xxx: fix calls to dma_set_mask_and_coherent()
scsi: lpfc: fix calls to dma_set_mask_and_coherent()
Pull networking fixes from David Miller:
1) Fix refcount leak in act_ipt during replace, from Davide Caratti.
2) Set task state properly in tun during blocking reads, from Timur
Celik.
3) Leaked reference in DSA, from Wen Yang.
4) NULL deref in act_tunnel_key, from Vlad Buslov.
5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
thus referencing garbage, from Nazarov Sergey.
6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
attributes make no sense.
7) Fix hung sendto in tipc, from Tung Nguyen.
8) Out-of-bounds access in netlabel, from Paul Moore.
9) Grant reference leak in xen-netback, from Igor Druzhinin.
10) Fix tx stalls with lan743x, from Bryan Whitehead.
11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.
12) Memory leak in sit on device registry failure, from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: sit: fix memory leak in sit_init_net()
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
geneve: correctly handle ipv6.disable module parameter
net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
bpf: fix sanitation rewrite in case of non-pointers
ipv4: Add ICMPv6 support when parse route ipproto
MIPS: eBPF: Fix icache flush end address
lan743x: Fix TX Stall Issue
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: aquantia: regression on cpus with high cores: set mode with 8 queues
selftests: fixes for UDP GRO
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
net: dsa: mv88e6xxx: Fix u64 statistics
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
sctp: chunk.c: correct format string for size_t in printk
net: netem: fix skb length BUG_ON in __skb_to_sgvec
netlabel: fix out-of-bounds memory accesses
ipv4: Pass original device to ip_rcv_finish_core
...
Pull more crypto fixes from Herbert Xu:
"This fixes a couple of issues in arm64/chacha that was introduced in
5.0"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/chacha - fix hchacha_block_neon() for big endian
crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian
Despite what the datesheet says, the silicon implements the older way
of snapshoting the statistics. Change the op.
Reported-by: Chris.Healy@zii.aero
Tested-by: Chris.Healy@zii.aero
Fixes: 0ac64c3949 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.
Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.
This is the same fix as what commit d074bf9600 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.
Note there's also commit c0a47e44c0 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-03-01
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix sanitation rewrite, from Daniel.
2) fix error path on map_new_fd, from Peng.
3) fix icache flush address, from Paul.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.
Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.
v2:
- add small helper and init the number of actual ports only
Fixes: 364e9d7776 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:
[...]
uint64_t a = bpf_ktime_get_ns();
uint64_t b = bpf_ktime_get_ns();
uint64_t delta = b - a;
if ((int64_t)delta > 0) {
[...]
Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0 ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).
Fixes: d3bd7413e0 ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Reported-by: Arthur Fabre <afabre@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.
Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.
v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: eacb9384a3 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.
The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.
Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9c4 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It has been observed that tx queue stalls while downloading
from certain web sites (example www.speedtest.net)
The cause has been tracked down to a corner case where
dma descriptors where not setup properly. And there for a tx
completion interrupt was not signaled.
This fix corrects the problem by properly marking the end of
a multi descriptor transmission.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When debugging an issue I found implausible values in state->pause.
Reason in that state->pause isn't initialized and later only single
bits are changed. Also the struct itself isn't initialized in
phylink_resolve(). So better initialize state->pause and other
not yet initialized fields.
v2:
- use right function name in subject
v3:
- initialize additional fields
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation for UDP GRO tests is racy: the receiver
may flush the RX queue while the sending is still transmitting and
incorrectly report RX errors, with a wrong number of packet received.
Add explicit timeouts to the receiver for both connection activation
(first packet received for UDP) and reception completion, so that
in the above critical scenario the receiver will wait for the
transfer completion.
Fixes: 3327a9c463 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IOMMU fix from Joerg Roedel:
"One important fix for a memory corruption issue in the Intel VT-d
driver that triggers on hardware with deep PCI hierarchies"
* tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dmar: Fix buffer overflow during PCI bus notification
Merge misc fixes from Andrew Morton:
"2 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
hugetlbfs: fix races and page leaks during migration
kasan: turn off asan-stack for clang-8 and earlier
hugetlb pages should only be migrated if they are 'active'. The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.
When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked. Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code. This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.
To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.
Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available. A program mmaps a 2G file in a hugetlbfs filesystem. It
then migrates the pages associated with the file from one node to
another. When the program exits, huge page counts are as follows:
node0
1024 free_hugepages
1024 nr_hugepages
node1
0 free_hugepages
1024 nr_hugepages
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
That is as expected. 2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem. If the file is then removed, the counts become:
node0
1024 free_hugepages
1024 nr_hugepages
node1
1024 free_hugepages
1024 nr_hugepages
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use. The only way to 'fix' the filesystem
accounting is to unmount the filesystem
If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field. At migration
time, this information is not preserved. To fix, simply transfer
page_private from old to new page at migration time if necessary.
There is a related race with removing a huge page from a file and
migration. When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page(). A page could be migrated
while in this state. However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.
To fix that, check for this condition before migrating a huge page. If
the condition is detected, return EBUSY for the page.
Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc5422230 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building an arm64 allmodconfig kernel with clang results in over 140
warnings about overly large stack frames, the worst ones being:
drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
None of these happen with gcc today, and almost all of these are the
result of a single known issue in llvm. Hopefully it will eventually
get fixed with the clang-9 release.
In the meantime, the best idea I have is to turn off asan-stack for
clang-8 and earlier, so we can produce a kernel that is safe to run.
I have posted three patches that address the frame overflow warnings
that are not addressed by turning off asan-stack, so in combination with
this change, we get much closer to a clean allmodconfig build, which in
turn is necessary to do meaningful build regression testing.
It is still possible to turn on the CONFIG_ASAN_STACK option on all
versions of clang, and it's always enabled for gcc, but when
CONFIG_COMPILE_TEST is set, the option remains invisible, so
allmodconfig and randconfig builds (which are normally done with a
forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.
Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"Three final fixes, one for a feature that is new in this kernel, one
bochs fix for qemu riscv and one atomic modesetting fix.
I've left a few of the other late fixes until next as I didn't want to
throw in anything that wasn't really necessary"
* tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm:
drm/bochs: Fix the ID mismatch error
drm: Block fb changes for async plane updates
drm/amd/display: Use vrr friendly pageflip throttling in DC.
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.
Fixes: bd5f5f4ecb ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun <sironhide0null@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Qualcomm ARM64 Fixes for 5.0-rc8
* Fix TZ memory area size to avoid crashes during boot
* tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
arm64: dts: qcom: msm8998: Extend TZ reserved memory area
OP-TEE driver
- add missing of_node_put after of_device_is_available
* tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee:
tee: optee: add missing of_node_put after of_device_is_available
Pull MIPS fixes from Paul Burton:
"A few more MIPS fixes:
- Fix 16b cmpxchg() operations which could erroneously fail if bits
15:8 of the old value are non-zero. In practice I'm not aware of
any actual users of 16b cmpxchg() on MIPS, but this fixes the
support for it was was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY
systems with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a
non-zero PHYS_OFFSET, correcting the memblock conversion performed
in v4.20"
* tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix memory setup for platforms with PHYS_OFFSET != 0
MIPS: BCM63XX: provide DMA masks for ethernet devices
MIPS: lantiq: pass struct device to DMA API functions
MIPS: fix truncation in __cmpxchg_small for short values
Pull orangefs fixlet from Mike Marshall:
"Remove two un-needed BUG_ONs"
* tag 'for-linus-5.0-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: remove two un-needed BUG_ONs...
Upon setting the cmode on 6390 and 6390X, the associated serdes
interfaces must be powered off/on.
Both 6390X and 6390 share code to do so, but it currently uses the 6390
specific helper mv88e6390_serdes_power() to disable and enable the
serdes interface.
This call will fail silently on 6390X when trying so set a 10G interface
such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs
the lane number based on modes supported by the 6390, and returns 0 when
getting -ENODEV as a lane number.
Using mv88e6390x_serdes_power() should be safe here, since we explicitly
rule-out all ports but the 9 and 10, and because modes supported by 6390
ports 9 and 10 are a subset of those supported on 6390X.
This was tested on 6390X using RXAUI mode.
Fixes: 364e9d7776 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch maintains u64 counters for the number of octets sent and
received. These are kept as two u32's which need to be combined. Fix
the combing, which wrongly worked on u16's.
Fixes: 80c4627b27 ("dsa: mv88x6xxx: Refactor getting a single statistic")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Occasionally, during the disconnection procedure on XenBus which
includes hash cache deinitialization there might be some packets
still in-flight on other processors. Handling of these packets includes
hashing and hash cache population that finally results in hash cache
data structure corruption.
In order to avoid this we prevent hashing of those packets if there
are no queues initialized. In that case RCU protection of queues guards
the hash cache as well.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings:
"d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"
That behavior is observed under certain workloads where sudden spikes
of page cache writes coexist with active atomic skb allocations from
network traffic. Additionally, rework the logic to deal with frag_list
deallocation in a single place.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to Documentation/core-api/printk-formats.rst, size_t should be
printed with %zu, rather than %Zu.
In addition, using %Zu triggers a warning on clang (-Wformat-extra-args):
net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args]
__func__, asoc, max_data);
~~~~~~~~~~~~~~~~^~~~~~~~~
./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited'
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited'
printk(fmt, ##__VA_ARGS__); \
~~~ ^
Fixes: 5b5e0928f7 ("lib/vsprintf.c: remove %Z support")
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Matthias Maennich <maennich@google.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly
[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094] <IRQ>
[10258690.372094] skb_to_sgvec+0x11/0x40
[10258690.372094] start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094] dev_hard_start_xmit+0x9b/0x200
[10258690.372094] sch_direct_xmit+0xff/0x260
[10258690.372094] __qdisc_run+0x15e/0x4e0
[10258690.372094] net_tx_action+0x137/0x210
[10258690.372094] __do_softirq+0xd6/0x2a9
[10258690.372094] irq_exit+0xde/0xf0
[10258690.372094] smp_apic_timer_interrupt+0x74/0x140
[10258690.372094] apic_timer_interrupt+0xf/0x20
[10258690.372094] </IRQ>
In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.
Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.
To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.
Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <lansheng@huawei.com>
Reported-by: Qin Ji <jiqin.ji@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix NULL ptr crash for a special test case
- Align max segment size with logical block size to prevent bugs in
v5.1-rc1.
MMC host:
- cqhci: Minor fixes
- tmio: Prevent interrupt storm
- tmio: Fixup SD/MMC card initialization
- spi: Allow card to be detected during probe
- sdhci-esdhc-imx: Fixup fix for ERR004536"
* tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: correct the fix of ERR004536
mmc: core: align max segment size with logical block size
mmc: cqhci: Fix a tiny potential memory leak on error condition
mmc: cqhci: fix space allocated for transfer descriptor
mmc: core: Fix NULL ptr crash from mmc_should_fail_request
mmc: tmio: fix access width of Block Count Register
mmc: tmio_mmc_core: don't claim spurious interrupts
mmc: spi: Fix card detection during probe
Pull crypto fixes from Herbert Xu:
"This fixes a compiler warning introduced by a previous fix, as well as
two crash bugs on ARM"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512/arm - fix crash bug in Thumb2 build
crypto: sha256/arm - fix crash bug in Thumb2 build
crypto: ccree - add missing inline qualifier
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of ff9fb72bc0 ("debugfs: return error
values, not NULL"). syzbot has found a way to trigger multiple debugfs
files attempting to be created, which fails, and then the error code
gets passed to dentry_path_raw() which obviously does not like it.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-and-tested-by: syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an of_node_put when a tested device node is not available.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
identifier f;
local idexpression e;
expression x;
@@
e = f(...);
... when != of_node_put(e)
when != x = e
when != e = x
when any
if (<+...of_device_is_available(e)...+>) {
... when != of_node_put(e)
(
return e;
|
+ of_node_put(e);
return ...;
)
}
// </smpl>
Fixes: db878f76b9 ("tee: optee: take DT status property into account")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
On big endian arm64 kernels, the xchacha20-neon and xchacha12-neon
self-tests fail because hchacha_block_neon() outputs little endian words
but the C code expects native endianness. Fix it to output the words in
native endianness (which also makes it match the arm32 version).
Fixes: cc7cf991e9 ("crypto: arm64/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The change to encrypt a fifth ChaCha block using scalar instructions
caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests
to start failing on big endian arm64 kernels. The bug is that the
keystream block produced in 32-bit scalar registers is directly XOR'd
with the data words, which are loaded and stored in native endianness.
Thus in big endian mode the data bytes end up XOR'd with the wrong
bytes. Fix it by byte-swapping the keystream words in big endian mode.
Fixes: 2fe55987b2 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are two array out-of-bounds memory accesses, one in
cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both
errors are embarassingly simple, and the fixes are straightforward.
As a FYI for anyone backporting this patch to kernels prior to v4.8,
you'll want to apply the netlbl_bitmap_walk() patch to
cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before
Linux v4.8.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Fixes: 3faa8f982f ("netlabel: Move bitmap manipulation functions to the NetLabel core.")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_input_rcu expects the original ingress device (e.g., for
proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv,
so dev needs to be saved prior to calling it. This was the behavior prior
to the listify changes.
Fixes: 5fa12739a5 ("net: ipv4: listify ip_rcv_finish")
Cc: Edward Cree <ecree@solarflare.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
selftests: pmtu: fix and increase coverage
This series includes a fixup for the pmtu.sh test script, related to IPv6
address management, and adds coverage for the recently reported and fixed
PMTU exception issue
v2 -> v3:
- more cleanups
v1 -> v2:
- several script cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of new tests, explicitly checking that the kernel
timely releases PMTU exceptions on related device removal.
This is mostly a regression test vs the issue fixed by
commit f5b51fe804 ("ipv6: route: purge exception on removal")
Only 2 new test cases have been added, instead of extending all
the existing ones, because the reproducer requires executing
several commands and would slow down too much the tests otherwise.
v2 -> v3:
- more cleanup, still from Stefano
v1 -> v2:
- several script cleanups, as suggested by Stefano
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, the configured IPv6 address could be still "tentative"
at test time, possibly causing tests failures.
We can also drop some sleep along the code and decrease the
timeout for most commands so that the test runtime decreases.
v1 -> v2:
- fix comment (Stefano)
Fixes: d1f1b9cbf3 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to dp83640 delay after soft reset
is needed to set up registers correctly.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The prepare_fb call always happens on new_plane_state.
The drm_atomic_helper_cleanup_planes checks to see if
plane state pointer has changed when deciding to call cleanup_fb on
either the new_plane_state or the old_plane_state.
For a non-async atomic commit the state pointer is swapped, so this
helper calls prepare_fb on the new_plane_state and cleanup_fb on the
old_plane_state. This makes sense, since we want to prepare the
framebuffer we are going to use and cleanup the the framebuffer we are
no longer using.
For the async atomic update helpers this differs. The async atomic
update helpers perform in-place updates on the existing state. They call
drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
This means that prepare_fb is called on the new_plane_state and
cleanup_fb is called on the new_plane_state (not the old).
In the case where old_plane_state->fb == new_plane_state->fb then
there should be no behavioral difference between an async update
and a non-async commit. But there are issues that arise when
old_plane_state->fb != new_plane_state->fb.
The first is that the new_plane_state->fb is immediately cleaned up
after it has been prepared, so we're using a fb that we shouldn't
be.
The second occurs during a sequence of async atomic updates and
non-async regular atomic commits. Suppose there are two framebuffers
being interleaved in a double-buffering scenario, fb1 and fb2:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
We call cleanup_fb on fb2 twice in this example scenario, and any
further use will result in use-after-free.
The simple fix to this problem is to block framebuffer changes
in the drm_atomic_helper_async_check function for now.
v2: Move check by itself, add a FIXME (Daniel)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: fef9df8b59 ("drm/atomic: initial support for asynchronous plane update")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/275364/
Signed-off-by: Dave Airlie <airlied@redhat.com>
security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.
This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.
Fixes: 8869477a49 ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In VRR mode, keep track of the vblank count of the last
completed pageflip in amdgpu_crtc->last_flip_vblank, as
recorded in the pageflip completion handler after each
completed flip.
Use that count to prevent mmio programming a new pageflip
within the same vblank in which the last pageflip completed,
iow. to throttle pageflips to at most one flip per video
frame, while at the same time allowing to request a flip
not only before start of vblank, but also anywhere within
vblank.
The old logic did the same, and made sense for regular fixed
refresh rate flipping, but in vrr mode it prevents requesting
a flip anywhere inside the possibly huge vblank, thereby
reducing framerate in vrr mode instead of improving it, by
delaying a slightly delayed flip requests up to a maximum
vblank duration + 1 scanout duration. This would limit VRR
usefulness to only help applications with a very high GPU
demand, which can submit the flip request before start of
vblank, but then have to wait long for fences to complete.
With this method a flip can be both requested and - after
fences have completed - executed, ie. it doesn't matter if
the request (amdgpu_dm_do_flip()) gets delayed until deep
into the extended vblank due to cpu execution delays. This
also allows clients which want to regulate framerate within
the vrr range a much more fine-grained control of flip timing,
a feature that might be useful for video playback, and is
very useful for neuroscience/vision research applications.
In regular non-VRR mode, retain the old flip submission
behavior. This to keep flip scheduling for fullscreen X11/GLX
OpenGL clients intact, if they use the GLX_OML_sync_control
extensions glXSwapBufferMscOML(, ..., target_msc,...) function
with a specific target_msc target vblank count.
glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
not flip at the proper target_msc for a non-zero target_msc
if VRR mode is active with this patch. They'd often flip one
frame too early. However, this limitation should not matter
much in VRR mode, as scheduling based on vblank counts is
pretty futile/unusable under variable refresh duration
anyway, so no real extra harm is done.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With Micrel KSZ8061 PHY, the link may occasionally not come up after
Ethernet cable connect. The vendor's (Microchip, former Micrel) errata
sheet 80000688A.pdf descripes the problem and possible workarounds in
detail, see below.
The batch implements workaround 1, which permanently fixes the issue.
DESCRIPTION
Link-up may not occur properly when the Ethernet cable is initially
connected. This issue occurs more commonly when the cable is connected
slowly, but it may occur any time a cable is connected. This issue occurs
in the auto-negotiation circuit, and will not occur if auto-negotiation
is disabled (which requires that the two link partners be set to the
same speed and duplex).
END USER IMPLICATIONS
When this issue occurs, link is not established. Subsequent cable
plug/unplaug cycle will not correct the issue.
WORk AROUND
There are four approaches to work around this issue:
1. This issue can be prevented by setting bit 15 in MMD device address 1,
register 2, prior to connecting the cable or prior to setting the
Restart Auto-negotiation bit in register 0h. The MMD registers are
accessed via the indirect access registers Dh and Eh, or via the Micrel
EthUtil utility as shown here:
. if using the EthUtil utility (usually with a Micrel KSZ8061
Evaluation Board), type the following commands:
> address 1
> mmd 1
> iw 2 b61a
. Alternatively, write the following registers to write to the
indirect MMD register:
Write register Dh, data 0001h
Write register Eh, data 0002h
Write register Dh, data 4001h
Write register Eh, data B61Ah
2. The issue can be avoided by disabling auto-negotiation in the KSZ8061,
either by the strapping option, or by clearing bit 12 in register 0h.
Care must be taken to ensure that the KSZ8061 and the link partner
will link with the same speed and duplex. Note that the KSZ8061
defaults to full-duplex when auto-negotiation is off, but other
devices may default to half-duplex in the event of failed
auto-negotiation.
3. The issue can be avoided by connecting the cable prior to powering-up
or resetting the KSZ8061, and leaving it plugged in thereafter.
4. If the above measures are not taken and the problem occurs, link can
be recovered by setting the Restart Auto-Negotiation bit in
register 0h, or by resetting or power cycling the device. Reset may
be either hardware reset or software reset (register 0h, bit 15).
PLAN
This errata will not be corrected in the future revision.
Fixes: 7ab59dc15e ("drivers/net/phy/micrel_phy: Add support for new PHYs")
Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com>
Signed-off-by: Rajasingh Thavamani <T.Rajasingh@landisgyr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netif_msg_init() API takes on input the amount of bits to be set. The
description of debug parameter in the enc28j60 module is misleading in this
sense and passing 0xffff does not give an expected behaviour.
Fix the description of debug module parameter to show what exactly is expected.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1 << 31 is Undefined Behaviour according to the C standard.
Use U type modifier to avoid theoretical overflow.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There have been reports of oversize UDP packets being sent to the
driver to be transmitted, causing error conditions. The issue is
likely caused by the dst of the SKB switching between 'lo' with
64K MTU and the hardware device with a smaller MTU. Patches are
being proposed by Mahesh Bandewar <maheshb@google.com> to fix the
issue.
In the meantime, add a quick length check in the driver to prevent
the error. The driver uses the TX packet size as index to look up an
array to setup the TX BD. The array is large enough to support all MTU
sizes supported by the driver. The oversize TX packet causes the
driver to index beyond the array and put garbage values into the
TX BD. Add a simple check to prevent this.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logical block size is the lowest possible block size that the storage
device can address. Max segment size is often related with controller's
DMA capability. And it is reasonable to align max segment size with
logical block size.
SDHCI sets un-aligned max segment size, and causes ADMA error, so
fix it by aligning max segment size with logical block size.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Faiz Abbas <faiz_abbas@ti.com>
Cc: linux-block@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Free up the allocated memory in the case of error return
The value of mmc_host->cqe_enabled stays 'false'. Thus, cqhci_disable
(mmc_cqe_ops->cqe_disable) won't be called to free the memory. Also,
cqhci_disable() seems to be designed to disable and free all resources, not
suitable to handle this corner case.
Fixes: a4080225f5 ("mmc: cqhci: support for command queue enabled host")
Signed-off-by: Alamy Liu <alamy.liu@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
There is not enough space being allocated when DCMD is disabled.
CQE_DCMD is not necessary to be enabled when CQE is enabled.
(Software could halt CQE to send command)
In the case that CQE_DCMD is not enabled, it still needs to allocate
space for data transfer. For instance:
CQE_DCMD is enabled: 31 slots space (one slot used by DCMD)
CQE_DCMD is disabled: 32 slots space
Fixes: a4080225f5 ("mmc: cqhci: support for command queue enabled host")
Signed-off-by: Alamy Liu <alamy.liu@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
scsi_device_quiesce() and scsi_device_resume() are called during
system-wide suspend and resume. scsi_device_quiesce() only succeeds for
SCSI devices that are in one of the RUNNING, OFFLINE or TRANSPORT_OFFLINE
states (see also scsi_set_device_state()). This patch avoids that the
following warning is triggered when resuming a system for which quiescing a
SCSI device failed:
WARNING: CPU: 2 PID: 11303 at drivers/scsi/scsi_lib.c:2600 scsi_device_resume+0x4f/0x58
CPU: 2 PID: 11303 Comm: kworker/u8:70 Not tainted 5.0.0-rc1+ #50
Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016
Workqueue: events_unbound async_run_entry_fn
Call Trace:
scsi_dev_type_resume+0x2e/0x60
async_run_entry_fn+0x32/0xd8
process_one_work+0x1f4/0x420
worker_thread+0x28/0x3c0
kthread+0x118/0x130
ret_from_fork+0x22/0x40
Cc: Przemek Socha <soprwa@gmail.com>
Reported-by: Przemek Socha <soprwa@gmail.com>
Fixes: 3a0a529971 ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case of CQHCI, mrq->cmd may be NULL for data requests (non DCMD).
In such case mmc_should_fail_request is directly dereferencing
mrq->cmd while cmd is NULL.
Fix this by checking for mrq->cmd pointer.
Fixes: 72a5af554d ("mmc: core: Add support for handling CQE requests")
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
My console locks up as soon as Linux writes to [88800000,88f00000[
AFAIU, that memory area is reserved for trustzone.
Extend TZ reserved memory range, to prevent Linux from stepping on
trustzone's toes.
Cc: stable@vger.kernel.org # 4.20+
Reviewed-by: Sibi Sankar <sibis@codeaurora.org>
Fixes: c783394956 ("arm64: dts: qcom: msm8998: Add smem related nodes")
Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
When sending multicast messages via blocking socket,
if sending link is congested (tsk->cong_link_cnt is set to 1),
the sending thread will be put into sleeping state. However,
tipc_sk_filter_rcv() is called under socket spin lock but
tipc_wait_for_cond() is not. So, there is no guarantee that
the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in
CPU-1 will be perceived by CPU-0. If that is the case, the sending
thread in CPU-0 after being waken up, will continue to see
tsk->cong_link_cnt as 1 and put the sending thread into sleeping
state again. The sending thread will sleep forever.
CPU-0 | CPU-1
tipc_wait_for_cond() |
{ |
// condition_ = !tsk->cong_link_cnt |
while ((rc_ = !(condition_))) { |
... |
release_sock(sk_); |
wait_woken(); |
| if (!sock_owned_by_user(sk))
| tipc_sk_filter_rcv()
| {
| ...
| tipc_sk_proto_rcv()
| {
| ...
| tsk->cong_link_cnt--;
| ...
| sk->sk_write_space(sk);
| ...
| }
| ...
| }
sched_annotate_sleep(); |
lock_sock(sk_); |
remove_wait_queue(); |
} |
} |
This commit fixes it by adding memory barrier to tipc_sk_proto_rcv()
and tipc_wait_for_cond().
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Incoming packets may have IP header checksum verified by the host.
They may not have IP header checksum computed after coalescing.
This patch re-compute the checksum when necessary, otherwise the
packets may be dropped, because Linux network stack always checks it.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net: Fail route add with unsupported nexthop attribute
RTA_VIA was added for MPLS as a way of specifying a gateway from a
different address family. IPv4 and IPv6 do not currently support RTA_VIA
so using it leads to routes that are not what the user intended. Catch
and fail - returning a proper error message.
MPLS on the other hand does not support RTA_GATEWAY since it does not
make sense to have a nexthop from the MPLS address family. Similarly,
catch and fail - returning a proper error message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
MPLS does not support nexthops with an MPLS address family.
Specifically, it does not handle RTA_GATEWAY attribute. Make it
clear by returning an error.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 currently does not support nexthops outside of the AF_INET6 family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
$ ip ro ls
...
2001:db8:2::/64 dev eth0 metric 1024 pref medium
Catch this and fail the route add:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
Error: IPv6 does not support RTA_VIA attribute.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 currently does not support nexthops outside of the AF_INET family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
$ ip ro ls
...
172.16.1.0/24 dev eth0
Catch this and fail the route add:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
Error: IPv4 does not support RTA_VIA attribute.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero()
to increase the refcount, both map->refcnt and map->usercnt. Then, if
bpf_map_new_fd() fails, should handle map->usercnt too.
Fixes: bd5f5f4ecb ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun <sironhide0null@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit 57384592c4 ("iommu/vt-d: Store bus information in RMRR PCI
device path") changed the type of the path data, however, the change in
path type was not reflected in size calculations. Update to use the
correct type and prevent a buffer overflow.
This bug manifests in systems with deep PCI hierarchies, and can lead to
an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
or can lead to overflow of slab-allocated data.
BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
Write of size 1 at addr ffffffff90445d80 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.14.87-rt49-02406-gd0a0e96 #1
Call Trace:
? dump_stack+0x46/0x59
? print_address_description+0x1df/0x290
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? kasan_report+0x256/0x340
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? e820__memblock_setup+0xb0/0xb0
? dmar_dev_scope_init+0x424/0x48f
? __down_write_common+0x1ec/0x230
? dmar_dev_scope_init+0x48f/0x48f
? dmar_free_unused_resources+0x109/0x109
? cpumask_next+0x16/0x20
? __kmem_cache_create+0x392/0x430
? kmem_cache_create+0x135/0x2f0
? e820__memblock_setup+0xb0/0xb0
? intel_iommu_init+0x170/0x1848
? _raw_spin_unlock_irqrestore+0x32/0x60
? migrate_enable+0x27a/0x5b0
? sched_setattr+0x20/0x20
? migrate_disable+0x1fc/0x380
? task_rq_lock+0x170/0x170
? try_to_run_init_process+0x40/0x40
? locks_remove_file+0x85/0x2f0
? dev_prepare_static_identity_mapping+0x78/0x78
? rt_spin_unlock+0x39/0x50
? lockref_put_or_lock+0x2a/0x40
? dput+0x128/0x2f0
? __rcu_read_unlock+0x66/0x80
? __fput+0x250/0x300
? __rcu_read_lock+0x1b/0x30
? mntput_no_expire+0x38/0x290
? e820__memblock_setup+0xb0/0xb0
? pci_iommu_init+0x25/0x63
? pci_iommu_init+0x25/0x63
? do_one_initcall+0x7e/0x1c0
? initcall_blacklisted+0x120/0x120
? kernel_init_freeable+0x27b/0x307
? rest_init+0xd0/0xd0
? kernel_init+0xf/0x120
? rest_init+0xd0/0xd0
? ret_from_fork+0x1f/0x40
The buggy address belongs to the variable:
dmar_pci_notify_info_buf+0x40/0x60
Fixes: 57384592c4 ("iommu/vt-d: Store bus information in RMRR PCI device path")
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In R-Car Gen2 or later, the maximum number of transfer blocks are
changed from 0xFFFF to 0xFFFFFFFF. Therefore, Block Count Register
should use iowrite32().
If another system (U-boot, Hypervisor OS, etc) uses bit[31:16], this
value will not be cleared. So, SD/MMC card initialization fails.
So, check for the bigger register and use apropriate write. Also, mark
the register as extended on Gen2.
Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
[wsa: use max_blk_count in if(), add Gen2, update commit message]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: stable@kernel.org
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
[Ulf: Fixed build error]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The change to use dma_set_mask() incorrectly made a second call with the 32
bit DMA mask value when the call with the 64 bit DMA mask value succeeded.
Fixes: 453cd3700c ("scsi: hptiop: use dma_set_mask")
Cc: <stable@vger.kernel.org>
Suggested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA
mask value succeeded.
[mkp: fixed commit message]
Fixes: e4db40e7a1 ("scsi: hisi_sas: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded.
Fixes: c22b332d81 ("scsi: csiostor: switch to generic DMA API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded.
[mkp: fixed commit message]
Fixes: a69b080025 ("scsi: bfa: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded.
[mkp: fixed subject]
Fixes: 3a21986f1a ("scsi: aic94xx: fully convert to the generic DMA API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded.
Fixes: b1fa122930 ("scsi: 3w-sas: fully convert to the generic DMA API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded.
Fixes: b000bced57 ("scsi: 3w-9xxx: fully convert to the generic DMA API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA mask
value succeeded. This resulted in NVMe/FC connections failing due to
corrupted data buffers, and various other SCSI/FCP I/O errors.
Fixes: f30e1bfd61 ("scsi: lpfc: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace set_current_state with __set_current_state since no memory
barrier is needed at this point.
Signed-off-by: Timur Celik <mail@timurcelik.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a cell with a volume location server list is added manually by
echoing the details into /proc/net/afs/cells, a record is added but the
flag saying it has been looked up isn't set.
This causes the VL server rotation code to wait forever, with the top of
/proc/pid/stack looking like:
afs_select_vlserver+0x3a6/0x6f3
afs_vl_lookup_vldb+0x4b/0x92
afs_create_volume+0x25/0x1b9
...
with the thread stuck in afs_start_vl_iteration() waiting for
AFS_CELL_FL_NO_LOOKUP_YET to be cleared.
Fix this by clearing AFS_CELL_FL_NO_LOOKUP_YET when setting up a record
if that record's details were supplied manually.
Fixes: 0a5143f2f8 ("afs: Implement VL server rotation")
Reported-by: Dave Botsch <dwb7@cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When calling __put_user(foo(), ptr), the __put_user() macro would call
foo() in between __uaccess_begin() and __uaccess_end(). If that code
were buggy, then those bugs would be run without SMAP protection.
Fortunately, there seem to be few instances of the problem in the
kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
Therefore, evaluate __put_user()'s argument before setting AC.
This issue was noticed when an objtool hack by Peter Zijlstra complained
about genregs_get() and I compared the assembly output to the C source.
[ bp: Massage commit message and fixed up whitespace. ]
Fixes: 11f1a4b975 ("x86: reorganize SMAP handling in user space accesses")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org
Commit 9060cb719e ("net: crypto set sk to NULL when af_alg_release.")
fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is
closed concurrently with fchownat(). However, it ignored that many
other proto_ops::release() methods don't set sock->sk to NULL and
therefore allow the same use-after-free:
- base_sock_release
- bnep_sock_release
- cmtp_sock_release
- data_sock_release
- dn_release
- hci_sock_release
- hidp_sock_release
- iucv_sock_release
- l2cap_sock_release
- llcp_sock_release
- llc_ui_release
- rawsock_release
- rfcomm_sock_release
- sco_sock_release
- svc_release
- vcc_release
- x25_release
Rather than fixing all these and relying on every socket type to get
this right forever, just make __sock_release() set sock->sk to NULL
itself after calling proto_ops::release().
Reproducer that produces the KASAN splat when any of these socket types
are configured into the kernel:
#include <pthread.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
pthread_t t;
volatile int fd;
void *close_thread(void *arg)
{
for (;;) {
usleep(rand() % 100);
close(fd);
}
}
int main()
{
pthread_create(&t, NULL, close_thread, NULL);
for (;;) {
fd = socket(rand() % 50, rand() % 11, 0);
fchownat(fd, "", 1000, 1000, 0x1000);
close(fd);
}
}
Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
it is unconditionally dereferenced in tunnel_key_init() error handler.
Verify that metadata pointer is not NULL before dereferencing it in
tunnel_key_init error handling code.
Fixes: ee28bb56ac ("net/sched: fix memory leak in act_tunnel_key_init()")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./net/dsa/port.c:294:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 284, but without a corresponding object release within this function.
./net/dsa/dsa2.c:627:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function.
./net/dsa/dsa2.c:630:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function.
./net/dsa/dsa2.c:636:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function.
./net/dsa/dsa2.c:639:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 618, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 9da3f2b740.
It was well-intentioned, but wrong. Overriding the exception tables for
instructions for random reasons is just wrong, and that is what the new
code did.
It caused problems for tracing, and it caused problems for strncpy_from_user(),
because the new checks made perfectly valid use cases break, rather than
catch things that did bad things.
Unchecked user space accesses are a problem, but that's not a reason to
add invalid checks that then people have to work around with silly flags
(in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
odd way to say "this commit was wrong" and was sprinked into random
places to hide the wrongness).
The real fix to unchecked user space accesses is to get rid of the
special "let's not check __get_user() and __put_user() at all" logic.
Make __{get|put}_user() be just aliases to the regular {get|put}_user()
functions, and make it impossible to access user space without having
the proper checks in places.
The raison d'être of the special double-underscore versions used to be
that the range check was expensive, and if you did multiple user
accesses, you'd do the range check up front (like the signal frame
handling code, for example). But SMAP (on x86) and PAN (on ARM) have
made that optimization pointless, because the _real_ expense is the "set
CPU flag to allow user space access".
Do let's not break the valid cases to catch invalid cases that shouldn't
even exist.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tobin C. Harding <tobin@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have encountered an interrupt storm during the eMMC chip probing (and
the chip finally didn't get detected). It turned out that U-Boot left
the DMAC interrupts enabled while the Linux driver didn't use those.
The SDHI driver's interrupt handler somehow assumes that, even if an
SDIO interrupt didn't happen, it should return IRQ_HANDLED. I think
that if none of the enabled interrupts happened and got handled, we
should return IRQ_NONE -- that way the kernel IRQ code recoginizes
a spurious interrupt and masks it off pretty quickly...
Fixes: 7729c7a232 ("mmc: tmio: Provide separate interrupt handlers")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
When using the mmc_spi driver with a card-detect pin, I noticed that the
card was not detected immediately after probe, but only after it was
unplugged and plugged back in (and the CD IRQ fired).
The call tree looks something like this:
mmc_spi_probe
mmc_add_host
mmc_start_host
_mmc_detect_change
mmc_schedule_delayed_work(&host->detect, 0)
mmc_rescan
host->bus_ops->detect(host)
mmc_detect
_mmc_detect_card_removed
host->ops->get_cd(host)
mmc_gpio_get_cd -> -ENOSYS (ctx->cd_gpio not set)
mmc_gpiod_request_cd
ctx->cd_gpio = desc
To fix this issue, call mmc_detect_change after the card-detect GPIO/IRQ
is registered.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
commit 137cd7100e
"ARM: dts: Enable Gemini flash access" contained a bug
by disabling the display controller, while the whole
idea with the patch was to enable flash access AND
the display controller, simultaneously. Fix it up.
Fixes: 137cd7100e ("ARM: dts: Enable Gemini flash access")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch moves setting of the current state into the loop. Otherwise
the task may end up in a busy wait loop if none of the break conditions
are met.
Signed-off-by: Timur Celik <mail@timurcelik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the file names of the FW files which this driver handles into
the module description.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
when act_skbedit was converted to use RCU in the data plane, we added an
error path, but we forgot to drop the action refcount in case of failure
during a 'replace' operation:
# tc actions add action skbedit ptype otherhost pass index 100
# tc action show action skbedit
total acts 1
action order 0: skbedit ptype otherhost pass
index 100 ref 1 bind 0
# tc actions replace action skbedit ptype otherhost drop index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action show action skbedit
total acts 1
action order 0: skbedit ptype otherhost pass
index 100 ref 2 bind 0
Ensure we call tcf_idr_release(), in case 'params_new' allocation failed,
also when the action is being replaced.
Fixes: c749cdda90 ("net/sched: act_skbedit: don't use spinlock in the data path")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 4e8ddd7f17 ("net: sched: don't release reference on action
overwrite"), the error path of all actions was converted to drop refcount
also when the action was being overwritten. But we forgot act_ipt_init(),
in case allocation of 'tname' was not successful:
# tc action add action xt -j LOG --log-prefix hello index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "hello" index 100
# tc action show action xt
total acts 1
action order 0: tablename: mangle hook: NF_IP_POST_ROUTING
target LOG level warning prefix "hello"
index 100 ref 1 bind 0
# tc action replace action xt -j LOG --log-prefix world index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "world" index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action show action xt
total acts 1
action order 0: tablename: mangle hook: NF_IP_POST_ROUTING
target LOG level warning prefix "hello"
index 100 ref 2 bind 0
Ensure we call tcf_idr_release(), in case 'tname' allocation failed, also
when the action is being replaced.
Fixes: 4e8ddd7f17 ("net: sched: don't release reference on action overwrite")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull KVM fixes from Paolo Bonzini:
"Bug fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
kvm: x86: Return LA57 feature based on hardware capability
x86/kvm/mmu: fix switch between root and guest MMUs
s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity
Pull networking fixes from David Miller:
"Hopefully the last pull request for this release. Fingers crossed:
1) Only refcount ESP stats on full sockets, from Martin Willi.
2) Missing barriers in AF_UNIX, from Al Viro.
3) RCU protection fixes in ipv6 route code, from Paolo Abeni.
4) Avoid false positives in untrusted GSO validation, from Willem de
Bruijn.
5) Forwarded mesh packets in mac80211 need more tailroom allocated,
from Felix Fietkau.
6) Use operstate consistently for linkup in team driver, from George
Wilkie.
7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
and PF code paths.
8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.
9) nfp eBPF code gen fixes from Jiong Wang.
10) bnxt_en firmware timeout fix from Michael Chan.
11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.
12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
net: phy: realtek: Dummy IRQ calls for RTL8366RB
tcp: repaired skbs must init their tso_segs
net/x25: fix a race in x25_bind()
net: dsa: Remove documentation for port_fdb_prepare
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
bpf, doc: add bpf list as secondary entry to maintainers file
udp: fix possible user after free in error handler
udpv6: fix possible user after free in error handler
fou6: fix proto error handler argument type
udpv6: add the required annotation to mib type
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Documentation: networking: switchdev: Update port parent ID section
...
This fixes a regression introduced by
commit 0d2e778e38
"net: phy: replace PHY_HAS_INTERRUPT with a check for
config_intr and ack_interrupt".
This assumes that a PHY cannot trigger interrupt unless
it has .config_intr() or .ack_interrupt() implemented.
A later patch makes the code assume both need to be
implemented for interrupts to be present.
But this PHY (which is inside a DSA) will happily
fire interrupts without either callback.
Implement dummy callbacks for .config_intr() and
.ack_interrupt() in the phy header to fix this.
Tested on the RTL8366RB on D-Link DIR-685.
Fixes: 0d2e778e38 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This callback was removed some time ago, also remove the documentation.
Fixes: 1b6dd556c3 ("net: dsa: Remove prepare phase for FDB")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5a2de63fd1 ("bridge: do not add port to router list
when receives query with source 0.0.0.0") and commit 0fe5119e26 ("net:
bridge: remove ipv6 zero address check in mcast queries")
The reason is RFC 4541 is not a standard but suggestive. Currently we
will elect 0.0.0.0 as Querier if there is no ip address configured on
bridge. If we do not add the port which recives query with source
0.0.0.0 to router list, the IGMP reports will not be about to forward
to Querier, IGMP data will also not be able to forward to dest.
As Nikolay suggested, revert this change first and add a boolopt api
to disable none-zero election in future if needed.
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Reported-by: Sebastian Gottschall <s.gottschall@newmedia-net.de>
Fixes: 5a2de63fd1 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
Fixes: 0fe5119e26 ("net: bridge: remove ipv6 zero address check in mcast queries")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like commit e2ba732a16 ("selftests: fib_tests: sleep after
changing carrier"), wait one second to allow linkwatch to propagate the
carrier change to the stack.
There are two sets of carrier tests. The first slept after the carrier
was set to off, and when the second set ran, it was likely that the
linkwatch would be able to run again without much delay, reducing the
likelihood of a race. However, if you run 'fib_tests.sh -t carrier' on a
loop, you will quickly notice the failures.
Sleeping on the second set of tests make the failures go away.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cards_found is a static variable, but when it enters atl2_probe(),
cards_found is set to zero, the value is not consistent with last probe,
so next behavior is not our expect.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Marvell Alaska PHYs support 2.5G, 5G and 10G BaseT links. Their
default behaviour is to advertise all of these modes, but at the moment,
only 10GBaseT is supported. To prevent link partners from establishing
link at that speed, clear these modes upon configuring aneg parameters.
Fixes: 20b2af32ff ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull powerpc fix from Michael Ellerman:
"One fix for an oops when using SRIOV, introduced by the recent changes
to support compound IOMMU groups.
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/sriov: Register IOMMU groups for VFs
Pull SCSI fixes from James Bottomley:
"Four small fixes: three in drivers and one in the core.
The core fix is also minor in scope since the bug it fixes is only
known to affect systems using SCSI reservations. Of the driver bugs,
the libsas one is the most major because it can lead to multiple disks
on the same expander not being exposed"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: reset host byte in DID_NEXUS_FAILURE case
scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation
scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
Daniel Borkmann says:
====================
pull-request: bpf 2019-02-23
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a bug in BPF's LPM deletion logic to match correct prefix
length, from Alban.
2) Fix AF_XDP teardown by not destroying umem prematurely as it
is still needed till all outstanding skbs are freed, from Björn.
3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking
signal_pending() outside need_resched() condition which is never
triggered there, from Stanislav.
4) Fix two nfp JIT bugs, one in code emission for K-based xor, and
another one to explicitly clear upper bits in alu32, from Jiong.
5) Add bpf list address to maintainers file, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull keys fixes from James Morris:
"Two fixes from Eric Biggers"
* 'fixes-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: always initialize keyring_index_key::desc_len
KEYS: user: Align the payload buffer
Pull power management fixes from Rafael Wysocki:
"These fix a regression in the PM-runtime framework introduced by the
recent switch-over of it to using hrtimers and a use-after-free
introduced by one of the recent changes in the scmi-cpufreq driver.
Specifics:
- Use hrtimer_try_to_cancel() instead of hrtimer_cancel() in the
PM-runtime framework to avoid a possible timer-related deadlock
introduced recently (Vincent Guittot).
- Reorder the scmi-cpufreq driver code to avoid accessing memory that
has just been freed (Yangtao Li)"
* tag 'pm-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM-runtime: Fix deadlock when canceling hrtimer
cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()
Pull ARM SoC fixes from Arnd Bergmann:
"Only a handful of device tree fixes, all simple enough:
NVIDIA Tegra:
- Fix a regression for booting on chromebooks
TI OMAP:
- Two fixes PHY mode on am335x reference boards
Marvell mvebu:
- A regression fix for Armada XP NAND flash controllers
- An incorrect reset signal on the clearfog board"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
ARM: dts: am335x-evm: Fix PHY mode for ethernet
ARM: dts: am335x-evmsk: Fix PHY mode for ethernet
arm64: dts: clearfog-gt-8k: fix SGMII PHY reset signal
ARM: dts: armada-xp: fix Armada XP boards NAND description
Pull ARC fixes from Vineet Gupta:
"Fixes for ARC for 5.0, bunch of those are stable fodder anyways so
sooner the better.
- Fix memcpy to prevent prefetchw beyond end of buffer [Eugeniy]
- Enable unaligned access early to prevent exceptions given newer gcc
code gen [Eugeniy]
- Tighten up uboot arg checking to prevent false negatives and also
allow both jtag and bootloading to coexist w/o config option as
needed by kernelCi folks [Eugeniy]
- Set slab alignment to 8 for ARC to avoid the atomic64_t unalign
[Alexey]
- Disable regfile auto save on interrupts on HSDK platform due to a
silicon issue [Vineet]
- Avoid HS38x boot printing crash by not reading HS48x only reg
[Vineet]"
* tag 'arc-5.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARCv2: don't assume core 0x54 has dual issue
ARC: define ARCH_SLAB_MINALIGN = 8
ARC: enable uboot support unconditionally
ARC: U-boot: check arguments paranoidly
ARCv2: support manual regfile save on interrupts
ARC: uacces: remove lp_start, lp_end from clobber list
ARC: fix actionpoints configuration detection
ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
ARCv2: Enable unaligned access in early ASM code
We recently created a bpf@vger.kernel.org list (https://lore.kernel.org/bpf/)
for BPF related discussions, originally in context of BPF track at LSF/MM
for topic discussions. It's *optional* but *desirable* to keep it in Cc for
BPF related kernel/loader/llvm/tooling threads, meaning also infrastructure
like llvm that sits on top of kernel but is crucial to BPF. In any case,
netdev with it's bpf delegate is *as-is* today primary list for patches, so
nothing changes in the workflow. Main purpose is to have some more awareness
for the bpf@vger.kernel.org list that folks can Cc for BPF specific topics.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull parisc fixes from Helge Deller:
"Fix ptrace syscall number modification which has been broken since
kernel v4.5 and provide alternative email addresses for the remaining
users of the retired parisc-linux.org email domain"
* 'parisc-5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
CREDITS/MAINTAINERS: Retire parisc-linux.org email domain
parisc: Fix ptrace syscall number modification
Pull more Kbuild fixes from Masahiro Yamada:
- fix scripts/kallsyms.c to correctly check too long symbol names
- fix sh build error for the combination of CONFIG_OF_EARLY_FLATTREE=y
and CONFIG_USE_BUILTIN_DTB=n
* tag 'kbuild-fixes-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
sh: fix build error for invisible CONFIG_BUILTIN_DTB_SOURCE
kallsyms: Handle too long symbols in kallsyms.c
Paolo Abeni says:
====================
udp: a few fixes
This series includes some UDP-related fixlet. All this stuff has been
pointed out by the sparse tool. The first two patches are just annotation
related, while the last 2 cover some very unlikely races.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the previous commit, this addresses the same issue for
ipv4: use a single fetch operation and use the correct rcu
annotation.
Fixes: e7cc082455 ("udp: Support for error handlers of tunnels with arbitrary destination port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before derefencing the encap pointer, commit e7cc082455 ("udp: Support
for error handlers of tunnels with arbitrary destination port") checks
for a NULL value, but the two fetch operation can race with removal.
Fix the above using a single access.
Also fix a couple of type annotations, to make sparse happy.
Fixes: e7cc082455 ("udp: Support for error handlers of tunnels with arbitrary destination port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last argument of gue6_err_proto_handler() has a wrong type annotation,
fix it and make sparse happy again.
Fixes: b8a51b38e4 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 029a374348 ("udp6: cleanup stats accounting in recvmsg()")
I forgot to add the percpu annotation for the mib pointer. Add it, and
make sparse happy.
Fixes: 029a374348 ("udp6: cleanup stats accounting in recvmsg()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
keep legacy software happy. This is similar to what was done for
ipv4 in commit 709772e6e0 ("net: Fix routing tables with
id > 255 for legacy software").
Signed-off-by: Kalash Nainwal <kalash@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: firmware message delay fixes.
We were seeing some intermittent firmware message timeouts in our lab and
these 2 small patches fix them. Please apply to stable as well. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic that polls for the firmware message response uses a shorter
sleep interval for the first few passes. But there was a typo so it
was using the wrong counter (larger counter) for these short sleep
passes. The result is a slightly shorter timeout period for these
firmware messages than intended. Fix it by using the proper counter.
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiong Wang says:
====================
Code-gen for BPF_ALU | BPF_XOR | BPF_K is wrong when imm is -1,
also high 32-bit of 64-bit register should always be cleared.
This set fixed both bugs.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP BPF JIT compiler is doing a couple of small optimizations when jitting
ALU imm instructions, some of these optimizations could save code-gen, for
example:
A & -1 = A
A | 0 = A
A ^ 0 = A
However, for ALU32, high 32-bit of the 64-bit register should still be
cleared according to ISA semantics.
Fixes: cd7df56ed3 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Johannes Berg says:
====================
Three more fixes:
* mac80211 mesh code wasn't allocating SKB tailroom properly
in some cases
* tx_sk_pacing_shift should be 7 for better performance
* mac80211_hwsim wasn't propagating genlmsg_reply() errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the section about switchdev drivers having to implement a
switchdev_port_attr_get() function to return
SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after
commit bccb30254a ("net: Get rid of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID").
Fixes: bccb30254a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sys_setsockopt() already checks for `optlen < 0`. Add an equivalent check
to the compat path for robustness. This has to be `> INT_MAX` instead of
`< 0` because the signedness of `optlen` is different here.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a netdevice is unregistered, we flush the relevant exception
via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
Finally, we end-up calling rt6_remove_exception(), where we release
the relevant dst, while we keep the references to the related fib6_info and
dev. Such references should be released later when the dst will be
destroyed.
There are a number of caches that can keep the exception around for an
unlimited amount of time - namely dst_cache, possibly even socket cache.
As a result device registration may hang, as demonstrated by this script:
ip netns add cl
ip netns add rt
ip netns add srv
ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
ip link add name cl_veth type veth peer name cl_rt_veth
ip link set dev cl_veth netns cl
ip -n cl link set dev cl_veth up
ip -n cl addr add dev cl_veth 2001::2/64
ip -n cl route add default via 2001::1
ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
ip -n cl link set tunv6 up
ip -n cl addr add 2013::2/64 dev tunv6
ip link set dev cl_rt_veth netns rt
ip -n rt link set dev cl_rt_veth up
ip -n rt addr add dev cl_rt_veth 2001::1/64
ip link add name rt_srv_veth type veth peer name srv_veth
ip link set dev srv_veth netns srv
ip -n srv link set dev srv_veth up
ip -n srv addr add dev srv_veth 2002::1/64
ip -n srv route add default via 2002::2
ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
ip -n srv link set tunv6 up
ip -n srv addr add 2013::1/64 dev tunv6
ip link set dev rt_srv_veth netns rt
ip -n rt link set dev rt_srv_veth up
ip -n rt addr add dev rt_srv_veth 2002::2/64
ip netns exec srv netserver & sleep 0.1
ip netns exec cl ping6 -c 4 2013::1
ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
ip -n rt link set dev rt_srv_veth mtu 1400
wait %2
ip -n cl link del cl_veth
This commit addresses the issue purging all the references held by the
exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
v1 -> v2:
- re-order the code to avoid accessing dst and net after dst_dev_put()
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Lomovtsev says:
====================
nic: thunderx: fix communication races between VF & PF
The ThunderX CN88XX NIC Virtual Function driver uses mailbox interface
to communicate to physical function driver. Each of VF has it's own pair
of mailbox registers to read from and write to. The mailbox registers
has no protection from possible races, so it has to be implemented
at software side.
After long term testing by loop of 'ip link set <ifname> up/down'
command it was found that there are two possible scenarios when
race condition appears:
1. VF receives link change message from PF and VF send RX mode
configuration message to PF in the same time from separate thread.
2. PF receives RX mode configuration from VF and in the same time,
in separate thread PF detects link status change and sends appropriate
message to particular VF.
Both cases leads to mailbox data to be rewritten, NIC VF messaging control
data to be updated incorrectly and communication sequence gets broken.
This patch series is to address race condition with VF & PF communication.
Changes:
v1 -> v2
- 0000: correct typo in cover letter subject: 'betwen' -> 'between';
- move link state polling request task from pf to vf
instead of cheking status of mailbox irq;
v2 -> v3
- 0003: change return type of nicvf_send_cfg_done() function
from int to void;
- 0007: update subject and remove unused variable 'netdev'
from nicvf_link_status_check_task() function;
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since link change polling routine was moved to nicvf side,
we don't need anymore polling function at nicpf side along
with link status info for all enabled Vfs as at VF side
this info is already tracked.
This commit is to remove unnecessary code & fields from
nicpf structure.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the link change polling task to VF side in order to
prevent races between VF and PF while sending link change
message(s). This commit is to implement link change request
to be initiated by VF.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases it could happen that nicvf_send_msg_to_pf() could be called
concurrently for the same NIC VF, and thus re-writing mailbox contents and
breaking messaging sequence with PF by re-writing NICVF data.
This commit is to implement mutex for NICVF to protect mailbox registers
and NICVF messaging control data from concurrent access.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To communicate to PF each of ThunderX NIC VF uses mailbox which is
pair of 64 bit registers available to both VFn and PF.
This commit is to change the xcast message structure in order to
fit it into 64 bit.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_set_mode invokes number of messages to be send to PF for receive
mode configuration. In case if there any issues we need to stop sending
messages and release allocated memory.
This commit is to implement check of nicvf_msg_send_to_pf() result.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of NIC VF initialization VF sends CFG_DONE message to PF without
using nicvf_msg_send_to_pf routine. This potentially could re-write data in
mailbox. This commit is to implement common way of sending CFG_DONE message
by the same way with other configuration messages by using
nicvf_send_msg_to_pf() routine.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having one work queue for receive mode configuration ndo_set_rx_mode()
call for all VFs results in making each of them wait till the
set_rx_mode() call completes for another VF if any of close, set
receive mode and change flags calls being already invoked. Potentially
this could cause device state change before appropriate call of receive
mode configuration completes, so the call itself became meaningless,
corrupt data or break configuration sequence.
We don't need any delays in NIC VF configuration sequence so having delayed
work call with 0 delay has no sense.
This commit is to implement one work queue for each NIC VF for set_rx_mode
task and to let them work independently and replacing delayed_work
with work_struct.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct STREERING to STEERING at macro name for BGX steering register.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a port is added to a team, its initial state is derived
from netif_carrier_ok rather than netif_oper_up.
If it is carrier up but operationally down at the time of being
added, the port state.linkup will be set prematurely.
port state.linkup should be set consistently using
netif_oper_up rather than netif_carrier_ok.
Fixes: f1d22a1e05 ("team: account for oper state")
Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8153-BD is used in Dell DA300 type-C dongle.
Added RTL8153-BD support to activate MAC address pass through on DA300.
Apply correction on previously submitted patch in net.git tree.
Signed-off-by: David Chen <david.chen7@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running Docker with userns isolation e.g. --userns-remap="default"
and spawning up some containers with CAP_NET_ADMIN under this realm, I
noticed that link changes on ipvlan slave device inside that container
can affect all devices from this ipvlan group which are in other net
namespaces where the container should have no permission to make changes
to, such as the init netns, for example.
This effectively allows to undo ipvlan private mode and switch globally to
bridge mode where slaves can communicate directly without going through
hostns, or it allows to switch between global operation mode (l2/l3/l3s)
for everyone bound to the given ipvlan master device. libnetwork plugin
here is creating an ipvlan master and ipvlan slave in hostns and a slave
each that is moved into the container's netns upon creation event.
* In hostns:
# ip -d a
[...]
8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.0.1/32 scope link cilium_host
valid_lft forever preferred_lft forever
[...]
* Spawn container & change ipvlan mode setting inside of it:
# docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
# docker exec -ti client ip -d a
[...]
10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
valid_lft forever preferred_lft forever
# docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
# docker exec -ti client ip -d a
[...]
10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
valid_lft forever preferred_lft forever
* In hostns (mode switched to l2):
# ip -d a
[...]
8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.0.1/32 scope link cilium_host
valid_lft forever preferred_lft forever
[...]
Same l3 -> l2 switch would also happen by creating another slave inside
the container's network namespace when specifying the existing cilium0
link to derive the actual (bond0) master:
# docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
# docker exec -ti client ip -d a
[...]
2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
valid_lft forever preferred_lft forever
* In hostns:
# ip -d a
[...]
8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.41.0.1/32 scope link cilium_host
valid_lft forever preferred_lft forever
[...]
One way to mitigate it is to check CAP_NET_ADMIN permissions of
the ipvlan master device's ns, and only then allow to change
mode or flags for all devices bound to it. Above two cases are
then disallowed after the patch.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hb_timer might not start at all for a particular transport because its
start is conditional. In a result a node is not sending heartbeats.
Function sctp_transport_reset_hb_timer has two roles:
- initial start of hb_timer for a given transport,
- update expire date of hb_timer for a given transport.
The function is optimized to update timer's expire only if it is before
a new calculated one but this comparison is invalid for a timer which
has not yet started. Such a timer has expire == 0 and if a new expire
value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
fail and timer will not start resulting in no heartbeat packets send by
the node.
This was found when association was initialized within first 5 mins
after system boot due to jiffies init value which is near to MAX_JIFFIES.
Test kernel version: 4.9.154 (ARCH=arm)
hb_timer.expire = 0; //initialized, not started timer
new_expire = MAX_JIFFIES / 2 + 2; //or more
time_before(hb_timer.expire, new_expire) == false
Fixes: ba6f5e33bd ("sctp: avoid refreshing heartbeat timer too often")
Reported-by: Marcin Stojek <marcin.stojek@nokia.com>
Tested-by: Marcin Stojek <marcin.stojek@nokia.com>
Signed-off-by: Maciej Kwiecien <maciej.kwiecien@nokia.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Dave Airlie:
"This contains a single i915 tiled display fix, and a set of
amdgpu/radeon fixes.
i915:
- tiled display fix
amdgpu/radeon:
- runtime PM fix
- bulk moves disable (fix is too large for 5.0)
- a set of display fixes that are all cc'ed stable so we didn't want
to leave them until -next"
* tag 'drm-fixes-2019-02-22' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: disable bulk moves for now
drm/amd/display: set clocks to 0 on suspend on dce80
drm/amd/display: fix optimize_bandwidth func pointer for dce80
drm/amd/display: Fix negative cursor pos programming
drm/i915/fbdev: Actually configure untiled displays
drm/amd/display: Raise dispclk value for dce11
drm/amd/display: Fix MST reboot/poweroff sequence
drm/amdgpu: Update sdma golden setting for vega20
drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
Pull rdma fixes from Jason Gunthorpe:
"Small set of three regression fixing patches, things are looking
pretty good here.
- Fix cxgb4 to work again with non-4k page sizes
- NULL pointer oops in SRP during sg_reset"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
iw_cxgb4: cq/qp mask depends on bar2 pages in a host page
cxgb4: Export sge_host_page_size to ulds
RDMA/srp: Rework SCSI device reset handling
Previously, commit 7dcd575520 ("x86/kvm/mmu: check if tdp/shadow
MMU reconfiguration is needed") offered some optimization to avoid
the unnecessary reconfiguration. Yet one scenario is broken - when
cpuid changes VM's maximum physical address width, reconfiguration
is needed to reset the reserved bits. Also, the TDP may need to
reset its shadow_root_level when this value is changed.
To fix this, a new field, maxphyaddr, is introduced in the extended
role structure to keep track of the configured guest physical address
width.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Previously, 'commit 372fddf709 ("x86/mm: Introduce the 'no5lvl' kernel
parameter")' cleared X86_FEATURE_LA57 in boot_cpu_data, if Linux chooses
to not run in 5-level paging mode. Yet boot_cpu_data is queried by
do_cpuid_ent() as the host capability later when creating vcpus, and Qemu
will not be able to detect this feature and create VMs with LA57 feature.
As discussed earlier, VMs can still benefit from extended linear address
width, e.g. to enhance features like ASLR. So we would like to fix this,
by return the true hardware capability when Qemu queries.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 14c07ad89f ("x86/kvm/mmu: introduce guest_mmu") brought one subtle
change: previously, when switching back from L2 to L1, we were resetting
MMU hooks (like mmu->get_cr3()) in kvm_init_mmu() called from
nested_vmx_load_cr3() and now we do that in nested_ept_uninit_mmu_context()
when we re-target vcpu->arch.mmu pointer.
The change itself looks logical: if nested_ept_init_mmu_context() changes
something than nested_ept_uninit_mmu_context() restores it back. There is,
however, one thing: the following call chain:
nested_vmx_load_cr3()
kvm_mmu_new_cr3()
__kvm_mmu_new_cr3()
fast_cr3_switch()
cached_root_available()
now happens with MMU hooks pointing to the new MMU (root MMU in our case)
while previously it was happening with the old one. cached_root_available()
tries to stash current root but it is incorrect to read current CR3 with
mmu->get_cr3(), we need to use old_mmu->get_cr3() which in case we're
switching from L2 to L1 is guest_mmu. (BTW, in shadow page tables case this
is a non-issue because we don't switch MMU).
While we could've tried to guess that we're switching between MMUs and call
the right ->get_cr3() from cached_root_available() this seems to be overly
complicated. Instead, just stash the corresponding CR3 when setting
root_hpa and make cached_root_available() use the stashed value.
Fixes: 14c07ad89f ("x86/kvm/mmu: introduce guest_mmu")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
syzbot hit the 'BUG_ON(index_key->desc_len == 0);' in __key_link_begin()
called from construct_alloc_key() during sys_request_key(), because the
length of the key description was never calculated.
The problem is that we rely on ->desc_len being initialized by
search_process_keyrings(), specifically by search_nested_keyrings().
But, if the process isn't subscribed to any keyrings that never happens.
Fix it by always initializing keyring_index_key::desc_len as soon as the
description is set, like we already do in some places.
The following program reproduces the BUG_ON() when it's run as root and
no session keyring has been installed. If it doesn't work, try removing
pam_keyinit.so from /etc/pam.d/login and rebooting.
#include <stdlib.h>
#include <unistd.h>
#include <keyutils.h>
int main(void)
{
int id = add_key("keyring", "syz", NULL, 0, KEY_SPEC_USER_KEYRING);
keyctl_setperm(id, KEY_OTH_WRITE);
setreuid(5000, 5000);
request_key("user", "desc", "", id);
}
Reported-by: syzbot+ec24e95ea483de0a24da@syzkaller.appspotmail.com
Fixes: b2a4df200d ("KEYS: Expand the capacity of a keyring")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.morris@microsoft.com>
Align the payload of "user" and "logon" keys so that users of the
keyrings service can access it as a struct that requires more than
2-byte alignment. fscrypt currently does this which results in the read
of fscrypt_key::size being misaligned as it needs 4-byte alignment.
Align to __alignof__(u64) rather than __alignof__(long) since in the
future it's conceivable that people would use structs beginning with
u64, which on some platforms would require more than 'long' alignment.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 2aa349f6e3 ("[PATCH] Keys: Export user-defined keyring operations")
Fixes: 88bd6ccdcd ("ext4 crypto: add encryption key management facilities")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Pull clk fixes from Stephen Boyd:
"A few more fixes for clk drivers causing regressions this release.
Two Allwinner index fixes for A31 and V3 and two Microchip AT91 fixes
for an incorrect clk parent linkage and a miscalculated number of
clks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: fix masterck name
clk: at91: fix at91sam9x5 peripheral clock number
clk: sunxi: A31: Fix wrong AHB gate number
clk: sunxi-ng: v3s: Fix TCON reset de-assert bit
trie_delete_elem() was deleting an entry even though it was not matching
if the prefixlen was correct. This patch adds a check on matchlen.
Reproducer:
$ sudo bpftool map create /sys/fs/bpf/mylpm type lpm_trie key 8 value 1 entries 128 name mylpm flags 1
$ sudo bpftool map update pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 aa bb cc dd value hex 01
$ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
key: 10 00 00 00 aa bb cc dd value: 01
Found 1 element
$ sudo bpftool map delete pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 ff ff ff ff
$ echo $?
0
$ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
Found 0 elements
A similar reproducer is added in the selftests.
Without the patch:
$ sudo ./tools/testing/selftests/bpf/test_lpm_map
test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed.
Aborted
With the patch: test_lpm_map runs without errors.
Fixes: e454cf5958 ("bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE")
Cc: Craig Gallek <kraig@google.com>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
mvebu fixes for 5.0 (part 2)
Fix PHY reset signal on clearfog gt 8K (Armada 8040 based)
Fix NAND description on Armada XP boards which was broken since a few
release
* tag 'mvebu-fixes-5.0-2' of git://git.infradead.org/linux-mvebu:
arm64: dts: clearfog-gt-8k: fix SGMII PHY reset signal
ARM: dts: armada-xp: fix Armada XP boards NAND description
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Two am335x ethernet phy mode fixes for v5.0-rc cycle
Recent changes with commit cd28d1d6e5: ("net: phy: at803x: Disable phy
delay for RGMII mode") broke Ethernet on am335x-evmsk, and turns out some
device driver fixes are needed.
Even without the driver fixes, am335x needs to run in rgmii-id mode instead
rgmii-txid mode. Things have been working based on luck as the broken driver
has been configuring rgmii-id mode. Let's fix that as that way things work
as they're supposed to work from hardware wiring point of view.
* tag 'omap-for-v5.0/fixes-rc7-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x-evm: Fix PHY mode for ethernet
ARM: dts: am335x-evmsk: Fix PHY mode for ethernet
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Forwarded packets enter the tx path through ieee80211_add_pending_skb,
which skips the ieee80211_skb_resize call.
Fixes WARN_ON in ccmp_encrypt_skb and resulting packet loss.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we did the original tests for the optimal value of sk_pacing_shift, we
came up with 6 ms of buffering as the default. Sadly, 6 is not a power of
two, so when picking the shift value I erred on the size of less buffering
and picked 4 ms instead of 8. This was probably wrong; those 2 ms of extra
buffering makes a larger difference than I thought.
So, change the default pacing shift to 7, which corresponds to 8 ms of
buffering. The point of diminishing returns really kicks in after 8 ms, and
so having this as a default should cut down on the need for extensive
per-device testing and overrides needed in the drivers.
Cc: stable@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The SHA512 code we adopted from the OpenSSL project uses a rather
peculiar way to take the address of the round constant table: it
takes the address of the sha256_block_data_order() routine, and
substracts a constant known quantity to arrive at the base of the
table, which is emitted by the same assembler code right before
the routine's entry point.
However, recent versions of binutils have helpfully changed the
behavior of references emitted via an ADR instruction when running
in Thumb2 mode: it now takes the Thumb execution mode bit into
account, which is bit 0 af the address. This means the produced
table address also has bit 0 set, and so we end up with an address
value pointing 1 byte past the start of the table, which results
in crashes such as
Unable to handle kernel paging request at virtual address bf825000
pgd = 42f44b11
[bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033
sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433
r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a
r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b
r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de
Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
Stack: (0xebc8bbe8 to 0xebc8c000)
...
unwind: Unknown symbol address bf820bca
unwind: Index not found bf820bca
Code: 441a ea80 40f9 440a (f85e) 3b04
---[ end trace e560cce92700ef8a ]---
Given that this affects older kernels as well, in case they are built
with a recent toolchain, apply a minimal backportable fix, which is
to emit another non-code label at the start of the routine, and
reference that instead. (This is similar to the current upstream state
of this file in OpenSSL)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The SHA256 code we adopted from the OpenSSL project uses a rather
peculiar way to take the address of the round constant table: it
takes the address of the sha256_block_data_order() routine, and
substracts a constant known quantity to arrive at the base of the
table, which is emitted by the same assembler code right before
the routine's entry point.
However, recent versions of binutils have helpfully changed the
behavior of references emitted via an ADR instruction when running
in Thumb2 mode: it now takes the Thumb execution mode bit into
account, which is bit 0 af the address. This means the produced
table address also has bit 0 set, and so we end up with an address
value pointing 1 byte past the start of the table, which results
in crashes such as
Unable to handle kernel paging request at virtual address bf825000
pgd = 42f44b11
[bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033
sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433
r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a
r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b
r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de
Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
Stack: (0xebc8bbe8 to 0xebc8c000)
...
unwind: Unknown symbol address bf820bca
unwind: Index not found bf820bca
Code: 441a ea80 40f9 440a (f85e) 3b04
---[ end trace e560cce92700ef8a ]---
Given that this affects older kernels as well, in case they are built
with a recent toolchain, apply a minimal backportable fix, which is
to emit another non-code label at the start of the routine, and
reference that instead. (This is similar to the current upstream state
of this file in OpenSSL)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 1358c13a48 ("crypto: ccree - fix resume race condition on init")
was missing a "inline" qualifier for stub function used when CONFIG_PM
is not set causing a build warning.
Fixes: 1358c13a48 ("crypto: ccree - fix resume race condition on init")
Cc: stable@kernel.org # v4.20
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
clang warns about overflowing the data[] member in the struct pnpipehdr:
net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
if (hdr->data[4] == PEP_IND_READY)
^ ~
include/net/phonet/pep.h:66:3: note: array 'data' declared here
u8 data[1];
Using a flexible array member at the end of the struct avoids the
warning, but since we cannot have a flexible array member inside
of the union, each index now has to be moved back by one, which
makes it a little uglier.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-02-21
1) Don't do TX bytes accounting for the esp trailer when sending
from a request socket as this will result in an out of bounds
memory write. From Martin Willi.
2) Destroy xfrm_state synchronously on net exit path to
avoid nested gc flush callbacks that may trigger a
warning in xfrm6_tunnel_net_exit(). From Cong Wang.
3) Do an unconditionally clone in pfkey_broadcast_one()
to avoid a race when freeing the skb.
From Sean Tranchetti.
4) Fix inbound traffic via XFRM interfaces across network
namespaces. We did the lookup for interfaces and policies
in the wrong namespace. From Tobias Brunner.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi says:
====================
report erspan version field just for erspan tunnels
Do not report erspan_version to userpsace for non erspan tunnels.
Report IFLA_GRE_ERSPAN_INDEX only for erspan version 1 in
ip6gre_fill_info
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Report erspan version field to userspace in ip6gre_fill_info just for
erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for
erspan version 1.
The issue can be triggered with the following reproducer:
$ip link add name gre6 type ip6gre local 2001::1 remote 2002::2
$ip link set gre6 up
$ip -d link sh gre6
14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0
ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64
Fixes: 94d7d8f292 ("ip6_gre: add erspan v2 support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report erspan version field to userspace in ipgre_fill_info just for
erspan tunnels. The issue can be triggered with the following reproducer:
$ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1
$ip link set dev gre1 up
$ip -d link sh gre1
13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0
gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1
Fixes: f551c91de2 ("net: erspan: introduce erspan v2 for ip_gre")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bit bigger than normal for this week due to fixes for some long
standing display issues that are bound for stable. These changes would
be going to stable anyway, so I figured it was better via 5.0 than 5.1.
- Several display fixes
- Fix PX systems due to core changes in runtime pm
- Disable bulk moves. They are fixed in 5.1, but fix is too invasive for 5.0
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220225715.3240-1-alexander.deucher@amd.com
The first release of core4 (0x54) was dual issue only (HS4x).
Newer releases allow hardware to be configured as single issue (HS3x)
or dual issue.
Prevent accessing a HS4x only aux register in HS3x, which otherwise
leads to illegal instruction exceptions
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
GSO packets with vnet_hdr must conform to a small set of gso_types.
The below commit uses flow dissection to drop packets that do not.
But it has false positives when the skb is not fully initialized.
Dissection needs skb->protocol and skb->network_header.
Infer skb->protocol from gso_type as the two must agree.
SKB_GSO_UDP can use both ipv4 and ipv6, so try both.
Exclude callers for which network header offset is not known.
Fixes: d5be7f632b ("net: validate untrusted gso packets without csum offload")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tung Nguyen says:
====================
tipc: improvement for wait and wakeup
Some improvements for tipc_wait_for_xzy().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit replaces schedule_timeout() with wait_woken()
in function tipc_wait_for_rcvmsg(). wait_woken() uses
memory barriers in its implementation to avoid potential
race condition when putting a process into sleeping state
and then waking it up.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 844cf763fb ("tipc: make macro tipc_wait_for_cond() smp safe")
replaced finish_wait() with remove_wait_queue() but still used
prepare_to_wait(). This causes unnecessary conditional
checking before adding to wait queue in prepare_to_wait().
This commit replaces prepare_to_wait() with add_wait_queue()
as the pair function with remove_wait_queue().
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a subtle PACKET_ORIGDEV regression which was a side
effect of fixes introduced by:
6a9e461f6f bonding: pass link-local packets to bonding master also.
... to:
b89f04c61e bonding: deliver link-local packets with skb->dev set to link that packets arrived on
While 6a9e461f6f restored pre-b89f04c61efe presence of link-local
packets on bonding masters (which is required e.g. by linux bridges
participating in spanning tree or needed for lab-like setups created
with group_fwd_mask) it also caused the originating device
information to be lost due to cloning.
Maciej Żenczykowski proposed another solution that doesn't require
packet cloning and retains original device information - instead of
returning RX_HANDLER_PASS for all link-local packets it's now limited
only to packets from inactive slaves.
At the same time, packets passed to bonding masters retain correct
information about the originating device and PACKET_ORIGDEV can be used
to determine it.
This elegantly solves all issues so far:
- link-local packets that were removed from bonding masters
- LLDP daemons being forced to explicitly bind to slave interfaces
- PACKET_ORIGDEV having no effect on bond interfaces
Fixes: 6a9e461f6f (bonding: pass link-local packets to bonding master also.)
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similiar to commit e94cd8113c ("net: remove MTU limits for dummy and
ifb device"), MTU is irrelevant for VRF device. We init it as 64K while
limit it to [68, 1500] may make users feel confused.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The listed address for the CAIF maintainer bounces with
"553 5.3.0 <dmitry.tarnyagin@lockless.no>... No such user here", and the
only existing email address of the maintainer in git history hasn't
responded in a week.
Therefore, remove the listed maintainer and mark CAIF as orphan.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-02-21
This series contains fixes to ixgbe and i40e.
Majority of the fixes are to resolve XDP issues found in both drivers,
there is only one fix which is not XDP related. That one fix resolves
an issue seen on older 10GbE devices, where UDP traffic was either being
dropped or being transmitted out of order when the bit to enable L3/L4
filtering for transmit switched packets is enabled on older devices that
did not support this option.
Magnus fixes an XDP issue for both ixgbe and i40e, where receive rings
are created but no buffers are allocated for AF_XDP in zero-copy mode,
so no packets can be received and no interrupts will be generated so
that NAPI poll function that allocates buffers to the rings will never
get executed.
Björn fixes a race in XDP xmit ring cleanup for i40e, where
ndo_xdp_xmit() must be taken into consideration. Added a
synchronize_rcu() to wait for napi(s) before clearing the queue.
Jan fixes a ixgbe AF_XDP zero-copy transmit issue which can cause a
reset to be triggered, so add a check to ensure that netif carrier is
'ok' before trying to transmit packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Retire the parisc-linux.org email domain and provide alternative email
addresses for the remaining users, as agreed upon with them.
Signed-off-by: Helge Deller <deller@gmx.de>
An issue has been found while testing zero-copy XDP that
causes a reset to be triggered. As it takes some time to
turn the carrier on after setting zc, and we already
start trying to transmit some packets, watchdog considers
this as an erroneous state and triggers a reset.
Don't do any work if netif carrier is not OK.
Fixes: 8221c5eba8 (ixgbe: add AF_XDP zero-copy Tx support)
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit 910cd32e55 ("parisc: Fix and enable seccomp filter support")
introduced a regression in ptrace-based syscall tampering: when tracer
changes syscall number to -1, the kernel fails to initialize %r28 with
-ENOSYS and subsequently fails to return the error code of the failed
syscall to userspace.
This erroneous behaviour could be observed with a simple strace syscall
fault injection command which is expected to print something like this:
$ strace -a0 -ewrite -einject=write:error=enospc echo hello
write(1, "hello\n", 6) = -1 ENOSPC (No space left on device) (INJECTED)
write(2, "echo: ", 6) = -1 ENOSPC (No space left on device) (INJECTED)
write(2, "write error", 11) = -1 ENOSPC (No space left on device) (INJECTED)
write(2, "\n", 1) = -1 ENOSPC (No space left on device) (INJECTED)
+++ exited with 1 +++
After commit 910cd32e55 it loops printing
something like this instead:
write(1, "hello\n", 6../strace: Failed to tamper with process 12345: unexpectedly got no error (return value 0, error 0)
) = 0 (INJECTED)
This bug was found by strace test suite.
Fixes: 910cd32e55 ("parisc: Fix and enable seccomp filter support")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
When the driver clears the XDP xmit ring due to re-configuration or
teardown, in-progress ndo_xdp_xmit must be taken into consideration.
The ndo_xdp_xmit function is typically called from a NAPI context that
the driver does not control. Therefore, we must be careful not to
clear the XDP ring, while the call is on-going. This patch adds a
synchronize_rcu() to wait for napi(s) (preempt-disable regions and
softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY
flag is checked in the ndo_xdp_xmit implementation to avoid touching
the XDP xmit queue during re-configuration.
Fixes: d9314c474d ("i40e: add support for XDP_REDIRECT")
Fixes: 123cecd427 ("i40e: added queue pair disable/enable functions")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The default value of ARCH_SLAB_MINALIGN in "include/linux/slab.h" is
"__alignof__(unsigned long long)" which for ARC unexpectedly turns out
to be 4. This is not a compiler bug, but as defined by ARC ABI [1]
Thus slab allocator would allocate a struct which is 32-bit aligned,
which is generally OK even if struct has long long members.
There was however potetial problem when it had any atomic64_t which
use LLOCKD/SCONDD instructions which are required by ISA to take
64-bit addresses. This is the problem we ran into
[ 4.015732] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[ 4.167881] Misaligned Access
[ 4.172356] Path: /bin/busybox.nosuid
[ 4.176004] CPU: 2 PID: 171 Comm: rm Not tainted 4.19.14-yocto-standard #1
[ 4.182851]
[ 4.182851] [ECR ]: 0x000d0000 => Check Programmer's Manual
[ 4.190061] [EFA ]: 0xbeaec3fc
[ 4.190061] [BLINK ]: ext4_delete_entry+0x210/0x234
[ 4.190061] [ERET ]: ext4_delete_entry+0x13e/0x234
[ 4.202985] [STAT32]: 0x80080002 : IE K
[ 4.207236] BTA: 0x9009329c SP: 0xbe5b1ec4 FP: 0x00000000
[ 4.212790] LPS: 0x9074b118 LPE: 0x9074b120 LPC: 0x00000000
[ 4.218348] r00: 0x00000040 r01: 0x00000021 r02: 0x00000001
...
...
[ 4.270510] Stack Trace:
[ 4.274510] ext4_delete_entry+0x13e/0x234
[ 4.278695] ext4_rmdir+0xe0/0x238
[ 4.282187] vfs_rmdir+0x50/0xf0
[ 4.285492] do_rmdir+0x9e/0x154
[ 4.288802] EV_Trap+0x110/0x114
The fix is to make sure slab allocations are 64-bit aligned.
Do note that atomic64_t is __attribute__((aligned(8)) which means gcc
does generate 64-bit aligned references, relative to beginning of
container struct. However the issue is if the container itself is not
64-bit aligned, atomic64_t ends up unaligned which is what this patch
ensures.
[1] https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/wiki/files/ARCv2_ABI.pdf
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked changelog, added dependency on LL64+LLSC]
After reworking U-boot args handling code and adding paranoid
arguments check we can eliminate CONFIG_ARC_UBOOT_SUPPORT and
enable uboot support unconditionally.
For JTAG case we can assume that core registers will come up
reset value of 0 or in worst case we rely on user passing
'-on=clear_regs' to Metaware debugger.
Cc: stable@vger.kernel.org
Tested-by: Corentin LABBE <clabbe@baylibre.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Handle U-boot arguments paranoidly:
* don't allow to pass unknown tag.
* try to use external device tree blob only if corresponding tag
(TAG_DTB) is set.
* don't check uboot_tag if kernel build with no ARC_UBOOT_SUPPORT.
NOTE:
If U-boot args are invalid we skip them and try to use embedded device
tree blob. We can't panic on invalid U-boot args as we really pass
invalid args due to bug in U-boot code.
This happens if we don't provide external DTB to U-boot and
don't set 'bootargs' U-boot environment variable (which is default
case at least for HSDK board) In that case we will pass
{r0 = 1 (bootargs in r2); r1 = 0; r2 = 0;} to linux which is invalid.
While I'm at it refactor U-boot arguments handling code.
Cc: stable@vger.kernel.org
Tested-by: Corentin LABBE <clabbe@baylibre.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
There's a hardware bug which affects the HSDK platform, triggered by
micro-ops for auto-saving regfile on taken interrupt. The workaround is
to inhibit autosave.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARCv2 optimized memcpy uses PREFETCHW instruction for prefetching the
next cache line but doesn't ensure that the line is not past the end of
the buffer. PRETECHW changes the line ownership and marks it dirty,
which can cause data corruption if this area is used for DMA IO.
Fix the issue by avoiding the PREFETCHW. This leads to performance
degradation but it is OK as we'll introduce new memcpy implementation
optimized for unaligned memory access using.
We also cut off all PREFETCH instructions at they are quite useless
here:
* we call PREFETCH right before LOAD instruction call.
* we copy 16 or 32 bytes of data (depending on CONFIG_ARC_HAS_LL64)
in a main logical loop. so we call PREFETCH 4 times (or 2 times)
for each L1 cache line (in case of 64B L1 cache Line which is
default case). Obviously this is not optimal.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
It is currently done in arc_init_IRQ() which might be too late
considering gcc 7.3.1 onwards (GNU 2018.03) generates unaligned
memory accesses by default
Cc: stable@vger.kernel.org #4.4+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: rewrote changelog]
When the RX rings are created they are also populated with buffers so
that packets can be received. Usually these are kernel buffers, but
for AF_XDP in zero-copy mode, these are user-space buffers and in this
case the application might not have sent down any buffers to the
driver at this point. And if no buffers are allocated at ring creation
time, no packets can be received and no interrupts will be generated so
the NAPI poll function that allocates buffers to the rings will never
get executed.
To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once after
an XDP program has loaded and once after the umem is registered. This
take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.
Fixes: d0bcacd0a1 ("ixgbe: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the RX rings are created they are also populated with buffers
so that packets can be received. Usually these are kernel buffers,
but for AF_XDP in zero-copy mode, these are user-space buffers and
in this case the application might not have sent down any buffers
to the driver at this point. And if no buffers are allocated at ring
creation time, no packets can be received and no interrupts will be
generated so the NAPI poll function that allocates buffers to the
rings will never get executed.
To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once
after an XDP program has loaded and once after the umem is registered.
This take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.
Fixes: 0a714186d3 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The enabling L3/L4 filtering for transmit switched packets for all
devices caused unforeseen issue on older devices when trying to send UDP
traffic in an ordered sequence. This bit was originally intended for X550
devices, which supported this feature, so limit the scope of this bit to
only X550 devices.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
smc_poll() returns with mask bit EPOLLPRI if the connection urg_state
is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals
EPOLLPRI errorneously if called in state SMC_INIT before the connection
is created, for instance in a non-blocking connect scenario.
This patch switches to non-zero values for the urg states.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Fixes: de8474eb9d ("net/smc: urgent data support")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
ipv6: route: enforce RCU protection for fib6_info->from
This series addresses a couple of RCU left-over dating back to rt6_info->from
conversion to RCU
v1 -> v2:
- fix a possible race in patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a RCU critical section around rt6_info->from deference, and
proper annotation.
Fixes: 4ed591c8ab ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must access rt6_info->from under RCU read lock: move the
dereference under such lock, with proper annotation.
v1 -> v2:
- avoid using multiple, racy, fetch operations for rt->from
Fixes: a68886a691 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ceph fixes from Ilya Dryomov:
"Two bug fixes for old issues, both marked for stable"
* tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client:
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
libceph: handle an empty authorize reply
Pull late arm64 fixes from Will Deacon:
"Three small arm64 fixes for 5.0.
They fix a build breakage with clang introduced in 4.20, an oversight
in our sigframe restoration relating to the SSBS bit and a boot fix
for systems with newer revisions of our interrupt controller.
Summary:
- Fix handling of PSTATE.SSBS bit in sigreturn()
- Fix version checking of the GIC during early boot
- Fix clang builds failing due to use of NEON in the crypto code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax GIC version check during early boot
arm64/neon: Disable -Wincompatible-pointer-types when building with Clang
arm64: fix SSBS sanitization
Merge misc fixes from Andrew Morton:
"23 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (23 commits)
mm, memory_hotplug: fix off-by-one in is_pageblock_removable
mm: don't let userspace spam allocations warnings
slub: fix a crash with SLUB_DEBUG + KASAN_SW_TAGS
kasan, slab: remove redundant kasan_slab_alloc hooks
kasan, slab: make freelist stored without tags
kasan, slab: fix conflicts with CONFIG_HARDENED_USERCOPY
kasan: prevent tracing of tags.c
kasan: fix random seed generation for tag-based mode
tmpfs: fix link accounting when a tmpfile is linked in
psi: avoid divide-by-zero crash inside virtual machines
mm: handle lru_add_drain_all for UP properly
mm, page_alloc: fix a division by zero error when boosting watermarks v2
mm/debug.c: fix __dump_page() for poisoned pages
proc, oom: do not report alien mms when setting oom_score_adj
slub: fix SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS
kasan, slub: fix more conflicts with CONFIG_SLAB_FREELIST_HARDENED
kasan, slub: fix conflicts with CONFIG_SLAB_FREELIST_HARDENED
kasan, slub: move kasan_poison_slab hook before page_address
kmemleak: account for tagged pointers when calculating pointer range
kasan, kmemleak: pass tagged pointers to kmemleak
...
Rong Chen has reported the following boot crash:
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
RIP: 0010:page_mapping+0x12/0x80
Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
FS: 00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
Call Trace:
__dump_page+0x14/0x2c0
is_mem_section_removable+0x24c/0x2c0
removable_show+0x87/0xa0
dev_attr_show+0x25/0x60
sysfs_kf_seq_show+0xba/0x110
seq_read+0x196/0x3f0
__vfs_read+0x34/0x180
vfs_read+0xa0/0x150
ksys_read+0x44/0xb0
do_syscall_64+0x5e/0x4a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
and bisected it down to commit efad4e475c ("mm, memory_hotplug:
is_mem_section_removable do not pass the end of a zone").
The reason for the crash is that the mapping is garbage for poisoned
(uninitialized) page. This shouldn't happen as all pages in the zone's
boundary should be initialized.
Later debugging revealed that the actual problem is an off-by-one when
evaluating the end_page. 'start_pfn + nr_pages' resp 'zone_end_pfn'
refers to a pfn after the range and as such it might belong to a
differen memory section.
This along with CONFIG_SPARSEMEM then makes the loop condition
completely bogus because a pointer arithmetic doesn't work for pages
from two different sections in that memory model.
Fix the issue by reworking is_pageblock_removable to be pfn based and
only use struct page where necessary. This makes the code slightly
easier to follow and we will remove the problematic pointer arithmetic
completely.
Link: http://lkml.kernel.org/r/20190218181544.14616-1-mhocko@kernel.org
Fixes: efad4e475c ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: <rong.a.chen@intel.com>
Tested-by: <rong.a.chen@intel.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two issues with assigning random percpu seeds right now:
1. We use for_each_possible_cpu() to iterate over cpus, but cpumask is
not set up yet at the moment of kasan_init(), and thus we only set
the seed for cpu #0.
2. A call to get_random_u32() always returns the same number and produces
a message in dmesg, since the random subsystem is not yet initialized.
Fix 1 by calling kasan_init_tags() after cpumask is set up.
Fix 2 by using get_cycles() instead of get_random_u32(). This gives us
lower quality random numbers, but it's good enough, as KASAN is meant to
be used as a debugging tool and not a mitigation.
Link: http://lkml.kernel.org/r/1f815cc914b61f3516ed4cc9bfd9eeca9bd5d9de.1550677973.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tmpfs has a peculiarity of accounting hard links as if they were
separate inodes: so that when the number of inodes is limited, as it is
by default, a user cannot soak up an unlimited amount of unreclaimable
dcache memory just by repeatedly linking a file.
But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
fd, we missed accommodating this new case in tmpfs: "df -i" shows that
an extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted. If a user repeatedly links
tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
are deleted.
Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
Fixes: f4e0c30c19 ("allow the temp files created by open() to be linked to")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Matej Kupljen <matej.kupljen@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've been seeing hard-to-trigger psi crashes when running inside VM
instances:
divide error: 0000 [#1] SMP PTI
Modules linked in: [...]
CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: events psi_clock
RIP: 0010:psi_update_stats+0x270/0x490
RSP: 0018:ffffc90001117e10 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8
RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000
RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502
FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
psi_clock+0x12/0x50
process_one_work+0x1e0/0x390
worker_thread+0x2b/0x3c0
? rescuer_thread+0x330/0x330
kthread+0x113/0x130
? kthread_create_worker_on_cpu+0x40/0x40
? SyS_exit_group+0x10/0x10
ret_from_fork+0x35/0x40
Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48
The Code-line points to `period` being 0 inside update_stats(), and we
divide by that when calculating that period's pressure percentage.
The elapsed period should never be 0. The reason this can happen is due
to an off-by-one in the idle time / missing period calculation combined
with a coarse sched_clock() in the virtual machine.
The target time for aggregation is advanced into the future on a fixed
grid to prevent clock drift. So when an aggregation runs after some idle
period, we can not just set it to "now + psi_period", but have to
calculate the downtime and advance the target time relative to itself.
However, if the aggregator was disabled exactly one psi_period (ns), we
drop one idle period in the calculation due to a > when we should do >=.
In that case, next_update will be advanced from 'now - psi_period' to
'now' when it should be moved to 'now + psi_period'. The run finishes
with last_update == next_update == sched_clock().
With hardware clocks, this exact nanosecond match isn't likely in the
first place; but if it does happen, the clock will still have moved on and
the period non-zero by the time the worker runs. A pointlessly short
period, but besides the extra work, no harm no foul. However, a slow
sched_clock() like we have on VMs might not have advanced either by the
time the worker runs again. And when we calculate the elapsed period, the
result, our pressure divisor, will be 0. Ouch.
Fix this by correctly handling the situation when the elapsed time between
aggregation runs is precisely two periods, and advance the expiration
timestamp correctly to period into the future.
Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Łukasz Siudut <lsiudut@fb.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since for_each_cpu(cpu, mask) added by commit 2d3854a37e
("cpumask: introduce new API, without changing anything") did not
evaluate the mask argument if NR_CPUS == 1 due to CONFIG_SMP=n,
lru_add_drain_all() is hitting WARN_ON() at __flush_work() added by
commit 4d43d395fe ("workqueue: Try to catch flush_work() without
INIT_WORK().") by unconditionally calling flush_work() [1].
Workaround this issue by using CONFIG_SMP=n specific lru_add_drain_all
implementation. There is no real need to defer the implementation to
the workqueue as the draining is going to happen on the local cpu. So
alias lru_add_drain_all to lru_add_drain which does all the necessary
work.
[akpm@linux-foundation.org: fix various build warnings]
[1] https://lkml.kernel.org/r/18a30387-6aa5-6123-e67c-57579ecc3f38@roeck-us.net
Link: http://lkml.kernel.org/r/20190213124334.GH4525@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Evaluating page_mapping() on a poisoned page ends up dereferencing junk
and making PF_POISONED_CHECK() considerably crashier than intended:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
Mem abort info:
ESR = 0x96000005
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c2f6ac38
[0000000000000006] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 491 Comm: bash Not tainted 5.0.0-rc1+ #1
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : page_mapping+0x18/0x118
lr : __dump_page+0x1c/0x398
Process bash (pid: 491, stack limit = 0x000000004ebd4ecd)
Call trace:
page_mapping+0x18/0x118
__dump_page+0x1c/0x398
dump_page+0xc/0x18
remove_store+0xbc/0x120
dev_attr_store+0x18/0x28
sysfs_kf_write+0x40/0x50
kernfs_fop_write+0x130/0x1d8
__vfs_write+0x30/0x180
vfs_write+0xb4/0x1a0
ksys_write+0x60/0xd0
__arm64_sys_write+0x18/0x20
el0_svc_common+0x94/0xf8
el0_svc_handler+0x68/0x70
el0_svc+0x8/0xc
Code: f9400401 d1000422 f240003f 9a801040 (f9400402)
---[ end trace cdb5eb5bf435cecb ]---
Fix that by not inspecting the mapping until we've determined that it's
likely to be valid. Now the above condition still ends up stopping the
kernel, but in the correct manner:
page:ffffffbf20000000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
------------[ cut here ]------------
kernel BUG at ./include/linux/mm.h:1006!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 483 Comm: bash Not tainted 5.0.0-rc1+ #3
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : remove_store+0xbc/0x120
lr : remove_store+0xbc/0x120
...
Link: http://lkml.kernel.org/r/03b53ee9d7e76cda4b9b5e1e31eea080db033396.1550071778.git.robin.murphy@arm.com
Fixes: 1c6fb1d89e ("mm: print more information about mapping in __dump_page")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_KASAN_SW_TAGS is enabled, ptr_addr might be tagged. Normally,
this doesn't cause any issues, as both set_freepointer() and
get_freepointer() are called with a pointer with the same tag. However,
there are some issues with CONFIG_SLUB_DEBUG code. For example, when
__free_slub() iterates over objects in a cache, it passes untagged
pointers to check_object(). check_object() in turns calls
get_freepointer() with an untagged pointer, which causes the freepointer
to be restored incorrectly.
Add kasan_reset_tag to freelist_ptr(). Also add a detailed comment.
Link: http://lkml.kernel.org/r/bf858f26ef32eb7bd24c665755b3aee4bc58d0e4.1550103861.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The system call, get_mempolicy() [1], passes an unsigned long *nodemask
pointer and an unsigned long maxnode argument which specifies the length
of the user's nodemask array in bits (which is rounded up). The manual
page says that if the maxnode value is too small, get_mempolicy will
return EINVAL but there is no system call to return this minimum value.
To determine this value, some programs search /proc/<pid>/status for a
line starting with "Mems_allowed:" and use the number of digits in the
mask to determine the minimum value. A recent change to the way this line
is formatted [2] causes these programs to compute a value less than
MAX_NUMNODES so get_mempolicy() returns EINVAL.
Change get_mempolicy(), the older compat version of get_mempolicy(), and
the copy_nodes_to_user() function to use nr_node_ids instead of
MAX_NUMNODES, thus preserving the defacto method of computing the minimum
size for the nodemask array and the maxnode argument.
[1] http://man7.org/linux/man-pages/man2/get_mempolicy.2.html
[2] https://lore.kernel.org/lkml/1545405631-6808-1-git-send-email-longman@redhat.com
Link: http://lkml.kernel.org/r/20190211180245.22295-1-rcampbell@nvidia.com
Fixes: 4fb8e5b89bcbbbb ("include/linux/nodemask.h: use nr_node_ids (not MAX_NUMNODES) in __nodemask_pr_numnodes()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit e2ce367488.
It turns out that the sock destructor xsk_destruct was needed after
all. The cleanup simplification broke the skb transmit cleanup path,
due to that the umem was prematurely destroyed.
The umem cannot be destroyed until all outstanding skbs are freed,
which means that we cannot remove the umem until the sk_destruct has
been called.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When rpm_resume() desactivates the autosuspend timer, it should only
try to cancel hrtimer but not wait for the handler to finish, because
both rpm_resume() and pm_suspend_timer_fn() take the power.lock.
A deadlock is possible as follows:
CPU0 CPU1
rpm_resume()
spin_lock_irqsave
pm_suspend_timer_fn()
spin_lock_irqsave
pm_runtime_deactivate_timer()
hrtimer_cancel()
It is sufficient to call hrtimer_try_to_cancel() from
pm_runtime_deactivate_timer(), because dev->power.timer_expires
reset to 0 by it, so use that function instead of hrtimer_cancel().
Fixes: 8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Sunzhaosheng Sun(Zhaosheng) <sunzhaosheng@hisilicon.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Several u->addr and u->path users are not holding any locks in
common with unix_bind(). unix_state_lock() is useless for those
purposes.
u->addr is assign-once and *(u->addr) is fully set up by the time
we set u->addr (all under unix_table_lock). u->path is also
set in the same critical area, also before setting u->addr, and
any unix_sock with ->path filled will have non-NULL ->addr.
So setting ->addr with smp_store_release() is all we need for those
"lockless" users - just have them fetch ->addr with smp_load_acquire()
and don't even bother looking at ->path if they see NULL ->addr.
Users of ->addr and ->path fall into several classes now:
1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
and u->path only if smp_load_acquire() has returned non-NULL.
2) places holding unix_table_lock. These are guaranteed that
*(u->addr) is seen fully initialized. If unix_sock is in one of the
"bound" chains, so's ->path.
3) unix_sock_destructor() using ->addr is safe. All places
that set u->addr are guaranteed to have seen all stores *(u->addr)
while holding a reference to u and unix_sock_destructor() is called
when (atomic) refcount hits zero.
4) unix_release_sock() using ->path is safe. unix_bind()
is serialized wrt unix_release() (normally - by struct file
refcount), and for the instances that had ->path set by unix_bind()
unix_release_sock() comes from unix_release(), so they are fine.
Instances that had it set in unix_stream_connect() either end up
attached to a socket (in unix_accept()), in which case the call
chain to unix_release_sock() and serialization are the same as in
the previous case, or they never get accept'ed and unix_release_sock()
is called when the listener is shut down and its queue gets purged.
In that case the listener's queue lock provides the barriers needed -
unix_stream_connect() shoves our unix_sock into listener's queue
under that lock right after having set ->path and eventual
unix_release_sock() caller picks them from that queue under the
same lock right before calling unix_release_sock().
5) unix_find_other() use of ->path is pointless, but safe -
it happens with successful lookup by (abstract) name, so ->path.dentry
is guaranteed to be NULL there.
earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Booting 4.20 on SolidRun Clearfog issues this warning with DMA API
debug enabled:
WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
[<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
[<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
[<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
[<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
[<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
[<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
[<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
[<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
[<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
[<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
...
This appears to be caused by mvneta_rx_hwbm() calling
dma_sync_single_range_for_cpu() with the wrong struct device pointer,
as the buffer manager device pointer is used to map and unmap the
buffer. Fix this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull documentation fix from Jonathan Corbet:
"A single patch from Arnd bringing some top-level docs into the 5.0
era"
* tag 'docs-5.0-fix' of git://git.lwn.net/linux:
Documentation: change linux-4.x references to 5.x
[Why]
When a dce80 asic was suspended, the clocks were not set to 0.
Upon resume, the new clock was compared to the existing clock,
they were found to be the same, and so the clock was not set.
This resulted in a blackscreen.
[How]
In atomic commit, check to see if there are any active pipes.
If no, set clocks to 0
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
If the cursor pos passed from DM is less than the plane_state->dst_rect
top left corner then the unsigned cursor pos wraps around to a large
positive number since cursor pos is a u32.
There was an attempt to guard against this in hubp1_cursor_set_position
by checking the src_x_offset and src_y_offset and offseting the
cursor hotspot within hubp1_cursor_set_position.
However, the cursor position itself is still being programmed
incorrectly as a large value.
This manifests itself visually as the cursor disappearing or containing
strange artifacts near the middle of the screen on raven.
[How]
Don't subtract the destination rect top left corner from the pos but
add it to the hotspot instead. This happens before the pos gets
passed into hubp1_cursor_set_position.
This achieves the same result but avoids the subtraction wrap around.
With this fix the original cursor programming logic can be used again.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Murton Liu <Murton.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The master clock is actually named masterck earlier in the driver. Having
"mck" in the parent list means that it can never be selected.
Fixes: 1eabdc2f9d ("clk: at91: add at91sam9x5 PMCs driver")
Fixes: a2038077de ("clk: at91: add sama5d2 PMC driver")
Fixes: 084b696bb5 ("clk: at91: add sama5d4 pmc driver")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
nck() looks at the last id in an array and unfortunately,
at91sam9x35_periphck has a sentinel, hence the id is 0 and the calculated
number of peripheral clocks is 1 instead of a maximum of 31.
Fixes: 1eabdc2f9d ("clk: at91: add at91sam9x5 PMCs driver")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
When a DSA port is added to a bridge and brought up, the resulting STP
state programmed into the hardware depends on the order that these
operations are performed. However, the Linux bridge code believes that
the port is in disabled mode.
If the DSA port is first added to a bridge and then brought up, it will
be in blocking mode. If it is brought up and then added to the bridge,
it will be in disabled mode.
This difference is caused by DSA always setting the STP mode in
dsa_port_enable() whether or not this port is part of a bridge. Since
bridge always sets the STP state when the port is added, brought up or
taken down, it is unnecessary for us to manipulate the STP state.
Apparently, this code was copied from Rocker, and the very next day a
similar fix for Rocker was merged but was not propagated to DSA. See
e47172ab7e ("rocker: put port in FORWADING state after leaving bridge")
Fixes: b73adef677 ("net: dsa: integrate with SWITCHDEV for HW bridging")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sound fixes from Takashi Iwai:
"Here are a few last-minute fixes for 5.0.
The most significant one is the OF-node refcount fix for ASoC
simple-card, which could be triggered on many boards. Another fix for
ASoC core is for the error handling in topology, while others are
device-specific fixes for Samsung and HD-audio"
* tag 'sound-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: simple-card: fixup refcount_t underflow
ASoC: topology: free created components in tplg load error
ALSA: hda/realtek: Disable PC beep in passthrough on alc285
ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
Pull pin control fixes from Linus Walleij:
"Some final pin control fixes (I hope) to round off the v5.0 pin
control development cycle.
Only driver fixes, one for stable:
- Meson8B fixup for the sdc pins
- Fix SDC tile position for Qualcomm QCS404"
* tag 'pinctrl-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
pinctrl: qcom: qcs404: Correct SDC tile
Pull GPIO fixes from Linus Walleij:
"Two GPIO fixes for the v5.0 series:
- Per-instance irqchip on the MT7621
- Avoid direction setting using pin control on MMP2"
* tag 'gpio-v5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
gpio: MT7621: use a per instance irq_chip structure
Pull MTD fixes from Boris Brezillon:
- Don't add a digit to MTD-backed nvmem device names
- Make sure powernv flash names are unique
* tag 'mtd/fixes-for-5.0-rc8' of git://git.infradead.org/linux-mtd:
mtd: powernv_flash: Fix device registration error
mtd: Use mtd->name when registering nvmem device
Pull keys fixes from James Morris:
- Handle quotas better, allowing full quota to be reached.
- Fix the creation of shortcuts in the assoc_array internal
representation when the index key needs to be an exact multiple of
the machine word size.
- Fix a dependency loop between the request_key contruction record and
the request_key authentication key. The construction record isn't
really necessary and can be dispensed with.
- Set the timestamp on a new key rather than leaving it as 0. This
would ordinarily be fine - provided the system clock is never set to
a time before 1970
* 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
keys: Timestamp new keys
keys: Fix dependency loop between construction record and auth key
assoc_array: Fix shortcut creation
KEYS: allow reaching the keys quotas exactly
Commit 482997699e ("ARM: tegra: Fix unit_address_vs_reg DTC warnings
for /memory") inadventently broke device tree ABI by adding a unit-
address to the "/memory" node because the device tree compiler flagged
the missing unit-address as a warning.
Tegra124 Chromebooks (a.k.a. Nyan) use a bootloader that relies on the
full name of the memory node in device tree being exactly "/memory". It
can be argued whether this was a good decision or not, and some other
bootloaders (such as U-Boot) do accept a unit-address in the name of the
node, but the device tree is an ABI and we can't break existing setups
just because the device tree compiler considers it bad practice to omit
the unit-address nowadays.
This partially reverts the offending commit and restores device tree ABI
compatibility.
Fixes: 482997699e ("ARM: tegra: Fix unit_address_vs_reg DTC warnings for /memory")
Reported-by: Tristan Bastian <tristan-c.bastian@gmx.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Tristan Bastian <tristan-c.bastian@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Updates to the GIC architecture allow ID_AA64PFR0_EL1.GIC to have
values other than 0 or 1. At the moment, Linux is quite strict in the
way it handles this field at early boot stage (cpufeature is fine) and
will refuse to use the system register CPU interface if it doesn't
find the value 1.
Fixes: 021f653791 ("irqchip: gic-v3: Initial support for GICv3")
Reported-by: Chase Conklin <Chase.Conklin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull networking fixes from David Miller:
1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw
Gruszka.
2) Missing memory barriers in xsk, from Magnus Karlsson.
3) rhashtable fixes in mac80211 from Herbert Xu.
4) 32-bit MIPS eBPF JIT fixes from Paul Burton.
5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens.
6) GSO validation fixes from Willem de Bruijn.
7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue.
8) More strict checks in tcp_v4_err(), from Eric Dumazet.
9) af_alg_release should NULL out the sk after the sock_put(), from Mao
Wenan.
10) Missing unlock in mac80211 mesh error path, from Wei Yongjun.
11) Missing device put in hns driver, from Salil Mehta.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
sky2: Increase D3 delay again
vhost: correctly check the return value of translate_desc() in log_used()
net: netcp: Fix ethss driver probe issue
net: hns: Fixes the missing put_device in positive leg for roce reset
net: stmmac: Fix a race in EEE enable callback
qed: Fix iWARP syn packet mac address validation.
qed: Fix iWARP buffer size provided for syn packet processing.
r8152: Add support for MAC address pass through on RTL8153-BD
mac80211: mesh: fix missing unlock on error in table_path_del()
net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"
net: crypto set sk to NULL when af_alg_release.
net: Do not allocate page fragments that are not skb aligned
mm: Use fixed constant in page_frag_alloc instead of size + 1
tcp: tcp_v4_err() should be more careful
tcp: clear icsk_backoff in tcp_write_queue_purge()
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
qmi_wwan: apply SET_DTR quirk to Sierra WP7607
net: stmmac: handle endianness in dwmac4_get_timestamp
doc: Mention MSG_ZEROCOPY implementation for UDP
mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable
...
When fail, translate_desc() returns negative value, otherwise the
number of iovs. So we should fail when the return value is negative
instead of a blindly check against zero.
Detected by CoverityScan, CID# 1442593: Control flow issues (DEADCODE)
Fixes: cc5e710759 ("vhost: log dirty page correctly")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
The visual corruption due to low display clock value.
Observed on Carrizo 4K@60Hz.
[How]
There was earlier patch for dce_update_clocks:
Adding +15% workaround also to to dce11_update_clocks
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
drm_dp_mst_topology_mgr_suspend() is added into the new reboot
sequence, which disables the UP request at the beginning.
Therefore sideband messages are blocked.
[How]
Finish MST sideband message transaction before UP request is
suppressed.
Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
According to hardware engineer, WRITE_BURST_LENGTH [9:8] in register
SDMA0_CHICKEN_BITS need to change to 3 for better performance
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Based on a similar patch from Rafael for radeon.
When using ATPX to control dGPU power, the state is not retained
across suspend and resume cycles by default. This can probably
be loosened for Hybrid Graphics (_PR3) laptops where I think the
state is properly retained.
Fixes: c62ec4610c ("PM / core: Fix direct_complete handling for devices with no callbacks")
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
On HP ProBook 4540s, if PM-runtime is enabled in the radeon driver
and the direct-complete optimization is used for the radeon device
during system-wide suspend, the system doesn't resume.
Preventing direct-complete from being used with the radeon device by
setting the DPM_FLAG_NEVER_SKIP driver flag for it makes the problem
go away, which indicates that direct-complete is not safe for the
radeon driver in general and should not be used with it (at least
for now).
This fixes a regression introduced by commit c62ec4610c
("PM / core: Fix direct_complete handling for devices with no
callbacks") which allowed direct-complete to be applied to
devices without PM callbacks (again) which in turn unlocked
direct-complete for radeon on HP ProBook 4540s.
Fixes: c62ec4610c ("PM / core: Fix direct_complete handling for devices with no callbacks")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201519
Reported-by: Ярослав Семченко <ukrkyi@gmail.com>
Tested-by: Ярослав Семченко <ukrkyi@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The PHY must add both tx and rx delay and not only on the tx clock.
The board uses AR8031_AL1A PHY where the rx delay is enabled by default,
the tx dealy is disabled.
The reason why rgmii-txid worked because the rx delay was not disabled by
the driver so essentially we ended up with rgmii-id PHY mode.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The PHY must add both tx and rx delay and not only on the tx clock.
The board uses AR8031_AL1A PHY where the rx delay is enabled by default,
the tx dealy is disabled.
The reason why rgmii-txid worked because the rx delay was not disabled by
the driver so essentially we ended up with rgmii-id PHY mode.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 3b79919946 ("ARM: dts:
armada-370-xp: update NAND node with new bindings") updated some
Marvell Armada DT description to use the new NAND controller bindings,
but did it incorrectly for a number of boards: armada-xp-gp,
armada-xp-db and armada-xp-lenovo-ix4-300d. Due to this, the NAND is
no longer detected on those platforms.
This commit fixes that by properly using the new NAND DT binding. This
commit was runtime-tested on Armada XP GP, the two other platforms are
only compile-tested.
Fixes: 3b79919946 ("ARM: dts: armada-370-xp: update NAND node with new bindings")
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
ASoC: Fixes for v5.0
A few small fixes, a driver fix for Samsung, a fix for refcounting of
of_nodes in the simple-card driver that triggered on a lot of systems
and a fix for topology error handling.
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Follow up patch to fix a compilation warning in a recent IPVS fix:
098e13f5b2 ("ipvs: fix dependency on nf_defrag_ipv6").
2) Bogus ENOENT error on flush after rule deletion in the same batch,
reported by Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commit below has introduced a bug in netcp driver that causes
the ethss driver probe failure and thus break the networking function
on K2 SoCs such as K2HK, K2L, K2E etc. This patch fixes the issue to
restore networking on the above SoCs.
Fixes: 21c328dcec ("net: ethernet: Convert to using %pOFn instead of device_node.name")
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the missing device reference release-after-use in
the positive leg of the roce reset API of the HNS DSAF.
Fixes: c969c6e7ab ("net: hns: Fix object reference leaks in hns_dsaf_roce_reset()")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for 5.0
Hopefully the last set of fixes for 5.0, only fix this time.
mt76
* fix regression with resume on mt76x0u USB devices
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We are saving the status of EEE even before we try to enable it. This
leads to a race with XMIT function that tries to arm EEE timer before we
set it up.
Fix this by only saving the EEE parameters after all operations are
performed with success.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: d765955d2a ("stmmac: add the Energy Efficient Ethernet support")
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon says:
====================
qed: iWARP - fix some syn related issues.
This series fixes two bugs related to iWARP syn processing flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ll2 forwards all syn packets to the driver without validating the mac
address. Add validation check in the driver's iWARP listener flow and drop
the packet if it isn't intended for the device.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assumption that the maximum size of a syn packet is 128 bytes
is wrong. Tunneling headers were not accounted for.
Allocate buffers large enough for mtu.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The compound IOMMU group rework moved iommu_register_group() together
in pnv_pci_ioda_setup_iommu_api() (which is a part of
ppc_md.pcibios_fixup). As the result, pnv_ioda_setup_bus_iommu_group()
does not create groups any more, it only adds devices to groups.
This works fine for boot time devices. However IOMMU groups for
SRIOV's VFs were added by pnv_ioda_setup_bus_iommu_group() so this got
broken: pnv_tce_iommu_bus_notifier() expects a group to be registered
for VF and it is not.
This adds missing group registration and adds a NULL pointer check
into the bus notifier so we won't crash if there is no group, although
it is not expected to happen now because of the change above.
Example oops seen prior to this patch:
$ echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc0000000004a6018
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
CPU: 46 PID: 7006 Comm: bash Not tainted 4.15-ish
NIP: c0000000004a6018 LR: c0000000004a6014 CTR: 0000000000000000
REGS: c000008fc876b400 TRAP: 0300 Not tainted (4.15-ish)
MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CFAR: c000000000d0be20 DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1
...
NIP sysfs_do_create_link_sd.isra.0+0x68/0x150
LR sysfs_do_create_link_sd.isra.0+0x64/0x150
Call Trace:
pci_dev_type+0x0/0x30 (unreliable)
iommu_group_add_device+0x8c/0x600
iommu_add_device+0xe8/0x180
pnv_tce_iommu_bus_notifier+0xb0/0xf0
notifier_call_chain+0x9c/0x110
blocking_notifier_call_chain+0x64/0xa0
device_add+0x524/0x7d0
pci_device_add+0x248/0x450
pci_iov_add_virtfn+0x294/0x3e0
pci_enable_sriov+0x43c/0x580
mlx5_core_sriov_configure+0x15c/0x2f0 [mlx5_core]
sriov_numvfs_store+0x180/0x240
dev_attr_store+0x3c/0x60
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1ac/0x240
__vfs_write+0x3c/0x70
vfs_write+0xd8/0x220
SyS_write+0x6c/0x110
system_call+0x58/0x6c
Fixes: 0bd971676e ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reported-by: Santwana Samantray <santwana.samantray@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 8099b047ec ("exec: load_script: don't blindly truncate
shebang string") was trying to protect against a confused exec of a
truncated interpreter path. However, it was overeager and also refused
to truncate arguments as well, which broke userspace, and it was
reverted. This attempts the protection again, but allows arguments to
remain truncated. In an effort to improve readability, helper functions
and comments have been added.
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Samuel Dionne-Riel <samuel@dionne-riel.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Graham Christensen <graham@grahamc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RTL8153-BD is used in Dell DA300 type-C dongle.
It should be added to the whitelist of devices to activate MAC address
pass through.
Per confirming with Realtek all devices containing RTL8153-BD should
activate MAC pass through and there won't use pass through bit on efuse
like in RTL8153-AD.
Signed-off-by: David Chen <david.chen7@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
spin_lock_bh() is used in table_path_del() but rcu_read_unlock()
is used for unlocking. Fix it by using spin_unlock_bh() instead
of rcu_read_unlock() in the error handling case.
Fixes: b4c3fbe636 ("mac80211: Use linked list instead of rhashtable walk for mesh tables")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
makes process unkillable. The problem is that when CONFIG_PREEMPT is
enabled, we never see need_resched() return true. This is due to the
fact that preempt_enable() (which we do in bpf_test_run_one on each
iteration) now handles resched if it's needed.
Let's disable preemption for the whole run, not per test. In this case
we can properly see whether resched is needed.
Let's also properly return -EINTR to the userspace in case of a signal
interrupt.
See recent discussion:
http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com
I'll follow up with the same fix bpf_prog_test_run_flow_dissector in
bpf-next.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
commit da215354eb ("ASoC: simple-card: merge simple-scu-card")
merged simple-card and simple-scu-card. Then it had refcount
underflow bug. This patch fixup it.
We will get below error without this patch.
OF: ERROR: Bad of_node_put() on /sound
CPU: 3 PID: 237 Comm: kworker/3:1 Not tainted 5.0.0-rc6+ #1514
Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
Workqueue: events deferred_probe_work_func
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x24/0x30
dump_stack+0xb0/0xec
of_node_release+0xd0/0xd8
kobject_put+0x74/0xe8
of_node_put+0x24/0x30
__of_get_next_child+0x50/0x70
of_get_next_child+0x40/0x68
asoc_simple_card_probe+0x604/0x730
platform_drv_probe+0x58/0xa8
...
Reported-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull mailbox fixes from Jassi Brar:
- API: Fix build breakge by exporting the function mbox_flush
- BRCM: Fix FlexRM ring flush timeout issue
* tag 'mailbox-fixes-v5.0-rc7' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue
mailbox: Export mbox_flush()
Pull ARM fixes from Russell King:
"A few ARM fixes:
- Dietmar Eggemann noticed an issue with IRQ migration during CPU
hotplug stress testing.
- Mathieu Desnoyers noticed that a previous fix broke optimised
kprobes.
- Robin Murphy noticed a case where we were not clearing the dma_ops"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8835/1: dma-mapping: Clear DMA ops on teardown
ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction
ARM: 8824/1: fix a migrating irq bug when hotplug cpu
Pull tracing fixes from Steven Rostedt:
"Two more tracing fixes
- Have kprobes not use copy_from_user() to access kernel addresses,
because kprobes can legitimately poke at bad kernel memory, which
will fault. Copy from user code should never fault in kernel space.
Using probe_mem_read() can handle kernel address space faulting.
- Put back the entries counter in the tracing output that was
accidentally removed"
* tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix number of entries in trace header
kprobe: Do not use uaccess functions to access kernel memory that can fault
The authorize reply can be empty, for example when the ticket used to
build the authorizer is too old and TAG_BADAUTHORIZER is returned from
the service. Calling ->verify_authorizer_reply() results in an attempt
to decrypt and validate (somewhat) random data in au->buf (most likely
the signature block from calc_signature()), which fails and ends up in
con_fault_finish() with !con->auth_retry. The ticket isn't invalidated
and the connection is retried again and again until a new ticket is
obtained from the monitor:
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
Let TAG_BADAUTHORIZER handler kick in and increment con->auth_retry.
Cc: stable@vger.kernel.org
Fixes: 5c056fdc5b ("libceph: verify authorize reply on connect")
Link: https://tracker.ceph.com/issues/20164
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
The mbox_flush() function can be used by drivers that are built as
modules, so the function needs to be exported.
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
After commit cc9f8349cb ("arm64: crypto: add NEON accelerated XOR
implementation"), Clang builds for arm64 started failing with the
following error message.
arch/arm64/lib/xor-neon.c:58:28: error: incompatible pointer types
assigning to 'const unsigned long *' from 'uint64_t *' (aka 'unsigned
long long *') [-Werror,-Wincompatible-pointer-types]
v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6));
^~~~~~~~
/usr/lib/llvm-9/lib/clang/9.0.0/include/arm_neon.h:7538:47: note:
expanded from macro 'vld1q_u64'
__ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
^~~~
There has been quite a bit of debate and triage that has gone into
figuring out what the proper fix is, viewable at the link below, which
is still ongoing. Ard suggested disabling this warning with Clang with a
pragma so no neon code will have this type of error. While this is not
at all an ideal solution, this build error is the only thing preventing
KernelCI from having successful arm64 defconfig and allmodconfig builds
on linux-next. Getting continuous integration running is more important
so new warnings/errors or boot failures can be caught and fixed quickly.
Link: https://github.com/ClangBuiltLinux/linux/issues/283
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In valid_user_regs() we treat SSBS as a RES0 bit, and consequently it is
unexpectedly cleared when we restore a sigframe or fiddle with GPRs via
ptrace.
This patch fixes valid_user_regs() to account for this, updating the
function to refer to the latest ARM ARM (ARM DDI 0487D.a). For AArch32
tasks, SSBS appears in bit 23 of SPSR_EL1, matching its position in the
AArch32-native PSR format, and we don't need to translate it as we have
to for DIT.
There are no other bit assignments that we need to account for today.
As the recent documentation describes the DIT bit, we can drop our
comment regarding DIT.
While removing SSBS from the RES0 masks, existing inconsistent
whitespace is corrected.
Fixes: d71be2b6c0 ("arm64: cpufeature: Detect SSBS and advertise to userspace")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
After moving an XFRM interface to another namespace it stays associated
with the original namespace (net in `struct xfrm_if` and the list keyed
with `xfrmi_net_id`), allowing processes in the new namespace to use
SAs/policies that were created in the original namespace. For instance,
this allows a keying daemon in one namespace to establish IPsec SAs for
other namespaces without processes there having access to the keys or IKE
credentials.
This worked fine for outbound traffic, however, for inbound traffic the
lookup for the interfaces and the policies used the incorrect namespace
(the one the XFRM interface was moved to).
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Alexander Duyck says:
====================
Address recent issues found in netdev page_frag_alloc usage
This patch set addresses a couple of issues that I had pointed out to Jann
Horn in response to a recent patch submission.
The first issue is that I wanted to avoid the need to read/modify/write the
size value in order to generate the value for pagecnt_bias. Instead we can
just use a fixed constant which reduces the need for memory read operations
and the overall number of instructions to update the pagecnt bias values.
The other, and more important issue is, that apparently we were letting tun
access the napi_alloc_cache indirectly through netdev_alloc_frag and as a
result letting it create unaligned accesses via unaligned allocations. In
order to prevent this I have added a call to SKB_DATA_ALIGN for the fragsz
field so that we will keep the offset in the napi_alloc_cache
SMP_CACHE_BYTES aligned.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses the fact that there are drivers, specifically tun,
that will call into the network page fragment allocators with buffer sizes
that are not cache aligned. Doing this could result in data alignment
and DMA performance issues as these fragment pools are also shared with the
skb allocator and any other devices that will use napi_alloc_frags or
netdev_alloc_frags.
Fixes: ffde7328a3 ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the size + 1 value introduced with the recent fix for 1
byte allocs with a constant value.
The idea here is to reduce code overhead as the previous logic would have
to read size into a register, then increment it, and write it back to
whatever field was being used. By using a constant we can avoid those
memory reads and arithmetic operations in favor of just encoding the
maximum value into the operation itself.
Fixes: 2c2ade8174 ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: fix possible crash in tcp_v4_err()
soukjin bae reported a crash in tcp_v4_err() that we
root caused to a missing initialization.
Second patch adds a sanity check in tcp_v4_err() to avoid
future potential problems. Ignoring an ICMP message
is probably better than crashing a machine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ICMP handlers are not very often stressed, we should
make them more resilient to bugs that might surface in
the future.
If there is no packet in retransmit queue, we should
avoid a NULL deref.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: soukjin bae <soukjin.bae@samsung.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
soukjin bae reported a crash in tcp_v4_err() handling
ICMP_DEST_UNREACH after tcp_write_queue_head(sk)
returned a NULL pointer.
Current logic should have prevented this :
if (seq != tp->snd_una || !icsk->icsk_retransmits ||
!icsk->icsk_backoff || fastopen)
break;
Problem is the write queue might have been purged
and icsk_backoff has not been cleared.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: soukjin bae <soukjin.bae@samsung.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mv643xx_eth_shared_of_probe() fails, mv643xx_eth_shared_probe()
leaves clk enabled.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 1199:68C0 USB ID is reused by Sierra WP7607 which requires the DTR
quirk to be detected. Apply QMI_QUIRK_SET_DTR unconditionally as
already done for other IDs shared between different devices.
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
GMAC IP is little-endian and used on several kind of CPU (big or little
endian). Main callbacks functions of the stmmac drivers take care about
it. It was not the case for dwmac4_get_timestamp function.
Fixes: ba1ffd74df ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MSG_ZEROCOPY implementation for UDP was merged in v5.0,
6e360f7331 ("Merge branch 'udp-msg_zerocopy'").
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
As linux-5.0.x is coming up soon, the documentation should match,
in particular the README.rst file, so change all 4.x references
accordingly. There was a mix of lowercase and uppercase X here,
which I changed to using lowercase consistently.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Fix the mismatch between the "sdxc_d13_1_a" pin group definition from
meson8b_cbus_groups and the entry in sdxc_a_groups ("sdxc_d0_13_1_a").
This makes it possible to use "sdxc_d13_1_a" in device-tree files to
route the MMC data 1..3 pins to GPIOX_1..3.
Fixes: 0fefcb6876 ("pinctrl: Add support for Meson8b")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The function-local variable "delay" enters the loop interpreted as delay
in bits. However, inside the loop it gets overwritten by the result of
mlxsw_sp_pg_buf_delay_get(), and thus leaves the loop as quantity in
cells. Thus on second and further loop iterations, the headroom for a
given priority is configured with a wrong size.
Fix by introducing a loop-local variable, delay_cells. Rename thres to
thres_cells for consistency.
Fixes: f417f04da5 ("mlxsw: spectrum: Refactor port buffer configuration")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull EFI fixes from Ingo Molnar:
"This tree reverts a GICv3 commit (which was broken) and fixes it in
another way, by adding a memblock build-time entries quirk for ARM64"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm: Revert "Defer persistent reservations until after paging_init()"
arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table
Pull x86 fixes from Ingo Molnar:
"Three changes:
- An UV fix/quirk to pull UV BIOS calls into the efi_runtime_lock
locking regime. (This done by aliasing __efi_uv_runtime_lock to
efi_runtime_lock, which should make the quirk nature obvious and
maintain the general policy that the EFI lock (name...) isn't
exposed to drivers.)
- Our version of MAGA: Make a.out Great Again.
- Add a new Intel model name enumerator to an upstream header to help
reduce dependencies going forward"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
x86/CPU: Add Icelake model number
x86/a.out: Clear the dump structure initially
Pull perf fixes from Ingo Molnar:
"Two fixes on the kernel side: fix an over-eager condition that failed
larger perf ring-buffer sizes, plus fix crashes in the Intel BTS code
for a corner case, found by fuzzing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix impossible ring-buffer sizes warning
perf/x86: Add check_period PMU callback
Pull powerpc fix from Michael Ellerman:
"Just one fix, for pgd/pud_present() which were broken on big endian
since v4.20, leading to possible data corruption.
Thanks to: Aneesh Kumar K.V., Erhard F., Jan Kara"
* tag 'powerpc-5.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()
Pull arch/csky fixes from Guo Ren:
"Here are some fixup patches for 5.0-rc6"
* tag 'csky-for-linus-5.0-rc6' of git://github.com/c-sky/csky-linux:
csky: Fixup dead loop in show_stack
csky: Fixup io-range page attribute for mmap("/dev/mem")
csky: coding convention: Use task_stack_page
csky: Fixup wrong pt_regs size
csky: Fixup _PAGE_GLOBAL bit for 610 tlb entry
Pull i2c fixes from Wolfram Sang:
"Two more driver bugfixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: bcm2835: Clear current buffer pointers and counts after a transfer
i2c: cadence: Fix the hold bit setting
Pull input fixes from Dmitry Torokhov:
- tweaks to Elan drivers (both PS/2 and I2C) to support new devices.
Also revert of one of IDs as that device should really be driven by
i2c-hid + hid-multitouch
- a few drivers have been switched to set_brightness_blocking() call
because they either were sleeping the their set_brightness()
implementation or used workqueue but were not canceling it on unbind.
- ps2-gpio and matrix_keypad needed to [properly] flush their works to
avoid potential use-after-free on unbind.
- other miscellaneous fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - add ACPI ID for touchpad in Lenovo V330-15ISK
Input: st-keyscan - fix potential zalloc NULL dereference
Input: apanel - switch to using brightness_set_blocking()
Revert "Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G"
Input: qt2160 - switch to using brightness_set_blocking()
Input: matrix_keypad - use flush_delayed_work()
Input: ps2-gpio - flush TX work when closing port
Input: cap11xx - switch to using set_brightness_blocking()
Input: elantech - enable 3rd button support on Fujitsu CELSIUS H780
Input: bma150 - register input device after setting private data
Input: pwm-vibra - stop regulator after disabling pwm, not before
Input: pwm-vibra - prevent unbalanced regulator
Input: snvs_pwrkey - allow selecting driver for i.MX 7D
Pull KVM fixes from Paolo Bonzini:
"A somewhat bigger ARM update, and the usual smattering of x86 bug
fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vmx: Fix entry number check for add_atomic_switch_msr()
KVM: x86: Recompute PID.ON when clearing PID.SN
KVM: nVMX: Restore a preemption timer consistency check
x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is available
KVM: arm64: Forbid kprobing of the VHE world-switch code
KVM: arm64: Relax the restriction on using stage2 PUD huge mapping
arm: KVM: Add missing kvm_stage2_has_pmd() helper
KVM: arm/arm64: vgic: Always initialize the group of private IRQs
arm/arm64: KVM: Don't panic on failure to properly reset system registers
arm/arm64: KVM: Allow a VCPU to fully reset itself
KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
arm64: KVM: Don't generate UNDEF when LORegion feature is present
KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlock
KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock
Alexei Starovoitov says:
====================
pull-request: bpf 2019-02-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix lockdep false positive in bpf_get_stackid(), from Alexei.
2) several AF_XDP fixes, from Bjorn, Magnus, Davidlohr.
3) fix narrow load from struct bpf_sock, from Martin.
4) mips JIT fixes, from Paul.
5) gso handling fix in bpf helpers, from Willem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that LEDs core allows "blocking" flavor of "set brightness" method we
can use it and get rid of private work item. As a bonus, we are no longer
forgetting to cancel it when we unbind the driver.
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT
rather than just checking that the value is non-zero, e.g.:
static inline int pgd_present(pgd_t pgd)
{
- return !pgd_none(pgd);
+ return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
}
Unfortunately this is broken on big endian, as the result of the
bitwise & is truncated to int, which is always zero because
_PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and
pud_present() are always false at compile time, and the compiler
elides the subsequent code.
Remarkably with that bug present we are still able to boot and run
with few noticeable effects. However under some work loads we are able
to trigger a warning in the ext4 code:
WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0
CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1
...
NIP .ext4_set_page_dirty+0x70/0xb0
LR .set_page_dirty+0xa0/0x150
Call Trace:
.set_page_dirty+0xa0/0x150
.unmap_page_range+0xbf0/0xe10
.unmap_vmas+0x84/0x130
.unmap_region+0xe8/0x190
.__do_munmap+0x2f0/0x510
.__vm_munmap+0x80/0x110
.__se_sys_munmap+0x14/0x30
system_call+0x5c/0x70
The fix is simple, we need to convert the result of the bitwise & to
an int before returning it.
Thanks to Erhard, Jan Kara and Aneesh for help with debugging.
Fixes: da7ad366b4 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Erhard F. <erhard_f@mailbox.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull ARM SoC fixes from Arnd Bergmann:
"This week is a much smaller update, containing fixes only for TI OMAP,
NXP i.MX and Rockchips platforms:
omap:
- omap4 had problems with lost timer interrupts
- another IRQ handling issue with OMAP5
- A workaround for a regression in the pwm-omap-dmtimer driver
NXP i.MX:
- eMMC was broken on the new imx8mq-evk board
Rockchip:
- a fix for new dtc graph warnings and a regulator fix for rock64
- USB support broke on rk3328-rock64"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
arm64: dts: imx8mq: Fix boot from eMMC
ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
ARM: dts: Configure clock parent for pwm vibra
bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe()
arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type
clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
ARM: dts: rockchip: remove qos_cif1 from rk3188 power-domain
Pull more nfsd fixes from Bruce Fields:
"Two small fixes, one for crashes using nfs/krb5 with older enctypes,
one that could prevent clients from reclaiming state after a kernel
upgrade"
* tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix 4 more call sites that were using stack memory with a scatterlist
Revert "nfsd4: return default lease period"
Pull more NFS client fixes from Anna Schumaker:
"Three fixes this time.
Nicolas's is for xprtrdma completion vector allocation on single-core
systems. Greg's adds an error check when allocating a debugfs dentry.
And Ben's is an additional fix for nfs_page_async_flush() to prevent
pages from accidentally getting truncated.
Summary:
- Make sure Send CQ is allocated on an existing compvec
- Properly check debugfs dentry before using it
- Don't use page_file_mapping() after removing a page"
* tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Don't use page_file_mapping after removing the page
rpc: properly check debugfs dentry before using it
xprtrdma: Make sure Send CQ is allocated on an existing compvec
Pull auxdisplay fix from Miguel Ojeda:
"Fix potential user-after-free on ht16k33 module unload. Reported by
Sven Van Asbroeck"
* tag 'auxdisplay-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
auxdisplay: ht16k33: fix potential user-after-free on module unload
Pull compiler attributes fixes from Miguel Ojeda:
"Clean the new GCC 9 -Wmissing-attributes warnings
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
These patch series clean these new warnings. Most of them are caused
by the module_init/exit macros"
Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labbott@redhat.com/
* tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
Compiler Attributes: add support for __copy (gcc >= 9)
lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
In the irqchip and EFI code, we have what basically amounts to a quirk
to work around a peculiarity in the GICv3 architecture, which permits
the system memory address of LPI tables to be programmable only once
after a CPU reset. This means kexec kernels must use the same memory
as the first kernel, and thus ensure that this memory has not been
given out for other purposes by the time the ITS init code runs, which
is not very early for secondary CPUs.
On systems with many CPUs, these reservations could overflow the
memblock reservation table, and this was addressed in commit:
eff8962888 ("efi/arm: Defer persistent reservations until after paging_init()")
However, this turns out to have made things worse, since the allocation
of page tables and heap space for the resized memblock reservation table
itself may overwrite the regions we are attempting to reserve, which may
cause all kinds of corruption, also considering that the ITS will still
be poking bits into that memory in response to incoming MSIs.
So instead, let's grow the static memblock reservation table on such
systems so it can accommodate these reservations at an earlier time.
This will permit us to revert the above commit in a subsequent patch.
[ mingo: Minor cleanups. ]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When CONFIG_IP_VS_IPV6 is not defined, build produced this warning:
net/netfilter/ipvs/ip_vs_ctl.c:899:6: warning: unused variable ‘ret’ [-Wunused-variable]
int ret = 0;
^~~
Fix this by moving the declaration of 'ret' in the CONFIG_IP_VS_IPV6
section in the same function.
While at it, drop its unneeded initialisation.
Fixes: 098e13f5b2 ("ipvs: fix dependency on nf_defrag_ipv6")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Fainelli says:
====================
net: dsa: b53: VLAN and L2 fixes
This patch series contains a collection of fixes to the b53 driver in
order to:
- consistently program the same default VLAN ID when a port is bridged
or not
- properly account for VLAN filtering being turned on/off and turning
on ingress VID checking accordingly
- have SYSTEMPORT properly forward BPDU frames to the network stack
(which it did not)
- do not assume that WoL is supported by the DSA master network device
we are connected to
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The CPU port is special and does not need to obey VLAN restrictions as
far as untagged traffic goes, also, having the CPU port be part of a
particular PVID is against the idea of keeping it tagged in all VLANs.
Fixes: ca89319483 ("net: dsa: b53: Keep CPU port as tagged in all VLANs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We assume in the bcm_sf2 driver that the DSA master network device
supports ethtool_ops::{get,set}_wol operations, which is not a given.
Avoid de-referencing potentially non-existent function pointers and
check them as we should.
Fixes: 96e65d7f3f ("net: dsa: bcm_sf2: add support for Wake-on-LAN")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYSTEMPORT has its RXCHK parser block that attempts to validate the
packet structures, unfortunately setting the L2 header check bit will
cause Bridge PDUs (BPDUs) to be incorrectly rejected because they look
like LLC/SNAP packets with a non-IPv4 or non-IPv6 Ethernet Type.
Fixes: 4e8aedfe78c7 ("net: systemport: Turn on offloads by default")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN filtering can be built into the kernel, and also dynamically turned
on/off through the bridge master device. Allow re-configuring the switch
appropriately to account for that by deciding whether VLAN table
(v_table) misses should lead to a drop or forward.
Fixes: a2482d2ce3 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were not consistent in how the default VID of a given port was
defined, b53_br_leave() would make sure the VLAN ID would be either 0/1
depending on the switch generation, but b53_configure_vlan(), which is
the default configuration would unconditionally set it to 1. The correct
value is 1 for 5325/5365 series and 0 otherwise. To avoid repeating that
mistake ever again, introduce a helper function: b53_default_pvid() to
factor that out.
Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller again found a path to a kernel crash through bad gso input.
By building an excessively large packet to cause an skb field to wrap.
If VIRTIO_NET_HDR_F_NEEDS_CSUM was set this would have been dropped in
skb_partial_csum_set.
GSO packets that do not set checksum offload are suspicious and rare.
Most callers of virtio_net_hdr_to_skb already pass them to
skb_probe_transport_header.
Move that test forward, change it to detect parse failure and drop
packets on failure as those cleary are not one of the legitimate
VIRTIO_NET_HDR_GSO types.
Fixes: bfd5f4a3d6 ("packet: Add GSO/csum offload support.")
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The features attribute is of type u64 and stored in the native endianes on
the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
and goes over the bits in this area. On little Endian systems this also
works with an u64 as the most significant bit is on the highest address,
but on big endian the words are swapped. When we expect bit 15 here we get
bit 47 (15 + 32).
This patch converts it more or less to its own for_each_set_bit()
implementation which works on 64 bit integers directly. This is then
completely in host endianness and should work like expected.
Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hauke Mehrtens <hauke.mehrtens@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some PHY drivers like the generic one do not provide a read_status
callback on their own but rely on genphy_read_status being called
directly.
With the current code, this results in a NULL function pointer call.
Call genphy_read_status instead when there is no specific callback.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit c706863bc8 ("net: ip6_gre: always reports o_key to
userspace"), ip6gre and ip6gretap tunnels started reporting TUNNEL_KEY
output flag even if it is not configured.
ip6gre_fill_info checks erspan_ver value to add TUNNEL_KEY for
erspan tunnels, however in commit 84581bdae9 ("erspan: set
erspan_ver to 1 by default when adding an erspan dev")
erspan_ver is initialized to 1 even for ip6gre or ip6gretap
Fix the issue moving erspan_ver initialization in a dedicated routine
Fixes: c706863bc8 ("net: ip6_gre: always reports o_key to userspace")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Just a few fixes this time:
* mesh rhashtable fixes from Herbert
* a small error path fix when starting AP interfaces
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Up to 4.12, __scsi_error_from_host_byte() would reset the host byte to
DID_OK for various cases including DID_NEXUS_FAILURE. Commit
2a842acab1 ("block: introduce new block status code type") replaced this
function with scsi_result_to_blk_status() and removed the host-byte
resetting code for the DID_NEXUS_FAILURE case. As the line
set_host_byte(cmd, DID_OK) was preserved for the other cases, I suppose
this was an editing mistake.
The fact that the host byte remains set after 4.13 is causing problems with
the sg_persist tool, which now returns success rather then exit status 24
when a RESERVATION CONFLICT error is encountered.
Fixes: 2a842acab1 "block: introduce new block status code type"
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function sd_zbc_do_report_zones() issues a REPORT ZONES command with a
buffer size calculated based on the number of zones requested by the
caller. This value should however not exceed the capabilities of the
hardware maximum command size, that is, should not exceed the
max_hw_sectors limit of the device. This problem leads to failures of
report zones commands when re-validating disks with some SAS HBAs.
Fix this by limiting a report zone command buffer size to the minimum of
the device max_hw_sectors and calculated value based on the requested
number of zones. This does not change the semantic of the report_zones file
operation as report zones can always return less zone reports than
requested. Short reports are handled using a loop execution of the
report_zones file operation in the function blk_report_zones().
[Damien]
Before patch 'e76239a3748c ("block: add a report_zones method")', report
zones buffer allocation was limited to max_sectors when allocated in
blk_report_zones(). This however does not consider the actual format of the
device reply which is interface dependent. Limiting the allocation based
on the size of the expected reply format rather than the size of the array
of generic sturct blkzone passed by blk_report_zones() makes more sense.
Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
REG_32BIT_ZERO_EX and REG_64BIT are always handled in exactly the same
way, and reg_val_propagate_range() never actually sets any register to
type REG_32BIT_ZERO_EX.
Remove the redundant & unused REG_32BIT_ZERO_EX.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The function prototype used to call JITed eBPF code (ie. the type of the
struct bpf_prog bpf_func field) returns an unsigned int. The MIPS n64
ABI that MIPS64 kernels target defines that 32 bit integers should
always be sign extended when passed in registers as either arguments or
return values.
This means that when returning any value which may not already be sign
extended (ie. of type REG_64BIT or REG_32BIT_ZERO_EX) we need to perform
that sign extension in order to comply with the n64 ABI. Without this we
see strange looking test failures from test_bpf.ko, such as:
test_bpf: #65 ALU64_MOV_X:
dst = 4294967295 jited:1 ret -1 != -1 FAIL (1 times)
Although the return value printed matches the expected value, this is
only because printf is only examining the least significant 32 bits of
the 64 bit register value we returned. The register holding the expected
value is sign extended whilst the v0 register was set to a zero extended
value by our JITed code, so when compared by a conditional branch
instruction the values are not equal.
We already handle this when the return value register is of type
REG_32BIT_ZERO_EX, so simply extend this to also cover REG_64BIT.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9c4 ("MIPS: Add missing file for eBPF JIT.")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Set the timestamp on new keys rather than leaving it unset.
Fixes: 31d5a79d7f ("KEYS: Do LRU discard in full keyrings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
In the request_key() upcall mechanism there's a dependency loop by which if
a key type driver overrides the ->request_key hook and the userspace side
manages to lose the authorisation key, the auth key and the internal
construction record (struct key_construction) can keep each other pinned.
Fix this by the following changes:
(1) Killing off the construction record and using the auth key instead.
(2) Including the operation name in the auth key payload and making the
payload available outside of security/keys/.
(3) The ->request_key hook is given the authkey instead of the cons
record and operation name.
Changes (2) and (3) allow the auth key to naturally be cleaned up if the
keyring it is in is destroyed or cleared or the auth key is unlinked.
Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Fix the creation of shortcuts for which the length of the index key value
is an exact multiple of the machine word size. The problem is that the
code that blanks off the unused bits of the shortcut value malfunctions if
the number of bits in the last word equals machine word size. This is due
to the "<<" operator being given a shift of zero in this case, and so the
mask that should be all zeros is all ones instead. This causes the
subsequent masking operation to clear everything rather than clearing
nothing.
Ordinarily, the presence of the hash at the beginning of the tree index key
makes the issue very hard to test for, but in this case, it was encountered
due to a development mistake that caused the hash output to be either 0
(keyring) or 1 (non-keyring) only. This made it susceptible to the
keyctl/unlink/valid test in the keyutils package.
The fix is simply to skip the blanking if the shift would be 0. For
example, an index key that is 64 bits long would produce a 0 shift and thus
a 'blank' of all 1s. This would then be inverted and AND'd onto the
index_key, incorrectly clearing the entire last word.
Fixes: 3cb989501c ("Add a generic associative array implementation.")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
If the sysctl 'kernel.keys.maxkeys' is set to some number n, then
actually users can only add up to 'n - 1' keys. Likewise for
'kernel.keys.maxbytes' and the root_* versions of these sysctls. But
these sysctls are apparently supposed to be *maximums*, as per their
names and all documentation I could find -- the keyrings(7) man page,
Documentation/security/keys/core.rst, and all the mentions of EDQUOT
meaning that the key quota was *exceeded* (as opposed to reached).
Thus, fix the code to allow reaching the quotas exactly.
Fixes: 0b77f5bfb4 ("keys: make the keyring quotas controllable through /proc/sys")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Pull SCSI fixes from James Bottomley:
"Two fairly small fixes: the qla one is a panic inducing use after free
and the entropy fix may seem minor but it has had huge userspace
impact thanks to an unrelated change in openssl that causes sshd to
refuse logins until it has enough entropy for the session keys, which
causes tens of minutes delay before the affected systems allow logins
after reboot"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
scsi: sd: fix entropy gathering for most rotational disks
While trying to reproduce a reported kernel panic on arm64, I discovered
that AUTH_GSS basically doesn't work at all with older enctypes on arm64
systems with CONFIG_VMAP_STACK enabled. It turns out there still a few
places using stack memory with scatterlists, causing krb5_encrypt() and
krb5_decrypt() to produce incorrect results (or a BUG if CONFIG_DEBUG_SG
is enabled).
Tested with cthon on v4.0/v4.1/v4.2 with krb5/krb5i/krb5p using
des3-cbc-sha1 and arcfour-hmac-md5.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fix omap4 and later lost cpu1 interrupts for periodic timer
A fix from Russell that took a while to get applied into fixes as
I thought Russell is merging this one.
* tag 'omap-for-v5.0/fixes-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers for all the init/cleanup_module
aliases in the kernel (defined by the module_init/exit macros),
ending up being very noisy.
These aliases point to the __init/__exit functions of a module,
which are defined as __cold (among other attributes). However,
the aliases themselves do not have the __cold attribute.
Since the compiler behaves differently when compiling a __cold
function as well as when compiling paths leading to calls
to __cold functions, the warning is trying to point out
the possibly-forgotten attribute in the alias.
In order to keep the warning enabled, we decided to silence
this case. Ideally, we would mark the aliases directly
as __init/__exit. However, there are currently around 132 modules
in the kernel which are missing __init/__exit in their init/cleanup
functions (either because they are missing, or for other reasons,
e.g. the functions being called from somewhere else); and
a section mismatch is a hard error.
A conservative alternative was to mark the aliases as __cold only.
However, since we would like to eventually enforce __init/__exit
to be always marked, we chose to use the new __copy function
attribute (introduced by GCC 9 as well to deal with this).
With it, we copy the attributes used by the target functions
into the aliases. This way, functions that were not marked
as __init/__exit won't have their aliases marked either,
and therefore there won't be a section mismatch.
Note that the warning would go away marking either the extern
declaration, the definition, or both. However, we only mark
the definition of the alias, since we do not want callers
(which only see the declaration) to be compiled as if the function
was __cold (and therefore the paths leading to those calls
would be assumed to be unlikely).
Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
From the GCC manual:
copy
copy(function)
The copy attribute applies the set of attributes with which function
has been declared to the declaration of the function to which
the attribute is applied. The attribute is designed for libraries
that define aliases or function resolvers that are expected
to specify the same set of attributes as their targets. The copy
attribute can be used with functions, variables, or types. However,
the kind of symbol to which the attribute is applied (either
function or variable) must match the kind of symbol to which
the argument refers. The copy attribute copies only syntactic and
semantic attributes but not attributes that affect a symbol’s
linkage or visibility such as alias, visibility, or weak.
The deprecated attribute is also not copied.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
Using __copy(f) we can copy the __cold attribute from f to g:
void __cold f(void) {}
void __copy(f) __alias("f") g(void);
This attribute is most useful to deal with situations where an alias
is declared but we don't know the exact attributes the target has.
For instance, in the kernel, the widely used module_init/exit macros
define the init/cleanup_module aliases, but those cannot be marked
always as __init/__exit since some modules do not have their
functions marked as such.
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers here because crc32_le_base/__crc32c_le_base
aren't __pure while their target crc32_le/__crc32c_le are.
These aliases are used by architectures as a fallback in accelerated
versions of CRC32. See commit 9784d82db3 ("lib/crc32: make core crc32()
routines weak so they can be overridden").
Therefore, being fallbacks, it is likely that even if the aliases
were called from C, there wouldn't be any optimizations possible.
Currently, the only user is arm64, which calls this from asm.
Still, marking the aliases as __pure makes sense and is a good idea
for documentation purposes and possible future optimizations,
which also silences the warning.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
On module unload/remove, we need to ensure that work does not run
after we have freed resources. Concretely, cancel_delayed_work()
may return while the callback function is still running.
From kernel/workqueue.c:
The work callback function may still be running on return,
unless it returns true and the work doesn't re-arm itself.
Explicitly flush or use cancel_delayed_work_sync() to wait on it.
Link: https://lore.kernel.org/lkml/20190204220952.30761-1-TheSven73@googlemail.com/
Reported-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Acked-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
The following commit
441dae8f2f ("tracing: Add support for display of tgid in trace output")
removed the call to print_event_info() from print_func_help_header_irq()
which results in the ftrace header not reporting the number of entries
written in the buffer. As this wasn't the original intent of the patch,
re-introduce the call to print_event_info() to restore the orginal
behaviour.
Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com
Acked-by: Joel Fernandes <joelaf@google.com>
Cc: stable@vger.kernel.org
Fixes: 441dae8f2f ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull block fixes from Jens Axboe:
- Ensure we insert into the hctx dispatch list, if a request is marked
as DONTPREP (Jianchao)
- NVMe pull request, single missing unlock on error fix (Keith)
- MD pull request, single fix for a potentially data corrupting issue
(Nate)
- Floppy check_events regression fix (Yufen)
* tag 'for-linus-20190215' of git://git.kernel.dk/linux-block:
md/raid1: don't clear bitmap bits on interrupted recovery.
floppy: check_events callback should not return a negative number
nvme-pci: add missing unlock for reset error
blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
Pull device mapper fixes from Mike Snitzer:
- Fix bug in DM crypt's sizing of its block integrity tag space,
resulting in less memory use when DM crypt layers on DM integrity.
- Fix a long-standing DM thinp crash consistency bug that was due to
improper handling of FUA. This issue is specific to writes that fill
an entire thinp block which needs to be allocated.
* tag 'for-5.0/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: fix bug where bio that overwrites thin block ignores FUA
dm crypt: don't overallocate the integrity tag space
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes intended for v5.0-rc7.
MMC core:
- Fix deadlock bug for block I/O requests
MMC host:
- sunxi: Disable broken HS-DDR mode for H5 by default
- sunxi: Avoid unsupported speed modes declared via DT
- meson-gx: Restore interrupt name"
* tag 'mmc-v5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: fix interrupt name
mmc: block: handle complete_work on separate workqueue
mmc: sunxi: Filter out unsupported modes declared in the device tree
mmc: sunxi: Disable HS-DDR mode for H5 eMMC controller by default
Adjust the cq/qp mask based on the number of bar2 pages in a host page.
For user-mode rdma, the granularity of the BAR2 memory mapped to a user
rdma process during queue allocation must be based on the host page
size. The lld attributes udb_density and ucq_density are used to figure
out how many sge contexts are in a bar2 page. So the rdev->qpmask and
rdev->cqmask in iw_cxgb4 need to now be adjusted based on how many sge
bar2 pages are in a host page.
Otherwise the device fails to work on non 4k page size systems.
Fixes: 2391b0030e ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Export the sge_host_page_size field to ULDs via cxgb4_lld_info, so that
iw_cxgb4 can make use of this in calculating the correct qp/cq mask.
Fixes: 2391b0030e ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull drm fixes from Dave Airlie:
"Usual pull request, little larger than I'd like but nothing too
strange in it. Willy found an bug in the lease ioctl calculations, but
it's a drm master only ioctl which makes it harder to mess with.
i915:
- combo phy programming fix
- opregion version check fix for VBT RVDA lookup
- gem mmap ioctl race fix
- fbdev hpd during suspend fix
- array size bounds check fix in pmu
amdgpu:
- Vega20 psp fix
- Add vrr range to debugfs for freesync debugging
sched:
- Scheduler race fix
vkms:
- license header fixups
imx:
- Fix CSI register offsets for i.MX51 and i.MX53.
- Fix delayed page flip completion events on i.MX6QP due to
unexpected behaviour of the PRE when issuing NOP buffer updates to
the same buffer address.
- Stop throwing errors for plane updates on disabled CRTCs when a
userspace process is killed while a plane update is pending.
- Add missing of_node_put cleanup in imx_ldb_bind"
* tag 'drm-fixes-2019-02-15-1' of git://anongit.freedesktop.org/drm/drm:
drm: Use array_size() when creating lease
drm/amdgpu/psp11: TA firmware is optional (v3)
drm/i915/opregion: rvda is relative from opregion base in opregion 2.1+
drm/i915/opregion: fix version check
drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
drm/i915: Block fbdev HPD processing during suspend
drm/i915/pmu: Fix enable count array size and bounds checking
drm/i915/cnl: Fix CNL macros for Voltage Swing programming
drm/i915/icl: combo port vswing programming changes per BSPEC
drm/vkms: Fix license inconsistent
drm/amd/display: Expose connector VRR range via debugfs
drm/sched: Always trace the dependencies we wait on, to fix a race.
gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
gpu: ipu-v3: Fix CSI offsets for imx53
drm/imx: imx-ldb: add missing of_node_puts
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
drm/imx: ignore plane updates on disabled crtcs
Pull crypto fix from Herbert Xu:
"This fixes a crash on resume in the ccree driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccree - fix resume race condition on init
Pull networking fixes from David Miller:
1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.
2) Probe SFP modules after being attached, from Russell King.
3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.
4) Revert some r8169 changes that are causing regressions, from Heiner
Kallweit.
5) Fix spurious connection timeouts in netfilter nat code, from Florian
Westphal.
6) SKB leak in tipc, from Hoang Le.
7) Short packet checkum issue in mlx4, similar to a previous mlx5
change, from Saeed Mahameed. The issue is that whilst padding bytes
are usually zero, it is not guarateed and the hardware doesn't take
the padding bytes into consideration when generating the checksum.
8) Fix various races in cls_tcindex, from Cong Wang.
9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
Long.
10) Fix locking in phy_is_started, from Heiner Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net: ethernet: freescale: set FEC ethtool regs version
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
net: phy: fix potential race in the phylib state machine
net: phy: don't use locking in phy_is_started
selftests: fix timestamping Makefile
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
net: fix possible overflow in __sk_mem_raise_allocated()
dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
net: phy: fix interrupt handling in non-started states
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
net/mlx5e: XDP, fix redirect resources availability check
net/mlx5: Fix a compilation warning in events.c
net/mlx5: No command allowed when command interface is not ready
net/mlx5e: Fix NULL pointer derefernce in set channels error flow
netfilter: nft_compat: use-after-free when deleting targets
team: avoid complex list operations in team_nl_cmd_options_set()
net_sched: fix two more memory leaks in cls_tcindex
net_sched: fix a memory leak in cls_tcindex
...
Pull signal fix from Eric Biederman:
"Just a single patch that restores PTRACE_EVENT_EXIT functionality that
was accidentally broken by last weeks fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Restore the stop PTRACE_EVENT_EXIT
Flush after rule deletion bogusly hits -ENOENT. Skip rules that have
been already from nft_delrule_by_chain() which is always called from the
flush path.
Fixes: cf9dc09d09 ("netfilter: nf_tables: fix missing rules flushing per table")
Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
i.MX fixes for 5.0, 3rd round:
It contains a fix for i.MX8MQ EVK board device tree, which makes the
broken eMMC support work as expected.
* tag 'imx-fixes-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mq: Fix boot from eMMC
Fix for new dtc graph warnings and a regulator fix for rock64.
* tag 'v5.0-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
Drop one non-existent component from powerdomain list.
* tag 'v5.0-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: remove qos_cif1 from rk3188 power-domain
SoC fixes for omaps for v5.0-rc cycle
This series contains two SoC regression fixes and one uninitialized
variable fix:
- Fix inverted nirq pin handling for omap5 that started producing
warnings with earlier GIC direction checks and took a while to
understand and confirm. Basically there are two sys_nirq pins
that are bypassing peripheral modules and inverted automatically
by the SoC and need to be handled with a custom irq_set_type()
- Recent ti-sysc changes caused a regression to the pwm-omap-dmtimer
code where the device tree handling code for timer source clock
gets confused. It looks like we can remove that code eventually,
but for now we just drop a bogus pm_runtime_irq_safe() for the
timers with the related quirks caused by pm_runtime_irq_safe(),
and have the standard assigned-clocks and assigned-clock-parents
deal with setting the source clock
- Fix potentially uninitialized value for display init code if
regmap_read() fails
* tag 'omap-for-v5.0/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
ARM: dts: Configure clock parent for pwm vibra
bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe()
ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type
clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
The starting of AP interface can fail due to invalid
beacon interval, which does not match the minimum gcd
requirement set by the wifi driver. In such case, the
beacon interval of that interface gets updated with
that invalid beacon interval.
The next time that interface is brought up in AP mode,
an interface combination check is performed and the
beacon interval is taken from the previously set value.
In a case where an invalid beacon interval, i.e. a beacon
interval value which does not satisfy the minimum gcd criteria
set by the driver, is set, all the subsequent trials to
bring that interface in AP mode will fail, even if the
subsequent trials have a valid beacon interval.
To avoid this, in case of a failure in bringing up an
interface in AP mode due to interface combination error,
the interface beacon interval which is stored in bss
conf, needs to be restored with the last working value
of beacon interval.
Tested on ath10k using WCN3990.
Cc: stable@vger.kernel.org
Fixes: 0c317a02ca ("cfg80211: support virtual interfaces with different beacon intervals")
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The mesh table code walks over hash tables for two purposes. First of
all it's used as part of a netlink dump process, but it is also used
for looking up entries to delete using criteria other than the hash
key.
The second purpose is directly contrary to the design specification
of rhashtable walks. It is only meant for use by netlink dumps.
This is because rhashtable is resizable and you cannot obtain a
stable walk over it during a resize process.
In fact mesh's use of rhashtable for dumping is bogus too. Rather
than using rhashtable walk's iterator to keep track of the current
position, it always converts the current position to an integer
which defeats the purpose of the iterator.
Therefore this patch converts all uses of rhashtable walk into a
simple linked list.
This patch also adds a new spin lock to protect the hash table
insertion/removal as well as the walk list modifications. In fact
the previous code was buggy as the removals can race with each
other, potentially resulting in a double-free.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The driver's interrupt handler checks whether a message is currently
being handled with the curr_msg pointer. When it is NULL, the interrupt
is considered to be unexpected. Similarly, the i2c_start_transfer
routine checks for the remaining number of messages to handle in
num_msgs.
However, these values are never cleared and always keep the message and
number relevant to the latest transfer (which might be done already and
the underlying message memory might have been freed).
When an unexpected interrupt hits with the DONE bit set, the isr will
then try to access the flags field of the curr_msg structure, leading
to a fatal page fault.
The msg_buf and msg_buf_remaining fields are also never cleared at the
end of the transfer, which can lead to similar pitfalls.
Fix these issues by introducing a cleanup function and always calling
it after a transfer is finished.
Fixes: e247454103 ("i2c: bcm2835: Fix hang for writing messages larger than 16 bytes")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
In case the hold bit is not needed we are carrying the old values.
Fix the same by resetting the bit when not needed.
Fixes the sporadic i2c bus lockups on National Instruments
Zynq-based devices.
Fixes: df8eb5691c ("i2c: Add driver for Cadence I2C controller")
Reported-by: Kyle Roeschley <kyle.roeschley@ni.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Tested-by: Kyle Roeschley <kyle.roeschley@ni.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Passing an object_count of sufficient size will make
object_count * 4 wrap around to be very small, then a later function
will happily iterate off the end of the object_ids array. Using
array_size() will saturate at SIZE_MAX, the kmalloc() will fail and
we'll return an -ENOMEM to the norty userspace.
Fixes: 62884cd386 ("drm: Add four ioctls for managing drm mode object leases [v7]")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Dave Airlie <airlied@redhat.com>
When provisioning a new data block for a virtual block, either because
the block was previously unallocated or because we are breaking sharing,
if the whole block of data is being overwritten the bio that triggered
the provisioning is issued immediately, skipping copying or zeroing of
the data block.
When this bio completes the new mapping is inserted in to the pool's
metadata by process_prepared_mapping(), where the bio completion is
signaled to the upper layers.
This completion is signaled without first committing the metadata. If
the bio in question has the REQ_FUA flag set and the system crashes
right after its completion and before the next metadata commit, then the
write is lost despite the REQ_FUA flag requiring that I/O completion for
this request must only be signaled after the data has been committed to
non-volatile storage.
Fix this by deferring the completion of overwrite bios, with the REQ_FUA
flag set, until after the metadata has been committed.
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This reverts commit 8099b047ec.
It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.
Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the ethtool_regs version is set to 0 for FEC devices.
Use this field to store the register dump version exposed by the
kernel. The choosen version 2 corresponds to the kernel compile test:
#if defined(CONFIG_M523x) || defined(CONFIG_M527x)
|| defined(CONFIG_M528x) || defined(CONFIG_M520x)
|| defined(CONFIG_M532x) || defined(CONFIG_ARM)
|| defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
and version 1 corresponds to the opposite. Binaries of ethtool unaware
of this version will dump the whole set as usual.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d6ebf5088f.
I forgot that the kernel's default lease period should never be
decreased!
After a kernel upgrade, the kernel has no way of knowing on its own what
the previous lease time was. Unless userspace tells it otherwise, it
will assume the previous lease period was the same.
So if we decrease this value in a kernel upgrade, we end up enforcing a
grace period that's too short, and clients will fail to reclaim state in
time. Symptoms may include EIO and log messages like "NFS:
nfs4_reclaim_open_state: Lock reclaim failed!"
There was no real justification for the lease period decrease anyway.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Fixes: d6ebf5088f "nfsd4: return default lease period"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.
However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.
Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.
To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: fix locking issue
Russell pointed out that the locking used in phy_is_started() isn't
needed and misleading. This locking also contributes to a race fixed
with patch 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell reported the following race in the phylib state machine
(quoting from his mail):
if (phy_polling_mode(phydev) && phy_is_started(phydev))
phy_queue_state_machine(phydev, PHY_STATE_TIME);
state = PHY_UP
thread 0 thread 1
phy_disconnect()
+-phy_is_started()
phy_is_started() |
`-phy_stop()
+-phydev->state = PHY_HALTED
`-phy_stop_machine()
`-cancel_delayed_work_sync()
phy_queue_state_machine()
`-mod_delayed_work()
At this point, the phydev->state_queue() has been added back onto the
system workqueue despite phy_stop_machine() having been called and
cancel_delayed_work_sync() called on it.
Fix this by protecting the complete operation in thread 0.
Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell suggested to remove the locking from phy_is_started() because
the read is atomic anyway and actually the locking may be more
misleading.
Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The clean target in the makefile conflicts with the generic
kselftests lib.mk, and fails to properly remove the compiled
test programs.
Remove the redundant rule, the TEST_GEN_FILES will be already
removed by the CLEAN macro in lib.mk.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ca83b4a7f2 ("x86/KVM/VMX: Add find_msr() helper function")
introduces the helper function find_msr(), which returns -ENOENT when
not find the msr in vmx->msr_autoload.guest/host. Correct checking contion
of no more available entry in vmx->msr_autoload.
Fixes: ca83b4a7f2 ("x86/KVM/VMX: Add find_msr() helper function")
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.
The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but
not running on physical CPU). If a posted interrupt comes at this time,
the irq remapping facility will set the bit of PIR (Posted Interrupt
Requests) but not ON (Outstanding Notification). Then, the interrupt
will not be seen by KVM, which always expects PID.ON=1 if PID.PIR=1
as documented in the Intel processor SDM but not in the VT-d specification.
To fix this, restore the invariant after PID.SN is cleared.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similarly to PXA3xx, pinctrl-single can't set pin direction on MMP2 either.
See also: commit 9dabfdd84b ("gpio: pxa: disable pinctrl calls for
PXA3xx")
Cc: stable@vger.kernel.org
Fixes: a770d94637 ("gpio: pxa: add pin control gpio direction and request")
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
It is reported that there's a constant background "hum/whitenoise"
in the headset on the Lenovo X1 machines with the codec alc285, and it
is confirmed that if we run the command below, the noise will stop.
sudo hda-verb /dev/snd/hwC0D0 0x1d SET_PIN_WIDGET_CONTROL 0x0
Then I consulted this issue with Kailang, he told me the pin 0x1d on
this codec is used for PC beep in, the noise probably comes from this
pin and we can also disable the PC beep in passthrough, then the PC
beep in will not affect other sound playback.
Fixes: c4cfcf6f42 ("ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1660581
Cc: <stable@vger.kernel.org>
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On the System76 Oryx Pro (oryp5), there is a headset microphone input
attached to 0x19 that does not have a jack detect. In order to get it
working, the pin configuration needs to be set correctly, and the
ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC fixup needs to be applied. This is
similar to the MIC_NO_PRESENCE fixups for some Dell laptops, except we
have a separate microphone jack that is already configured correctly.
Since the ALC1220 does not have a fixup similar to
ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC, I have exposed the fixup from the
ALC269 in a way that it can be accessed from the
alc1220_fixup_system76_oryp5 function. In addition, the
alc1220_fixup_clevo_p950 needs to be applied to gain speaker output.
Signed-off-by: Jeremy Soller <jeremy@system76.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.
Fixes: 8cfa94984c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.
They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge. When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.
I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are
directly cleared by reading the Global 1 status register. However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.
The code only services interrupts whose status bit is set at the time of reading its status
register. If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.
This is not a problem with polling or level interrupts since the handler will be called
again to process the event. However, it's a big problem when using level interrupts.
The fix presented here is to add a loop around the code servicing switch interrupts. If
any pending interrupts remain after the current set has been handled, we loop and process
the new set. If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.
Tested on espressobin board.
Fixes: dc30c35be7 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.
Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.
With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.
We fix it by simply setting them to NULL after freeing.
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianlin reported a panic when running sctp gso over gre over vlan device:
[ 84.772930] RIP: 0010:do_csum+0x6d/0x170
[ 84.790605] Call Trace:
[ 84.791054] csum_partial+0xd/0x20
[ 84.791657] gre_gso_segment+0x2c3/0x390
[ 84.792364] inet_gso_segment+0x161/0x3e0
[ 84.793071] skb_mac_gso_segment+0xb8/0x120
[ 84.793846] __skb_gso_segment+0x7e/0x180
[ 84.794581] validate_xmit_skb+0x141/0x2e0
[ 84.795297] __dev_queue_xmit+0x258/0x8f0
[ 84.795949] ? eth_header+0x26/0xc0
[ 84.796581] ip_finish_output2+0x196/0x430
[ 84.797295] ? skb_gso_validate_network_len+0x11/0x80
[ 84.798183] ? ip_finish_output+0x169/0x270
[ 84.798875] ip_output+0x6c/0xe0
[ 84.799413] ? ip_append_data.part.50+0xc0/0xc0
[ 84.800145] iptunnel_xmit+0x144/0x1c0
[ 84.800814] ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
[ 84.801699] gre_tap_xmit+0xac/0xf0 [ip_gre]
[ 84.802395] dev_hard_start_xmit+0xa5/0x210
[ 84.803086] sch_direct_xmit+0x14f/0x340
[ 84.803733] __dev_queue_xmit+0x799/0x8f0
[ 84.804472] ip_finish_output2+0x2e0/0x430
[ 84.805255] ? skb_gso_validate_network_len+0x11/0x80
[ 84.806154] ip_output+0x6c/0xe0
[ 84.806721] ? ip_append_data.part.50+0xc0/0xc0
[ 84.807516] sctp_packet_transmit+0x716/0xa10 [sctp]
[ 84.808337] sctp_outq_flush+0xd7/0x880 [sctp]
It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().
For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.
So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
computing checksum in sctp_gso_segment.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Missing structure initialization in ebtables causes splat with
32-bit user level on a 64-bit kernel, from Francesco Ruggeri.
2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
Andrea Claudi.
3) Fix possible use-after-free from release path of target extensions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-02-13
This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).
To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.
The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.
Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eliminate the following compilation warning:
drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]: => 238:3
Fixes: c2fb3db22d ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.
Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.
Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.
The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.
This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.
Fixes: bbeb53b8b2 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
KVM/ARM fixes for 5.0:
- Fix the way we reset vcpus, plugging the race that could happen on VHE
- Fix potentially inconsistent group setting for private interrupts
- Don't generate UNDEF when LORegion feature is present
- Relax the restriction on using stage2 PUD huge mapping
- Turn some spinlocks into raw_spinlocks to help RT compliance
A recently added preemption timer consistency check was unintentionally
dropped when the consistency checks were being reorganized to match the
SDM's ordering.
Fixes: 461b4ba4c7 ("KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate helper function")
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull tracing fix from Steven Rostedt:
"This fixes kprobes/uprobes dynamic processing of strings, where it
processes the args but does not update the remaining length of the
buffer that the string arguments will be placed in. It constantly
passes in the total size of buffer used instead of passing in the
remaining size of the buffer used.
This could cause issues if the strings are larger than the max size of
an event which could cause the strings to be written beyond what was
reserved on the buffer"
* tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: probeevent: Correctly update remaining space in dynamic area
Pull Allwinner clock fixes from Maxime Ripard:
Two fixes for clock indices, one for the A31 and one for the V3s.
* tag 'sunxi-clk-fixes-for-5.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi: A31: Fix wrong AHB gate number
clk: sunxi-ng: v3s: Fix TCON reset de-assert bit
Don't warn or fail if it's missing.
v2: handle xgmi case more gracefully.
v3: handle older kernels properly
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.
Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set. This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending. This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.
This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored
Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ivan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa17 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I fixed a similar build error in commit 1b1e4ee86e ("sh: fix build
error for empty CONFIG_BUILTIN_DTB_SOURCE"), but it came back again.
Since commit 37c8a5fafa ("kbuild: consolidate Devicetree dtb
build rules"), the combination of CONFIG_OF_EARLY_FLATTREE=y and
CONFIG_USE_BUILTIN_DTB=n results in the following build error:
make[1]: *** No rule to make target 'arch/sh/boot/dts/.dtb.o',
needed by 'arch/sh/boot/dts/built-in.a'. Stop.
Prior to that commit, there was only one path to descend into
arch/sh/boot/dts/, and arch/sh/Makefile correctly guards it with
CONFIG_USE_BUILTIN_DTB:
core-$(CONFIG_USE_BUILTIN_DTB) += arch/sh/boot/dts/
Now, there is another path to descend there from the top Makefile
when CONFIG_OF_EARLY_FLATTREE=y. If CONFIG_USE_BUILTIN_DTB is disabled,
CONFIG_BUILTIN_DTB_SOURCE is invisible instead of defined as "".
Add obj-$(CONFIG_USE_BUILTIN_DTB) guard to avoid the attempt to build
the non-existing file.
Fixes: 37c8a5fafa ("kbuild: consolidate Devicetree dtb build rules")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
With this patch, we use the mtd->name instead of concatenating the name
with '0'.
Fixes: c4dfa25ab3 ("mtd: add support for reading MTD devices via the nvmem API")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Commit bb36489032 ("mmc: meson-gx: Free irq in release() callback")
changed the _probe code to use request_threaded_irq() instead of
devm_request_threaded_irq().
Unfortunately this removes a fallback for the interrupt name:
devm_request_threaded_irq() uses the device name as fallback if the
given IRQ name is NULL. request_threaded_irq() has no such fallback,
thus /proc/interrupts shows "(null)" instead.
Explicitly pass the dev_name() so we get the IRQ name shown in
/proc/interrupts again.
While here, also fix the indentation of the request_threaded_irq()
parameter list.
Fixes: bb36489032 ("mmc: meson-gx: Free irq in release() callback")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In qla2x00_async_tm_cmd, we reference off sp after it has been freed. This
caused a panic on a system running a slub debug kernel. Since fcport is
passed in anyways, just use that instead.
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Giridhar Malavali <gmalavali@marvell.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The problem is that the default for MQ is not to gather entropy, whereas
the default for the legacy queue was always to gather it. The original
attempt to fix entropy gathering for rotational disks under MQ added an
else branch in sd_read_block_characteristics(). Unfortunately, the entire
check isn't reached if the device has no characteristics VPD page. Since
this page was only introduced in SBC-3 and its optional anyway, most less
expensive rotational disks don't have one, meaning they all stopped
gathering entropy when we made MQ the default. In a wholly unrelated
change, openssl and openssh won't function until the random number
generator is initialised, meaning lots of people have been seeing large
delays before they could log into systems with default MQ kernels due to
this lack of entropy, because it now can take tens of minutes to initialise
the kernel random number generator.
The fix is to set the non-rotational and add-randomness flags
unconditionally early on in the disk initialization path, so they can be
reset only if the device actually reports being non-rotational via the VPD
page.
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Fixes: 83e32a5910 ("scsi: sd: Contribute to randomness when running rotational device")
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drm/imx: plane, ldb, and ipu-v3 fixes
- Fix CSI register offsets for i.MX51 and i.MX53.
- Fix delayed page flip completion events on i.MX6QP due to unexpected
behaviour of the PRE when issuing NOP buffer updates to the same
buffer address.
- Stop throwing errors for plane updates on disabled CRTCs when a
userspace process is killed while a plane update is pending.
- Add missing of_node_put cleanup in imx_ldb_bind.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1549990602.4800.11.camel@pengutronix.de
When STACKTRACE is enabled, we must pass fp as stack for unwind,
otherwise random value in stack will casue a dead loop.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Reported-by: Lu Baoquan <lu.baoquan@intellif.com>
Some user space drivers need accessing IO address and IO remap need
SO(strong order) page-attribute to make IO operation correct. So we
need add SO-page-attr for all non-memory address.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Reported-by: Fan Xiaodong <xiaodong.fan@boyahualu.com>
Use task_stack_page instead of p->stack to get stack. Follow the coding
convention style. Also for init_stack, the same with other archs.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
The bug is from commit 2054f4af19 ("csky: bugfix gdb coredump error.")
We change the ELF_NGREG to ELF_NGREG - 2 to fit gdb&gcc define, but forgot
modify ptrace regset.
Now coredump use ELF_NRGEG to parse GPRs and ptrace use pt_regs_regset, so
there are two different reg_sets for userspace.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
C-SKY CPU 8xx's _PAGE_GLOBAL is BIT(0), but 610's _PAGE_GLOBAL is
BIT(6). Use _PAGE_GLOBAL macro instead of bad magic number.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Merge fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: proc: smaps_rollup: fix pss_locked calculation
Rename include/{uapi => }/asm-generic/shmparam.h really
Revert "mm: use early_pfn_to_nid in page_ext_init"
mm/gup: fix gup_pmd_range() for dax
Revert "mm: slowly shrink slabs with a relatively small number of objects"
Revert "mm: don't reclaim inodes with many attached pages"
This reverts commit fe53ca5427 ("mm: use early_pfn_to_nid in
page_ext_init").
When booting a system with "page_owner=on",
start_kernel
page_ext_init
invoke_init_callbacks
init_section_page_ext
init_page_owner
init_early_allocated_pages
init_zones_in_node
init_pages_in_zone
lookup_page_ext
page_to_nid
The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
access with an invalid nid.
UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
index 7 is out of range for type 'zone [5]'
Also, kernel will panic since flags were poisoned earlier with,
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n
start_kernel
setup_arch
pagetable_init
paging_init
sparse_init
sparse_init_nid
memblock_alloc_try_nid_raw
It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().
page:ffffea0004200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
page_owner info is not active (free page?)
kernel BUG at include/linux/mm.h:990!
RIP: 0010:init_page_owner+0x486/0x520
This means that assumptions behind commit fe53ca5427 ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
the commit for now. A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.
Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 172b06c32b ("mm: slowly shrink slabs with a
relatively small number of objects").
This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches. As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.
As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.
As such, this is a bad change and needs to be reverted.
[ Needs some massaging to retain the later seekless shrinker
modifications.]
Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull hwmon fix from Guenter Roeck:
"Fix fan detection for NCT6793D"
* tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix fan6 detection for NCT6793D
sync_request_write no longer submits writes to a Faulty device. This has
the unfortunate side effect that bitmap bits can be incorrectly cleared
if a recovery is interrupted (previously, end_sync_write would have
prevented this). This means the next recovery may not copy everything
it should, potentially corrupting data.
Add a function for doing the proper md_bitmap_end_sync, called from
end_sync_write and the Faulty case in sync_request_write.
backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync
Cc: stable@vger.kernel.org 4.14+
Fixes: 0c9d5b127f ("md/raid1: avoid reusing a resync bio after error handling.")
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a pair of bug fixes that I'd like to include in 5.0:
- A fix to disambiguate swap from invalid PTEs, which fixes an error
when trying to unmap PROT_NONE pages.
- A revert to an optimization of the size of flat binaries. This is
really a workaround to prevent breaking existing boot flows, but
since the change was introduced as part of the 5.0 merge window I'd
like to have the fix in before 5.0 so we can avoid a regression for
any proper releases.
With these I hope we're out of patches for 5.0 in RISC-V land"
* tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
riscv: Add pte bit to distinguish swap from invalid
If nfs_page_async_flush() removes the page from the mapping, then we can't
use page_file_mapping() on it as nfs_updatepate() is wont to do when
receiving an error. Instead, push the mapping to the stack before the page
is possibly truncated.
Fixes: 8fc75bed96 ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The current opt_inst_list operations inside team_nl_cmd_options_set()
is too complex to track:
LIST_HEAD(opt_inst_list);
nla_for_each_nested(...) {
list_for_each_entry(opt_inst, &team->option_inst_list, list) {
if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
continue;
list_add(&opt_inst->tmp_list, &opt_inst_list);
}
}
team_nl_send_event_options_get(team, &opt_inst_list);
as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().
Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.
Fixes: 4fb0534fb7 ("team: avoid adding twice the same option to the event list")
Fixes: 2fcdb2c9e6 ("team: allow to send multiple set events in one message")
Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang says:
====================
net_sched: some fixes for cls_tcindex
This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.
v2: fix a compile error in patch 2
drop netns refcnt in patch 1
====================
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.
For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.
For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2 ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski says:
====================
net: ena: race condition bug fix and version update
This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f6834 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.
This change fixes support for packet ring buffers >= UINT_MAX.
Fixes: 8f8d28e4d6 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.
Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.
Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.
Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.
Fixes: 0888e372c3 ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sound fixes from Takashi Iwai:
"It's a bit of surprising that we've got more changes than hoped at
this late stage, but they all don't look too scary but small fixes.
One change in ALSA core side is again the PCM regression fix that was
partially addressed for OSS, but now the all relevant change is
reverted instead. Also, a few ASoC core fixes for UAF and OOB are
included, while the rest are usual random device-specific fixes"
* tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Revert capture stream behavior change in blocking mode
ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
ALSA: hda - Add quirk for HP EliteBook 840 G5
ASoC: samsung: Prevent clk_get_rate() calls in atomic context
ASoC: rsnd: ssiu: correct shift bit for ssiu9
ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
ASoC: topology: fix oops/use-after-free case with dai driver
ASoC: rsnd: fixup MIX kctrl registration
ASoC: core: Allow soc_find_component lookups to match parent of_node
ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
ASoC: MAINTAINERS: fsl: Change Fabio's email address
ASoC: hdmi-codec: fix oops on re-probe
Pull ext4 fix from Ted Ts'o:
"Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
nojournal mode unsafe"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
genlmsg_reply can fail, so propagate its return code
Fixes: 915d7e5e59 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.
Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.
Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During testing on Armada 388 platforms, it was found with a certain
module configuration that it was possible to trigger a kernel oops
during the module load process, caused by the phylink resolver being
triggered for a currently disabled interface.
This problem was introduced by changing the way the SFP registration
works, which now can result in the sfp link down notification being
called during phylink_create().
Fixes: b5bfc21af5 ("net: sfp: do not probe SFP module before we're attached")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
selected.
Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
select list, as it already depends on NET_UDP_TUNNEL.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bitmap of found partitions in efx_ef10_mtd_probe was not
initialised, causing partitions to be suppressed based off whatever
value was in the bitmap at the start.
Fixes: 3366463513 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Just a few fixes:
* aggregation session teardown with internal TXQs was
continuing to send some frames marked as aggregation,
fix from Ilan
* IBSS join was missed during firmware restart, should
such a thing happen
* speculative execution based on the return value of
cfg80211_classify8021d() - which is controlled by the
sender of the packet - could be problematic in some
code using it, prevent it
* a few peer measurement fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure i2s->rclk_srcrate is properly initialized also during
playback through the secondary DAI.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
floppy_check_events() is supposed to return bit flags to say which
events occured. We should return zero to say that no event flags are
set. Only BIT(0) and BIT(1) are used in the caller. And .check_events
interface also expect to return an unsigned int value.
However, after commit a0c80efe59, it may return -EINTR (-4u).
Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't
affect runtime, but it obviously is still worth fixing.
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a0c80efe59 ("floppy: fix lock_fdc() signal handling")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit c9b47cc1fa ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") stores the umem into the netdev._rx
struct. However, the patch incorrectly removed the umem from the
netdev._rx struct when user-space passed "best-effort" mode
(i.e. select the fastest possible option available), and zero-copy
mode was not available. This commit fixes that.
Fixes: c9b47cc1fa ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Installing the appropriate non-IOMMU DMA ops in arm_iommu_detch_device()
serves the case where IOMMU-aware drivers choose to control their own
mapping but still make DMA API calls, however it also affects the case
when the arch code itself tears down the mapping upon driver unbinding,
where the ops now get left in place and can inhibit arch_setup_dma_ops()
on subsequent re-probe attempts.
Fix the latter case by making sure that arch_teardown_dma_ops() cleans
up whenever the ops were automatically installed by its counterpart.
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 1874619a7d "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"
Tested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Tested-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
SDM says MSR_IA32_VMX_PROCBASED_CTLS2 is only available "If
(CPUID.01H:ECX.[5] && IA32_VMX_PROCBASED_CTLS[63])". It was found that
some old cpus (namely "Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (family: 0x6,
model: 0xf, stepping: 0x6") don't have it. Add the missing check.
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Tested-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.
However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.
This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.
(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).
We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.
This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.
Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
(Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 0e32b39cee ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-2-lyude@redhat.com
(cherry picked from commit fe5ec65668)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Enable count array is supposed to have one counter for each possible
engine sampler. As such, array sizing and bounds checking is not correct
and would blow up the asserts if more samplers were added.
No ill-effect in the current code base but lets fix it for correctness.
At the same time tidy the assert for readability and robustness.
v2:
* One check per assert. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271 ("drm/i915/pmu: Expose a PMU interface for perf queries")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 26a11deea6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation,
but lacks proper Kconfig dependencies and does not explicitly
request defrag features.
As a result, if netfilter hooks are not loaded, when IPv6 fragmented
packet are handled by ipvs only the first fragment makes through.
Fix it properly declaring the dependency on Kconfig and registering
netfilter hooks on ip_vs_add_service() and ip_vs_new_dest().
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Attempting to avoid cloning the skb when broadcasting by inflating
the refcount with sock_hold/sock_put while under RCU lock is dangerous
and violates RCU principles. It leads to subtle race conditions when
attempting to free the SKB, as we may reference sockets that have
already been freed by the stack.
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
[006b6b6b6b6b6c4b] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
task: fffffff78f65b380 task.stack: ffffff8049a88000
pc : sock_rfree+0x38/0x6c
lr : skb_release_head_state+0x6c/0xcc
Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
Call trace:
sock_rfree+0x38/0x6c
skb_release_head_state+0x6c/0xcc
skb_release_all+0x1c/0x38
__kfree_skb+0x1c/0x30
kfree_skb+0xd0/0xf4
pfkey_broadcast+0x14c/0x18c
pfkey_sendmsg+0x1d8/0x408
sock_sendmsg+0x44/0x60
___sys_sendmsg+0x1d0/0x2a8
__sys_sendmsg+0x64/0xb4
SyS_sendmsg+0x34/0x4c
el0_svc_naked+0x34/0x38
Kernel panic - not syncing: Fatal exception
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The reset work holds a mutex to prevent races with removal modifying the
same resources, but was unlocking only on success. Unlock on failure
too.
Fixes: 5c959d73db ("nvme-pci: fix rapid add remove sequence")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.
However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:
1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!
Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.
2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!
Solution:
===========
This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.
In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.
The solution is backward compatible.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow those steps:
# ip addr add 2001:123::1/32 dev eth0
# ip addr add 2001:123:456::2/64 dev eth0
# ip addr del 2001:123::1/32 dev eth0
# ip addr del 2001:123:456::2/64 dev eth0
and then prefix route of 2001:123::1/32 will still exist.
This is because ipv6_prefix_equal in check_cleanup_prefix_route
func does not check whether two IPv6 addresses have the same
prefix length. If the prefix of one address starts with another
shorter address prefix, even though their prefix lengths are
different, the return value of ipv6_prefix_equal is true.
Here I add a check of whether two addresses have the same prefix
to decide whether their prefixes are equal.
Fixes: 5b84efecb7 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),
kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When we free skb at tipc_data_input, we return a 'false' boolean.
Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,
<snip>
1303 int tipc_link_rcv:
...
1354 if (!tipc_data_input(l, skb, l->inputq))
1355 rc |= tipc_link_input(l, skb, l->inputq);
</snip>
Fix it by simple changing to a 'true' boolean when skb is being free-ed.
Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
condition.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least BBL relies on the flat binaries containing all the bytes in the
actual image to exist in the file. Before this revert the flat images
dropped the trailing zeros, which caused BBL to put its copy of the
device tree where Linux thought the BSS was, which wreaks all sorts of
havoc. Manifesting the bug is a bit subtle because BBL aligns
everything to 2MiB page boundaries, but with large enough kernels you're
almost certain to get bitten by the bug.
While moving the sections around isn't a great long-term fix, it will at
least avoid producing broken images.
This reverts commit 22e6a2e14c.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Previously, invalid PTEs and swap PTEs had the same binary
representation, causing errors when attempting to unmap PROT_NONE
mappings, including implicit unmap on exit.
Typical error:
swap_info_get: Bad swap file entry 40000000007a9879
BUG: Bad page map in process a.out pte:3d4c3cc0 pmd:3e521401
Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear2@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Now that LEDs core allows "blocking" flavor of "set brightness" method we
can use it and get rid of private work items.
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull thermal SoC management fixes from Eduardo Valentin:
"Minor fixes on of-thermal and cpu cooling"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: cpu_cooling: Clarify error message
thermal: of-thermal: Print name of device node with error
Commit 9178412ddf ("tracing: probeevent: Return consumed
bytes of dynamic area") improved the string fetching
mechanism by returning the number of required bytes after
copying the argument to the dynamic area. However, this
return value is now only used to increment the pointer
inside the dynamic area but misses updating the 'maxlen'
variable which indicates the remaining space in the dynamic
area.
This means that fetch_store_string() always reads the *total*
size of the dynamic area from the data_loc pointer instead of
the *remaining* size (and passes it along to
strncpy_from_{user,unsafe}) even if we're already about to
copy data into the middle of the dynamic area.
Link: http://lkml.kernel.org/r/20190206190013.16405-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 9178412ddf ("tracing: probeevent: Return consumed bytes of dynamic area")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
netif_rx() must be called under a strict contract.
At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.
Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP
Virtual drivers do not have this guarantee, and must
therefore make the check themselves.
Otherwise we risk use-after-free and/or crashes.
Note this patch also fixes a small issue that came
with commit ce6502a8f9 ("vxlan: fix a use after free
in vxlan_encap_bypass"), since the dev->stats.rx_dropped
change was done on the wrong device.
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Fixes: ce6502a8f9 ("vxlan: fix a use after free in vxlan_encap_bypass")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Machata <petrm@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink has moved from bitmasks to group numbers long ago.
Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__cmpxchg_small erroneously uses u8 for load comparison which can
be either char or short. This patch changes the local variable to
u32 which is sufficiently sized, as the loaded value is already
masked and shifted appropriately. Using an integer size avoids
any unnecessary canonicalization from use of non native widths.
This patch is part of a series that adapts the MIPS small word
atomics code for xchg and cmpxchg on short and char to RISC-V.
Cc: RISC-V Patches <patches@groups.riscv.org>
Cc: Linux RISC-V <linux-riscv@lists.infradead.org>
Cc: Linux MIPS <linux-mips@linux-mips.org>
Signed-off-by: Michael Clark <michaeljclark@mac.com>
[paul.burton@mips.com:
- Fix varialble typo per Jonas Gorski.
- Consolidate load variable with other declarations.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 3ba7f44d2b ("MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()")
Cc: stable@vger.kernel.org # v4.13+
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Out-of-bound access to packet data from the snmp nat helper,
from Jann Horn.
2) ICMP(v6) error packets are set as related traffic by conntrack,
update protocol number before calling nf_nat_ipv4_manip_pkt()
to use ICMP(v6) rather than the original protocol number,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 bug fixes from Martin Schwidefsky:
- Fix specification exception on z196 during ap probe
- A fix for suspend-to-disk, the VMAP stack patch broke the
swsusp_arch_suspend function
- The EMC CKD ioctl of the dasd driver needs an additional size check
for user space data
- Revert an incorrect patch for the PCI base code that removed a bit
lock that turned out to be required after all
* tag 's390-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
Revert "s390/pci: remove bit_lock usage in interrupt handler"
s390/zcrypt: fix specification exception on z196 during ap probe
s390/dasd: fix using offset into zero size array error
s390/suspend: fix stack setup in swsusp_arch_suspend
Pull alpha fixes from Matt Turner:
"A few changes for alpha, including a build fix, a fix for the Eiger
platform, and a fix for a tricky bug uncovered by the strace test
suite that has existed since at least 1997 (v2.1.32)!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: fix page fault handling for r16-r18 targets
alpha: Fix Eiger NR_IRQS to 128
tools uapi: fix Alpha support
Fix a grammatical error in the dentry-state text and clarify the usage
of negative dentries.
Fixes: af0c9af1b3 ("fs/dcache: Track & report number of negative dentries")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bio_sectors() returns the value in the units of 512-byte sectors (no
matter what the real sector size of the device). dm-crypt multiplies
bio_sectors() by on_disk_tag_size to calculate the space allocated for
integrity tags. If dm-crypt is running with sector size larger than
512b, it allocates more data than is needed.
Device Mapper trims the extra space when passing the bio to
dm-integrity, so this bug didn't result in any visible misbehavior.
But it must be fixed to avoid wasteful memory allocation for the block
integrity payload.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.
I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.
Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).
Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18dbb0 ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.
(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)
Fixes: cc2d58634e ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When mac80211 requests the low level driver to stop an ongoing
Tx aggregation, the low level driver is expected to call
ieee80211_stop_tx_ba_cb_irqsafe() to indicate that it is ready
to stop the session. The callback in turn schedules a worker
to complete the session tear down, which in turn also handles
the relevant state for the intermediate Tx queue.
However, as this flow in asynchronous, the intermediate queue
should be stopped and not continue servicing frames, as in
such a case frames that are dequeued would be marked as part
of an aggregation, although the aggregation is already been
stopped.
Fix this by stopping the intermediate Tx queue, before
calling the low level driver to stop the Tx aggregation.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's possible that the caller of cfg80211_classify8021d() uses the
value to index an array, like mac80211 in ieee80211_downgrade_queue().
Prevent speculation on the return value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Without recording the netlink port ID, we cannot return the
results or complete messages to userspace, nor will we be
able to abort if the socket is closed, so clearly we need
to fill the value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix FTM per burst maximum value from 15 to 31
(The maximal bits that represents that number in the frame
is 5 hence a maximal value of 31)
Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When facility.76 MSAX3 is present for the guest we must issue a validity
interception if the CRYCBD is not valid.
The bit CRYCBD.31 is an effective field and tested at each guest level
and has for effect to mask the facility.76
It follows that if CRYCBD.31 is clear and AP is not in use we do not
have to test the CRYCBD validatity even if facility.76 is present in the
host.
Fixes: 6ee7409820 ("KVM: s390: vsie: allow CRYCB FORMAT-0")
Cc: stable@vger.kernel.org
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1549876849-32680-1-git-send-email-pmorel@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Vince (and later on Ravi) reported crashes in the BTS code during
fuzzing with the following backtrace:
general protection fault: 0000 [#1] SMP PTI
...
RIP: 0010:perf_prepare_sample+0x8f/0x510
...
Call Trace:
<IRQ>
? intel_pmu_drain_bts_buffer+0x194/0x230
intel_pmu_drain_bts_buffer+0x160/0x230
? tick_nohz_irq_exit+0x31/0x40
? smp_call_function_single_interrupt+0x48/0xe0
? call_function_single_interrupt+0xf/0x20
? call_function_single_interrupt+0xa/0x20
? x86_schedule_events+0x1a0/0x2f0
? x86_pmu_commit_txn+0xb4/0x100
? find_busiest_group+0x47/0x5d0
? perf_event_set_state.part.42+0x12/0x50
? perf_mux_hrtimer_restart+0x40/0xb0
intel_pmu_disable_event+0xae/0x100
? intel_pmu_disable_event+0xae/0x100
x86_pmu_stop+0x7a/0xb0
x86_pmu_del+0x57/0x120
event_sched_out.isra.101+0x83/0x180
group_sched_out.part.103+0x57/0xe0
ctx_sched_out+0x188/0x240
ctx_resched+0xa8/0xd0
__perf_event_enable+0x193/0x1e0
event_function+0x8e/0xc0
remote_function+0x41/0x50
flush_smp_call_function_queue+0x68/0x100
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x3e/0xe0
call_function_single_interrupt+0xf/0x20
</IRQ>
The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.
Following sequence will cause the crash:
If we create an 'almost' BTS event with precise_ip and callchains,
and it into a BTS event it will crash the perf_prepare_sample()
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer() caller.
Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).
More details:
Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:
```c
#include <err.h>
#include <unistd.h>
#include <sys/mman.h>
#include <asm/unistd.h>
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, &ctx))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```
Running this test causes kernel to crash when handling page fault:
```
Unable to handle kernel paging request at virtual address ffffffffffff9468
CPU 3
aio(26027): Oops 0
pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
s6 = fffffc00c58e6300
a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
gp = 0000000000000000 sp = 00000000265fd174
Disabling lock debugging due to kernel taint
Trace:
[<fffffc0000311404>] entSys+0xa4/0xc0
```
Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:
```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```
After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.
This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).
I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.
Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: stable@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Eiger machine vector definition has nr_irqs 128, and working 2.6.26
boot shows SCSI getting IRQ-s 64 and 65. Current kernel boot fails
because Symbios SCSI fails to request IRQ-s and does not find the disks.
It has been broken at least since 3.18 - the earliest I could test with
my gcc-5.
The headers have moved around and possibly another order of defines has
worked in the past - but since 128 seems to be correct and used, fix
arch/alpha/include/asm/irq.h to have NR_IRQS=128 for Eiger.
This fixes 4.19-rc7 boot on my Force Flexor A264 (Eiger subarch).
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Matt Turner <mattst88@gmail.com>
All the setup code in AF_XDP is protected by a mutex with the
exception of the mmap code that cannot use it. To make sure that a
process banging on the mmap call at the same time as another process
is setting up the socket, smp_wmb() calls were added in the umem
registration code and the queue creation code, so that the published
structures that xsk_mmap needs would be consistent. However, the
corresponding smp_rmb() calls were not added to the xsk_mmap
code. This patch adds these calls.
Fixes: 37b076933a ("xsk: add missing write- and data-dependency barrier")
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.
The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.
Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Heiner Kallweit says:
====================
r8169: revert two commits due to a regression
Sander reported a regression (kernel panic, see[1]), therefore let's
revert these commits. Removal of the barriers doesn't seem to
contribute to the issue, the patch just overlaps with the problematic
one and only reverting both patches was tested.
[1] https://marc.info/?t=154965066400001&r=1&w=2
v2:
- improve commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit bd7153bd83.
There doesn't seem to be anything wrong with this patch,
it's just reverted to get a stable baseline again.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should be using flush_delayed_work() instead of flush_work() in
matrix_keypad_stop() to ensure that we are not missing work that is
scheduled but not yet put in the workqueue (i.e. its delay timer has not
expired yet).
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Updating LED state requires access to regmap and therefore we may sleep,
so we could not do that directly form set_brightness() method.
Historically we used private work to adjust the brightness, but with the
introduction of set_brightness_blocking() we no longer need it.
As a bonus, not having our own work item means we do not have
use-after-free issue as we neglected to cancel outstanding work on
driver unbind.
Reported-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Sven Van Asbroeck <TheSven73@googlemail.com>
Acked-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The recent change in the rx_curs_confirmed assignment disregards
byte order, which causes problems on little endian architectures.
This patch fixes it.
Fixes: b8649efad8 ("net/smc: fix sender_free computation") (net-tree)
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that the kmalloc call in vmci_transport_socket_init()
fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
and oopsing.
This change addresses the above explicitly checking for zero vmci_trans()
at destruction time.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.
Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host
I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we probe a SFP module, we expect to be able to call the upstream
device's module_insert() function so that the upstream link can be
configured. However, when the upstream device is delayed, we currently
may end up probing the module before the upstream device is available,
and lose the module_insert() call.
Avoid this by holding off probing the module until the SFP bus is
properly connected to both the SFP socket driver and the upstream
driver.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Why]
It's useful to know the min and max vrr range for IGT testing.
[How]
Expose the min and max vfreq for the connector via a debugfs file
on the connector, "vrr_range".
Example usage: cat /sys/kernel/debug/dri/0/DP-1/vrr_range
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The entity->dependency can go away completely once we've called
drm_sched_entity_add_dependency_cb() (if the cb is called before we
get around to tracing). The tracepoint is more useful if we trace
every dependency instead of just ones that get callbacks installed,
anyway, so just do that.
Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with
"perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the commit 62ba568f7a ("ALSA: pcm: Return 0 when size <
start_threshold in capture"), we changed the behavior of
__snd_pcm_lib_xfer() to return immediately with 0 when a capture
stream has a high start_threshold. This was intended to be a
correction of the behavior consistency and looked harmless, but this
was the culprit of the recent breakage reported by syzkaller, which
was fixed by the commit e190161f96 ("ALSA: pcm: Fix tight loop of
OSS capture stream").
At the time for the OSS fix, I didn't touch the behavior for ALSA
native API, as assuming that this behavior actually is good. But this
turned out to be also broken actually for a similar deployment,
e.g. one thread goes to a write loop in blocking mode while another
thread controls the start/stop of the stream manually.
Overall, the original commit is harmful, and it brings less merit to
keep that behavior. Let's revert it.
Fixes: 62ba568f7a ("ALSA: pcm: Return 0 when size < start_threshold in capture")
Fixes: e190161f96 ("ALSA: pcm: Fix tight loop of OSS capture stream")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ASoC: Fixes for v5.0
A selection of driver specific fixes here, along with a few core fixes:
- A fixup for some MFD devices that were broken by the previous fixes
for deferred probe.
- A fix for potential out of bounds array accesses when ordering DAPM
power/up down sequences.
- Avoid use after free issue when unloading and reloading drivers using
topologies.
The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
This generates a rescuer thread for that queue that will trigger when
the CPU is under heavy load and collect the uncompleted work.
In the case of mmc, this creates the possibility of a deadlock when
there are multiple partitions on the device as other blk-mq work is
also run on the same queue. For example:
- worker 0 claims the mmc host to work on partition 1
- worker 1 attempts to claim the host for partition 2 but has to wait
for worker 0 to finish
- worker 0 schedules complete_work to release the host
- rescuer thread is triggered after time-out and collects the dangling
work
- rescuer thread attempts to complete the work in order starting with
claim host
- the task to release host is now blocked by a task to claim it and
will never be called
The above results in multiple hung tasks that lead to failures to
mount partitions.
Handling complete_work on a separate workqueue avoids this by keeping
the work completion tasks separate from the other blk-mq work. This
allows the host to be released without getting blocked by other tasks
attempting to claim the host.
Signed-off-by: Zachary Hays <zhays@lexmark.com>
Fixes: 81196976ed ("mmc: block: Add blk-mq support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Like Fujitsu CELSIUS H760, the H780 also has a three-button Elantech
touchpad, but the driver needs to be told so to enable the middle touchpad
button.
The elantech_dmi_force_crc_enabled quirk was not necessary with the H780.
Also document the fw_version and caps values detected for both H760 and
H780 models.
Signed-off-by: Matti Kurkela <Matti.Kurkela@iki.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We were enabling autosuspend, which is using data set by the
hash module, prior to the hash module being inited, casuing
a crash on resume as part of the startup sequence if the race
was lost.
This was never a real problem because the PM infra was using low
res timers so we were always winning the race, until commit 8234f6734c
("PM-runtime: Switch autosuspend over to using hrtimers") changed that :-)
Fix this by seperating the PM setup and enablement and doing the
latter only at the end of the init sequence.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@kernel.org # v4.20
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The commit a60945fd08 ("ALSA: usb-audio: move implicit fb quirks to
separate function") introduced an error in the handling of quirks for
implicit feedback endpoints. This commit fixes this.
If a quirk successfully sets up an implicit feedback endpoint, usb-audio
no longer tries to find the implicit fb endpoint itself.
Fixes: a60945fd08 ("ALSA: usb-audio: move implicit fb quirks to separate function")
Signed-off-by: Manuel Reinhardt <manuel.rhdt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We need to reset MCU and do other initializations on resume otherwise
MT7610U device will fail to initialize, what cause system hung due to
USB requests timeouts.
Patch fixes 4.19 -> 4.20 regression.
Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
If we have a kernel configured for periodic timer interrupts, and we
have cpuidle enabled, then we end up with CPU1 losing timer interupts
after a hotplug.
This can manifest itself in RCU stall warnings, or userspace becoming
unresponsive.
The problem is that the kernel initially wants to use the TWD timer
for interrupts, but the TWD loses context when we enter the C3 cpuidle
state. Nothing reprograms the TWD after idle.
We have solved this in the past by switching to broadcast timer ticks,
and cpuidle44xx switches to that mode at boot time. However, there is
nothing to switch from periodic mode local timers after a hotplug
operation.
We call tick_broadcast_enter() in omap_enter_idle_coupled(), which one
would expect would take care of the issue, but internally this only
deals with one-shot local timers - tick_broadcast_enable() on the other
hand only deals with periodic local timers. So, we need to call both.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[tony@atomide.com: just standardized the subject line]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Currently "0xf << 36" is used to
clear SSIU-9 internal buffer state, which overflows 32-bit value
according to user reference manual, it is always bit4 ~ bit7
of SSI_SYS_STATUS[1,3,5,7] registers indicate
SSIU-9's buffer state, so "0xf << 4" should be used.
This patch fix incorrect shifting issue in SSIU-9 case
Fixes: commit b7169ddea2 ("ASoC: rsnd: remove RSND_REG_ from rsnd_reg")
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
On systems with VHE the kernel and KVM's world-switch code run at the
same exception level. Code that is only used on a VHE system does not
need to be annotated as __hyp_text as it can reside anywhere in the
kernel text.
__hyp_text was also used to prevent kprobes from patching breakpoint
instructions into this region, as this code runs at a different
exception level. While this is no longer true with VHE, KVM still
switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
world-switch code will cause a hyp-panic.
echo "p:weasel sysreg_save_guest_state_vhe" > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable
lkvm run -k /boot/Image --console serial -p "console=ttyS0 earlycon=uart,mmio,0x3f8"
# lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474
Info: Placing fdt at 0x8fe00000 - 0x8fffffff
Info: virtio-mmio.devices=0x200@0x10000:36
Info: virtio-mmio.devices=0x200@0x10200:37
Info: virtio-mmio.devices=0x200@0x10400:38
[ 614.178186] Kernel panic - not syncing: HYP panic:
[ 614.178186] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004
[ 614.178186] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de
[ 614.178186] VCPU:00000000f8de32f1
[ 614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799
[ 614.178446] Call trace:
[ 614.178480] dump_backtrace+0x0/0x148
[ 614.178567] show_stack+0x24/0x30
[ 614.178658] dump_stack+0x90/0xb4
[ 614.178710] panic+0x13c/0x2d8
[ 614.178793] hyp_panic+0xac/0xd8
[ 614.178880] kvm_vcpu_run_vhe+0x9c/0xe0
[ 614.178958] kvm_arch_vcpu_ioctl_run+0x454/0x798
[ 614.179038] kvm_vcpu_ioctl+0x360/0x898
[ 614.179087] do_vfs_ioctl+0xc4/0x858
[ 614.179174] ksys_ioctl+0x84/0xb8
[ 614.179261] __arm64_sys_ioctl+0x28/0x38
[ 614.179348] el0_svc_common+0x94/0x108
[ 614.179401] el0_svc_handler+0x38/0x78
[ 614.179487] el0_svc+0x8/0xc
[ 614.179558] SMP: stopping secondary CPUs
[ 614.179661] Kernel Offset: disabled
[ 614.179695] CPU features: 0x003,2a80aa38
[ 614.179758] Memory Limit: none
[ 614.179858] ---[ end Kernel panic - not syncing: HYP panic:
[ 614.179858] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004
[ 614.179858] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de
[ 614.179858] VCPU:00000000f8de32f1 ]---
Annotate the VHE world-switch functions that aren't marked
__hyp_text using NOKPROBE_SYMBOL().
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: 3f5c90b890 ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We restrict mapping the PUD huge pages in stage2 to only when the
stage2 has 4 level page table, leaving the feature unused with
the default IPA size. But we could use it even with a 3
level page table, i.e, when the PUD level is folded into PGD,
just like the stage1. Relax the condition to allow using the
PUD huge page mappings at stage2 when it is possible.
Cc: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We currently initialize the group of private IRQs during
kvm_vgic_vcpu_init, and the value of the group depends on the GIC model
we are emulating. However, CPUs created before creating (and
initializing) the VGIC might end up with the wrong group if the VGIC
is created as GICv3 later.
Since we have no enforced ordering of creating the VGIC and creating
VCPUs, we can end up with part the VCPUs being properly intialized and
the remaining incorrectly initialized. That also means that we have no
single place to do the per-cpu data structure initialization which
depends on knowing the emulated GIC model (which is only the group
field).
This patch removes the incorrect comment from kvm_vgic_vcpu_init and
initializes the group of all previously created VCPUs's private
interrupts in vgic_init in addition to the existing initialization in
kvm_vgic_vcpu_init.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Failing to properly reset system registers is pretty bad. But not
quite as bad as bringing the whole machine down... So warn loudly,
but slightly more gracefully.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it. However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.
Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
We have two ways to reset a vcpu:
- either through VCPU_INIT
- or through a PSCI_ON call
The first one is easy to reason about. The second one is implemented
in a more bizarre way, as it is the vcpu that handles PSCI_ON that
resets the vcpu that is being powered-on. As we need to turn the logic
around and have the target vcpu to reset itself, we must take some
preliminary steps.
Resetting the VCPU state modifies the system register state in memory,
but this may interact with vcpu_load/vcpu_put if running with preemption
disabled, which in turn may lead to corrupted system register state.
Address this by disabling preemption and doing put/load if required
around the reset logic.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This reverts commit 9594ca6b87.
With the handle_simple_irq irq_flow_handler it must be ensured to
not call generic_handle_irq with the same IRQ number on 2 CPUs at
the same time (interrupts are floating on s390).
Contrary to my initial investigation the irq_desc's lock usage in
handle_simple_irq does not ensure this. Thus re-introduce the bit-
lock usage in s390's pci handler.
Reported-by: Ursula Braun <ubraun@linux.ibm.com>
Reported-by: Alexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit 4d230d1271 ("ASoC: rsnd: fixup not to call clk_get/set
under non-atomic") added new rsnd_ssi_prepare() and moved
rsnd_ssi_master_clk_start() to .prepare.
But, ssi user count (= ssi->usrcnt) is incremented at .init
(= rsnd_ssi_init()).
Because of these timing exchange, ssi->usrcnt check at
rsnd_ssi_master_clk_start() should be adjusted.
Otherwise, 2nd master clock setup will be no check.
This patch fixup this issue.
Fixes: commit 4d230d1271 ("ASoC: rsnd: fixup not to call clk_get/set under non-atomic")
Reported-by: Yusuke Goda <yusuke.goda.sx@renesas.com>
Reported-by: Valentine Barshak <valentine.barshak@cogentembedded.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Yusuke Goda <yusuke.goda.sx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
KASAN reports and additional traces point to out-of-bounds accesses to
the dapm_up_seq and dapm_down_seq lookup tables. The indices used are
larger than the array definition.
Fix by adding missing entries for the new widget types in these two
lookup tables, and align them with PGA values.
Also the sequences for the following widgets were not defined. Since
their values defaulted to zero, assign them explicitly
snd_soc_dapm_input
snd_soc_dapm_output
snd_soc_dapm_vmid
snd_soc_dapm_siggen
snd_soc_dapm_sink
Fixes: 8a70b4544e ('ASoC: dapm: Add new widget type for constructing DAPM graphs on DSPs.').
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The MMC device tree bindings include properties used to signal various
signalling speed modes. Until now the sunxi driver was accepting them
without any further filtering, while the sunxi device trees were not
actually using them.
Since some of the H5 boards can not run at higher speed modes stably,
we are resorting to declaring the higher speed modes per-board.
Regardless, having boards declare modes and blindly following them,
even without proper support in the driver, is generally a bad thing.
Filter out all unsupported modes from the capabilities mask after
the device tree properties have been parsed.
Cc: <stable@vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Some H5 boards seem to not have proper trace lengths for eMMC to be able
to use the default setting for the delay chains under HS-DDR mode. These
include the Bananapi M2+ H5 and NanoPi NEO Core2. However the Libre
Computer ALL-H3-CC-H5 works just fine.
For the H5 (at least for now), default to not enabling HS-DDR modes in
the driver, and expect the device tree to signal HS-DDR capability on
boards that work.
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Fixes: 07bafc1e35 ("mmc: sunxi: Use new timing mode for A64 eMMC controller")
Cc: <stable@vger.kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
When we destroy the interface we already hold the wdev->mtx
while calling cfg80211_pmsr_wdev_down(), which assumes this
isn't true and flushes the worker that takes the lock, thus
leading to a deadlock.
Fix this by refactoring the worker and calling its code in
cfg80211_pmsr_wdev_down() directly.
We still need to flush the work later to make sure it's not
still running and will crash, but it will not do anything.
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we *don't* have a MAC address attribute, we shouldn't
try to use this - this was intended to copy the local MAC
address instead, so fix it.
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make it clear that it is a failure if the cpufreq driver was unable to
register as a cooling device. Makes it easier to find in logs and
grepping for words like fail, err, warn.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Make it easier to debug devicetree definition in case of errors.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.
However, this doesn't work because one of the ->destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:
WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
...
ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.
This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.
Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.
Fixes: b48c05ab5d ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Since .scsi_done() must only be called after scsi_queue_rq() has
finished, make sure that the SRP initiator driver does not call
.scsi_done() while scsi_queue_rq() is in progress. Although
invoking sg_reset -d while I/O is in progress works fine with kernel
v4.20 and before, that is not the case with kernel v5.0-rc1. This
patch avoids that the following crash is triggered with kernel
v5.0-rc1:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G B 5.0.0-rc1-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10
Call Trace:
blk_mq_sched_dispatch_requests+0x2f7/0x300
__blk_mq_run_hw_queue+0xd6/0x180
blk_mq_run_work_fn+0x27/0x30
process_one_work+0x4f1/0xa20
worker_thread+0x67/0x5b0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
Cc: <stable@vger.kernel.org>
Fixes: 94a9174c63 ("IB/srp: reduce lock coverage of command completion")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rmmod/modprobe tests expose a kernel oops when accessing the dai
driver pointer. This comes from the topology design which operates in
multiple passes. Each object removal happens at a specific iteration,
and the code checks for the iteration (order) number after the memory
containing the order was freed.
Fix this be clearing a reference to the dai driver and check its
validity to avoid dereferences.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Renesas sound device has many IPs and many situations.
If platform/board uses MIXer, situation will be more complex.
To avoid duplicate DVC kctrl registration when MIXer was used,
it had original flags.
But it was issue when sound card was re-binded, because
no one can't cleanup this flags then.
To solve this issue, commit 9c698e8481 ("ASoC: rsnd: tidyup
registering method for rsnd_kctrl_new()") checks registered
card->controls, because if card was re-binded, these were cleanuped
automatically. This patch could solve re-binding issue.
But, it start to avoid MIX kctrl.
To solve these issues, we need below.
To avoid card re-binding issue: check registered card->controls
To avoid duplicate DVC registration: check registered rsnd_kctrl_cfg
To allow multiple MIX registration: check registered rsnd_kctrl_cfg
This patch do it.
Fixes: 9c698e8481 ("ASoC: rsnd: tidyup registering method for rsnd_kctrl_new()")
Reported-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-By: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Arm TC2 fails cpu hotplug stress test.
This issue was tracked down to a missing copy of the new affinity
cpumask for the vexpress-spc interrupt into struct
irq_common_data.affinity when the interrupt is migrated in
migrate_one_irq().
Fix it by replacing the arm specific hotplug cpu migration with the
generic irq code.
This is the counterpart implementation to commit 217d453d47 ("arm64:
fix a migrating irq bug when hotplug cpu").
Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus
CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y).
The vexpress-spc interrupt (irq=22) on this board is affine to CPU0.
Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when
CPU0 is hotplugged out.
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
This reverts commit ad211f3e94.
As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case. The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().
The real fix to this problem was discussed here:
https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing. Anyway, commit
ad211f3e94 is absolutely the wrong way to suppress the lockdep, so
revert it.
Fixes: ad211f3e94 ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
The boot from eMMC is currently broken on the NXP i.MX8MQ EVK board.
When trying to boot from eMMC it fails with:
...
[ 1.271938] mmc1: Tuning failed, falling back to fixed sampling clock
[ 1.287429] print_req_error: I/O error, dev mmcblk1, sector 1 flags 0
[ 1.306833] mmc1: Tuning failed, falling back to fixed sampling clock
[ 1.322325] print_req_error: I/O error, dev mmcblk1, sector 2 flags 0
[ 1.329559] Buffer I/O error on dev mmcblk1, logical block 0, async page read
[ 1.336714] mmcblk1: unable to read partition table
...
The problem is the result of a partial misconfiguration of the pins and
the missing assigned clock rate.
Fixes: 9079aca4aa ("arm64: add support for i.MX8M EVK board")
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Tested-by: Chris Spencer <christopher.spencer@sea.co.uk>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
In function omap4_dsi_mux_pads(), local variable "reg" could
be uninitialized if function regmap_read() returns -EINVAL.
However, it will be used directly in the later context, which
is potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 84badc5ec5 ("ARM: dts: omap4: Move l4 child devices to probe
them with ti-sysc") started producing a warning for pwm-omap-dmtimer:
WARNING: CPU: 0 PID: 77 at drivers/bus/omap_l3_noc.c:147
l3_interrupt_handler+0x2f8/0x388
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle):
Data Access in Supervisor mode during Functional access
...
__pm_runtime_idle
omap_dm_timer_disable
pwm_omap_dmtimer_start
pwm_omap_dmtimer_enable
pwm_apply_state
pwm_vibrator_start
pwm_vibrator_play_work
This is because the timer that pwm-omap-dmtimer is using is now being
probed with ti-sysc interconnect target module instead of omap_device
and the ti-sysc quirk for SYSC_QUIRK_LEGACY_IDLE is not fully
compatible with what omap_device has been doing.
We could fix this by reverting the timer changes and have the timer
probe again with omap_device. Or we could add more quirk handling to
ti-sysc driver. But as these options don't work nicely as longer term
solutions, let's just make timers probe with ti-sysc without any
quirks.
To do this, all we need to do is remove quirks for timers for ti-sysc,
and drop the bogus pm_runtime_irq_safe() flag for timer-ti-dm.
We should not use pm_runtime_irq_safe() anyways for drivers as it will
take a permanent use count on the parent device blocking the parent
devices from idling and has been forcing ti-sysc driver to use a
quirk flag.
Note that we will move the timer data to DEBUG section later on in
clean-up patches.
Fixes: 84badc5ec5 ("ARM: dts: omap4: Move l4 child devices to probe them with ti-sysc")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-By: Andreas Kemnade <andreas@kemnade.info>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This patch fixes order of disable calls in pwm_vibrator_stop.
Currently when starting device, we first enable vcc regulator and then
setup and enable pwm. When stopping, we should do this in oposite order,
so first disable pwm and then disable regulator.
Previously order was the same as in start.
Signed-off-by: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
pwm_vibrator_stop disables the regulator, but it can be called from
multiple places, even when the regulator is already disabled. Fix this
by using regulator_is_enabled check when starting and stopping device.
Signed-off-by: Jonathan Bakker <xc-racer2@live.ca>
Signed-off-by: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The i.MX SNVS Power Key driver supports the i.MX 7D SoC family too.
Allow to enable the i.MX SNVS Power Key driver even if only i.MX 7D
SoC is selected.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The older machines don't have the QCI instruction available.
With support for up to 256 crypto cards the probing of each
card has been extended to check card ids from 0 up to 255.
For machines with QCI support there is a filter limiting the
range of probed cards. The older machines (z196 and older)
don't have this filter and so since support for 256 cards is
in the driver all cards are probed. However, these machines
also require to have the card id fit into 6 bits. Exceeding
this limit results in a specification exception which happens
on every kernel startup even when there is no crypto configured
and used at all.
This fix limits the range of probed crypto cards to 64 if
there is no QCI instruction available to obey to the older
ap architecture and so fixes the specification exceptions
on z196 machines.
Cc: stable@vger.kernel.org # v4.17+
Fixes: af4a72276d ("s390/zcrypt: Support up to 256 crypto adapters.")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dan Carpenter reported the following:
The patch 52898025cf: "[S390] dasd: security and PSF update patch
for EMC CKD ioctl" from Mar 8, 2010, leads to the following static
checker warning:
drivers/s390/block/dasd_eckd.c:4486 dasd_symm_io()
error: using offset into zero size array 'psf_data[]'
drivers/s390/block/dasd_eckd.c
4458 /* Copy parms from caller */
4459 rc = -EFAULT;
4460 if (copy_from_user(&usrparm, argp, sizeof(usrparm)))
^^^^^^^
The user can specify any "usrparm.psf_data_len". They choose zero by
mistake.
4461 goto out;
4462 if (is_compat_task()) {
4463 /* Make sure pointers are sane even on 31 bit. */
4464 rc = -EINVAL;
4465 if ((usrparm.psf_data >> 32) != 0)
4466 goto out;
4467 if ((usrparm.rssd_result >> 32) != 0)
4468 goto out;
4469 usrparm.psf_data &= 0x7fffffffULL;
4470 usrparm.rssd_result &= 0x7fffffffULL;
4471 }
4472 /* alloc I/O data area */
4473 psf_data = kzalloc(usrparm.psf_data_len, GFP_KERNEL
| GFP_DMA);
4474 rssd_result = kzalloc(usrparm.rssd_result_len, GFP_KERNEL
| GFP_DMA);
4475 if (!psf_data || !rssd_result) {
kzalloc() returns a ZERO_SIZE_PTR (0x16).
4476 rc = -ENOMEM;
4477 goto out_free;
4478 }
4479
4480 /* get syscall header from user space */
4481 rc = -EFAULT;
4482 if (copy_from_user(psf_data,
4483 (void __user *)(unsigned long)
usrparm.psf_data,
4484 usrparm.psf_data_len))
That all works great.
4485 goto out_free;
4486 psf0 = psf_data[0];
4487 psf1 = psf_data[1];
But now we're assuming that "->psf_data_len" was at least 2 bytes.
Fix this by checking the user specified length psf_data_len.
Fixes: 52898025cf ("[S390] dasd: security and PSF update patch for EMC CKD ioctl")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The patch that added support for the virtually mapped kernel stacks changed
swsusp_arch_suspend to switch to the nodat-stack as the vmap stack is not
available while going in and out of suspend.
Unfortunately the switch to the nodat-stack is incorrect which breaks
suspend to disk.
Cc: stable@vger.kernel.org # v4.20
Fixes: ce3dc44749 ("s390: add support for virtually mapped kernel stacks")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On ESP output, sk_wmem_alloc is incremented for the added padding if a
socket is associated to the skb. When replying with TCP SYNACKs over
IPsec, the associated sk is a casted request socket, only. Increasing
sk_wmem_alloc on a request socket results in a write at an arbitrary
struct offset. In the best case, this produces the following WARNING:
WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
refcount_t: addition on 0; use-after-free.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
Hardware name: Marvell Armada 380/385 (Device Tree)
[...]
[<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
[<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
[<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
[<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
[<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
[<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
[<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
[<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
[<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
[<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
[<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
[<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
[<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
[...]
The issue triggers only when not using TCP syncookies, as for syncookies
no socket is associated.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
According to the manual the gate clock for MMC3 is at bit 11, and NAND1
is controlled by bit 12.
Fix the gate bit definitions in the clock driver.
Fixes: c6e6c96d8f ("clk: sunxi-ng: Add A31/A31s clocks")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Commit 2d99925a15 ("hwmon: (nct6775) Separate fan/pwm configuration
detection for NCT6793D") accidentally removed part of the code detecting
if fan6 is enabled or not. As result, fan6 is no longer detected on Asus
PRIME Z370-A. Restore the missing detection code.
Fixes: 2d99925a15 ("hwmon: (nct6775) Separate fan/pwm configuration detection for NCT6793D")
Reported-by: Chris Siebenmann <cks@cs.toronto.edu>
Cc: Chris Siebenmann <cks@cs.toronto.edu>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
After commit ef05bcb60c, boot from USB drives is broken.
Fix this problem by enabling usb-host regulators during boot time.
Fixes: ef05bcb60c ("arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Voytik <voytikd@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Ports are described by child 'port' nodes contained in the device node.
'ports' is optional and is used to group all 'port' nodes which is not
the case here.
This patch fixes the following warnings:
arch/arm64/boot/dts/rockchip/rk3399-gru-bob.dts:25.9-29.5: Warning (graph_port): /edp-panel/ports: graph port node name should be 'port'
arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts:46.9-50.5: Warningi (graph_port): /edp-panel/ports: graph port node name should be 'port'
arch/arm64/boot/dts/rockchip/rk3399-sapphire-excavator.dts:94.9-98.5: Warning (graph_port): /edp-panel/ports: graph port node name should be 'port'
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
For devices implemented as a MFD it is common to only have a single node
in devicetree representing the whole device. As such when looking up
components in soc_find_components we should match against both the devices
of_node and the devices parent's of_node, as is already done in the rest
of the ASoC core.
This causes regressions for some DAI links at the moment as
soc_find_component was recently added as a check in soc_init_dai_link.
Fixes: 8780cf1142 ("ASoC: soc-core: defer card probe until all component is added to list")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
We currently hide the LORegion feature, and generate an UNDEF
if the guest dares using the corresponding registers. This is
a bit extreme, as ARMv8.1 guarantees the feature to be present.
The guest should check the feature register before doing anything,
but we could also give the guest some slack (read "allow the
guest to be a bit stupid").
So instead of unconditionnaly deliver an exception, let's
only do it when the host doesn't support LORegion at all (or
when the feature has been sanitized out), and treat the registers
as RAZ/WI otherwise (with the exception of LORID_EL1 being RO).
Fixes: cc33c4e201 ("arm64/kvm: Prohibit guest LOR accesses")
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
vgic_cpu->ap_list_lock must always be taken with interrupts disabled as
it is used in interrupt context.
For configurations such as PREEMPT_RT_FULL, this means that it should
be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
vgic_dist->lpi_list_lock must always be taken with interrupts disabled as
it is used in interrupt context.
For configurations such as PREEMPT_RT_FULL, this means that it should
be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
vgic_irq->irq_lock must always be taken with interrupts disabled as
it is used in interrupt context.
For configurations such as PREEMPT_RT_FULL, this means that it should
be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Commit 83a86fbb5b ("irqchip/gic: Loudly complain about the use of
IRQ_TYPE_NONE") started warning about incorrect dts usage for irqs.
ARM GIC only supports active-high interrupts for SPI (Shared Peripheral
Interrupts), and the Palmas PMIC by default is active-low.
Palmas PMIC allows changing the interrupt polarity using register
PALMAS_POLARITY_CTRL_INT_POLARITY, but configuring sys_nirq1 with
a pull-down and setting PALMAS_POLARITY_CTRL_INT_POLARITY made the
Palmas RTC interrupts stop working. This can be easily tested with
kernel tools rtctest.c.
Turns out the SoC inverts the sys_nirq pins for GIC as they do not go
through a peripheral device but go directly to the MPUSS wakeupgen.
I've verified this by muxing the interrupt line temporarily to gpio_wk16
instead of sys_nirq1. with a gpio, the interrupt works fine both
active-low and active-high with the SoC internal pull configured and
palmas polarity configured. But as sys_nirq1, the interrupt only works
when configured ACTIVE_LOW for palmas, and ACTIVE_HIGH for GIC.
Note that there was a similar issue earlier with tegra114 and palmas
interrupt polarity that got fixed by commit df545d1cd0 ("mfd: palmas:
Provide irq flags through DT/platform data"). However, the difference
between omap5 and tegra114 is that tegra inverts the palmas interrupt
twice, once when entering tegra PMC, and again when exiting tegra PMC
to GIC.
Let's fix the issue by adding a custom wakeupgen_irq_set_type() for
wakeupgen and invert any interrupts with wrong polarity. Let's also
warn about any non-sysnirq pins using wrong polarity. Note that we
also need to update the dts for the level as IRQ_TYPE_NONE never
has irq_set_type() called, and let's add some comments and use proper
pin nameing to avoid more confusion later on.
Cc: Belisko Marek <marek.belisko@gmail.com>
Cc: Dmitry Lifshitz <lifshitz@compulab.co.il>
Cc: "Dr. H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Richard Woodruff <r-woodruff2@ti.com>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: stable@vger.kernel.org # v4.17+
Reported-by: Belisko Marek <marek.belisko@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
On a NOP double buffer update where current buffer address is the same
as the next buffer address, the SDW_UPDATE bit clears too late. As we
are now using this bit to determine when it is safe to signal flip
completion to userspace this will delay completion of atomic commits
where one plane doesn't change the buffer by a whole frame period.
Fix this by remembering the last buffer address and just skip the
double buffer update if it would not change the buffer address.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: initialize last_bufaddr in ipu_pre_configure]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Commit 84badc5ec5 ("ARM: dts: omap4: Move l4 child devices to probe
them with ti-sysc") moved some omap4 timers to probe with ti-sysc
interconnect target module. Turns out this broke pwm-omap-dmtimer
where we now try to reparent the clock to itself with the following:
omap_dm_timer_of_set_source: failed to set parent
With ti-sysc, we can now configure the clock sources in the dts
with assigned-clocks and assigned-clock-parents. So we should be able
to remove omap_dm_timer_of_set_source with clean-up patches later on.
But for now, let's just fix it first by checking if parent and fck
are the same and bail out of so.
Fixes: 84badc5ec5 ("ARM: dts: omap4: Move l4 child devices to probe them with ti-sysc")
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-By: Andreas Kemnade <andreas@kemnade.info>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
AD/DA ASRC function control two ASRC clock sources separately.
Whether AD/DA filter select which clock source, we enable AD/DA ASRC
function for all cases.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
According to the datasheet and the reference code from Allwinner, the
bit used to de-assert the TCON reset is bit 4, not bit 3.
Fix it in the V3s CCU driver.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
I prefer to use my personal email address for kernel related work.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
hdmi-codec oopses the kernel when it is unbound from a successfully
bound audio subsystem, and is then rebound:
Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = ee3f0000
[0000001c] *pgd=3cc59831
Internal error: Oops: 817 [#1] PREEMPT ARM
Modules linked in: ext2 snd_soc_spdif_tx vmeta dove_thermal snd_soc_kirkwood ofpart marvell_cesa m25p80 orion_wdt mtd spi_nor des_generic gpio_ir_recv snd_soc_kirkwood_spdif bmm_dmabuf auth_rpcgss nfsd autofs4 etnaviv thermal_sys hwmon gpu_sched tda9950
CPU: 0 PID: 1005 Comm: bash Not tainted 4.20.0+ #1762
Hardware name: Marvell Dove (Cubox)
PC is at hdmi_dai_probe+0x68/0x80
LR is at find_held_lock+0x20/0x94
pc : [<c04c7de0>] lr : [<c0063bf4>] psr: 600f0013
sp : ee15bd28 ip : eebd8b1c fp : c093b488
r10: ee048000 r9 : eebdab18 r8 : ee048600
r7 : 00000001 r6 : 00000000 r5 : 00000000 r4 : ee82c100
r3 : 00000006 r2 : 00000001 r1 : c067e38c r0 : ee82c100
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none[ 297.318599] Control: 10c5387d Table: 2e3f0019 DAC: 00000051
Process bash (pid: 1005, stack limit = 0xee15a248)
...
[<c04c7de0>] (hdmi_dai_probe) from [<c04b7060>] (soc_probe_dai.part.9+0x34/0x70)
[<c04b7060>] (soc_probe_dai.part.9) from [<c04b81a8>] (snd_soc_instantiate_card+0x734/0xc9c)
[<c04b81a8>] (snd_soc_instantiate_card) from [<c04b8b6c>] (snd_soc_add_component+0x29c/0x378)
[<c04b8b6c>] (snd_soc_add_component) from [<c04b8c8c>] (snd_soc_register_component+0x44/0x54)
[<c04b8c8c>] (snd_soc_register_component) from [<c04c64b4>] (devm_snd_soc_register_component+0x48/0x84)
[<c04c64b4>] (devm_snd_soc_register_component) from [<c04c7be8>] (hdmi_codec_probe+0x150/0x260)
[<c04c7be8>] (hdmi_codec_probe) from [<c0373124>] (platform_drv_probe+0x48/0x98)
This happens because hdmi_dai_probe() attempts to access the HDMI
codec private data, but this has not been assigned by hdmi_dai_probe()
before it calls devm_snd_soc_register_component(). Move the call to
dev_set_drvdata() before devm_snd_soc_register_component() to avoid
this oops.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
The CSI offsets are wrong for both CSI0 and CSI1. They are at
physical address 0x1e030000 and 0x1e038000 respectively.
Fixes: 2ffd48f2e7 ("gpu: ipu-v3: Add Camera Sensor Interface unit")
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The device node iterators perform an of_node_get on each
iteration, so a jump out of the loop requires an of_node_put.
Move the initialization channel->child = child; down to just
before the call to imx_ldb_register so that intervening failures
don't need to clear it. Add a label at the end of the function to
do all the of_node_puts.
The semantic patch that finds part of this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
expression root,e;
local idexpression child;
iterator name for_each_child_of_node;
@@
for_each_child_of_node(root, child) {
... when != of_node_put(child)
when != e = child
(
return child;
|
* return ...;
)
...
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The CSI0/CSI1 registers offset is at +0xe030000/+0xe038000 relative
to the control module registers on IPUv3EX.
This patch fixes wrong values for i.MX51 CSI0/CSI1.
Fixes: 2ffd48f2e7 ("gpu: ipu-v3: Add Camera Sensor Interface unit")
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This patch fixes backtraces like the following when sending SIGKILL to a
process with a currently pending plane update:
[drm:ipu_plane_atomic_check] CRTC should be enabled
[drm:drm_framebuffer_remove] *ERROR* failed to commit
------------[ cut here ]------------
WARNING: CPU: 3 PID: 63 at drivers/gpu/drm/drm_framebuffer.c:926 drm_framebuffer_remove+0x47c/0x498
atomic remove_fb failed with -22
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
While the rk3066 does have 2 camera interfaces, the rk3188 does not, so
there also isn't a QoS block for that non-existing interface, so remove it.
Fixes: e6e1869f0b ("ARM: dts: rockchip: add rk3066/rk3188 power-domains")
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2019-01-07 09:14:42 +01:00
539 changed files with 4409 additions and 2734 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.