omap_sr_register_pmic() was introduced in 2010 in commit
984aa6dbf4 ("OMAP3: PM: Adding smartreflex driver support.")
. There was never any caller of this function in mainline resulting in a
warning
sr_init: No PMIC hook to init smartreflex
for each machine where this driver is enabled. So remove the unused
function and the pr_warn.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
I wrote:
"Char/Misc fixes for 4.19-rc7
Here are 8 small fixes for some char/misc driver issues
Included here are:
- fpga driver fixes
- thunderbolt bugfixes
- firmware core revert/fix
- hv core fix
- hv tool fix
All of these have been in linux-next with no reported issues."
* tag 'char-misc-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
thunderbolt: Initialize after IOMMUs
thunderbolt: Do not handle ICM events after domain is stopped
firmware: Always initialize the fw_priv list object
docs: fpga: document fpga manager flags
fpga: bridge: fix obvious function documentation error
tools: hv: fcopy: set 'error' in case an unknown operation was requested
fpga: do not access region struct after fpga_region_unregister
Drivers: hv: vmbus: Use get/put_cpu() in vmbus_connect()
I wrote:
"Serial driver fixes for 4.19-rc7
Here are 3 small serial driver fixes for 4.19-rc7
- 2 sh-sci bugfixes for reported issues
- a revert of the PM handling for the 8250_dw code
All of these have been in linux-next with no reported issues."
* tag 'tty-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: sh-sci: Allow for compressed SCIF address"
Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE"
Revert "serial: 8250_dw: Fix runtime PM handling"
I wrote:
"USB fixes for 4.19-rc7
Here are some small USB fixes for 4.19-rc7
These include:
- the usual xhci bugfixes for reported issues
- some new serial driver device ids
- bugfix for the option serial driver for some devices
- bugfix for the cdc_acm driver that has been there for a long time.
All of these have been in linux-next for a while with no reported
issues."
* tag 'usb-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci-mtk: resume USB3 roothub first
xhci: Add missing CAS workaround for Intel Sunrise Point xHCI
usb: cdc_acm: Do not leak URB buffers
USB: serial: simple: add Motorola Tetra MTP6550 id
USB: serial: option: add two-endpoints device-id flag
USB: serial: option: improve Quectel EP06 detection
Wolfram writes:
"i2c for 4.19
I2C has three driver bugfixes and a fix for a typo for you."
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: Call i2c_dw_clk_rate() only when calculating timings
i2c: i2c-scmi: fix for i2c_smbus_write_block_data
i2c: i2c-isch: fix spelling mistake "unitialized" -> "uninitialized"
i2c: i2c-qcom-geni: Properly handle DMA safe buffers
James writes:
"SCSI fixes on 20181006
Small fix for an unititialized mutex in the qedi driver."
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Initialize the stats mutex lock
Michael writes:
"powerpc fixes for 4.19 #4
Four regression fixes.
A fix for a change to lib/xz which broke our zImage loader when
building with XZ compression. OK'ed by Herbert who merged the
original patch.
The recent fix we did to avoid patching __init text broke some 32-bit
machines, fix that.
Our show_user_instructions() could be tricked into printing kernel
memory, add a check to avoid that.
And a fix for a change to our NUMA initialisation logic, which causes
crashes in some kdump configurations.
Thanks to:
Christophe Leroy, Hari Bathini, Jann Horn, Joel Stanley, Meelis
Roos, Murilo Opsfelder Araujo, Srikar Dronamraju."
* tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/numa: Skip onlining a offline node in kdump path
powerpc: Don't print kernel instructions in show_user_instructions()
powerpc/lib: fix book3s/32 boot failure due to code patching
lib/xz: Put CRC32_POLY_LE in xz_private.h
Dave writes:
"Networking fixes:
1) Fix truncation of 32-bit right shift in bpf, from Jann Horn.
2) Fix memory leak in wireless wext compat, from Stefan Seyfried.
3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao.
4) Need to cancel pending work when unbinding in smsc75xx otherwise
we oops, also from Yu Zhao.
5) Don't allow enslaving a team device to itself, from Ido Schimmel.
6) Fix backwards compat with older userspace for rtnetlink FDB dumps.
From Mauricio Faria.
7) Add validation of tc policy netlink attributes, from David Ahern.
8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
ipv6: take rcu lock in rawv6_send_hdrinc()
net: sched: Add policy validation for tc attributes
rtnetlink: fix rtnl_fdb_dump() for ndmsg header
yam: fix a missing-check bug
net: bpfilter: Fix type cast and pointer warnings
net: cxgb3_main: fix a missing-check bug
bpf: 32-bit RSH verification must truncate input before the ALU op
net: phy: phylink: fix SFP interface autodetection
be2net: don't flip hw_features when VXLANs are added/deleted
net/packet: fix packet drop as of virtio gso
net: dsa: b53: Keep CPU port as tagged in all VLANs
openvswitch: load NAT helper
bnxt_en: get the reduced max_irqs by the ones used by RDMA
bnxt_en: free hwrm resources, if driver probe fails.
bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
bnxt_en: Fix VNIC reservations on the PF.
team: Forbid enslaving team device to itself
net/usb: cancel pending work when unbinding smsc75xx
mlxsw: spectrum: Delete RIF when VLAN device is removed
...
* akpm:
mm: madvise(MADV_DODUMP): allow hugetlbfs pages
ocfs2: fix locking for res->tracking and dlm->tracking_list
mm/vmscan.c: fix int overflow in callers of do_shrink_slab()
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
mm/vmstat.c: fix outdated vmstat_text
proc: restrict kernel stack dumps to root
mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes
mm/migrate.c: split only transparent huge pages when allocation fails
ipc/shm.c: use ERR_CAST() for shm_lock() error return
mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
mm, thp: fix mlocking THP page with migration enabled
ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
hugetlb: take PMD sharing into account when flushing tlb/caches
mm: migration: fix migration of huge PMD shared pages
Reproducer, assuming 2M of hugetlbfs available:
Hugetlbfs mounted, size=2M and option user=testuser
# mount | grep ^hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
# sysctl vm.nr_hugepages=1
vm.nr_hugepages = 1
# grep Huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 1
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2048 kB
Code:
#include <sys/mman.h>
#include <stddef.h>
#define SIZE 2*1024*1024
int main()
{
void *ptr;
ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
madvise(ptr, SIZE, MADV_DONTDUMP);
madvise(ptr, SIZE, MADV_DODUMP);
}
Compile and strace:
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)
hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].
The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a
consequence of the large useage of VM_DONTEXPAND in device drivers.
A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.
A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the same
memory. We correct this omission by allowing madvice(MADV_DODUMP) on
hugetlbfs pages.
[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb ("mm: prepare VM_DONTDUMP for using in drivers")
Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com
Link: https://lists.launchpad.net/maria-discuss/msg05245.html
Fixes: 0103bd16fb ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.
Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.
There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.
Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
split_huge_page_to_list() fails on HugeTLB pages. I was experimenting
with moving 32MB contig HugeTLB pages on arm64 (with a debug patch
applied) and hit the following stack trace when the kernel crashed.
[ 3732.462797] Call trace:
[ 3732.462835] split_huge_page_to_list+0x3b0/0x858
[ 3732.462913] migrate_pages+0x728/0xc20
[ 3732.462999] soft_offline_page+0x448/0x8b0
[ 3732.463097] __arm64_sys_madvise+0x724/0x850
[ 3732.463197] el0_svc_handler+0x74/0x110
[ 3732.463297] el0_svc+0x8/0xc
[ 3732.463347] Code: d1000400 f90b0e60 f2fbd5a2 a94982a1 (f9000420)
When unmap_and_move[_huge_page]() fails due to lack of memory, the
splitting should happen only for transparent huge pages not for HugeTLB
pages. PageTransHuge() returns true for both THP and HugeTLB pages.
Hence the conditonal check should test PagesHuge() flag to make sure that
given pages is not a HugeTLB one.
Link: http://lkml.kernel.org/r/1537798495-4996-1-git-send-email-anshuman.khandual@arm.com
Fixes: 94723aafb9 ("mm: unclutter THP migration")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A transparent huge page is represented by a single entry on an LRU list.
Therefore, we can only make unevictable an entire compound page, not
individual subpages.
If a user tries to mlock() part of a huge page, we want the rest of the
page to be reclaimable.
We handle this by keeping PTE-mapped huge pages on normal LRU lists: the
PMD on border of VM_LOCKED VMA will be split into PTE table.
Introduction of THP migration breaks[1] the rules around mlocking THP
pages. If we had a single PMD mapping of the page in mlocked VMA, the
page will get mlocked, regardless of PTE mappings of the page.
For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in
remove_migration_pmd().
Anon THP pages can only be shared between processes via fork(). Mlocked
page can only be shared if parent mlocked it before forking, otherwise CoW
will be triggered on mlock().
For Anon-THP, we can fix the issue by munlocking the page on removing PTE
migration entry for the page. PTEs for the page will always come after
mlocked PMD: rmap walks VMAs from oldest to newest.
Test-case:
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <linux/mempolicy.h>
#include <numaif.h>
int main(void)
{
unsigned long nodemask = 4;
void *addr;
addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0);
if (fork()) {
wait(NULL);
return 0;
}
mlock(addr, 4UL << 10);
mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES,
&nodemask, 4, MPOL_MF_MOVE);
return 0;
}
[1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com
Link: http://lkml.kernel.org/r/20180917133816.43995-1-kirill.shutemov@linux.intel.com
Fixes: 616b837153 ("mm: thp: enable thp migration in generic path")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
is dirty. When a page has not been written back, it is still in dirty
state. If ocfs2_duplicate_clusters_by_page() is called against the dirty
page, the crash happens.
To fix this bug, we can just unlock the page and wait until the page until
its not dirty.
The following is the backtrace:
kernel BUG at /root/code/ocfs2/refcounttree.c:2961!
[exception RIP: ocfs2_duplicate_clusters_by_page+822]
__ocfs2_move_extent+0x80/0x450 [ocfs2]
? __ocfs2_claim_clusters+0x130/0x250 [ocfs2]
ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2]
__ocfs2_move_extents_range+0x2a4/0x470 [ocfs2]
ocfs2_move_extents+0x180/0x3b0 [ocfs2]
? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2]
ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2]
ocfs2_ioctl+0x253/0x640 [ocfs2]
do_vfs_ioctl+0x90/0x5f0
SyS_ioctl+0x74/0x80
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Once we find the page is dirty, we do not wait until it's clean, rather we
use write_one_page() to write it back
Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com
[lchen@suse.com: update comments]
Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Larry Chen <lchen@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The page migration code employs try_to_unmap() to try and unmap the source
page. This is accomplished by using rmap_walk to find all vmas where the
page is mapped. This search stops when page mapcount is zero. For shared
PMD huge pages, the page map count is always 1 no matter the number of
mappings. Shared mappings are tracked via the reference count of the PMD
page. Therefore, try_to_unmap stops prematurely and does not completely
unmap all mappings of the source page.
This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the target
page. Hence, data is lost.
This problem was originally seen as DB corruption of shared global areas
after a huge page was soft offlined due to ECC memory errors. DB
developers noticed they could reproduce the issue by (hotplug) offlining
memory used to back huge pages. A simple testcase can reproduce the
problem by creating a shared PMD mapping (note that this must be at least
PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
migrate_pages() to migrate process pages between nodes while continually
writing to the huge pages being migrated.
To fix, have the try_to_unmap_one routine check for huge PMD sharing by
calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared
mapping it will be 'unshared' which removes the page table entry and drops
the reference on the PMD page. After this, flush caches and TLB.
mmu notifiers are called before locking page tables, but we can not be
sure of PMD sharing until page tables are locked. Therefore, check for
the possibility of PMD sharing before locking so that notifiers can
prepare for the worst possible case.
Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: make _range_in_vma() a static inline]
Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
Fixes: 39dde65c99 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mike writes:
"device mapper fixes
- Fix a DM thinp __udivdi3 undefined on 32-bit bug introduced during
4.19 merge window.
- Fix leak and dangling pointer in DM multipath's scsi_dh related code.
- A couple stable@ fixes for DM cache's resize support.
- A DM raid fix to remove "const" from decipher_sync_action()'s return
type."
* tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix resize crash if user doesn't reload cache table
dm cache metadata: ignore hints array being too small during resize
dm raid: remove bogus const from decipher_sync_action() return type
dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer
dm thin metadata: fix __udivdi3 undefined on 32-bit
Linus writes:
"A single GPIO fix:
Free the last used descriptor, an off by one error.
This is tagged for stable as well."
* tag 'gpio-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: Free the last requested descriptor
Rafael writes:
"Power management fix for 4.19-rc7
Fix a bug that may cause runtime PM to misbehave for some devices
after a failing or aborted system suspend which is nasty enough for
an -rc7 time frame fix."
* tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / core: Clear the direct_complete flag on errors
Ingo writes:
"perf fixes:
- fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver
- fix a CPU event enumeration bug in the x86 AMD PMU driver
- fix a perf ring-buffer corruption bug when using tracepoints
- fix a PMU unregister locking bug"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
perf/ring_buffer: Prevent concurent ring buffer access
perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
perf/core: Fix perf_pmu_unregister() locking
Ingo writes:
"scheduler fixes:
These fixes address a rather involved performance regression between
v4.17->v4.19 in the sched/numa auto-balancing code. Since distros
really need this fix we accelerated it to sched/urgent for a faster
upstream merge.
NUMA scheduling and balancing performance is now largely back to
v4.17 levels, without reintroducing the NUMA placement bugs that
v4.18 and v4.19 fixed.
Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for
reporting, testing, re-testing and solving this rather complex set of
bugs."
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task
mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration
sched/numa: Avoid task migration for small NUMA improvement
mm/migrate: Use spin_trylock() while resetting rate limit
sched/numa: Limit the conditions where scan period is reset
sched/numa: Reset scan rate whenever task moves across nodes
sched/numa: Pass destination CPU as a parameter to migrate_task_rq
sched/numa: Stop multiple tasks from moving to the CPU at the same time
Ingo writes:
"locking fixes:
A fix in the ww_mutex self-test that produces a scary splat, plus an
updates to the maintained-filed patters in MAINTAINER."
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/ww_mutex: Fix runtime warning in the WW mutex selftest
MAINTAINERS: Remove dead path from LOCKING PRIMITIVES entry
Takashi writes:
"sound fixes for 4.19-rc7
Just two small fixes for HD-audio: one is for a typo in completion
timeout, and another a fixup for Dell machines as usual"
* tag 'sound-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Cannot adjust speaker's volume on Dell XPS 27 7760
ALSA: hda: Fix the audio-component completion timeout
When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.
The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.
This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.
The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.
Fixes: 5bc1701881 ("net: sched: introduce multichain support for filters")
Fixes: d47a6b0e7c ("net: sched: introduce ingress/egress block index attributes for qdisc")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
The problem is, the function bails out early if nlmsg_parse() fails, which
does occur for iproute2 usage of 'struct ndmsg' because the payload length
is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
This breaks backward compatibility with userspace -- nothing is sent back.
Some examples with iproute2 and netlink library for go [1]:
1) $ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
This one works, as it uses 'struct ifinfomsg'.
fdb_show() @ iproute2/bridge/fdb.c
"""
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
...
if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
"""
2) $ ip --family bridge neigh
RTNETLINK answers: Invalid argument
Dump terminated
This one fails, as it uses 'struct ndmsg'.
do_show_or_flush() @ iproute2/ip/ipneigh.c
"""
.n.nlmsg_type = RTM_GETNEIGH,
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
"""
3) $ ./neighlist
< no output >
This one fails, as it uses 'struct ndmsg'-based.
neighList() @ netlink/neigh_linux.go
"""
req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
msg := Ndmsg{
"""
The actual breakage was introduced by commit 0ff50e83b5 ("net: rtnetlink:
bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
if the payload length (with the _actual_ family header) is less than the
family header length alone (which is assumed, in parameter 'hdrlen').
This is true in the examples above with struct ndmsg, with size and payload
length shorter than struct ifinfomsg.
However, that commit just intends to fix something under the assumption the
family header is indeed an 'struct ifinfomsg' - by preventing access to the
payload as such (via 'ifm' pointer) if the payload length is not sufficient
to actually contain it.
The assumption was introduced by commit 5e6d243587 ("bridge: netlink dump
interface at par with brctl"), to support iproute2's 'bridge fdb' command
(not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
So, in order to unbreak the 'struct ndmsg' family headers and still allow
'struct ifinfomsg' to continue to work, check for the known message sizes
used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
not used in this function anyway) then do not parse the data as ifinfomsg.
Same examples with this patch applied (or revert/before the original fix):
$ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
$ ip --family bridge neigh
dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
$ ./neighlist
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
Tested on mainline (v4.19-rc6) and net-next (3bd09b05b0).
References:
[1] netlink library for go (test-case)
https://github.com/vishvananda/netlink
$ cat ~/go/src/neighlist/main.go
package main
import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
func main() {
neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
}
$ export GOPATH=~/go
$ go get github.com/vishvananda/netlink
$ go build neighlist
$ ~/go/src/neighlist/neighlist
Thanks to David Ahern for suggestions to improve this patch.
Fixes: 0ff50e83b5 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
Fixes: 5e6d243587 ("bridge: netlink dump interface at par with brctl")
Reported-by: Aidan Obley <aobley@pivotal.io>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In yam_ioctl(), the concrete ioctl command is firstly copied from the
user-space buffer 'ifr->ifr_data' to 'ioctl_cmd' and checked through the
following switch statement. If the command is not as expected, an error
code EINVAL is returned. In the following execution the buffer
'ifr->ifr_data' is copied again in the cases of the switch statement to
specific data structures according to what kind of ioctl command is
requested. However, after the second copy, no re-check is enforced on the
newly-copied command. Given that the buffer 'ifr->ifr_data' is in the user
space, a malicious user can race to change the command between the two
copies. This way, the attacker can inject inconsistent data and cause
undefined behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following Sparse warnings:
net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer
Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
the user-space buffer 'useraddr' to 'cmd' and checked through the
switch statement. If the command is not as expected, an error code
EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
switch statement, the whole buffer of 'useraddr' is copied again to a
specific data structure, according to what kind of command is requested.
However, after the second copy, there is no re-check on the newly-copied
command. Given that the buffer 'useraddr' is in the user space, a malicious
user can race to change the command between the two copies. By doing so,
the attacker can supply malicious data to the kernel and cause undefined
behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL is returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-10-05
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix to truncate input on ALU operations in 32 bit mode, from Jann.
2) Fixes for cgroup local storage to reject reserved flags on element
update and rejection of map allocation with zero-sized value, from Roman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When I wrote commit 468f6eafa6 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.
That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.
Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.
Fixes: 468f6eafa6 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Joerg writes:
"IOMMU Fix for Linux v4.19-rc6
One important fix:
- Fix a memory leak with AMD IOMMU when SME is active and a VM
has assigned devices. In that case the complete guest memory
will be leaked without this fix."
* tag 'iommu-fixes-v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Clear memory encryption mask from physical address
Paolo writes:
"KVM changes for 4.19-rc7
x86 and PPC bugfixes, mostly introduced in 4.19-rc1."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: nVMX: fix entry with pending interrupt if APICv is enabled
KVM: VMX: hide flexpriority from guest when disabled at the module level
KVM: VMX: check for existence of secondary exec controls before accessing
KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
KVM: x86: fix L1TF's MMIO GFN calculation
tools/kvm_stat: cut down decimal places in update interval dialog
KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
KVM: x86: never trap MSR_KERNEL_GS_BASE
Steve writes:
"SMB3 fixes
four small SMB3 fixes: one for stable, the others to address a more
recent regression"
* tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix lease break problem introduced by compounding
cifs: only wake the thread for the very last PDU in a compound
cifs: add a warning if we try to to dequeue a deleted mid
smb2: fix missing files in root share directory listing
Recently we implemented show_user_instructions() which dumps the code
around the NIP when a user space process dies with an unhandled
signal. This was modelled on the x86 code, and we even went so far as
to implement the exact same bug, namely that if the user process
crashed with its NIP pointing into the kernel we will dump kernel text
to dmesg. eg:
bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1
bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6
bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048
This was discovered on x86 by Jann Horn and fixed in commit
342db04ae7 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").
Fix it by checking the adjusted NIP value (pc) and number of
instructions against USER_DS, and bail if we fail the check, eg:
bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1
bad-bctr[2969]: Bad NIP, not dumping instructions.
Fixes: 88b0fe1757 ("powerpc: Add show_user_instructions()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are platforms which don't provide input clock rate but provide
I2C timing parameters. Commit 3bd4f27727 ("i2c: designware: Call
i2c_dw_clk_rate() only once in i2c_dw_init_master()") causes needless
warning during probe on those platforms since i2c_dw_clk_rate(), which
causes the warning when input clock is unknown, is called even when
there is no need to calculate timing parameters.
Fixes: 3bd4f27727 ("i2c: designware: Call i2c_dw_clk_rate() only once in i2c_dw_init_master()")
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Boris Ostrovsky reported a memory leak with device passthrough when SME
is active.
The VFIO driver uses iommu_iova_to_phys() to get the physical address for
an iova. This physical address is later passed into vfio_unmap_unpin() to
unpin the memory. The vfio_unmap_unpin() uses pfn_valid() before unpinning
the memory. The pfn_valid() check was failing because encryption mask was
part of the physical address returned. This resulted in the memory not
being unpinned and therefore leaked after the guest terminates.
The memory encryption mask must be cleared from the physical address in
iommu_iova_to_phys().
Fixes: 2543a786aa ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <iommu@lists.linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>