Compare commits

..

49 Commits

Author SHA1 Message Date
Linus Torvalds
29594404d7 Linux 3.7 2012-12-10 19:30:57 -08:00
Florian Fainelli
55220bb3e5 Input: matrix-keymap - provide proper module license
The matrix-keymap module is currently lacking a proper module license,
add one so we don't have this module tainting the entire kernel.  This
issue has been present since commit 1932811f42 ("Input: matrix-keymap
- uninline and prepare for device tree support")

Signed-off-by: Florian Fainelli <florian@openwrt.org>
CC: stable@vger.kernel.org # v3.5+
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-10 16:10:05 -08:00
Linus Torvalds
2c68bc72dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Netlink socket dumping had several missing verifications and checks.

    In particular, address comparisons in the request byte code
    interpreter could access past the end of the address in the
    inet_request_sock.

    Also, address family and address prefix lengths were not validated
    properly at all.

    This means arbitrary applications can read past the end of certain
    kernel data structures.

    Fixes from Neal Cardwell.

 2) ip_check_defrag() operates in contexts where we're in the process
    of, or about to, input the packet into the real protocols
    (specifically macvlan and AF_PACKET snooping).

    Unfortunately, it does a pskb_may_pull() which can modify the
    backing packet data which is not legal if the SKB is shared.  It
    very much can be shared in this context.

    Deal with the possibility that the SKB is segmented by using
    skb_copy_bits().

    Fix from Johannes Berg based upon a report by Eric Leblond.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  ipv4: ip_check_defrag must not modify skb before unsharing
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
2012-12-10 16:07:11 -08:00
Linus Torvalds
caf491916b Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
This reverts commits a50915394f and
d7c3b937bd.

This is a revert of a revert of a revert.  In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.

It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all.  We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.

When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation.  That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.

So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake.  Let's hope we never revisit
this mistake again - and certainly not this many times ;)

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-10 11:03:05 -08:00
Johannes Berg
1bf3751ec9 ipv4: ip_check_defrag must not modify skb before unsharing
ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.

The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.

Reported-by: Eric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:51:44 -05:00
Linus Torvalds
31f8d42d44 Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"
This reverts commit 782fd30406.

We are going to reinstate the __GFP_NO_KSWAPD flag that has been
removed, the removal reverted, and then removed again.  Making this
commit a pointless fixup for a problem that was caused by the removal of
__GFP_NO_KSWAPD flag.

The thing is, we really don't want to wake up kswapd for THP allocations
(because they fail quite commonly under any kind of memory pressure,
including when there is tons of memory free), and these patches were
just trying to fix up the underlying bug: the original removal of
__GFP_NO_KSWAPD in commit c654345924 ("mm: remove __GFP_NO_KSWAPD")
was simply bogus.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-10 10:47:45 -08:00
Neal Cardwell
5e1f54201c inet_diag: validate port comparison byte code to prevent unsafe reads
Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.

Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 19:00:48 -05:00
Neal Cardwell
f67caec906 inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).

This change is needed for two reasons:

(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.

(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
405c005949 inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.

Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
1c95df85ca inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Johannes Weiner
ed23ec4f0a mm: vmscan: fix inappropriate zone congestion clearing
commit c702418f8a ("mm: vmscan: do not keep kswapd looping forever due
to individual uncompactable zones") removed zone watermark checks from
the compaction code in kswapd but left in the zone congestion clearing,
which now happens unconditionally on higher order reclaim.

This messes up the reclaim throttling logic for zones with
dirty/writeback pages, where zones should only lose their congestion
status when their watermarks have been restored.

Remove the clearing from the zone compaction section entirely.  The
preliminary zone check and the reclaim loop in kswapd will clear it if
the zone is considered balanced.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-08 08:41:18 -08:00
Linus Torvalds
684c9aaebb vfs: fix O_DIRECT read past end of block device
The direct-IO write path already had the i_size checks in mm/filemap.c,
but it turns out the read path did not, and removing the block size
checks in fs/block_dev.c (commit bbec0270bd: "blkdev_max_block: make
private to fs/buffer.c") removed the magic "shrink IO to past the end of
the device" code there.

Fix it by truncating the IO to the size of the block device, like the
write path already does.

NOTE! I suspect the write path would be *much* better off doing it this
way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
mm/filemap.c code is extremely hard to follow, and has various
conditionals on the target being a block device (ie the flag passed in
to 'generic_write_checks()', along with a conditional update of the
inode timestamp etc).

It is also quite possible that we should treat this whole block device
size as a "s_maxbytes" issue, and try to make the logic even more
generic.  However, in the meantime this is the fairly minimal targeted
fix.

Noted by Milan Broz thanks to a regression test for the cryptsetup
reencrypt tool.

Reported-and-tested-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-08 08:28:26 -08:00
Linus Torvalds
1b3c393cd4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Two stragglers:

   1) The new code that adds new flushing semantics to GRO can cause SKB
      pointer list corruption, manage the lists differently to avoid the
      OOPS.  Fix from Eric Dumazet.

   2) When TCP fast open does a retransmit of data in a SYN-ACK or
      similar, we update retransmit state that we shouldn't triggering a
      WARN_ON later.  Fix from Yuchung Cheng."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission
2012-12-07 17:00:57 -08:00
Eric Dumazet
c3c7c254b2 net: gro: fix possible panic in skb_gro_receive()
commit 2e71a6f808 (net: gro: selective flush of packets) added
a bug for skbs using frag_list. This part of the GRO stack is rarely
used, as it needs skb not using a page fragment for their skb->head.

Most drivers do use a page fragment, but some of them use GFP_KERNEL
allocations for the initial fill of their RX ring buffer.

napi_gro_flush() overwrite skb->prev that was used for these skb to
point to the last skb in frag_list.

Fix this using a separate field in struct napi_gro_cb to point to the
last fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:39:29 -05:00
Yuchung Cheng
93b174ad71 tcp: bug fix Fast Open client retransmission
If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.

Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:39:28 -05:00
Linus Torvalds
1afa471706 Merge tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Pull MMC fixes from Chris Ball:
 "Two small regression fixes:

   - sdhci-s3c: Fix runtime PM regression against 3.7-rc1
   - sh-mmcif: Fix oops against 3.6"

* tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect
2012-12-07 09:15:20 -08:00
Mel Gorman
18a2f371f5 tmpfs: fix shared mempolicy leak
This fixes a regression in 3.7-rc, which has since gone into stable.

Commit 00442ad04a ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a,
alloc_pages_vma() has kept refcount in balance, so now no problem.

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-06 11:56:43 -08:00
Johannes Weiner
c702418f8a mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
When a zone meets its high watermark and is compactable in case of
higher order allocations, it contributes to the percentage of the node's
memory that is considered balanced.

This requirement, that a node be only partially balanced, came about
when kswapd was desparately trying to balance tiny zones when all bigger
zones in the node had plenty of free memory.  Arguably, the same should
apply to compaction: if a significant part of the node is balanced
enough to run compaction, do not get hung up on that tiny zone that
might never get in shape.

When the compaction logic in kswapd is reached, we know that at least
25% of the node's memory is balanced properly for compaction (see
zone_balanced and pgdat_balanced).  Remove the individual zone checks
that restart the kswapd cycle.

Otherwise, we may observe more endless looping in kswapd where the
compaction code loops back to reclaim because of a single zone and
reclaim does nothing because the node is considered balanced overall.

See for example

  https://bugzilla.redhat.com/show_bug.cgi?id=866988

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-and-tested-by: Thorsten Leemhuis <fedora@leemhuis.info>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: John Ellson <john.ellson@comcast.net>
Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-06 11:29:57 -08:00
Mel Gorman
60177d31d2 mm: compaction: validate pfn range passed to isolate_freepages_block
Commit 0bf380bc70 ("mm: compaction: check pfn_valid when entering a
new MAX_ORDER_NR_PAGES block during isolation for migration") added a
check for pfn_valid() when isolating pages for migration as the scanner
does not necessarily start pageblock-aligned.

Since commit c89511ab2f ("mm: compaction: Restart compaction from near
where it left off"), the free scanner has the same problem.  This patch
makes sure that the pfn range passed to isolate_freepages_block() is
within the same block so that pfn_valid() checks are unnecessary.

In answer to Henrik's wondering why others have not reported this:
reproducing this requires a large enough hole with the right aligment to
have compaction walk into a PFN range with no memmap.  Size and
alignment depends in the memory model - 4M for FLATMEM and 128M for
SPARSEMEM on x86.  It needs a "lucky" machine.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-06 11:17:33 -08:00
Guennadi Liakhovetski
91ab252ac5 mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
On some systems, e.g., kzm9g, MMCIF interfaces can produce spurious
interrupts without any active request. To prevent the Oops, that results
in such cases, don't dereference the mmc request pointer until we make
sure, that we are indeed processing such a request.

Reported-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp>
Cc: stable@vger.kernel.org
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-12-06 13:54:35 -05:00
Chris Ball
6984f3c31b Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
This reverts commit 8464dd52d3, which was a misapplied debugging
version of the patch, not the final patch itself.

Signed-off-by: Chris Ball <cjb@laptop.org>
Cc: stable@vger.kernel.org
2012-12-06 13:54:34 -05:00
Heiko Stübner
fe007c02f9 mmc: sdhci-s3c: fix missing clock for gpio card-detect
2abeb5c5de ("Add clk_(enable/disable) in runtime suspend/resume")
added the capability to stop the clocks when the device is runtime
suspended, but forgot to handle the case of the card-detect using
an external gpio.

Therefore in the case that runtime-pm is enabled, start the io-clock
when a card is inserted and stop it again once it is removed.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-12-06 13:54:33 -05:00
Linus Torvalds
04c5decdc0 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "These are the fixes for the N32 syscall bugs found by Al, an
  extraneous break that broke detection for R3000 and R3081 processors,
  an endless loop processing signals for kernel task (x86 received the
  same fix a while ago) and a fix for transparent huge page which took
  ages to track down because it was so hard to come up with a workable
  test case."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix endless loop when processing signals for kernel tasks
  MIPS: R3000/R3081: Fix CPU detection.
  MIPS: N32: Fix signalfd4 syscall entry point
  MIPS: N32: Fix preadv(2) and pwritev(2) entry points.
  MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags()
2012-12-06 08:42:13 -08:00
Linus Torvalds
d91fa97128 Merge branch 'more-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull build fix from Rusty Russell:
 "Tim Gardner <tim.gardner@canonical.com> writes:
  > It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c.
  > The object file cannot be built until $(obj)/oid_registry_data.c has been
  > generated.
  >
  > A periodic and hard to reproduce parallel build failure is due to
  > this incorrect lib/Makefile dependency. The compile error is completely
  > disingenuous.
  >
  >   GEN     lib/oid_registry_data.c
  > Compiling 49 OIDs
  >   CC      lib/oid_registry.o
  > gcc: error: lib/oid_registry.c: No such file or directory
  > gcc: fatal error: no input files
  > compilation terminated.
  > make[3]: *** [lib/oid_registry.o] Error 4

  I can't reproduce it either.  It's completely weird; nothing ever
  removes lib/oid_registry.c, so either gcc is giving the wrong message
  or it's a weird fs with a very odd race.

  But your version is definitely more correct than the previous one,
  so..."

* 'more-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  lib/Makefile: Fix oid_registry build dependency
2012-12-06 08:39:57 -08:00
Linus Torvalds
54d1ae492f Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module signing fixes from Rusty Russell:
 "David gave me these a month ago, during my git workflow churn :("

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  ASN.1: Fix an indefinite length skip error
  MODSIGN: Don't use enum-type bitfields in module signature info block
2012-12-06 08:29:08 -08:00
Linus Torvalds
cfd1f032f9 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
 "Trivial CPU hotplug regression fix for the watchdog code"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  watchdog: Fix CPU hotplug regression
2012-12-06 08:27:11 -08:00
Tim Gardner
527897ccd9 lib/Makefile: Fix oid_registry build dependency
It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c.
The object file cannot be built until $(obj)/oid_registry_data.c has been
generated.

A periodic and hard to reproduce parallel build failure is due to
this incorrect lib/Makefile dependency. The compile error is completely
disingenuous.

  GEN     lib/oid_registry_data.c
Compiling 49 OIDs
  CC      lib/oid_registry.o
gcc: error: lib/oid_registry.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
make[3]: *** [lib/oid_registry.o] Error 4

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-06 17:25:01 +10:30
Dmitry Adamushko
c90e6fbb22 MIPS: Fix endless loop when processing signals for kernel tasks
The problem occurs [1] when a kernel-mode task returns from a system
call with a pending signal.

A real-life scenario is a child of 'khelper' returning from a failed
kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
kernel_execve() fails due to a pending SIGKILL, which is the result of
"kill -9 -1" (at least, busybox's init does it upon reboot).

The loop is as follows:

* syscall_exit_work:
 - work_pending:            // start_of_the_loop
 - work_notifysig:
   - do_notify_resume()
     - do_signal()
       - if (!user_mode(regs)) return;
 - resume_userspace         // TIF_SIGPENDING is still set
 - work_pending             // so we call work_pending => goto
                            // start_of_the_loop

More information can be found in another LKML thread:
http://www.serverphorums.com/read.php?12,457826

[1] The problem was also reproduced on !CONFIG_VM86 x86, and the
following fix was accepted.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=29a2e2836ff9ea65a603c89df217f4198973a74f

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3571/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-05 19:59:00 +01:00
Ralf Baechle
2d33976fb3 MIPS: R3000/R3081: Fix CPU detection.
Broken since e05ea74fc56f347f872ef9946d27c53e8bf20864 (lmo) rsp.
cea7e2dfde (kernel.org) [MIPS: Sort out CPU
type to name translation.]  These CPUs are no longer very popular to say
the least ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Murphy McCauley <murphy.mccauley@gmail.com>
2012-12-05 19:58:54 +01:00
Ralf Baechle
97daa76801 MIPS: N32: Fix signalfd4 syscall entry point
This needs to use the compat entry point or it's going to fail on big
endian systems.

Noticed by Al Viro.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-05 19:58:48 +01:00
Dan Carpenter
27d7c2a006 vfs: clear to the end of the buffer on partial buffer reads
READ is zero so the "rw & READ" test is always false.  The intended test
was "((rw & RW_MASK) == READ)".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-05 10:32:59 -08:00
David Howells
f3537f91f9 ASN.1: Fix an indefinite length skip error
Fix an error in asn1_find_indefinite_length() whereby small definite length
elements of size 0x7f are incorrecly classified as non-small.  Without this
fix, an error will be given as the length of the length will be perceived as
being very much greater than the maximum supported size.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-05 11:27:39 +10:30
David Howells
12e130b045 MODSIGN: Don't use enum-type bitfields in module signature info block
Don't use enum-type bitfields in the module signature info block as we can't be
certain how the compiler will handle them.  As I understand it, it is arch
dependent, and it is possible for the compiler to rearrange them based on
endianness and to insert a byte of padding to pad the three enums out to four
bytes.

Instead use u8 fields for these, which the compiler should emit in the right
order without padding.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-05 11:27:24 +10:30
Thomas Gleixner
8d4516904b watchdog: Fix CPU hotplug regression
Norbert reported:
"3.7-rc6 booted with nmi_watchdog=0 fails to suspend to RAM or
 offline CPUs. It's reproducable with a KVM guest and physical
 system."

The reason is that commit bcd951cf(watchdog: Use hotplug thread
infrastructure) missed to take this into account. So the cpu offline
code gets stuck in the teardown function because it accesses non
initialized data structures.

Add a check for watchdog_enabled into that path to cure the issue.

Reported-and-tested-by: Norbert Warmuth <nwarmuth@t-online.de>
Tested-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1211231033230.2701@ionos
Link: http://bugs.launchpad.net/bugs/1079534
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-12-04 19:56:59 +01:00
Linus Torvalds
df2fc246c8 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes from Rusty Russell:
 "Module signing build fixes for blackfin and metag"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modsign: add symbol prefix to certificate list
  linux/kernel.h: define SYMBOL_PREFIX
2012-12-04 09:32:12 -08:00
Linus Torvalds
70dcc535bd Merge tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi
Pull UBI changes from Artem Bityutskiy:
 "Fixes for 2 brown-paperbag bugs introduced this merge window by the
  fastmap code:

   1.  The UBI background thread got stuck when a bit-flip happened
       because free LEBs was not removed from the "free" tree when we
       started using it.
   2.  I/O debugging checks did not work because we called a sleeping
       function in atomic context."

* tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi:
  UBI: dont call ubi_self_check_all_ff() in __wl_get_peb()
  UBI: remove PEB from free tree in get_peb_for_wl()
2012-12-04 09:15:51 -08:00
Linus Torvalds
ca50496eb4 Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "So, safe fixes my ass.

  Commit 8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue
  timer on 0 delay") had the side-effect of performing delayed_work
  sanity checks even when @delay is 0, which should be fine for any sane
  use cases.

  Unfortunately, megaraid was being overly ingenious.  It seemingly
  wanted to use cancel_delayed_work_sync() before cancel_work_sync() was
  introduced, but didn't want to waste the space for full delayed_work
  as it was only going to use 0 @delay.  So, it only allocated space for
  struct work_struct and then cast it to struct delayed_work and passed
  it into delayed_work functions - truly awesome engineering tradeoff to
  save some bytes.

  Xiaotian fixed it by making megraid allocate full delayed_work for
  now.  It should be converted to use work_struct and cancel_work_sync()
  but I think we better do that after 3.7.

  I added another commit to change BUG_ON()s in __queue_delayed_work()
  to WARN_ON_ONCE()s so that the kernel doesn't crash even if there are
  more such abuses."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s
  megaraid: fix BUG_ON() from incorrect use of delayed work
2012-12-04 09:02:45 -08:00
Ralf Baechle
d5563715a3 MIPS: N32: Fix preadv(2) and pwritev(2) entry points.
By using the native syscall entry point the kernel was also expecting
64-bit iovec structures.

This is broken since ddd9e91b71 [preadv/
pwritev: MIPS: Add preadv(2) and pwritev(2) syscalls.] which originally
added these two syscalls.  I walked through piles of code, including
libc and couldn't find anything that would have worked around the issue
so this change the API to what it should always have been.

Noticed and patch suggested by Al Viro.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-04 17:59:39 +01:00
Linus Torvalds
609e3ff3ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "Two small fixes for Sparc, nobody uses sparc, so these are low risk :-)

   1) Piggyback is too picky about the symbol types that _start and _end
      have in the final kernel image, and it thus breaks with newer
      binutils.  Future proof by getting rid of the symbol type checks.

   2) exit_group() should kill register windows on sparc64 the same way
      we do for plain exit().  Thanks to Al Viro for spotting this."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix piggyback with newer binutils.
  sparc64: exit_group should kill register windows just like plain exit.
2012-12-04 08:42:29 -08:00
Linus Torvalds
57302e0ddf vfs: avoid "attempt to access beyond end of device" warnings
The block device access simplification that avoided accessing the (racy)
block size information (commit bbec0270bd: "blkdev_max_block: make
private to fs/buffer.c") no longer checks the maximum block size in the
block mapping path.

That was _almost_ as simple as just removing the code entirely, because
the readers and writers all check the size of the device anyway, so
under normal circumstances it "just worked".

However, the block size may be such that the end of the device may
straddle one single buffer_head.  At which point we may still want to
access the end of the device, but the buffer we use to access it
partially extends past the end.

The 'bd_set_size()' function intentionally sets the block size to avoid
this, but mounting the device - or setting the block size by hand to
some other value - can modify that block size.

So instead, teach 'submit_bh()' about the special case of the buffer
head straddling the end of the device, and turning such an access into a
smaller IO access, avoiding the problem.

This, btw, also means that unlike before, we can now access the whole
device regardless of device block size setting.  So now, even if the
device size is only 512-byte aligned, we can read and write even the
last sector even when having a much bigger block size for accessing the
rest of the device.

So with this, we could now get rid of the 'bd_set_size()' block size
code entirely - resulting in faster IO for the common case - but that
would be a separate patch.

Reported-and-tested-by: Romain Francoise <romain@orebokech.com>
Reporeted-and-tested-by: Meelis Roos <mroos@linux.ee>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-04 08:25:11 -08:00
Tejun Heo
fc4b514f27 workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s
8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue timer on
0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
megaraid - it allocated work_struct, casted it to delayed_work and
then pass that into queue_delayed_work().

Previously, this was okay because 0 @delay short-circuited to
queue_work() before doing anything with delayed_work.  8852aac25e
moved 0 @delay test into __queue_delayed_work() after sanity check on
delayed_work making megaraid trigger BUG_ON().

Although megaraid is already fixed by c1d390d8e6 ("megaraid: fix
BUG_ON() from incorrect use of delayed work"), this patch converts
BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
abusers, if there are more, trigger warning but don't crash the
machine.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Xiaotian Feng <xtfeng@gmail.com>
2012-12-04 07:58:47 -08:00
David Daney
ac53c4fca4 MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags()
Problem:

1) Huge page mapping of anonymous memory is initially invalid.  Will be
   faulted in by copy-on-write mechanism.

2) Userspace attempts store at the end of the huge mapping.

3) TLB Refill exception handler fill TLB with a normal (4K sized)
   invalid page at the end of the huge mapping virtual address range.

4) Userspace restarted, and re-attempts the store at the end of the
   huge mapping.

5) Page from #3 is invalid, we get a fault and go to the hugepage
   fault handler.  This tries to map a huge page and calls
   huge_ptep_set_access_flags() to install the mapping.

6) We just call the generic ptep_set_access_flags() to set up the page
   tables, but the flush there assumes a normal (4K sized) page and
   only tries to flush the first part of the huge page virtual address
   out of the TLB, since the existing entry from step #3 doesn't
   conflict, nothing is flushed.

7) We attempt to load the mapping into the TLB, but because it
   conflicts with the entry from step #3, we get a Machine Check
   exception.

The fix: Flush the entire rage covered by the huge page in
huge_ptep_set_access_flags(), and remove the optimization in
local_flush_tlb_range() so that the flush actually does the correct
thing.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Hillf Danton <dhillf@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4661/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
(cherry picked from commit dd617f258cc39d36be26afee9912624a2d23112c)
2012-12-04 16:57:54 +01:00
Xiaotian Feng
c1d390d8e6 megaraid: fix BUG_ON() from incorrect use of delayed work
megaraid use INIT_WORK to declare a hotplug_work, but cast the
hotplug_work from work_struct to delayed_work and
schedule_delayed_work on it.  This is very dangerous, as other part of
delayed_work might be kernel memories allocated by others.

With commit 8852aac ("workqueue: mod_delayed_work_on() shouldn't queue
timer on 0 delay"), schedule_delayed_work() will check dwork->timer
before queue_work even when @delay is 0, this causes megaraid code to
hit the BUG_ON() in workqueue code.  Change megaraid code to use
delayed work.

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-scsi@vger.kernel.org
2012-12-04 07:29:47 -08:00
Richard Weinberger
894aef2157 UBI: dont call ubi_self_check_all_ff() in __wl_get_peb()
As ubi_self_check_all_ff() might sleep we are not allowed
to call it from atomic context.
For now we call it only from ubi_wl_get_peb().
There are some code paths where it would also make sense,
but these paths are currently atomic and only enabled
when fastmap is used.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-12-04 16:04:31 +02:00
Richard Weinberger
ed4b7021cb UBI: remove PEB from free tree in get_peb_for_wl()
If UBI is built without fastmap, get_peb_for_wl() has to
remove the PEB manially from the free tree.
Otherwise the requested PEB lives in two trees.

Reported-by: Zach Sadecki <zsadecki@itwatchdogs.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-12-04 16:04:16 +02:00
David S. Miller
0032c85745 sparc: Fix piggyback with newer binutils.
Newer versions of binutils mark '_end' as 'B' instead of 'A' for
whatever reason.

To be honest, the piggyback code doesn't actually care what kind
of symbol _start and _end are, it just wants to find them and
record the address.

So remove the type from the match strings.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 11:24:25 -08:00
David S. Miller
de7531e857 sparc64: exit_group should kill register windows just like plain exit.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 11:17:57 -08:00
James Hogan
84ecfd15f5 modsign: add symbol prefix to certificate list
Add the arch symbol prefix (if applicable) to the asm definition of
modsign_certificate_list and modsign_certificate_list_end. This uses the
recently defined SYMBOL_PREFIX which is derived from
CONFIG_SYMBOL_PREFIX.

This fixes the build of module signing on the blackfin and metag
architectures.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-03 13:06:25 +10:30
James Hogan
cbdbf2abb7 linux/kernel.h: define SYMBOL_PREFIX
Define SYMBOL_PREFIX to be the same as CONFIG_SYMBOL_PREFIX if set by
the architecture, or "" otherwise. This avoids the need for ugly #ifdefs
whenever symbols are referenced in asm blocks.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-03 13:05:54 +10:30
43 changed files with 392 additions and 239 deletions

View File

@@ -1,7 +1,7 @@
VERSION = 3
PATCHLEVEL = 7
SUBLEVEL = 0
EXTRAVERSION = -rc8
EXTRAVERSION =
NAME = Terrified Chipmunk
# *DOCUMENTATION*

View File

@@ -95,7 +95,17 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
pte_t *ptep, pte_t pte,
int dirty)
{
return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
int changed = !pte_same(*ptep, pte);
if (changed) {
set_pte_at(vma->vm_mm, addr, ptep, pte);
/*
* There could be some standard sized pages in there,
* get them all.
*/
flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
}
return changed;
}
static inline pte_t huge_ptep_get(pte_t *ptep)

View File

@@ -510,7 +510,6 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
c->cputype = CPU_R3000A;
__cpu_name[cpu] = "R3000A";
}
break;
} else {
c->cputype = CPU_R3000;
__cpu_name[cpu] = "R3000";

View File

@@ -36,6 +36,11 @@ FEXPORT(ret_from_exception)
FEXPORT(ret_from_irq)
LONG_S s0, TI_REGS($28)
FEXPORT(__ret_from_irq)
/*
* We can be coming here from a syscall done in the kernel space,
* e.g. a failed kernel_execve().
*/
resume_userspace_check:
LONG_L t0, PT_STATUS(sp) # returning to kernel mode?
andi t0, t0, KU_USER
beqz t0, resume_kernel
@@ -162,7 +167,7 @@ work_notifysig: # deal with pending signals and
move a0, sp
li a1, 0
jal do_notify_resume # a2 already loaded
j resume_userspace
j resume_userspace_check
FEXPORT(syscall_exit_partial)
local_irq_disable # make sure need_resched doesn't

View File

@@ -397,14 +397,14 @@ EXPORT(sysn32_call_table)
PTR sys_timerfd_create
PTR compat_sys_timerfd_gettime /* 6285 */
PTR compat_sys_timerfd_settime
PTR sys_signalfd4
PTR compat_sys_signalfd4
PTR sys_eventfd2
PTR sys_epoll_create1
PTR sys_dup3 /* 6290 */
PTR sys_pipe2
PTR sys_inotify_init1
PTR sys_preadv
PTR sys_pwritev
PTR compat_sys_preadv
PTR compat_sys_pwritev
PTR compat_sys_rt_tgsigqueueinfo /* 6295 */
PTR sys_perf_event_open
PTR sys_accept4

View File

@@ -120,18 +120,11 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
if (cpu_context(cpu, mm) != 0) {
unsigned long size, flags;
int huge = is_vm_hugetlb_page(vma);
ENTER_CRITICAL(flags);
if (huge) {
start = round_down(start, HPAGE_SIZE);
end = round_up(end, HPAGE_SIZE);
size = (end - start) >> HPAGE_SHIFT;
} else {
start = round_down(start, PAGE_SIZE << 1);
end = round_up(end, PAGE_SIZE << 1);
size = (end - start) >> (PAGE_SHIFT + 1);
}
start = round_down(start, PAGE_SIZE << 1);
end = round_up(end, PAGE_SIZE << 1);
size = (end - start) >> (PAGE_SHIFT + 1);
if (size <= current_cpu_data.tlbsize/2) {
int oldpid = read_c0_entryhi();
int newpid = cpu_asid(cpu, mm);
@@ -140,10 +133,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
int idx;
write_c0_entryhi(start | newpid);
if (huge)
start += HPAGE_SIZE;
else
start += (PAGE_SIZE << 1);
start += (PAGE_SIZE << 1);
mtc0_tlbw_hazard();
tlb_probe();
tlb_probe_hazard();

View File

@@ -81,18 +81,18 @@ static void usage(void)
static int start_line(const char *line)
{
if (strcmp(line + 8, " T _start\n") == 0)
if (strcmp(line + 10, " _start\n") == 0)
return 1;
else if (strcmp(line + 16, " T _start\n") == 0)
else if (strcmp(line + 18, " _start\n") == 0)
return 1;
return 0;
}
static int end_line(const char *line)
{
if (strcmp(line + 8, " A _end\n") == 0)
if (strcmp(line + 10, " _end\n") == 0)
return 1;
else if (strcmp (line + 16, " A _end\n") == 0)
else if (strcmp (line + 18, " _end\n") == 0)
return 1;
return 0;
}
@@ -100,8 +100,8 @@ static int end_line(const char *line)
/*
* Find address for start and end in System.map.
* The file looks like this:
* f0004000 T _start
* f0379f79 A _end
* f0004000 ... _start
* f0379f79 ... _end
* 1234567890123456
* ^coloumn 1
* There is support for 64 bit addresses too.

View File

@@ -47,7 +47,7 @@ STUB: sra REG1, 0, REG1; \
sra REG4, 0, REG4
SIGN1(sys32_exit, sparc_exit, %o0)
SIGN1(sys32_exit_group, sys_exit_group, %o0)
SIGN1(sys32_exit_group, sparc_exit_group, %o0)
SIGN1(sys32_wait4, compat_sys_wait4, %o2)
SIGN1(sys32_creat, sys_creat, %o1)
SIGN1(sys32_mknod, sys_mknod, %o1)

View File

@@ -118,10 +118,20 @@ ret_from_syscall:
ba,pt %xcc, ret_sys_call
ldx [%sp + PTREGS_OFF + PT_V9_I0], %o0
.globl sparc_exit_group
.type sparc_exit_group,#function
sparc_exit_group:
sethi %hi(sys_exit_group), %g7
ba,pt %xcc, 1f
or %g7, %lo(sys_exit_group), %g7
.size sparc_exit_group,.-sparc_exit_group
.globl sparc_exit
.type sparc_exit,#function
sparc_exit:
rdpr %pstate, %g2
sethi %hi(sys_exit), %g7
or %g7, %lo(sys_exit), %g7
1: rdpr %pstate, %g2
wrpr %g2, PSTATE_IE, %pstate
rdpr %otherwin, %g1
rdpr %cansave, %g3
@@ -129,7 +139,7 @@ sparc_exit:
wrpr %g3, 0x0, %cansave
wrpr %g0, 0x0, %otherwin
wrpr %g2, 0x0, %pstate
ba,pt %xcc, sys_exit
jmpl %g7, %g0
stb %g0, [%g6 + TI_WSAVED]
.size sparc_exit,.-sparc_exit

View File

@@ -133,7 +133,7 @@ sys_call_table:
/*170*/ .word sys_lsetxattr, sys_fsetxattr, sys_getxattr, sys_lgetxattr, sys_getdents
.word sys_setsid, sys_fchdir, sys_fgetxattr, sys_listxattr, sys_llistxattr
/*180*/ .word sys_flistxattr, sys_removexattr, sys_lremovexattr, sys_nis_syscall, sys_ni_syscall
.word sys_setpgid, sys_fremovexattr, sys_tkill, sys_exit_group, sys_newuname
.word sys_setpgid, sys_fremovexattr, sys_tkill, sparc_exit_group, sys_newuname
/*190*/ .word sys_init_module, sys_sparc64_personality, sys_remap_file_pages, sys_epoll_create, sys_epoll_ctl
.word sys_epoll_wait, sys_ioprio_set, sys_getppid, sys_nis_syscall, sys_sgetmask
/*200*/ .word sys_ssetmask, sys_nis_syscall, sys_newlstat, sys_uselib, sys_nis_syscall

View File

@@ -1796,7 +1796,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
*/
mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
gfp = mapping_gfp_mask(mapping);
gfp |= __GFP_NORETRY | __GFP_NOWARN;
gfp |= __GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD;
gfp &= ~(__GFP_IO | __GFP_WAIT);
for_each_sg(st->sgl, sg, page_count, i) {
page = shmem_read_mapping_page_gfp(mapping, i, gfp);
@@ -1809,7 +1809,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
* our own buffer, now let the real VM do its job and
* go down in flames if truly OOM.
*/
gfp &= ~(__GFP_NORETRY | __GFP_NOWARN);
gfp &= ~(__GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD);
gfp |= __GFP_IO | __GFP_WAIT;
i915_gem_shrink_all(dev_priv);
@@ -1817,7 +1817,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
if (IS_ERR(page))
goto err_pages;
gfp |= __GFP_NORETRY | __GFP_NOWARN;
gfp |= __GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD;
gfp &= ~(__GFP_IO | __GFP_WAIT);
}

View File

@@ -23,6 +23,7 @@
#include <linux/input.h>
#include <linux/of.h>
#include <linux/export.h>
#include <linux/module.h>
#include <linux/input/matrix_keypad.h>
static bool matrix_keypad_map_key(struct input_dev *input_dev,
@@ -161,3 +162,5 @@ int matrix_keypad_build_keymap(const struct matrix_keymap_data *keymap_data,
return 0;
}
EXPORT_SYMBOL(matrix_keypad_build_keymap);
MODULE_LICENSE("GPL");

View File

@@ -373,18 +373,25 @@ static struct sdhci_ops sdhci_s3c_ops = {
static void sdhci_s3c_notify_change(struct platform_device *dev, int state)
{
struct sdhci_host *host = platform_get_drvdata(dev);
struct sdhci_s3c *sc = sdhci_priv(host);
unsigned long flags;
if (host) {
spin_lock_irqsave(&host->lock, flags);
if (state) {
dev_dbg(&dev->dev, "card inserted.\n");
#ifdef CONFIG_PM_RUNTIME
clk_prepare_enable(sc->clk_io);
#endif
host->flags &= ~SDHCI_DEVICE_DEAD;
host->quirks |= SDHCI_QUIRK_BROKEN_CARD_DETECTION;
} else {
dev_dbg(&dev->dev, "card removed.\n");
host->flags |= SDHCI_DEVICE_DEAD;
host->quirks &= ~SDHCI_QUIRK_BROKEN_CARD_DETECTION;
#ifdef CONFIG_PM_RUNTIME
clk_disable_unprepare(sc->clk_io);
#endif
}
tasklet_schedule(&host->card_tasklet);
spin_unlock_irqrestore(&host->lock, flags);

View File

@@ -1104,7 +1104,6 @@ static irqreturn_t sh_mmcif_irqt(int irq, void *dev_id)
{
struct sh_mmcif_host *host = dev_id;
struct mmc_request *mrq = host->mrq;
struct mmc_data *data = mrq->data;
cancel_delayed_work_sync(&host->timeout_work);
@@ -1152,13 +1151,14 @@ static irqreturn_t sh_mmcif_irqt(int irq, void *dev_id)
case MMCIF_WAIT_FOR_READ_END:
case MMCIF_WAIT_FOR_WRITE_END:
if (host->sd_error)
data->error = sh_mmcif_error_manage(host);
mrq->data->error = sh_mmcif_error_manage(host);
break;
default:
BUG();
}
if (host->wait_for != MMCIF_WAIT_FOR_STOP) {
struct mmc_data *data = mrq->data;
if (!mrq->cmd->error && data && !data->error)
data->bytes_xfered =
data->blocks * data->blksz;
@@ -1231,10 +1231,6 @@ static irqreturn_t sh_mmcif_intr(int irq, void *dev_id)
host->sd_error = true;
dev_dbg(&host->pd->dev, "int err state = %08x\n", state);
}
if (host->state == STATE_IDLE) {
dev_info(&host->pd->dev, "Spurious IRQ status 0x%x", state);
return IRQ_HANDLED;
}
if (state & ~(INT_CMD12RBE | INT_CMD12CRE)) {
if (!host->dma_active)
return IRQ_WAKE_THREAD;

View File

@@ -1077,7 +1077,8 @@ EXPORT_SYMBOL_GPL(mtd_writev);
* until the request succeeds or until the allocation size falls below
* the system page size. This attempts to make sure it does not adversely
* impact system performance, so when allocating more than one page, we
* ask the memory allocator to avoid re-trying.
* ask the memory allocator to avoid re-trying, swapping, writing back
* or performing I/O.
*
* Note, this function also makes sure that the allocated buffer is aligned to
* the MTD device's min. I/O unit, i.e. the "mtd->writesize" value.
@@ -1091,7 +1092,8 @@ EXPORT_SYMBOL_GPL(mtd_writev);
*/
void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size)
{
gfp_t flags = __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY;
gfp_t flags = __GFP_NOWARN | __GFP_WAIT |
__GFP_NORETRY | __GFP_NO_KSWAPD;
size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE);
void *kbuf;

View File

@@ -498,7 +498,7 @@ out:
* @ubi: UBI device description object
*
* This function returns a physical eraseblock in case of success and a
* negative error code in case of failure. Might sleep.
* negative error code in case of failure.
*/
static int __wl_get_peb(struct ubi_device *ubi)
{
@@ -540,13 +540,6 @@ retry:
* ubi_wl_get_peb() after removing e from the pool. */
prot_queue_add(ubi, e);
#endif
err = ubi_self_check_all_ff(ubi, e->pnum, ubi->vid_hdr_aloffset,
ubi->peb_size - ubi->vid_hdr_aloffset);
if (err) {
ubi_err("new PEB %d does not contain all 0xFF bytes", e->pnum);
return err;
}
return e->pnum;
}
@@ -679,17 +672,30 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
#else
static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
{
return find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
struct ubi_wl_entry *e;
e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
self_check_in_wl_tree(ubi, e, &ubi->free);
rb_erase(&e->u.rb, &ubi->free);
return e;
}
int ubi_wl_get_peb(struct ubi_device *ubi)
{
int peb;
int peb, err;
spin_lock(&ubi->wl_lock);
peb = __wl_get_peb(ubi);
spin_unlock(&ubi->wl_lock);
err = ubi_self_check_all_ff(ubi, peb, ubi->vid_hdr_aloffset,
ubi->peb_size - ubi->vid_hdr_aloffset);
if (err) {
ubi_err("new PEB %d does not contain all 0xFF bytes", peb);
return err;
}
return peb;
}
#endif

View File

@@ -1276,7 +1276,7 @@ struct megasas_evt_detail {
} __attribute__ ((packed));
struct megasas_aen_event {
struct work_struct hotplug_work;
struct delayed_work hotplug_work;
struct megasas_instance *instance;
};

View File

@@ -2060,9 +2060,9 @@ megasas_service_aen(struct megasas_instance *instance, struct megasas_cmd *cmd)
} else {
ev->instance = instance;
instance->ev = ev;
INIT_WORK(&ev->hotplug_work, megasas_aen_polling);
schedule_delayed_work(
(struct delayed_work *)&ev->hotplug_work, 0);
INIT_DELAYED_WORK(&ev->hotplug_work,
megasas_aen_polling);
schedule_delayed_work(&ev->hotplug_work, 0);
}
}
}
@@ -4352,8 +4352,7 @@ megasas_suspend(struct pci_dev *pdev, pm_message_t state)
/* cancel the delayed work if this work still in queue */
if (instance->ev != NULL) {
struct megasas_aen_event *ev = instance->ev;
cancel_delayed_work_sync(
(struct delayed_work *)&ev->hotplug_work);
cancel_delayed_work_sync(&ev->hotplug_work);
instance->ev = NULL;
}
@@ -4545,8 +4544,7 @@ static void __devexit megasas_detach_one(struct pci_dev *pdev)
/* cancel the delayed work if this work still in queue*/
if (instance->ev != NULL) {
struct megasas_aen_event *ev = instance->ev;
cancel_delayed_work_sync(
(struct delayed_work *)&ev->hotplug_work);
cancel_delayed_work_sync(&ev->hotplug_work);
instance->ev = NULL;
}
@@ -5190,7 +5188,7 @@ static void
megasas_aen_polling(struct work_struct *work)
{
struct megasas_aen_event *ev =
container_of(work, struct megasas_aen_event, hotplug_work);
container_of(work, struct megasas_aen_event, hotplug_work.work);
struct megasas_instance *instance = ev->instance;
union megasas_evt_class_locale class_locale;
struct Scsi_Host *host;

View File

@@ -1544,6 +1544,22 @@ ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
}
EXPORT_SYMBOL_GPL(blkdev_aio_write);
static ssize_t blkdev_aio_read(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
struct file *file = iocb->ki_filp;
struct inode *bd_inode = file->f_mapping->host;
loff_t size = i_size_read(bd_inode);
if (pos >= size)
return 0;
size -= pos;
if (size < INT_MAX)
nr_segs = iov_shorten((struct iovec *)iov, nr_segs, size);
return generic_file_aio_read(iocb, iov, nr_segs, pos);
}
/*
* Try to release a page associated with block device when the system
* is under memory pressure.
@@ -1574,7 +1590,7 @@ const struct file_operations def_blk_fops = {
.llseek = block_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_read = blkdev_aio_read,
.aio_write = blkdev_aio_write,
.mmap = generic_file_mmap,
.fsync = blkdev_fsync,

View File

@@ -2893,6 +2893,55 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
bio_put(bio);
}
/*
* This allows us to do IO even on the odd last sectors
* of a device, even if the bh block size is some multiple
* of the physical sector size.
*
* We'll just truncate the bio to the size of the device,
* and clear the end of the buffer head manually.
*
* Truly out-of-range accesses will turn into actual IO
* errors, this only handles the "we need to be able to
* do IO at the final sector" case.
*/
static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
{
sector_t maxsector;
unsigned bytes;
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (!maxsector)
return;
/*
* If the *whole* IO is past the end of the device,
* let it through, and the IO layer will turn it into
* an EIO.
*/
if (unlikely(bio->bi_sector >= maxsector))
return;
maxsector -= bio->bi_sector;
bytes = bio->bi_size;
if (likely((bytes >> 9) <= maxsector))
return;
/* Uhhuh. We've got a bh that straddles the device size! */
bytes = maxsector << 9;
/* Truncate the bio.. */
bio->bi_size = bytes;
bio->bi_io_vec[0].bv_len = bytes;
/* ..and clear the end of the buffer for reads */
if ((rw & RW_MASK) == READ) {
void *kaddr = kmap_atomic(bh->b_page);
memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes);
kunmap_atomic(kaddr);
}
}
int submit_bh(int rw, struct buffer_head * bh)
{
struct bio *bio;
@@ -2929,6 +2978,9 @@ int submit_bh(int rw, struct buffer_head * bh)
bio->bi_end_io = end_bio_bh_io_sync;
bio->bi_private = bh;
/* Take care of bh's that straddle the end of the device */
guard_bh_eod(rw, bio, bh);
bio_get(bio);
submit_bio(rw, bio);

View File

@@ -30,9 +30,10 @@ struct vm_area_struct;
#define ___GFP_HARDWALL 0x20000u
#define ___GFP_THISNODE 0x40000u
#define ___GFP_RECLAIMABLE 0x80000u
#define ___GFP_NOTRACK 0x100000u
#define ___GFP_OTHER_NODE 0x200000u
#define ___GFP_WRITE 0x400000u
#define ___GFP_NOTRACK 0x200000u
#define ___GFP_NO_KSWAPD 0x400000u
#define ___GFP_OTHER_NODE 0x800000u
#define ___GFP_WRITE 0x1000000u
/*
* GFP bitmasks..
@@ -85,6 +86,7 @@ struct vm_area_struct;
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */
#define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */
#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD)
#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */
@@ -94,7 +96,7 @@ struct vm_area_struct;
*/
#define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
#define __GFP_BITS_SHIFT 23 /* Room for N __GFP_FOO bits */
#define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/* This equals 0, but use constants in case they ever change */
@@ -114,7 +116,8 @@ struct vm_area_struct;
__GFP_MOVABLE)
#define GFP_IOFS (__GFP_IO | __GFP_FS)
#define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN)
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
__GFP_NO_KSWAPD)
#ifdef CONFIG_NUMA
#define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)

View File

@@ -701,6 +701,13 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
#define COMPACTION_BUILD 0
#endif
/* This helps us to avoid #ifdef CONFIG_SYMBOL_PREFIX */
#ifdef CONFIG_SYMBOL_PREFIX
#define SYMBOL_PREFIX CONFIG_SYMBOL_PREFIX
#else
#define SYMBOL_PREFIX ""
#endif
/* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
# define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD

View File

@@ -82,16 +82,6 @@ static inline void mpol_cond_put(struct mempolicy *pol)
__mpol_put(pol);
}
extern struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
struct mempolicy *frompol);
static inline struct mempolicy *mpol_cond_copy(struct mempolicy *tompol,
struct mempolicy *frompol)
{
if (!frompol)
return frompol;
return __mpol_cond_copy(tompol, frompol);
}
extern struct mempolicy *__mpol_dup(struct mempolicy *pol);
static inline struct mempolicy *mpol_dup(struct mempolicy *pol)
{
@@ -215,12 +205,6 @@ static inline void mpol_cond_put(struct mempolicy *pol)
{
}
static inline struct mempolicy *mpol_cond_copy(struct mempolicy *to,
struct mempolicy *from)
{
return from;
}
static inline void mpol_get(struct mempolicy *pol)
{
}

View File

@@ -1488,6 +1488,9 @@ struct napi_gro_cb {
/* Used in ipv6_gro_receive() */
int proto;
/* used in skb_gro_receive() slow path */
struct sk_buff *last;
};
#define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)

View File

@@ -525,6 +525,7 @@ static inline __u32 cookie_v6_init_sequence(struct sock *sk,
extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
int nonagle);
extern bool tcp_may_send_now(struct sock *sk);
extern int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);
extern void tcp_retransmit_timer(struct sock *sk);
extern void tcp_xmit_retransmit_queue(struct sock *);

View File

@@ -36,6 +36,7 @@
{(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
{(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
{(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
{(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \
{(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
) : "GFP_NOWAIT"

View File

@@ -21,10 +21,10 @@ struct key *modsign_keyring;
extern __initdata const u8 modsign_certificate_list[];
extern __initdata const u8 modsign_certificate_list_end[];
asm(".section .init.data,\"aw\"\n"
"modsign_certificate_list:\n"
SYMBOL_PREFIX "modsign_certificate_list:\n"
".incbin \"signing_key.x509\"\n"
".incbin \"extra_certificates\"\n"
"modsign_certificate_list_end:"
SYMBOL_PREFIX "modsign_certificate_list_end:"
);
/*

View File

@@ -27,13 +27,13 @@
* - Information block
*/
struct module_signature {
enum pkey_algo algo : 8; /* Public-key crypto algorithm */
enum pkey_hash_algo hash : 8; /* Digest algorithm */
enum pkey_id_type id_type : 8; /* Key identifier type */
u8 signer_len; /* Length of signer's name */
u8 key_id_len; /* Length of key identifier */
u8 __pad[3];
__be32 sig_len; /* Length of signature data */
u8 algo; /* Public-key crypto algorithm [enum pkey_algo] */
u8 hash; /* Digest algorithm [enum pkey_hash_algo] */
u8 id_type; /* Key identifier type [enum pkey_id_type] */
u8 signer_len; /* Length of signer's name */
u8 key_id_len; /* Length of key identifier */
u8 __pad[3];
__be32 sig_len; /* Length of signature data */
};
/*

View File

@@ -368,6 +368,9 @@ static void watchdog_disable(unsigned int cpu)
{
struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
if (!watchdog_enabled)
return;
watchdog_set_prio(SCHED_NORMAL, 0);
hrtimer_cancel(hrtimer);
/* disable the perf event */

View File

@@ -1361,8 +1361,8 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
timer->data != (unsigned long)dwork);
BUG_ON(timer_pending(timer));
BUG_ON(!list_empty(&work->entry));
WARN_ON_ONCE(timer_pending(timer));
WARN_ON_ONCE(!list_empty(&work->entry));
/*
* If @delay is 0, queue @dwork->work immediately. This is for

View File

@@ -163,7 +163,7 @@ $(obj)/crc32table.h: $(obj)/gen_crc32table
#
obj-$(CONFIG_OID_REGISTRY) += oid_registry.o
$(obj)/oid_registry.c: $(obj)/oid_registry_data.c
$(obj)/oid_registry.o: $(obj)/oid_registry_data.c
$(obj)/oid_registry_data.c: $(srctree)/include/linux/oid_registry.h \
$(src)/build_OID_registry

View File

@@ -91,7 +91,7 @@ next_tag:
/* Extract the length */
len = data[dp++];
if (len < 0x7f) {
if (len <= 0x7f) {
dp += len;
goto next_tag;
}

View File

@@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone,
/* Found a block suitable for isolating free pages from */
isolated = 0;
end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
/*
* As pfn may not start aligned, pfn+pageblock_nr_page
* may cross a MAX_ORDER_NR_PAGES boundary and miss
* a pfn_valid check. Ensure isolate_freepages_block()
* only scans within a pageblock
*/
end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
end_pfn = min(end_pfn, zone_end_pfn);
isolated = isolate_freepages_block(cc, pfn, end_pfn,
freelist, false);
nr_freepages += isolated;

View File

@@ -2037,28 +2037,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
return new;
}
/*
* If *frompol needs [has] an extra ref, copy *frompol to *tompol ,
* eliminate the * MPOL_F_* flags that require conditional ref and
* [NOTE!!!] drop the extra ref. Not safe to reference *frompol directly
* after return. Use the returned value.
*
* Allows use of a mempolicy for, e.g., multiple allocations with a single
* policy lookup, even if the policy needs/has extra ref on lookup.
* shmem_readahead needs this.
*/
struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
struct mempolicy *frompol)
{
if (!mpol_needs_cond_ref(frompol))
return frompol;
*tompol = *frompol;
tompol->flags &= ~MPOL_F_SHARED; /* copy doesn't need unref */
__mpol_put(frompol);
return tompol;
}
/* Slow path of a mempolicy comparison */
bool __mpol_equal(struct mempolicy *a, struct mempolicy *b)
{

View File

@@ -2378,15 +2378,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
}
/* Returns true if the allocation is likely for THP */
static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order)
{
if (order == pageblock_order &&
(gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE)
return true;
return false;
}
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
@@ -2425,10 +2416,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto nopage;
restart:
/* The decision whether to wake kswapd for THP is made later */
if (!is_thp_alloc(gfp_mask, order))
if (!(gfp_mask & __GFP_NO_KSWAPD))
wake_all_kswapd(order, zonelist, high_zoneidx,
zone_idx(preferred_zone));
zone_idx(preferred_zone));
/*
* OK, we're below the kswapd watermark and have kicked background
@@ -2498,21 +2488,15 @@ rebalance:
goto got_pg;
sync_migration = true;
if (is_thp_alloc(gfp_mask, order)) {
/*
* If compaction is deferred for high-order allocations, it is
* because sync compaction recently failed. If this is the case
* and the caller requested a movable allocation that does not
* heavily disrupt the system then fail the allocation instead
* of entering direct reclaim.
*/
if (deferred_compaction || contended_compaction)
goto nopage;
/* If process is willing to reclaim/compact then wake kswapd */
wake_all_kswapd(order, zonelist, high_zoneidx,
zone_idx(preferred_zone));
}
/*
* If compaction is deferred for high-order allocations, it is because
* sync compaction recently failed. In this is the case and the caller
* requested a movable allocation that does not heavily disrupt the
* system then fail the allocation instead of entering direct reclaim.
*/
if ((deferred_compaction || contended_compaction) &&
(gfp_mask & __GFP_NO_KSWAPD))
goto nopage;
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,

View File

@@ -909,26 +909,9 @@ static struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct mempolicy mpol, *spol;
struct vm_area_struct pvma;
spol = mpol_cond_copy(&mpol,
mpol_shared_policy_lookup(&info->policy, index));
/* Create a pseudo vma that just contains the policy */
pvma.vm_start = 0;
/* Bias interleave by inode number to distribute better across nodes */
pvma.vm_pgoff = index + info->vfs_inode.i_ino;
pvma.vm_ops = NULL;
pvma.vm_policy = spol;
return swapin_readahead(swap, gfp, &pvma, 0);
}
static struct page *shmem_alloc_page(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct vm_area_struct pvma;
struct page *page;
/* Create a pseudo vma that just contains the policy */
pvma.vm_start = 0;
@@ -937,10 +920,33 @@ static struct page *shmem_alloc_page(gfp_t gfp,
pvma.vm_ops = NULL;
pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
/*
* alloc_page_vma() will drop the shared policy reference
*/
return alloc_page_vma(gfp, &pvma, 0);
page = swapin_readahead(swap, gfp, &pvma, 0);
/* Drop reference taken by mpol_shared_policy_lookup() */
mpol_cond_put(pvma.vm_policy);
return page;
}
static struct page *shmem_alloc_page(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct vm_area_struct pvma;
struct page *page;
/* Create a pseudo vma that just contains the policy */
pvma.vm_start = 0;
/* Bias interleave by inode number to distribute better across nodes */
pvma.vm_pgoff = index + info->vfs_inode.i_ino;
pvma.vm_ops = NULL;
pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
page = alloc_page_vma(gfp, &pvma, 0);
/* Drop reference taken by mpol_shared_policy_lookup() */
mpol_cond_put(pvma.vm_policy);
return page;
}
#else /* !CONFIG_NUMA */
#ifdef CONFIG_TMPFS

View File

@@ -2823,29 +2823,10 @@ out:
if (!populated_zone(zone))
continue;
if (zone->all_unreclaimable &&
sc.priority != DEF_PRIORITY)
continue;
/* Would compaction fail due to lack of free memory? */
if (COMPACTION_BUILD &&
compaction_suitable(zone, order) == COMPACT_SKIPPED)
goto loop_again;
/* Confirm the zone is balanced for order-0 */
if (!zone_watermark_ok(zone, 0,
high_wmark_pages(zone), 0, 0)) {
order = sc.order = 0;
goto loop_again;
}
/* Check if the memory needs to be defragmented. */
if (zone_watermark_ok(zone, order,
low_wmark_pages(zone), *classzone_idx, 0))
zones_need_compaction = 0;
/* If balanced, clear the congested flag */
zone_clear_flag(zone, ZONE_CONGESTED);
}
if (zones_need_compaction)

View File

@@ -3451,6 +3451,8 @@ static int napi_gro_complete(struct sk_buff *skb)
struct list_head *head = &ptype_base[ntohs(type) & PTYPE_HASH_MASK];
int err = -ENOENT;
BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
if (NAPI_GRO_CB(skb)->count == 1) {
skb_shinfo(skb)->gso_size = 0;
goto out;

View File

@@ -3004,7 +3004,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
skb_shinfo(nskb)->gso_size = pinfo->gso_size;
pinfo->gso_size = 0;
skb_header_release(p);
nskb->prev = p;
NAPI_GRO_CB(nskb)->last = p;
nskb->data_len += p->len;
nskb->truesize += p->truesize;
@@ -3030,8 +3030,8 @@ merge:
__skb_pull(skb, offset);
p->prev->next = skb;
p->prev = skb;
NAPI_GRO_CB(p)->last->next = skb;
NAPI_GRO_CB(p)->last = skb;
skb_header_release(skb);
done:

View File

@@ -44,6 +44,10 @@ struct inet_diag_entry {
u16 dport;
u16 family;
u16 userlocks;
#if IS_ENABLED(CONFIG_IPV6)
struct in6_addr saddr_storage; /* for IPv4-mapped-IPv6 addresses */
struct in6_addr daddr_storage; /* for IPv4-mapped-IPv6 addresses */
#endif
};
static DEFINE_MUTEX(inet_diag_table_mutex);
@@ -428,25 +432,31 @@ static int inet_diag_bc_run(const struct nlattr *_bc,
break;
}
if (cond->prefix_len == 0)
break;
if (op->code == INET_DIAG_BC_S_COND)
addr = entry->saddr;
else
addr = entry->daddr;
if (cond->family != AF_UNSPEC &&
cond->family != entry->family) {
if (entry->family == AF_INET6 &&
cond->family == AF_INET) {
if (addr[0] == 0 && addr[1] == 0 &&
addr[2] == htonl(0xffff) &&
bitstring_match(addr + 3,
cond->addr,
cond->prefix_len))
break;
}
yes = 0;
break;
}
if (cond->prefix_len == 0)
break;
if (bitstring_match(addr, cond->addr,
cond->prefix_len))
break;
if (entry->family == AF_INET6 &&
cond->family == AF_INET) {
if (addr[0] == 0 && addr[1] == 0 &&
addr[2] == htonl(0xffff) &&
bitstring_match(addr + 3, cond->addr,
cond->prefix_len))
break;
}
yes = 0;
break;
}
@@ -509,6 +519,55 @@ static int valid_cc(const void *bc, int len, int cc)
return 0;
}
/* Validate an inet_diag_hostcond. */
static bool valid_hostcond(const struct inet_diag_bc_op *op, int len,
int *min_len)
{
int addr_len;
struct inet_diag_hostcond *cond;
/* Check hostcond space. */
*min_len += sizeof(struct inet_diag_hostcond);
if (len < *min_len)
return false;
cond = (struct inet_diag_hostcond *)(op + 1);
/* Check address family and address length. */
switch (cond->family) {
case AF_UNSPEC:
addr_len = 0;
break;
case AF_INET:
addr_len = sizeof(struct in_addr);
break;
case AF_INET6:
addr_len = sizeof(struct in6_addr);
break;
default:
return false;
}
*min_len += addr_len;
if (len < *min_len)
return false;
/* Check prefix length (in bits) vs address length (in bytes). */
if (cond->prefix_len > 8 * addr_len)
return false;
return true;
}
/* Validate a port comparison operator. */
static inline bool valid_port_comparison(const struct inet_diag_bc_op *op,
int len, int *min_len)
{
/* Port comparisons put the port in a follow-on inet_diag_bc_op. */
*min_len += sizeof(struct inet_diag_bc_op);
if (len < *min_len)
return false;
return true;
}
static int inet_diag_bc_audit(const void *bytecode, int bytecode_len)
{
const void *bc = bytecode;
@@ -516,29 +575,39 @@ static int inet_diag_bc_audit(const void *bytecode, int bytecode_len)
while (len > 0) {
const struct inet_diag_bc_op *op = bc;
int min_len = sizeof(struct inet_diag_bc_op);
//printk("BC: %d %d %d {%d} / %d\n", op->code, op->yes, op->no, op[1].no, len);
switch (op->code) {
case INET_DIAG_BC_AUTO:
case INET_DIAG_BC_S_COND:
case INET_DIAG_BC_D_COND:
if (!valid_hostcond(bc, len, &min_len))
return -EINVAL;
break;
case INET_DIAG_BC_S_GE:
case INET_DIAG_BC_S_LE:
case INET_DIAG_BC_D_GE:
case INET_DIAG_BC_D_LE:
case INET_DIAG_BC_JMP:
if (op->no < 4 || op->no > len + 4 || op->no & 3)
return -EINVAL;
if (op->no < len &&
!valid_cc(bytecode, bytecode_len, len - op->no))
if (!valid_port_comparison(bc, len, &min_len))
return -EINVAL;
break;
case INET_DIAG_BC_AUTO:
case INET_DIAG_BC_JMP:
case INET_DIAG_BC_NOP:
break;
default:
return -EINVAL;
}
if (op->yes < 4 || op->yes > len + 4 || op->yes & 3)
if (op->code != INET_DIAG_BC_NOP) {
if (op->no < min_len || op->no > len + 4 || op->no & 3)
return -EINVAL;
if (op->no < len &&
!valid_cc(bytecode, bytecode_len, len - op->no))
return -EINVAL;
}
if (op->yes < min_len || op->yes > len + 4 || op->yes & 3)
return -EINVAL;
bc += op->yes;
len -= op->yes;
@@ -596,6 +665,36 @@ static int inet_twsk_diag_dump(struct inet_timewait_sock *tw,
cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->nlh);
}
/* Get the IPv4, IPv6, or IPv4-mapped-IPv6 local and remote addresses
* from a request_sock. For IPv4-mapped-IPv6 we must map IPv4 to IPv6.
*/
static inline void inet_diag_req_addrs(const struct sock *sk,
const struct request_sock *req,
struct inet_diag_entry *entry)
{
struct inet_request_sock *ireq = inet_rsk(req);
#if IS_ENABLED(CONFIG_IPV6)
if (sk->sk_family == AF_INET6) {
if (req->rsk_ops->family == AF_INET6) {
entry->saddr = inet6_rsk(req)->loc_addr.s6_addr32;
entry->daddr = inet6_rsk(req)->rmt_addr.s6_addr32;
} else if (req->rsk_ops->family == AF_INET) {
ipv6_addr_set_v4mapped(ireq->loc_addr,
&entry->saddr_storage);
ipv6_addr_set_v4mapped(ireq->rmt_addr,
&entry->daddr_storage);
entry->saddr = entry->saddr_storage.s6_addr32;
entry->daddr = entry->daddr_storage.s6_addr32;
}
} else
#endif
{
entry->saddr = &ireq->loc_addr;
entry->daddr = &ireq->rmt_addr;
}
}
static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
struct request_sock *req,
struct user_namespace *user_ns,
@@ -637,8 +736,10 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
r->idiag_inode = 0;
#if IS_ENABLED(CONFIG_IPV6)
if (r->idiag_family == AF_INET6) {
*(struct in6_addr *)r->id.idiag_src = inet6_rsk(req)->loc_addr;
*(struct in6_addr *)r->id.idiag_dst = inet6_rsk(req)->rmt_addr;
struct inet_diag_entry entry;
inet_diag_req_addrs(sk, req, &entry);
memcpy(r->id.idiag_src, entry.saddr, sizeof(struct in6_addr));
memcpy(r->id.idiag_dst, entry.daddr, sizeof(struct in6_addr));
}
#endif
@@ -691,18 +792,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
continue;
if (bc) {
entry.saddr =
#if IS_ENABLED(CONFIG_IPV6)
(entry.family == AF_INET6) ?
inet6_rsk(req)->loc_addr.s6_addr32 :
#endif
&ireq->loc_addr;
entry.daddr =
#if IS_ENABLED(CONFIG_IPV6)
(entry.family == AF_INET6) ?
inet6_rsk(req)->rmt_addr.s6_addr32 :
#endif
&ireq->rmt_addr;
inet_diag_req_addrs(sk, req, &entry);
entry.dport = ntohs(ireq->rmt_port);
if (!inet_diag_bc_run(bc, &entry))

View File

@@ -707,28 +707,27 @@ EXPORT_SYMBOL(ip_defrag);
struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
{
const struct iphdr *iph;
struct iphdr iph;
u32 len;
if (skb->protocol != htons(ETH_P_IP))
return skb;
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
if (!skb_copy_bits(skb, 0, &iph, sizeof(iph)))
return skb;
iph = ip_hdr(skb);
if (iph->ihl < 5 || iph->version != 4)
return skb;
if (!pskb_may_pull(skb, iph->ihl*4))
return skb;
iph = ip_hdr(skb);
len = ntohs(iph->tot_len);
if (skb->len < len || len < (iph->ihl * 4))
if (iph.ihl < 5 || iph.version != 4)
return skb;
if (ip_is_fragment(ip_hdr(skb))) {
len = ntohs(iph.tot_len);
if (skb->len < len || len < (iph.ihl * 4))
return skb;
if (ip_is_fragment(&iph)) {
skb = skb_share_check(skb, GFP_ATOMIC);
if (skb) {
if (!pskb_may_pull(skb, iph.ihl*4))
return skb;
if (pskb_trim_rcsum(skb, len))
return skb;
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));

View File

@@ -5645,7 +5645,11 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
tcp_fastopen_cache_set(sk, mss, cookie, syn_drop);
if (data) { /* Retransmit unacked data in SYN */
tcp_retransmit_skb(sk, data);
tcp_for_write_queue_from(data, sk) {
if (data == tcp_send_head(sk) ||
__tcp_retransmit_skb(sk, data))
break;
}
tcp_rearm_rto(sk);
return true;
}

View File

@@ -2309,12 +2309,11 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
* state updates are done by the caller. Returns non-zero if an
* error occurred which prevented the send.
*/
int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
unsigned int cur_mss;
int err;
/* Inconslusive MTU probe */
if (icsk->icsk_mtup.probe_size) {
@@ -2387,11 +2386,17 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
GFP_ATOMIC);
err = nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-ENOBUFS;
return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-ENOBUFS;
} else {
err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
}
}
int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
int err = __tcp_retransmit_skb(sk, skb);
if (err == 0) {
/* Update global TCP statistics. */