Commit Graph

44881 Commits

Author SHA1 Message Date
David S. Miller
c58543c869 sparc64: Set IRQF_DISABLED on LDC channel IRQs.
With lots of virtual devices it's easy to generate a lot of
events and chew up the kernel IRQ stack.

Reported-by: hyl <heyongli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 00:49:09 -07:00
Hidetoshi Seto
8968f9d3dc perf_event, x86, mce: Use TRACE_EVENT() for MCE logging
This approach is the first baby step towards solving many of the
structural problems the x86 MCE logging code is having today:

 - It has a private ring-buffer implementation that has a number
   of limitations and has been historically fragile and buggy.

 - It is using a quirky /dev/mcelog ioctl driven ABI that is MCE
   specific. /dev/mcelog is not part of any larger logging
   framework and hence has remained on the fringes for many years.

 - The MCE logging code is still very unclean partly due to its ABI
   limitations. Fields are being reused for multiple purposes, and
   the whole message structure is limited and x86 specific to begin
   with.

All in one, the x86 tree would like to move away from this private
implementation of an event logging facility to a broader framework.

By using perf events we gain the following advantages:

 - Multiple user-space agents can access MCE events. We can have an
   mcelog daemon running but also a system-wide tracer capturing
   important events in flight-recorder mode.

 - Sampling support: the kernel and the user-space call-chain of MCE
   events can be stored and analyzed as well. This way actual patterns
   of bad behavior can be matched to precisely what kind of activity
   happened in the kernel (and/or in the app) around that moment in
   time.

 - Coupling with other hardware and software events: the PMU can track a
   number of other anomalies - monitoring software might chose to
   monitor those plus the MCE events as well - in one coherent stream of
   events.

 - Discovery of MCE sources - tracepoints are enumerated and tools can
   act upon the existence (or non-existence) of various channels of MCE
   information.

 - Filtering support: we just subscribe to and act upon the events we
   are interested in. Then even on a per event source basis there's
   in-kernel filter expressions available that can restrict the amount
   of data that hits the event channel.

 - Arbitrary deep per cpu buffering of events - we can buffer 32
   entries or we can buffer as much as we want, as long as we have
   the RAM.

 - An NMI-safe ring-buffer implementation - mappable to user-space.

 - Built-in support for timestamping of events, PID markers, CPU
   markers, etc.

 - A rich ABI accessible over system call interface. Per cpu, per task
   and per workload monitoring of MCE events can be done this way. The
   ABI itself has a nice, meaningful structure.

 - Extensible ABI: new fields can be added without breaking tooling.
   New tracepoints can be added as the hardware side evolves. There's
   various parsers that can be used.

 - Lots of scheduling/buffering/batching modes of operandi for MCE
   events. poll() support. mmap() support. read() support. You name it.

 - Rich tooling support: even without any MCE specific extensions added
   the 'perf' tool today offers various views of MCE data: perf report,
   perf stat, perf trace can all be used to view logged MCE events and
   perhaps correlate them to certain user-space usage patterns. But it
   can be used directly as well, for user-space agents and policy action
   in mcelog, etc.

With this we hope to achieve significant code cleanup and feature
improvements in the MCE code, and we hope to be able to drop the
/dev/mcelog facility in the end.

This patch is just a plain dumb dump of mce_log() records to
the tracepoints / perf events framework - a first proof of
concept step.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4AD42A0D.7050104@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:43:38 +02:00
Ingo Molnar
9dbdd6c41c Merge commit 'v2.6.32-rc4' into perf/core
Merge reason: we were on an -rc1 base, merge up to -rc4.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:31:34 +02:00
Paul Mundt
c8afde7f40 sh: Don't profile return_address().
This adds return_address.c to the -pg exclusion list, as this is the
building block for CALLER_ADDRx we do not want to profile this.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 16:31:08 +09:00
Ingo Molnar
2c96c142e9 Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up tracing/filters fix from the urgent queue,
              we will queue up dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:24:59 +02:00
Jeremy Fitzhardinge
71999d9862 x86/paravirt: Use normal calling sequences for irq enable/disable
Bastian Blank reported a boot crash with stackprotector enabled,
and debugged it back to edx register corruption.

For historical reasons irq enable/disable/save/restore had special
calling sequences to make them more efficient.  With the more
recent introduction of higher-level and more general optimisations
this is no longer necessary so we can just use the normal PVOP_
macros.

This fixes some residual bugs in the old implementations which left
edx liable to inadvertent clobbering. Also, fix some bugs in
__PVOP_VCALLEESAVE which were revealed by actual use.

Reported-by: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4AD3BC9B.7040501@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:22:01 +02:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Ingo Molnar
7a693d3f0d perf_events, x86: Fix event constraints code
There was namespace overlap due to a rename i did - this caused
the following build warning, reported by Stephen Rothwell against
linux-next x86_64 allmodconfig:

  arch/x86/kernel/cpu/perf_event.c: In function 'intel_get_event_idx':
  arch/x86/kernel/cpu/perf_event.c:1445: warning: 'event_constraint' is used uninitialized in this function

This is a real bug not just a warning: fix it by renaming the
global event-constraints table pointer to 'event_constraints'.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013144223.369d616d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 08:19:53 +02:00
Paul Mundt
5a3abba77d sh: Tidy up the dwarf module helpers.
This enables us to build the dwarf unwinder both with modules enabled and
disabled in addition to reducing code size in the latter case. The
helpers are also consolidated, and modified to resemble the BUG module
helpers.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 13:32:19 +09:00
Paul Mundt
ac4fac8cb2 sh: Generalize CALLER_ADDRx support.
This splits out the unwinder implementation and adds a new
return_address() abstraction modelled after the ARM code. The DWARF
unwinder is tied in to this, returning NULL otherwise in the case of
being unable to support arbitrary depths.

This enables us to get correct behaviour with the unwinder enabled,
as well as disabling the arbitrary depth support when frame pointers are
enabled, as arbitrary depths with __builtin_return_address() are not
supported regardless.

With this abstraction it's also possible to layer on a simplified
implementation with frame pointers in the event that the unwinder isn't
enabled, although this is left as a future exercise.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 13:10:14 +09:00
Paul Mundt
5852b203ef Merge branch 'sh/stable-updates' 2009-10-13 12:45:08 +09:00
Paul Mundt
9922262242 sh: ftrace: Fix up syscall tracepoint support.
Sync up with latest core changes in the syscalls tracing area:

- tracing: Map syscall name to number (syscall_name_to_nr())
- tracing: Call arch_init_ftrace_syscalls at boot
- tracing: add support tracepoint ids (set_syscall_{enter,exit}_id())

Taken from the s390 change.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 12:42:48 +09:00
Paul Mundt
95019b48ad Merge branch 'sh/stable-updates' 2009-10-13 11:27:08 +09:00
Paul Mundt
964f7e5a56 sh: force dcache flush if dcache_dirty bit set.
This too follows the ARM change, given that the issue at hand applies to
all platforms that implement lazy D-cache writeback.

This fixes up the case when a page mapping disappears between the
flush_dcache_page() call (when PG_dcache_dirty is set for the page) and
the update_mmu_cache() call -- such as in the case of swap cache being
freed early. This kills off the mapping test in update_mmu_cache() and
switches to simply testing for PG_dcache_dirty.

Reported-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 11:18:34 +09:00
Paul Mundt
af67c3a9e6 sh: update die() output.
This follows the ARM change, as SH had all of the same issues:

Make die() better match x86:
- add printing of the last accessed sysfs file
- ensure console_verbose() is called under the lock
- ensure we panic outside of oops_exit()

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 10:57:52 +09:00
Paul Mundt
7a0064d672 Merge branch 'sh/ftrace' of git://github.com/mfleming/linux-2.6 2009-10-13 10:31:50 +09:00
H. Peter Anvin
98272ed0d2 x86: use kernel_stack_pointer() in kprobes.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
5ca6c0ca5d x86: use kernel_stack_pointer() in kgdb.c
The way to obtain a kernel-mode stack pointer from a struct
pt_regs in 32-bit mode is "subtle": the stack doesn't actually
contain the stack pointer, but rather the location where it would
have been marks the actual previous stack frame.  For clarity, use
kernel_stack_pointer() instead of coding this weirdness
explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
a343c75d33 x86: use kernel_stack_pointer() in dumpstack.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Furthermore, user_mode() is only valid when the process is known to
not run in V86 mode.  Use the safer user_mode_vm() instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
H. Peter Anvin
def3c5d0a3 x86: use kernel_stack_pointer() in process_32.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
David Rientjes
adc1938994 x86: Interleave emulated nodes over physical nodes
Add interleaved NUMA emulation support

This patch interleaves emulated nodes over the system's physical
nodes. This is required for interleave optimizations since
mempolicies, for example, operate by iterating over a nodemask and
act without knowledge of node distances.  It can also be used for
testing memory latencies and NUMA bugs in the kernel.

There're a couple of ways to do this:

 - divide the number of emulated nodes by the number of physical
   nodes and allocate the result on each physical node, or

 - allocate each successive emulated node on a different physical
   node until all memory is exhausted.

The disadvantage of the first option is, depending on the asymmetry
in node capacities of each physical node, emulated nodes may
substantially differ in size on a particular physical node compared
to another.

The disadvantage of the second option is, also depending on the
asymmetry in node capacities of each physical node, there may be
more emulated nodes allocated on a single physical node as another.

This patch implements the second option; we sacrifice the
possibility that we may have slightly more emulated nodes on a
particular physical node compared to another in lieu of node size
asymmetry.

 [ Note that "node capacity" of a physical node is not only a
   function of its addressable range, but also is affected by
   subtracting out the amount of reserved memory over that range.
   NUMA emulation only deals with available, non-reserved memory
   quantities. ]

We ensure there is at least a minimal amount of available memory
allocated to each node.  We also make sure that at least this
amount of available memory is available in ZONE_DMA32 for any node
that includes both ZONE_DMA32 and ZONE_NORMAL.

This patch also cleans the emulation code up by no longer passing
the statically allocated struct bootnode array among the various
functions. This init.data array is not allocated on the stack since
it may be very large and thus it may be accessed at file scope.

The WARN_ON() for nodes_cover_memory() when faking proximity
domains is removed since it relies on successive nodes always
having greater start addresses than previous nodes; with
interleaving this is no longer always true.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8716273cae x86: Export srat physical topology
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init().  This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.

acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located.  If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.

acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8ee2debce3 x86: Export k8 physical topology
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it.  This does the k8 node setup in two parts:
detection and registration.  NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly.  If emulation isn't used, the k8 nodes are
registered as normal.

Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time.  This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.

This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch.  The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.

This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.

k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
David Rientjes
1af5ba514f x86: Clean up and add missing log levels for k8
Convert all printk's in arch/x86/mm/k8topology_64.c to use
pr_info() or pr_err() appropriately.

Adds log levels for messages currently lacking them.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
Arjan van de Ven
ad8f4356af x86: Don't use the strict copy checks when branch profiling is in use
The branch profiling creates very complex code for each if
statement, to the point that gcc has trouble even analyzing
something as simple as

  if (count > 5)
      count = 5;

This then means that causing an error on code that gcc cannot
analyze for copy_from_user() and co is not very productive.

This patch excludes the strict copy checks in the case of branch
profiling being enabled.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091006070452.5e1fc119@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:29:51 +02:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
H. Peter Anvin
d1705c558c x86: fix kernel panic on 32 bits when profiling
Latest kernel has a kernel panic in booting on i386 machine when
profile=2 setting in cmdline.  It is due to 'sp' being incorrect in
profile_pc().

BUG: unable to handle kernel NULL pointer dereference at 00000246
IP: [<c01288b6>] profile_pc+0x2a/0x48
*pde = 00000000
Oops: 0000 [#1] SMP

This differs from the original version by Alex Shi in that we use the
kernel_stack_pointer() inline already defined in <asm/ptrace.h> for
this purpose, instead of #ifdef.

Originally-by: Alex Shi <alex.shi@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 11:53:51 -07:00
Nitin Gupta
787b2faadc ARM: force dcache flush if dcache_dirty bit set
On ARM, update_mmu_cache() does dcache flush for a page only if
it has a kernel mapping (page_mapping(page) != NULL). The correct
behavior would be to force the flush based on dcache_dirty bit only.

One of the cases where present logic would be a problem is when
a RAM based block device[1] is used as a swap disk. In this case,
we would have in-memory data corruption as shown in steps below:

do_swap_page()
{
    - Allocate a new page (if not already in swap cache)
    - Issue read from swap disk
        - Block driver issues flush_dcache_page()
        - flush_dcache_page() simply sets PG_dcache_dirty bit and does not
          actually issue a flush since this page has no user space mapping yet.
    - Now, if swap disk is almost full, this newly read page is removed
      from swap cache and corrsponding swap slot is freed.
    - Map this page anonymously in user space.
    - update_mmu_cache()
        - Since this page does not have kernel mapping (its not in page/swap
          cache and is mapped anonymously), it does not issue dcache flush
          even if dcache_dirty bit is set by flush_dcache_page() above.

    <user now gets stale data since dcache was never flushed>
}

Same problem exists on mips too.

[1] example:
 - brd (RAM based block device)
 - ramzswap (RAM based compressed swap device)

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-10-12 17:52:26 +01:00
Brian Gerst
ae24ffe5ec x86, 64-bit: Move K8 B step iret fixup to fault entry asm
Move the handling of truncated %rip from an iret fault to the fault
entry path.

This allows x86-64 to use the standard search_extable() function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:29:46 +02:00
Jan Beulich
7a4b7e5e74 x86: Fix Suspend to RAM freeze on Acer Aspire 1511Lmi laptop
Move the trampoline and accessors back out of .cpuinit.* for the
case of 64-bits+ACPI_SLEEP.

This solves s2ram hangs reported in:

  http://bugzilla.kernel.org/show_bug.cgi?id=14279

Reported-and-bisected-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <bugzilla-daemon@bugzilla.kernel.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:06:48 +02:00
David Woodhouse
9a821b2316 x86: Move pci_iommu_init to rootfs_initcall()
We want this to happen after the PCI quirks, which are now running at
the very end of the fs_initcalls.

This works around the BIOS problems which were originally addressed by
commit db8be50c43 ('USB: Work around BIOS
bugs by quiescing USB controllers earlier'), which was reverted in
commit d93a8f829f.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-10-12 14:42:11 +01:00
Russell King
edc72786d2 Merge branch 'fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 2009-10-12 14:38:08 +01:00
Arjan van de Ven
c03cb3149d x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
MTRR and PAT support (which got added to CPUs over 10 years ago)
are no longer really optional in that more and more things are
depending on PAT just working, including various drivers and newer
versions of X.  (to not even speak of MTRR)

Having this as a regular config option just no longer makes sense.

This patch relegates CONFIG_X86_PAT to the EMBEDDED category so
ultra-embedded can still disable it if they really need to.

Also-Suggested-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
LKML-Reference: <20091011103302.62bded41@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 13:06:57 +02:00
Christoph Lameter
494f6a9e12 this_cpu: Use this_cpu_xx in nmi handling
this_cpu_inc/dec reduces the number of instructions needed.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-12 19:51:49 +09:00
Borislav Petkov
fb2531953f mce, edac: Use an atomic notifier for MCEs decoding
Add an atomic notifier which ensures proper locking when conveying
MCE info to EDAC for decoding. The actual notifier call overrides a
default, negative priority notifier.

Note: make sure we register the default decoder only once since
mcheck_init() runs on each CPU.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091003065752.GA8935@liondog.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 12:24:45 +02:00
David S. Miller
1a78cedb99 sparc64: Fix D-cache flushing on swapin from SW devices.
Thanks to tip form ARM folks and Russell King.

If flush_dcache_page() occurs on a swapin it will have a mapping
and we'll try to defer the flush by setting the dirty bit.

But when it hits update_dcache_page() we won't flush because the
page won't have a mapping any more.  So remove the mapping
requirement in flush_dcache().

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 03:20:57 -07:00
Dennis O'Brien
4367216a09 [ARM] pxa: workaround errata #37 by not using half turbo switching
PXA27x Errata #37 implies system will hang when switching into or out of
half turbo (HT bit in CLKCFG) mode, workaround this by not using it.

Signed-off-by: Dennis O'Brien <dennis.obrien@eqware.net>
Cc: stable-2.6.31 <stable@kernel.org>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2009-10-12 15:30:50 +08:00
Julia Lawall
c639ef4317 [ARM] pxa/csb726: adjust duplicate structure field initialization
Currently the irq_type field of the csb726_lan_config structure is
initialized twice.  The value in the first case,
SMSC911X_IRQ_POLARITY_ACTIVE_LOW, is normally stored in the irq_polarity
field, so I have renamed the field in the first initialization to that.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2009-10-12 15:30:50 +08:00
Joe Perches
3c355863fb testmmiotrace.c: Add and use pr_fmt(fmt)
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt.
- Strip MODULE_NAME from pr_<level>s.
- Remove MODULE_NAME definition.

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <3bb66cc7f85f77b9416902e1be7076f7e3f4ad48.1254701151.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 08:05:41 +02:00
Joe Perches
3bb258bf43 ftrace.c: Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- Remove prefixes from pr_<level>, use pr_fmt(fmt).

No change in output.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <9b377eefae9e28c599dd4a17bdc81172965e9931.1254701151.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 08:05:40 +02:00
Paul Mundt
8ec006c587 Merge branch 'sh/dwarf-unwinder'
Conflicts:
	arch/sh/kernel/dwarf.c
2009-10-12 08:50:07 +09:00
Paul Mundt
5ab78ff693 Merge branch 'sh/dwarf-unwinder' of git://github.com/mfleming/linux-2.6 into sh/dwarf-unwinder 2009-10-12 08:42:46 +09:00
Yinghai Lu
15b812f1d0 pci: increase alignment to make more space for hidden code
As reported in

	http://bugzilla.kernel.org/show_bug.cgi?id=13940

on some system when acpi are enabled, acpi clears some BAR for some
devices without reason, and kernel will need to allocate devices for
them.  It then apparently hits some undocumented resource conflict,
resulting in non-working devices.

Try to increase alignment to get more safe range for unassigned devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-11 14:43:36 -07:00
Linus Torvalds
f144c78e52 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (21 commits)
  [S390] dasd: fix race condition in resume code
  [S390] Add EX_TABLE for addressing exception in usercopy functions.
  [S390] 64-bit register support for 31-bit processes
  [S390] hibernate: Use correct place for CPU address in lowcore
  [S390] pm: ignore time spend in suspended state
  [S390] zcrypt: Improve some comments
  [S390] zcrypt: Fix sparse warning.
  [S390] perf_counter: fix vdso detection
  [S390] ftrace: drop nmi protection
  [S390] compat: fix truncate system call wrapper
  [S390] Provide arch specific mdelay implementation.
  [S390] Fix enabled udelay for short delays.
  [S390] cio: allow setting boxed devices offline
  [S390] cio: make not operational handling consistent
  [S390] cio: make disconnected handling consistent
  [S390] Fix memory leak in /proc/cio_ignore
  [S390] cio: channel path memory leak
  [S390] module: fix memory leak in s390 module loader
  [S390] Enable kmemleak on s390.
  [S390] 3270 console build fix
  ...
2009-10-11 11:34:50 -07:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Matt Fleming
d26cddbbd2 sh: tracing: Use the DWARF unwinder for CALLER_ADDRx
The major reason for implementing the DWARF unwinder in the first place
was so that we could stop using __builtin_return_address(n), which
doesn't work on SH for n > 0.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
2009-10-11 17:56:17 +01:00
Matt Fleming
c2d474d6f8 sh: Remove any reference to recursive functions from comments
Originally, dwarf_unwind_stack() was a recursive function and it seems
that some of the old comments were never updated.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
2009-10-11 17:12:32 +01:00
Matt Fleming
ed4fe7f488 sh: Fix memory leak in dwarf_unwind_stack()
If we broke out of the while (1) loop because the return address of
"frame" was zero, then "frame" needs to be free'd before we return.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
2009-10-11 17:12:28 +01:00
Matt Fleming
a6a2f2ad67 sh: Teach the DWARF unwinder about modules
Pass a module's .eh_frame section to the DWARF unwinder at module load
time so that the section's FDEs and CIEs can be registered with the
DWARF unwinder. This allows us to unwind the stack through module code
when generating backtraces.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
2009-10-11 16:41:44 +01:00
Russell King
6a5e293f1b ARM: Add kmap_atomic type debugging
Seemingly this support was missed when highmem was added, so
DEBUG_HIGHMEM wouldn't have checked the kmap_atomic type.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-10-11 16:29:48 +01:00