Commit Graph

47151 Commits

Author SHA1 Message Date
Michael Ellerman
aee7a283bb powerpc: Fix memory leak in axon_msi.c
cppcheck found a memory leak in axon_msi, if dcr_base or dcr_len are zero,
we have already allocated msic, so we should free it in the error path.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-14 16:58:36 +11:00
Benjamin Herrenschmidt
11a50873ef powerpc/pmac: Fix issues with sleep on some powerbooks
Since the change of how interrupts are disabled during suspend,
certain PowerBook models started exhibiting various issues during
suspend or resume from sleep.

I finally tracked it down to the code that runs various "platform"
functions (kind of little scripts extracted from the device-tree),
which uses our i2c and PMU drivers expecting interrutps to work,
and at a time where with the new scheme, they have been disabled.

This causes timeouts internally which for some reason results in
the PMU being unable to see the trackpad, among other issues, really
it depends on the machine. Most of the time, we fail to properly adjust
some clocks for suspend/resume so the results are not always
predictable.

This patch fixes it by using IRQF_TIMER for both the PMU and the I2C
interrupts. I prefer doing it this way than moving the call sites since
I really want those platform functions to still be called after all
drivers (and before sysdevs).

We also do a slight cleanup to via-pmu.c driver to make sure the
ADB autopoll mask is handled correctly when doing bus resets

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-14 16:58:35 +11:00
Paul Mundt
d780613acc sh: Only invalidate the I-cache range for secondary CPUs stack_start.
Secondary CPUs already take care of the D-cache bits through the common
cache initialization path, and the only thing that is necessary after
twiddling around with stack_start is ensuring that the I-cache changes
are visible (particularly since this tends to be the only part lacking
coherency).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-14 11:51:28 +09:00
Paul Mundt
36c8719926 sh: Provide CALLER_ADDRx definitions even when ftrace is disabled.
Despite being located in the ftrace header, the CALLER_ADDRx definitions
are used by generic code. As such, we have to provide it generically, and
given that there is no real dependence on ftrace in the first place, the
definitions can just be moved out.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-14 11:49:49 +09:00
Tony Luck
1502f08edc [IA64] SMT friendly version of spin_unlock_wait()
We can be kinder to SMT systems in spin_unlock_wait.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-10-13 14:28:31 -07:00
Steven Rostedt
be10ab1090 powerpc64/ftrace: use PACA to retrieve TOC in mod_return_to_handler
The mod_return_to_handler needs to switch to the kernel TOC before
jumping to a the kernel code. It currently does this by looking
at the kernel function data and retrieves the TOC that way.

Not only is this inefficient, it also breaks with a relocatable kernel.
The PACA contains the kernel TOC and we can easily retrieve it that
way.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-13 14:20:56 -07:00
Steven Rostedt
9135c3cc5a powerpc/ftrace: show real return addresses in modules
When the function graph tracer is enabled, it replaces the return address
with a hook back to the tracer. This makes back traces see the hook instead
of the actual return address.

The current code also shows the real address by checking if the return
address jumps to the return_to_handler. If it is, is also prints out
the saved real return address.

On powerpc64, some modules may return to mod_return_to_handler, which
is not checked. This patch will also show the real address if a return
is to mod_return_to_handler as well.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-13 14:20:55 -07:00
Marcin Slusarz
54f8dd3c99 [IA64] use printk_once() unaligned.c/io_common.c
Use printk_once() in a couple of places.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-10-13 12:48:25 -07:00
Matthew Wilcox
adcd740341 [IA64] Require SAL 3.2 in order to do extended config space ops
We had assumed that SAL firmware would return an error if it didn't
understand extended config space.  Unfortunately, the SAL on the SGI 750
doesn't do that, it panics the machine.  So, condition the extended PCI
config space accesses on SAL revision 3.2.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-10-13 10:44:42 -07:00
Roel Kluin
dec1798f81 [IA64] unsigned cannot be less than 0 in sn_hwperf_ioctl()
struct sn_hwperf_ioctl_args member arg (u64) cannot be less than 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-10-13 10:40:00 -07:00
Takao Indoh
29e4e025be [IA64] Restore registers in the stack on INIT
Registers are not saved anywhere when INIT comes during fsys mode and
we cannot know what happened when we investigate vmcore captured by
kdump. This patch adds new function finish_pt_regs() so registers can
be saved in such a case.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-10-13 10:37:14 -07:00
Linus Torvalds
80fa680d22 Merge git://git.infradead.org/~dwmw2/iommu-2.6.32
* git://git.infradead.org/~dwmw2/iommu-2.6.32:
  x86: Move pci_iommu_init to rootfs_initcall()
  Run pci_apply_final_quirks() sooner.
  Mark pci_apply_final_quirks() __init rather than __devinit
  Rename pci_init() to pci_apply_final_quirks(), move it to quirks.c
  intel-iommu: Yet another BIOS workaround: Isoch DMAR unit with no TLB space
  intel-iommu: Decode (and ignore) RHSA entries
  intel-iommu: Make "Unknown DMAR structure" message more informative
2009-10-13 10:04:40 -07:00
Ben Dooks
ed9d040d40 ASoC: S3C: Remove <plat/audio.h>
Remove the <plat/audio.h> include from arch/arm/plat-s3c/include/plat/audio.h
as it provides nothing to the current kernel and is not in any future plans
for the system.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Simtec Linux Team <linux@simtec.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-10-13 13:33:53 +01:00
Magnus Damm
748031f9fd net: allow sh_eth to get mac address through platform data
Extend the sh_eth driver to allow passing the mac address
using the platform data structure. This to simplify board
setup code.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:05 -07:00
Valentin Longchamp
679bfef0e3 MXC: fix reset for mx31, mx35 and mx27 SoCs
The clock name for the watchdog devices was not set consistently
with mx21 on these platforms, resulting in the reset not to work.

Signed-off-by: Valentin Longchamp <valentin.longchamp@epfl.ch>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-10-13 10:24:10 +02:00
Guennadi Liakhovetski
d9e8b88478 fix pcm037_eet compilation with the new SPI driver
Fix pcm037_eet compilation with the new imx SPI driver by unifying
platform device names.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Valentin Longchamp <valentin.longchamp@epfl.ch>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-10-13 10:24:09 +02:00
Guennadi Liakhovetski
324c1aa3df fix compilation of i.MX31 platforms
mxc_iomux_v3_init() is defined in arch/arm/plat-mxc/iomux-v3.c, which is
not linked for i.MX31 and produces an undefined reference error. Fix this
by building the offending code only for i.MX35.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-10-13 10:24:08 +02:00
Sascha Hauer
a90c31a3b7 pcm970 mmc: Fix ro switch
We have to use mxc_gpio_mode() for the card detection pin
instead of mxc_gpio_setup_multiple_pins() because the latter
does a gpio_request() and thus a later gpio_request() fails.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-10-13 10:24:07 +02:00
Sascha Hauer
6153384161 pcm038: Add SPI/MC13783 support
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-10-13 10:24:02 +02:00
Paul Mundt
e4b053d96a sh: ftrace: Make code modification NMI safe.
This cribs the x86 implementation of ftrace_nmi_enter() and friends to
make ftrace_modify_code() NMI safe, particularly on SMP configurations.

For additional notes on the problems involved, see the comment below
ftrace_call_replace().

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 16:52:50 +09:00
David S. Miller
c58543c869 sparc64: Set IRQF_DISABLED on LDC channel IRQs.
With lots of virtual devices it's easy to generate a lot of
events and chew up the kernel IRQ stack.

Reported-by: hyl <heyongli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 00:49:09 -07:00
Hidetoshi Seto
8968f9d3dc perf_event, x86, mce: Use TRACE_EVENT() for MCE logging
This approach is the first baby step towards solving many of the
structural problems the x86 MCE logging code is having today:

 - It has a private ring-buffer implementation that has a number
   of limitations and has been historically fragile and buggy.

 - It is using a quirky /dev/mcelog ioctl driven ABI that is MCE
   specific. /dev/mcelog is not part of any larger logging
   framework and hence has remained on the fringes for many years.

 - The MCE logging code is still very unclean partly due to its ABI
   limitations. Fields are being reused for multiple purposes, and
   the whole message structure is limited and x86 specific to begin
   with.

All in one, the x86 tree would like to move away from this private
implementation of an event logging facility to a broader framework.

By using perf events we gain the following advantages:

 - Multiple user-space agents can access MCE events. We can have an
   mcelog daemon running but also a system-wide tracer capturing
   important events in flight-recorder mode.

 - Sampling support: the kernel and the user-space call-chain of MCE
   events can be stored and analyzed as well. This way actual patterns
   of bad behavior can be matched to precisely what kind of activity
   happened in the kernel (and/or in the app) around that moment in
   time.

 - Coupling with other hardware and software events: the PMU can track a
   number of other anomalies - monitoring software might chose to
   monitor those plus the MCE events as well - in one coherent stream of
   events.

 - Discovery of MCE sources - tracepoints are enumerated and tools can
   act upon the existence (or non-existence) of various channels of MCE
   information.

 - Filtering support: we just subscribe to and act upon the events we
   are interested in. Then even on a per event source basis there's
   in-kernel filter expressions available that can restrict the amount
   of data that hits the event channel.

 - Arbitrary deep per cpu buffering of events - we can buffer 32
   entries or we can buffer as much as we want, as long as we have
   the RAM.

 - An NMI-safe ring-buffer implementation - mappable to user-space.

 - Built-in support for timestamping of events, PID markers, CPU
   markers, etc.

 - A rich ABI accessible over system call interface. Per cpu, per task
   and per workload monitoring of MCE events can be done this way. The
   ABI itself has a nice, meaningful structure.

 - Extensible ABI: new fields can be added without breaking tooling.
   New tracepoints can be added as the hardware side evolves. There's
   various parsers that can be used.

 - Lots of scheduling/buffering/batching modes of operandi for MCE
   events. poll() support. mmap() support. read() support. You name it.

 - Rich tooling support: even without any MCE specific extensions added
   the 'perf' tool today offers various views of MCE data: perf report,
   perf stat, perf trace can all be used to view logged MCE events and
   perhaps correlate them to certain user-space usage patterns. But it
   can be used directly as well, for user-space agents and policy action
   in mcelog, etc.

With this we hope to achieve significant code cleanup and feature
improvements in the MCE code, and we hope to be able to drop the
/dev/mcelog facility in the end.

This patch is just a plain dumb dump of mce_log() records to
the tracepoints / perf events framework - a first proof of
concept step.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4AD42A0D.7050104@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:43:38 +02:00
Ingo Molnar
9dbdd6c41c Merge commit 'v2.6.32-rc4' into perf/core
Merge reason: we were on an -rc1 base, merge up to -rc4.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:31:34 +02:00
Paul Mundt
c8afde7f40 sh: Don't profile return_address().
This adds return_address.c to the -pg exclusion list, as this is the
building block for CALLER_ADDRx we do not want to profile this.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 16:31:08 +09:00
Ingo Molnar
2c96c142e9 Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up tracing/filters fix from the urgent queue,
              we will queue up dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:24:59 +02:00
Jeremy Fitzhardinge
71999d9862 x86/paravirt: Use normal calling sequences for irq enable/disable
Bastian Blank reported a boot crash with stackprotector enabled,
and debugged it back to edx register corruption.

For historical reasons irq enable/disable/save/restore had special
calling sequences to make them more efficient.  With the more
recent introduction of higher-level and more general optimisations
this is no longer necessary so we can just use the normal PVOP_
macros.

This fixes some residual bugs in the old implementations which left
edx liable to inadvertent clobbering. Also, fix some bugs in
__PVOP_VCALLEESAVE which were revealed by actual use.

Reported-by: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4AD3BC9B.7040501@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:22:01 +02:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Ingo Molnar
7a693d3f0d perf_events, x86: Fix event constraints code
There was namespace overlap due to a rename i did - this caused
the following build warning, reported by Stephen Rothwell against
linux-next x86_64 allmodconfig:

  arch/x86/kernel/cpu/perf_event.c: In function 'intel_get_event_idx':
  arch/x86/kernel/cpu/perf_event.c:1445: warning: 'event_constraint' is used uninitialized in this function

This is a real bug not just a warning: fix it by renaming the
global event-constraints table pointer to 'event_constraints'.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013144223.369d616d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 08:19:53 +02:00
Paul Mundt
5a3abba77d sh: Tidy up the dwarf module helpers.
This enables us to build the dwarf unwinder both with modules enabled and
disabled in addition to reducing code size in the latter case. The
helpers are also consolidated, and modified to resemble the BUG module
helpers.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 13:32:19 +09:00
Paul Mundt
ac4fac8cb2 sh: Generalize CALLER_ADDRx support.
This splits out the unwinder implementation and adds a new
return_address() abstraction modelled after the ARM code. The DWARF
unwinder is tied in to this, returning NULL otherwise in the case of
being unable to support arbitrary depths.

This enables us to get correct behaviour with the unwinder enabled,
as well as disabling the arbitrary depth support when frame pointers are
enabled, as arbitrary depths with __builtin_return_address() are not
supported regardless.

With this abstraction it's also possible to layer on a simplified
implementation with frame pointers in the event that the unwinder isn't
enabled, although this is left as a future exercise.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 13:10:14 +09:00
Paul Mundt
5852b203ef Merge branch 'sh/stable-updates' 2009-10-13 12:45:08 +09:00
Paul Mundt
9922262242 sh: ftrace: Fix up syscall tracepoint support.
Sync up with latest core changes in the syscalls tracing area:

- tracing: Map syscall name to number (syscall_name_to_nr())
- tracing: Call arch_init_ftrace_syscalls at boot
- tracing: add support tracepoint ids (set_syscall_{enter,exit}_id())

Taken from the s390 change.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 12:42:48 +09:00
Paul Mundt
95019b48ad Merge branch 'sh/stable-updates' 2009-10-13 11:27:08 +09:00
Paul Mundt
964f7e5a56 sh: force dcache flush if dcache_dirty bit set.
This too follows the ARM change, given that the issue at hand applies to
all platforms that implement lazy D-cache writeback.

This fixes up the case when a page mapping disappears between the
flush_dcache_page() call (when PG_dcache_dirty is set for the page) and
the update_mmu_cache() call -- such as in the case of swap cache being
freed early. This kills off the mapping test in update_mmu_cache() and
switches to simply testing for PG_dcache_dirty.

Reported-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 11:18:34 +09:00
Paul Mundt
af67c3a9e6 sh: update die() output.
This follows the ARM change, as SH had all of the same issues:

Make die() better match x86:
- add printing of the last accessed sysfs file
- ensure console_verbose() is called under the lock
- ensure we panic outside of oops_exit()

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-13 10:57:52 +09:00
Paul Mundt
7a0064d672 Merge branch 'sh/ftrace' of git://github.com/mfleming/linux-2.6 2009-10-13 10:31:50 +09:00
H. Peter Anvin
98272ed0d2 x86: use kernel_stack_pointer() in kprobes.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
5ca6c0ca5d x86: use kernel_stack_pointer() in kgdb.c
The way to obtain a kernel-mode stack pointer from a struct
pt_regs in 32-bit mode is "subtle": the stack doesn't actually
contain the stack pointer, but rather the location where it would
have been marks the actual previous stack frame.  For clarity, use
kernel_stack_pointer() instead of coding this weirdness
explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
a343c75d33 x86: use kernel_stack_pointer() in dumpstack.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Furthermore, user_mode() is only valid when the process is known to
not run in V86 mode.  Use the safer user_mode_vm() instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
H. Peter Anvin
def3c5d0a3 x86: use kernel_stack_pointer() in process_32.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
David Rientjes
adc1938994 x86: Interleave emulated nodes over physical nodes
Add interleaved NUMA emulation support

This patch interleaves emulated nodes over the system's physical
nodes. This is required for interleave optimizations since
mempolicies, for example, operate by iterating over a nodemask and
act without knowledge of node distances.  It can also be used for
testing memory latencies and NUMA bugs in the kernel.

There're a couple of ways to do this:

 - divide the number of emulated nodes by the number of physical
   nodes and allocate the result on each physical node, or

 - allocate each successive emulated node on a different physical
   node until all memory is exhausted.

The disadvantage of the first option is, depending on the asymmetry
in node capacities of each physical node, emulated nodes may
substantially differ in size on a particular physical node compared
to another.

The disadvantage of the second option is, also depending on the
asymmetry in node capacities of each physical node, there may be
more emulated nodes allocated on a single physical node as another.

This patch implements the second option; we sacrifice the
possibility that we may have slightly more emulated nodes on a
particular physical node compared to another in lieu of node size
asymmetry.

 [ Note that "node capacity" of a physical node is not only a
   function of its addressable range, but also is affected by
   subtracting out the amount of reserved memory over that range.
   NUMA emulation only deals with available, non-reserved memory
   quantities. ]

We ensure there is at least a minimal amount of available memory
allocated to each node.  We also make sure that at least this
amount of available memory is available in ZONE_DMA32 for any node
that includes both ZONE_DMA32 and ZONE_NORMAL.

This patch also cleans the emulation code up by no longer passing
the statically allocated struct bootnode array among the various
functions. This init.data array is not allocated on the stack since
it may be very large and thus it may be accessed at file scope.

The WARN_ON() for nodes_cover_memory() when faking proximity
domains is removed since it relies on successive nodes always
having greater start addresses than previous nodes; with
interleaving this is no longer always true.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8716273cae x86: Export srat physical topology
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init().  This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.

acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located.  If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.

acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8ee2debce3 x86: Export k8 physical topology
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it.  This does the k8 node setup in two parts:
detection and registration.  NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly.  If emulation isn't used, the k8 nodes are
registered as normal.

Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time.  This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.

This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch.  The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.

This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.

k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
David Rientjes
1af5ba514f x86: Clean up and add missing log levels for k8
Convert all printk's in arch/x86/mm/k8topology_64.c to use
pr_info() or pr_err() appropriately.

Adds log levels for messages currently lacking them.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
Arjan van de Ven
ad8f4356af x86: Don't use the strict copy checks when branch profiling is in use
The branch profiling creates very complex code for each if
statement, to the point that gcc has trouble even analyzing
something as simple as

  if (count > 5)
      count = 5;

This then means that causing an error on code that gcc cannot
analyze for copy_from_user() and co is not very productive.

This patch excludes the strict copy checks in the case of branch
profiling being enabled.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091006070452.5e1fc119@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:29:51 +02:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
H. Peter Anvin
d1705c558c x86: fix kernel panic on 32 bits when profiling
Latest kernel has a kernel panic in booting on i386 machine when
profile=2 setting in cmdline.  It is due to 'sp' being incorrect in
profile_pc().

BUG: unable to handle kernel NULL pointer dereference at 00000246
IP: [<c01288b6>] profile_pc+0x2a/0x48
*pde = 00000000
Oops: 0000 [#1] SMP

This differs from the original version by Alex Shi in that we use the
kernel_stack_pointer() inline already defined in <asm/ptrace.h> for
this purpose, instead of #ifdef.

Originally-by: Alex Shi <alex.shi@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 11:53:51 -07:00
Nitin Gupta
787b2faadc ARM: force dcache flush if dcache_dirty bit set
On ARM, update_mmu_cache() does dcache flush for a page only if
it has a kernel mapping (page_mapping(page) != NULL). The correct
behavior would be to force the flush based on dcache_dirty bit only.

One of the cases where present logic would be a problem is when
a RAM based block device[1] is used as a swap disk. In this case,
we would have in-memory data corruption as shown in steps below:

do_swap_page()
{
    - Allocate a new page (if not already in swap cache)
    - Issue read from swap disk
        - Block driver issues flush_dcache_page()
        - flush_dcache_page() simply sets PG_dcache_dirty bit and does not
          actually issue a flush since this page has no user space mapping yet.
    - Now, if swap disk is almost full, this newly read page is removed
      from swap cache and corrsponding swap slot is freed.
    - Map this page anonymously in user space.
    - update_mmu_cache()
        - Since this page does not have kernel mapping (its not in page/swap
          cache and is mapped anonymously), it does not issue dcache flush
          even if dcache_dirty bit is set by flush_dcache_page() above.

    <user now gets stale data since dcache was never flushed>
}

Same problem exists on mips too.

[1] example:
 - brd (RAM based block device)
 - ramzswap (RAM based compressed swap device)

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-10-12 17:52:26 +01:00
Brian Gerst
ae24ffe5ec x86, 64-bit: Move K8 B step iret fixup to fault entry asm
Move the handling of truncated %rip from an iret fault to the fault
entry path.

This allows x86-64 to use the standard search_extable() function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:29:46 +02:00
Jan Beulich
7a4b7e5e74 x86: Fix Suspend to RAM freeze on Acer Aspire 1511Lmi laptop
Move the trampoline and accessors back out of .cpuinit.* for the
case of 64-bits+ACPI_SLEEP.

This solves s2ram hangs reported in:

  http://bugzilla.kernel.org/show_bug.cgi?id=14279

Reported-and-bisected-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <bugzilla-daemon@bugzilla.kernel.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:06:48 +02:00