Create the dev_set_name function now so that various subsystems can
start changing over to it before other changes in 2.6.27 will make it
compulsory.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
improve the sysbench ramp-up phase and its peak throughput on
a 16way NUMA box, by turning on WAKE_AFFINE:
tip/sched tip/sched+wake-affine
-------------------------------------------------
1: 700 830 +15.65%
2: 1465 1391 -5.28%
4: 3017 3105 +2.81%
8: 5100 6021 +15.30%
16: 10725 10745 +0.19%
32: 10135 10150 +0.16%
64: 9338 9240 -1.06%
128: 8599 8252 -4.21%
256: 8475 8144 -4.07%
-------------------------------------------------
SUM: 57558 57882 +0.56%
this change also improves lat_ctx from 6.69 usecs to 1.11 usec:
$ ./lat_ctx -s 0 2
"size=0k ovr=1.19
2 1.11
$ ./lat_ctx -s 0 2
"size=0k ovr=1.22
2 6.69
in sysbench it's an overall win with some weakness at the lots-of-clients
side. That happens because we now under-balance this workload
a bit. To counter that effect, turn on NEWIDLE:
wake-idle wake-idle+newidle
-------------------------------------------------
1: 830 834 +0.43%
2: 1391 1401 +0.65%
4: 3105 3091 -0.43%
8: 6021 6046 +0.42%
16: 10745 10736 -0.08%
32: 10150 10206 +0.55%
64: 9240 9533 +3.08%
128: 8252 8355 +1.24%
256: 8144 8384 +2.87%
-------------------------------------------------
SUM: 57882 58591 +1.21%
as a bonus this not only improves the many-clients case but
also improves the (more important) rampup phase.
sysbench is a workload that quickly breaks down if the
scheduler over-balances, so since it showed an improvement
under NEWIDLE this change is definitely good.
Previously, DCR support was configured at compile time to either use
MMIO or native dcr instructions. Although this works for most
platforms, it fails on FPGA platforms:
1) Systems may include more than one DCR bus.
2) Systems may be native DCR capable and still use memory mapped DCR interface.
This patch provides runtime support based on the device trees for the
case where CONFIG_PPC_DCR_MMIO and CONFIG_PPC_DCR_NATIVE are both
selected. Previously, this was a poorly defined configuration, which
happened to provide NATIVE support. The runtime selection is made
based on the dcr-controller having a 'dcr-access-method' attribute
in the device tree. If only one of the above options is selected,
then the code uses #defines to select only the used code in order to
avoid introducing overhead in existing usage.
Signed-off-by: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
I tried to group recovery related fields nearby (non-CA_Open related
variables, to be more accurate) so that one to three cachelines would
not be necessary in CA_Open. These are now contiguously deployed:
struct sk_buff_head out_of_order_queue; /* 1968 80 */
/* --- cacheline 32 boundary (2048 bytes) --- */
struct tcp_sack_block duplicate_sack[1]; /* 2048 8 */
struct tcp_sack_block selective_acks[4]; /* 2056 32 */
struct tcp_sack_block recv_sack_cache[4]; /* 2088 32 */
/* --- cacheline 33 boundary (2112 bytes) was 8 bytes ago --- */
struct sk_buff * highest_sack; /* 2120 8 */
int lost_cnt_hint; /* 2128 4 */
int retransmit_cnt_hint; /* 2132 4 */
u32 lost_retrans_low; /* 2136 4 */
u8 reordering; /* 2140 1 */
u8 keepalive_probes; /* 2141 1 */
/* XXX 2 bytes hole, try to pack */
u32 prior_ssthresh; /* 2144 4 */
u32 high_seq; /* 2148 4 */
u32 retrans_stamp; /* 2152 4 */
u32 undo_marker; /* 2156 4 */
int undo_retrans; /* 2160 4 */
u32 total_retrans; /* 2164 4 */
...and they're then followed by URG slowpath & keepalive related
variables.
Head of the out_of_order_queue always needed for empty checks, if
that's empty (and TCP is in CA_Open), following ~200 bytes (in 64-bit)
shouldn't be necessary for anything. If only OFO queue exists but TCP
is in CA_Open, selective_acks (and possibly duplicate_sack) are
necessary besides the out_of_order_queue but the rest of the block
again shouldn't be (ie., the other direction had losses).
As the cacheline boundaries depend on many factors in the preceeding
stuff, trying to align considering them doesn't make too much sense.
Commented one ordering hazard.
There are number of low utilized u8/16s that could be combined get 2
bytes less in total so that the hole could be made to vanish (includes
at least ecn_flags, urg_data, urg_mode, frto_counter, nonagle).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.
With bisect, I located the following patch:
| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group scheduling
Revert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
-tip testing found the following build breakage:
drivers/built-in.o: In function `xen_suspend':
manage.c:(.text+0x4390f): undefined reference to `xen_console_resume'
with this config:
http://redhat.com/~mingo/misc/config-Thu_May_29_09_23_16_CEST_2008.bad
i have bisected it down to:
| commit 0e91398f2a
| Author: Jeremy Fitzhardinge <jeremy@goop.org>
| Date: Mon May 26 23:31:27 2008 +0100
|
| xen: implement save/restore
the problem is that drivers/xen/manage.c is built unconditionally if
CONFIG_XEN is enabled and makes use of xen_suspend(), but
drivers/char/hvc_xen.c, where the xen_suspend() method is implemented,
is only build if CONFIG_HVC_XEN=y as well.
i have solved this by providing a NOP implementation for xen_suspend()
in the !CONFIG_HVC_XEN case.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The SOC_DOUBLE_S8_TLV control type was originally implemented in the
UDA1380 driver by Philipp Zabel and was moved into the core by me.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The commit
commit 4b82b27770
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Sat May 24 19:36:35 2008 +0400
set nmi_watchdog to NMI_IO_APIC as by default. This causes hangs on some
machines with buggy watchdogs. Fix it - i.e. restore old behaviour.
Thanks to Sitsofe Wheeler and Adrian Bunk for catching the problem
and Maciej W. Rozycki for explanation what is going on there.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> +#define ARCH_KMALLOC_MINALIGN (sizeof(long) * 2)
> +#define ARCH_SLAB_MINALIGN (sizeof(long) * 2)
This doesn't work if SLAB is selected and slab debugging is enabled as
these are passed to the preprocessor, and the preprocessor doesn't
understand sizeof.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: fix RCU problem in cfq_cic_lookup()
block: make blktrace use per-cpu buffers for message notes
Added in elevator switch message to blktrace stream
Added in MESSAGE notes for blktraces
block: reorder cfq_queue to save space on 64bit builds
block: Move the second call to get_request to the end of the loop
splice: handle try_to_release_page() failure
splice: fix sendfile() issue with relay
Specify the minimum slab/kmalloc alignment to be 8 bytes. This fixes a
crash when SLOB is selected as the memory allocator. The FRV arch needs
this so that it can use the load- and store-double instructions without
faulting. By default SLOB sets the minimum to be 4 bytes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.
The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Introduce pv_time_ops which adds hook to steal time accounting.
On virtualized environment, cpus are shared by many guests and
steal time is the time which is used for other guests.
On virtualized environtment, streal time should be accounted.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_irq_ops which adds hooks to paravirtualize irq related
operations.
On virtualized environment, interruption may be replaced by something
virtualization friendly. So the irq related operation also may need
paravirtualization.
This patch adds necessary hooks to paravirtualize irq related operations.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add hooks to paravirtualize iosapic which is a real hardware resource.
On virtualized environment it may be replaced something virtualized
friendly.
Define pv_iosapic_ops and add the hooks.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define pv_init_ops hooks which represents various initialization
hooks for paravirtualized environment. and add hooks.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make NR_IRQ overridable by each pv instances.
Pv instance may need each own number of irqs so that
NR_IRQS should be the maximum number of nr_irqs each
pv instances need.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel.
They include sensitive or performance critical privileged instructions
so that they need paravirtualization.
To paravirtualize them by single source and multi compile
they are converted into indirect jump. And define each pv instances.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize minstate.h which are hand written assembly code.
They include sensitive or performance critical privileged
instructions. So that they are appropriate for paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_cpu_ops to paravirtualize privleged instructions
which are defined by ia64 intrinsics.
make them indirect C function calls by introducing function
tables, pv_cpu_ops.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a setup hook in the very early boot sequence
before start_kernel() to initialize paravirtualization stuff.
The hook will be set by each pv loader code or by using multi entry point.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_info which describes some randome info about
underlying execution environment.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
__local_irq_save() and local_save_flags() are used to mask interruptions.
They read all psr bits that requres whole bit emulation.
On the other hand, reading only psr.i, the single bit, can be virtualized
cheaply.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
make kernel paravirtualization friendly by introducing
ia64_set_rr0_to_rr4().
ia64/Xen will replace setting rr[0-4] with single hypercall later.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move the LOAD_OFFSET definition from vmlinux.lds.S into system.h.
On paravirtualized environments, it is necessary to detect the
execution environment. One of the solutions is the multi entry point.
The multi entry point allows a boot loader to start the kernel execution
from the entry point which is different from the ELF entry point.
The non standard entry point will defined as the specialized elf note
which contains the LMA of the entry point symbol.
The constant, LOAD_OFFSET, is necessary to calculate the symbol's LMA.
Move the definition into the public header file to make it available
to the multi entry point support.
Cc: "He, Qing" <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
remove extern declaration of handle_IPI() in irq_ia64.c.
Instead, declare it in asm-ia64/smp.h.
Later handle_IPI() will be referenced from another file.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state. Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.
Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.
Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.
The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros. This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96. The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).
Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.
The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Replace CONFIG_SND_DEBUG_DETECT with CONFIG_SND_DEBUG_VERBOSE to
represent its meaning more better. This config isn't provided only
for the detection but for more verbose debug prints in general.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Hook into the device model to make sure that timekeeping's resume handler
is called. This deals with our clocksource's non-monotonicity over the
save/restore. Explicitly call clock_has_changed() to make sure that
all the timers get retriggered properly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements Xen save/restore and migration.
Saving is triggered via xenbus, which is polled in
drivers/xen/manage.c. When a suspend request comes in, the kernel
prepares itself for saving by:
1 - Freeze all processes. This is primarily to prevent any
partially-completed pagetable updates from confusing the suspend
process. If CONFIG_PREEMPT isn't defined, then this isn't necessary.
2 - Suspend xenbus and other devices
3 - Stop_machine, to make sure all the other vcpus are quiescent. The
Xen tools require the domain to run its save off vcpu0.
4 - Within the stop_machine state, it pins any unpinned pgds (under
construction or destruction), performs canonicalizes various other
pieces of state (mostly converting mfns to pfns), and finally
5 - Suspend the domain
Restore reverses the steps used to save the domain, ending when all
the frozen processes are thawed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add code to:
1. Deal with the console page being canonicalized. During save, the
console's mfn in the start_info structure is canonicalized to a pfn.
In order to deal with that, we always use a copy of the pfn and
indirect off that all the time. However, we fall back to using the
mfn if the pfn hasn't been initialized yet.
2. Restore the console event channel, and rebind it to the existing irq.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add rebind_evtchn_irq(), which will rebind an device driver's existing
irq to a new event channel on restore. Since the new event channel
will be masked and bound to vcpu0, we update the state accordingly and
unmask the irq once everything is set up.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a config option to set the max size of a Xen domain. This is used
to scale the size of the physical-to-machine array; it ends up using
around 1 page/GByte, so there's no reason to be very restrictive.
For a 32-bit guest, the default value of 8GB is probably sufficient;
there's not much point in giving a 32-bit machine much more memory
than that.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We now support the use of memory hotplug, so the physical to machine
page mapping structure must be dynamic. This is implemented as a
two-level radix tree structure, which allows us to efficiently
incrementally allocate memory for the p2m table as new pages are
added.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add xen handles realted definitions for xen memory which ia64/xen needs.
Pointer argumsnts for ia64/xen hypercall are passed in pseudo physical
address (guest physical address) so that it is required to convert
guest kernel virtual address into pseudo physical address.
The xen guest handle represents such arguments.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The pvfb backend indicates dynamic mode support by creating node
feature_resize with a non-zero value in its xenstore directory.
xen-fbfront sends a resize notification event on mode change. Fully
backwards compatible both ways.
Framebuffer size and initial resolution can be controlled through
kernel parameter xen_fbfront.video. The backend enforces a separate
size limit, which it advertises in node videoram in its xenstore
directory.
xen-kbdfront gets the maximum screen resolution from nodes width and
height in the backend's xenstore directory instead of hardcoding it.
Additional goodie: support for larger framebuffers (512M on a 64-bit
system with 4K pages).
Changing the number of bits per pixels dynamically is not supported,
yet.
Ported from
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/92f7b3144f41http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/bfc040135633
Signed-off-by: Pat Campbell <plc@novell.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>