Impact: New perf_counter features
This primarily adds a way for perf_counter users to enable and disable
counters and groups. Enabling or disabling a counter or group also
enables or disables all of the child counters that have been cloned
from it to monitor children of the task monitored by the top-level
counter. The userspace interface to enable/disable counters is via
ioctl on the counter file descriptor.
Along the way this extends the code that handles child counters to
handle child counter groups properly. A group with multiple counters
will be cloned to child tasks if and only if the group leader has the
hw_event.inherit bit set - if it is set the whole group is cloned as a
group in the child task.
In order to be able to enable or disable all child counters of a given
top-level counter, we need a way to find them all. Hence I have added
a child_list field to struct perf_counter, which is the head of the
list of children for a top-level counter, or the link in that list for
a child counter. That list is protected by the perf_counter.mutex
field.
This also adds a mutex to the perf_counter_context struct. Previously
the list of counters was protected just by the lock field in the
context, which meant that perf_counter_init_task had to take that lock
and then take whatever lock/mutex protects the top-level counter's
child_list. But the counter enable/disable functions need to take
that lock in order to traverse the list, then for each counter take
the lock in that counter's context in order to change the counter's
state safely, which will lead to a deadlock.
To solve this, we now have both a mutex and a spinlock in the context,
and taking either is sufficient to ensure the list of counters can't
change - you have to take both before changing the list. Now
perf_counter_init_task takes the mutex instead of the lock (which
incidentally means that inherit_counter can use GFP_KERNEL instead of
GFP_ATOMIC) and thus avoids the possible deadlock. Similarly the new
enable/disable functions can take the mutex while traversing the list
of child counters without incurring a possible deadlock when the
counter manipulation code locks the context for a child counter.
We also had an misfeature that the first counter added to a context
would possibly not go on until the next sched-in, because we were
using ctx->nr_active to detect if the context was running on a CPU.
But nr_active is the number of active counters, and if that was zero
(because the context didn't have any counters yet) it would look like
the context wasn't running on a cpu and so the retry code in
__perf_install_in_context wouldn't retry. So this adds an 'is_active'
field that is set when the context is on a CPU, even if it has no
counters. The is_active field is only used for task contexts, not for
per-cpu contexts.
If we enable a subsidiary counter in a group that is active on a CPU,
and the arch code can't enable the counter, then we have to pull the
whole group off the CPU. We do this with group_sched_out, which gets
moved up in the file so it comes before all its callers. This also
adds similar logic to __perf_install_in_context so that the "all on,
or none" invariant of groups is preserved when adding a new counter to
a group.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The newly added PERCPU_*() macros define and use __per_cpu_load but
VMLINUX_SYMBOL() was missing from usages causing build failures on
archs where linker visible symbol is different from C symbols
(e.g. blackfin).
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix (delete) more mac80211 kernel-doc:
Warning(linux-2.6.28-git13//include/net/mac80211.h:375): Excess struct/union/enum/typedef member 'retry_count' description in 'ieee80211_tx_info'
Warning(linux-2.6.28-git13//net/mac80211/sta_info.h:308): Excess struct/union/enum/typedef member 'last_txrate' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is a problem in our handling of suspend-resume of PCI devices that
many of them have their standard config registers restored with
interrupts enabled and they are put into the full power state with
interrupts enabled as well. This may lead to the following scenario:
* an interrupt vector is shared between two or more devices
* one device is resumed earlier and generates an interrupt
* the interrupt handler of another device tries to handle it and
attempts to access the device the config space of which hasn't been
restored yet and/or which still is in a low power state
* the system crashes as a result
To prevent this from happening we should restore the standard
configuration registers of all devices with interrupts disabled and we
should put them into the D0 power state right after that.
Unfortunately, this cannot be done using the existing
pci_set_power_state(), because it can sleep. Also, to do it we have to
make sure that the config spaces of all devices were actually saved
during suspend.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We implement dqget() and dqput() that need neither dqonoff_mutex nor dqptr_sem.
Then move dqget() and dqput() calls so that they are not called from under
dqptr_sem. This is important because filesystem callbacks aren't called from
under dqptr_sem which used to cause *lots* of problems with lock ranking
(and with OCFS2 they became close to unsolvable).
The patch also removes two functions which were introduced solely because OCFS2
needed them to cope with the old locking scheme. As time showed, they were not
enough for OCFS2 anyway and it would be unnecessary work to adapt them to the
new locking scheme in which they aren't needed. As a result OCFS2 needs the
following patch to compile properly with quotas. Sorry to any bisecters which
hit this in advance.
Signed-off-by: Jan Kara <jack@suse.cz>
Otherwise it can be very confusing to find a "EXT3-fs: " failure in
the middle of EXT4-fs failures, and it makes it harder to track the
source of the failure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Provide a shared interrupt debug facility under CONFIG_DEBUG_SHIRQ:
it uses the existing irqpoll facilities to iterate through all
registered interrupt handlers and call those which can handle shared
IRQ lines.
This can be handy for suspend/resume debugging: if we call this function
early during resume we can trigger crashes in those drivers which have
incorrect assumptions about when exactly their ISRs will be called
during suspend/resume.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
sata_fsl: Return non-zero on error in probe()
drivers/ata/pata_ali.c: s/isa_bridge/ali_isa_bridge/ to fix alpha build
libata: New driver for OCTEON SOC Compact Flash interface (v7).
libata: Add another column to the ata_timing table.
sata_via: Add VT8261 support
pata_atiixp: update port enabledness test handling
[libata] get-identity ioctl: Fix use of invalid memory pointer
The forthcoming OCTEON SOC Compact Flash driver needs an additional
timing value that was not available in the ata_timing table. I add a
new column for dmack_hold time. The values were obtained from the
Compact Flash specification Rev 4.1.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
for SAS drivers.
Caught by Ke Wei (and team?) at Marvell.
Also, move the ata_scsi_ioctl export to libata-scsi.c, as that seems to be the
general trend.
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
It is an optimization and a cleanup, and adds the following new
generic percpu methods:
percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()
and implements support for them on x86. (other architectures will fall
back to a default implementation)
The advantage is that for example to read a local percpu variable,
instead of this sequence:
return __get_cpu_var(var);
ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
We can get a single instruction by using the optimized variants:
return percpu_read(var);
ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.
tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.
Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.
After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().
This patch also removes now unnecessary get_local_pda() and its call
sites.
A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S. A new
PHDR is added as existing ones cannot contain sections near address
zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.
The following adjustments have been made to accomodate the address
change.
* code to locate percpu gdt_page in head_64.S is updated to add the
load address to the gdt_page offset.
* __per_cpu_load is used in places where access to the init data area
is necessary.
* pda->data_offset is initialized soon after C code is entered as zero
value doesn't work anymore.
This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Decoupling allows:
* hung tasks check to happen at very low priority
* hung tasks check and softlockup to be enabled/disabled independently
at compile and/or run-time
* individual panic settings to be enabled disabled independently
at compile and/or run-time
* softlockup threshold to be reduced without increasing hung tasks
poll frequency (hung task check is expensive relative to softlock watchdog)
* hung task check to be zero over-head when disabled at run-time
Signed-off-by: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephen Rothwell reported this linux-next build failure with !CONFIG_SCHEDSTATS:
| In file included from kernel/sched.c:1703:
| kernel/sched_fair.c: In function 'adaptive_gran':
| kernel/sched_fair.c:1324: error: 'struct sched_entity' has no member named 'avg_wakeup'
The start_runtime and avg_wakeup metrics are now not just for statistics,
but also for scheduling - so they always need to be available. (Also
move out the nr_migrations fields - for future perfcounters usage.)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge header files for m68k and m68knommu to the single location:
arch/m68k/include/asm
The majority of this patch was the result of the
script that is included in the changelog below.
The script was originally written by Arnd Bergman and
exten by me to cover a few more files.
When the header files differed the script uses the following:
The original m68k file is named <file>_mm.h [mm for memory manager]
The m68knommu file is named <file>_no.h [no for no memory manager]
The files uses the following include guard:
This include gaurd works as the m68knommu toolchain set
the __uClinux__ symbol - so this should work in userspace too.
Merging the header files for m68k and m68knommu exposes the
(unexpected?) ABI differences thus it is easier to actually
identify these and thus to fix them.
The commit has been build tested with both a m68k and
a m68knommu toolchain - with success.
The commit has also been tested with "make headers_check"
and this patch fixes make headers_check for m68knommu.
The script used:
TARGET=arch/m68k/include/asm
SOURCE=arch/m68knommu/include/asm
INCLUDE="cachectl.h errno.h fcntl.h hwtest.h ioctls.h ipcbuf.h \
linkage.h math-emu.h md.h mman.h movs.h msgbuf.h openprom.h \
oplib.h poll.h posix_types.h resource.h rtc.h sembuf.h shmbuf.h \
shm.h shmparam.h socket.h sockios.h spinlock.h statfs.h stat.h \
termbits.h termios.h tlb.h types.h user.h"
EQUAL="auxvec.h cputime.h device.h emergency-restart.h futex.h \
ioctl.h irq_regs.h kdebug.h local.h mutex.h percpu.h \
sections.h topology.h"
NOMUUFILES="anchor.h bootstd.h coldfire.h commproc.h dbg.h \
elia.h flat.h m5206sim.h m520xsim.h m523xsim.h m5249sim.h \
m5272sim.h m527xsim.h m528xsim.h m5307sim.h m532xsim.h \
m5407sim.h m68360_enet.h m68360.h m68360_pram.h m68360_quicc.h \
m68360_regs.h MC68328.h MC68332.h MC68EZ328.h MC68VZ328.h \
mcfcache.h mcfdma.h mcfmbus.h mcfne.h mcfpci.h mcfpit.h \
mcfsim.h mcfsmc.h mcftimer.h mcfuart.h mcfwdebug.h \
nettel.h quicc_simple.h smp.h"
FILES="atomic.h bitops.h bootinfo.h bug.h bugs.h byteorder.h cache.h \
cacheflush.h checksum.h current.h delay.h div64.h \
dma-mapping.h dma.h elf.h entry.h fb.h fpu.h hardirq.h hw_irq.h io.h \
irq.h kmap_types.h machdep.h mc146818rtc.h mmu.h mmu_context.h \
module.h page.h page_offset.h param.h pci.h pgalloc.h \
pgtable.h processor.h ptrace.h scatterlist.h segment.h \
setup.h sigcontext.h siginfo.h signal.h string.h system.h swab.h \
thread_info.h timex.h tlbflush.h traps.h uaccess.h ucontext.h \
unaligned.h unistd.h"
mergefile() {
BASE=${1%.h}
git mv ${SOURCE}/$1 ${TARGET}/${BASE}_no.h
git mv ${TARGET}/$1 ${TARGET}/${BASE}_mm.h
cat << EOF > ${TARGET}/$1
EOF
git add ${TARGET}/$1
}
set -e
mkdir -p ${TARGET}
git mv include/asm-m68k/* ${TARGET}
rmdir include/asm-m68k
git rm ${SOURCE}/Kbuild
for F in $INCLUDE $EQUAL; do
git rm ${SOURCE}/$F
done
for F in $NOMUUFILES; do
git mv ${SOURCE}/$F ${TARGET}/$F
done
for F in $FILES ; do
mergefile $F
done
rmdir arch/m68knommu/include/asm
rmdir arch/m68knommu/include
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
When mode setting is first initialized, the driver will call into
drm_helper_initial_config() to set up an initial output and framebuffer
configuration. This routine is responsible for probing the available
connectors, encoders, and crtcs, looking for modes and putting together
something reasonable (where reasonable is defined as "allows kernel
messages to be visible on as many displays as possible").
However, the code was a bit too aggressive in setting default modes when
none were found on a given connector. Even if some connectors had modes,
any connectors found lacking modes would have the default 800x600 mode added
to their mode list, which in some cases could cause problems later down the
line. In my case, the LVDS was perfectly available, but the initial config
code added 800x600 modes to both of the detected but unavailable HDMI
connectors (which are on my non-existent docking station). This ended up
preventing later code from setting a mode on my LVDS, which is bad.
This patch fixes that behavior by making the initial config code walk
through the connectors first, counting the available modes, before it decides
to add any default modes to a possibly connected output. It also fixes the
logic in drm_target_preferred() that was causing zeroed out modes to be set
as the preferred mode for a given connector, even if no modes were available.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (95 commits)
b44: GFP_DMA skb should not escape from driver
korina: do not use IRQF_SHARED with IRQF_DISABLED
korina: do not stop queue here
korina: fix handling tx_chain_tail
korina: do tx at the right position
korina: do schedule napi after testing for it
korina: rework korina_rx() for use with napi
korina: disable napi on close and restart
korina: reset resource buffer size to 1536
korina: fix usage of driver_data
bnx2x: First slow path interrupt race
bnx2x: MTU Filter
bnx2x: Indirection table initialization index
bnx2x: Missing brackets
bnx2x: Fixing the doorbell size
bnx2x: Endianness issues
bnx2x: VLAN tagged packets without VLAN offload
bnx2x: Protecting the link change indication
bnx2x: Flow control updated before reporting the link
bnx2x: Missing mask when calculating flow control
...
Unlike other alphas, marvel doesn't have real PC-style CMOS clock hardware
- RTC accesses are emulated via PAL calls. Unfortunately, for unknown
reason these calls work only on CPU #0. So current implementation for
arbitrary CPU makes CMOS_READ/WRITE to be executed on CPU #0 via IPI.
However, for obvious reason this doesn't work with standard
get/set_rtc_time() functions, where a bunch of CMOS accesses is done with
disabled interrupts.
Solved by making the IPI calls for entire get/set_rtc_time() functions,
not for individual CMOS accesses. Which is also a lot more effective
performance-wise.
The patch is largely based on the code from Jay Estabrook.
My changes:
- tweak asm-generic/rtc.h by adding a couple of #defines to
avoid a massive code duplication in arch/alpha/include/asm/rtc.h;
- sys_marvel.c: fix get/set_rtc_time() return values (Jay's FIXMEs).
NOTE: this fixes *only* LIB_RTC drivers. Legacy (CONFIG_RTC) driver
wont't work on marvel. Actually I think that we should just disable
CONFIG_RTC on alpha (maybe in 2.6.30?), like most other arches - AFAIK,
all modern distributions use LIB_RTC anyway.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix __request_region() parameter kernel-doc notation and parameter name:
Warning(linux-2.6.28-git10//kernel/resource.c:627): No description found for parameter 'flags'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix jbd header file kernel-doc notation:
Warning(linux-2.6.28-git13//include/linux/jbd.h:823): No description found for parameter 'j_average_commit_time'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new avg_wakeup statistic.
avg_wakeup is a measure of how frequently a task wakes up other tasks, it
represents the average time between wakeups, with a limit of avg_runtime
for when it doesn't wake up anybody.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.
It also updates the ibm_newemac driver to use that, this fixing the
oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()
Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (29 commits)
powerpc/83xx: Move mcu_mpc8349emitx driver out of drivers/i2c/chips/
powerpc/83xx: Make serial ports work on MPC8315E-RDB w/ FSL U-Boots
powerpc/e500mc: Doorbells need to be taken w/exceptions disabled
powerpc: Enable PS3 options and QPACE in ppc64_defconfig
powerpc/powermac: Fix occasional SMP boot failure
powerpc/cacheinfo: Rename cache_dir per-cpu variable
hvc_console: Use kzalloc() instead of kmalloc() + memset()
hvc_console: Do not set low_latency when using interrupts
hvc_console: Call free_irq() only if request_irq() was successful
hvc_console: Change an mb() to smp_mb() and add some comments
powerpc: Cleanup from l64 to ll64 change: drivers/net
powerpc: Cleanup from l64 to ll64 change: drivers/char
powerpc: Cleanup from l64 to ll64 change: arch code
powerpc: Change u64/s64 to a long long integer type
powerpc/kexec: Check crash_base for relocatable kernel
powerpc: Make dummy section a valid note header
Xilinx: SPI: updated driver for device tree
drivers/of: Add the of_find_i2c_device_by_node function.
powerpc/xsysace: add compatible string for non-ipcore instance
powerpc/mpc52xx: remove dead code from GPIO driver
...
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
[CVE-2009-0029] s390 specific system call wrappers
[CVE-2009-0029] System call wrappers part 33
[CVE-2009-0029] System call wrappers part 32
[CVE-2009-0029] System call wrappers part 31
[CVE-2009-0029] System call wrappers part 30
[CVE-2009-0029] System call wrappers part 29
[CVE-2009-0029] System call wrappers part 28
[CVE-2009-0029] System call wrappers part 27
[CVE-2009-0029] System call wrappers part 26
[CVE-2009-0029] System call wrappers part 25
[CVE-2009-0029] System call wrappers part 24
[CVE-2009-0029] System call wrappers part 23
[CVE-2009-0029] System call wrappers part 22
[CVE-2009-0029] System call wrappers part 21
[CVE-2009-0029] System call wrappers part 20
[CVE-2009-0029] System call wrappers part 19
[CVE-2009-0029] System call wrappers part 18
[CVE-2009-0029] System call wrappers part 17
[CVE-2009-0029] System call wrappers part 16
[CVE-2009-0029] System call wrappers part 15
...
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
IDE: fix sparse signed-ness errors with host->host_busy
ide: fix suspend regression
tx4938ide: Fix build error due to read_sff_dma_status moving
ide: remove unused CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ
sl82c105: remove dead code
via82cxxx: fix cable warning message
ide: can't use SSD/non-rotational queue flag for all CFA devices
it821x.c: use dev->revision instead of pci_read_config_byte
it821x: Add ultra_mask quirk for Vortex86SX
ide: fix accidental LOCKDEP breakage caused by local_irq_set() removal
The host_busy field in struct ide_host defaults to a
signed-long, where most arch's test_and_set_bit_*
macros use an unsigned long.
Change to using an unsigned long, which on ARM removes
the following sparse errors:
drivers/ide/ide-io.c:681:8: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:681:8: expected unsigned long volatile *p
drivers/ide/ide-io.c:681:8: got long volatile *<noident>
drivers/ide/ide-io.c:681:8: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:681:8: expected unsigned long volatile *p
drivers/ide/ide-io.c:681:8: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3: expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3: got long volatile *<noident>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Change mutex contention behaviour such that it will sometimes busy wait on
acquisition - moving its behaviour closer to that of spinlocks.
This concept got ported to mainline from the -rt tree, where it was originally
implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.
Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
gave a 345% boost for VFS scalability on my testbox:
# ./test-mutex-shm V 16 10 | grep "^avg ops"
avg ops/sec: 296604
# ./test-mutex-shm V 16 10 | grep "^avg ops"
avg ops/sec: 85870
The key criteria for the busy wait is that the lock owner has to be running on
a (different) cpu. The idea is that as long as the owner is running, there is a
fair chance it'll release the lock soon, and thus we'll be better off spinning
instead of blocking/scheduling.
Since regular mutexes (as opposed to rtmutexes) do not atomically track the
owner, we add the owner in a non-atomic fashion and deal with the races in
the slowpath.
Furthermore, to ease the testing of the performance impact of this new code,
there is means to disable this behaviour runtime (without having to reboot
the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
by issuing the following command:
# echo NO_OWNER_SPIN > /debug/sched_features
This command re-enables spinning again (this is also the default):
# echo OWNER_SPIN > /debug/sched_features
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem is that dropping the spinlock right before schedule is a voluntary
preemption point and can cause a schedule, right after which we schedule again.
Fix this inefficiency by keeping preemption disabled until we schedule, do this
by explicity disabling preemption and providing a schedule() variant that
assumes preemption is already disabled.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This assertion is incorrect for lockless pagecache. By definition if we
have an unpinned page that we are trying to take a speculative reference
to, it may become the tail of a compound page at any time (if it is
freed, then reallocated as a compound page).
It was still a valid assertion for the vmscan.c LRU isolation case, but
it doesn't seem incredibly helpful... if somebody wants it, they can
put it back directly where it applies in the vmscan code.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This enables the use of syscall wrappers to do proper sign extension
for 64-bit programs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
By selecting HAVE_SYSCALL_WRAPPERS architectures can activate
system call wrappers in order to sign extend system call arguments.
All architectures where the ABI defines that the caller of a function
has to perform sign extension probably need this.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Impact: new tracer
The workqueue tracer provides some statistical informations
about each cpu workqueue thread such as the number of the
works inserted and executed since their creation. It can help
to evaluate the amount of work each of them have to perform.
For example it can help a developer to decide whether he should
choose a per cpu workqueue instead of a singlethreaded one.
It only traces statistical informations for now but it will probably later
provide event tracing too.
Such a tracer could help too, and be improved, to help rt priority sorted
workqueue development.
To have a snapshot of the workqueues state at any time, just do
cat /debugfs/tracing/trace_stat/workqueues
Ie:
1 125 125 reiserfs/1
1 0 0 scsi_tgtd/1
1 0 0 aio/1
1 0 0 ata/1
1 114 114 kblockd/1
1 0 0 kintegrityd/1
1 2147 2147 events/1
0 0 0 kpsmoused
0 105 105 reiserfs/0
0 0 0 scsi_tgtd/0
0 0 0 aio/0
0 0 0 ata_aux
0 0 0 ata/0
0 0 0 cqueue
0 0 0 kacpi_notify
0 0 0 kacpid
0 149 149 kblockd/0
0 0 0 kintegrityd/0
0 1000 1000 khelper
0 2270 2270 events/0
Changes in V2:
_ Drop the static array based on NR_CPU and dynamically allocate the stat array
with num_possible_cpus() and other cpu mask facilities....
_ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
even the workqueue barriers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>