With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.
This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.
With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):
Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent modprobe and udev versions allow to create device nodes
for modules which are not loaded. Only the first access will cause
the in-kernel module loader to pull-in the module. Systems which
never access the device node will not needlessly load the module,
and no longer need init scripts or other facilities to unconditionally
load it.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Since handle_sysrq() does not take tty as argument anymore we can
drop it from usb_serial_handle_sysrq_char() as well.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Sysrq operations do not accept tty argument anymore so no need to pass
it to us.
[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
caused by sysrq using bool but not including linux/types.h]
[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
driver]
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Because list_empty() does not dereference any RCU-protected pointers, and
further does not pass such pointers to the caller (so that the caller
does not dereference them either), it is safe to use list_empty() on
RCU-protected lists. There is no need for a list_empty_rcu(). This
commit adds a comment stating this explicitly.
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The CONFIG_PREEMPT_RCU kernel configuration parameter was recently
re-introduced, but as an indication of the type of RCU (preemptible
vs. non-preemptible) instead of as selecting a given implementation.
This commit uses CONFIG_PREEMPT_RCU to combine duplicate code
from include/linux/rcutiny.h and include/linux/rcutree.h into
include/linux/rcupdate.h. This commit also combines a few other pieces
of duplicate code that have accumulated.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It is illegal to wait for an SRCU grace period while within the
corresponding flavor of SRCU read-side critical section. Therefore,
this commit updates the srcu_read_lock() docbook accordingly.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
and rcu_preempt_depth() into include/linux/rcupdate.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When using a kernel debugger, a long sojourn in the debugger can get
you lots of RCU CPU stall warnings once you resume. This might not be
helpful, especially if you are using the system console. This patch
therefore allows RCU CPU stall warnings to be suppressed, but only for
the duration of the current set of grace periods.
This differs from Jason's original patch in that it adds support for
tiny RCU and preemptible RCU, and uses a slightly different method for
suppressing the RCU CPU stall warning messages.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Jason Wessel <jason.wessel@windriver.com>
The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU. This commit provides a fix.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.
The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.
Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).
As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
workloads, CONFIG_TINY_RCU is even better.
CONFIG_TREE_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
6170 825 28 7023 kernel/rcutree.o
----
7026 Total
CONFIG_TINY_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
2081 81 8 2170 kernel/rcutiny.o
----
2183 Total
CONFIG_TINY_RCU (non-preemptible)
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
719 25 0 744 kernel/rcutiny.o
---
757 Total
Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
net/sched: add ACT_CSUM action to update packets checksums
ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
altered checksums in IPv4 and IPv6 packets. The following checksums are
supported by this patch:
- IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
- IPv6: ICMPv6, TCP, UDP & UDPLite
It's possible to request in the same action to update different kind of
checksums, if the packets flow mix TCP, UDP and UDPLite, ...
An example of usage is done in the associated iproute2 patch.
Version 3 changes:
- remove useless goto instructions
- improve IPv6 hop options decoding
Version 2 changes:
- coding style correction
- remove useless arguments of some functions
- use stack in tcf_csum_dump()
- add tcf_csum_skb_nextlayer() to factor code
Signed-off-by: Gregoire Baron <baronchon@n7mm.org>
Acked-by: jamal <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/pid.c:386 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by rc.sysinit/1102:
#0: (tasklist_lock){.+.+..}, at: [<c1048340>] sys_setpgid+0x40/0x160
stack backtrace:
Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
Call Trace:
[<c105e714>] lockdep_rcu_dereference+0x94/0xb0
[<c104b4cd>] find_task_by_pid_ns+0x6d/0x70
[<c104b4e8>] find_task_by_vpid+0x18/0x20
[<c1048347>] sys_setpgid+0x47/0x160
[<c1002b50>] sysenter_do_call+0x12/0x36
Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.
We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers. If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain. To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option. Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.
There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.
This patch also updates a number of docbook comments that were showing
their age.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christopher Li <sparse@chrisli.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference(). This commit therefore moves to the new RCU API
rcu_dereference_index_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce proto_ports_offset() for getting the position of the ports or
SPI in the message of a protocol.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PowerTOP would like to be able to trace timers.
Unfortunately, the current timer tracing is not very useful: the
actual timer function is not recorded in the trace at the start
of timer execution.
Although this is recorded for timer "start" time (when it gets
armed), this is not useful; most timers get started early, and a
tracer like PowerTOP will never see this event, but will only
see the actual running of the timer.
This patch just adds the function to the timer tracing; I've
verified with PowerTOP that now it can get useful information
about timers.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: xiaoguangrong@cn.fujitsu.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x
LKML-Reference: <4C6C5FA9.3000405@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.
The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().
http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
"make headers_check" issued the following warning:
CHECK include/linux/netfilter (64 files)
usr/include/linux/netfilter/xt_ipvs.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>
Fix this by as suggested including linux/types.h.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that software events don't have interrupt disabled anymore in
the event path, callchains can nest on any context. So seperating
nmi and others contexts in two buffers has become racy.
Fix this by providing one buffer per nesting level. Given the size
of the callchain entries (2040 bytes * 4), we now need to allocate
them dynamically.
v2: Fixed put_callchain_entry call after recursion.
Fix the type of the recursion, it must be an array.
v3: Use a manual pr cpu allocation (temporary solution until NMIs
can safely access vmalloc'ed memory).
Do a better separation between callchain reference tracking and
allocation. Make the "put" path lockless for non-release cases.
v4: Protect the callchain buffers with rcu.
v5: Do the cpu buffers allocations node affine.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Miller <davem@davemloft.net>
Cc: Borislav Petkov <bp@amd64.org>
- Most archs use one callchain buffer per cpu, except x86 that needs
to deal with NMIs. Provide a default perf_callchain_buffer()
implementation that x86 overrides.
- Centralize all the kernel/user regs handling and invoke new arch
handlers from there: perf_callchain_user() / perf_callchain_kernel()
That avoid all the user_mode(), current->mm checks and so...
- Invert some parameters in perf_callchain_*() helpers: entry to the
left, regs to the right, following the traditional (dst, src).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
spi.h: missing kernel-doc notation, please fix
of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
of: Fix missing includes
ata: update for of_device to platform_device replacement
microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
microblaze: Fix of/address: Merge all of the bus translation code
booting-without-of: Remove nonexistent chapters from TOC, fix numbering
The current code in pcm_lib.c do all checks using only the position
in the ring buffer. Unfortunately, where the interrupts gets delayed or
merged into one, we need another timing source to check when the
buffer size boundary overlaps to avoid the wrong updating of the
ring buffer pointers.
This code uses jiffies to check the right time window without any
performance impact.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.
It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.
More information: Kernel bugzilla bug#16300
[A copmile warning fixed by tiwai]
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).
One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.
However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.
A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.
Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.
Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.
Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9
us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75
So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.
Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.
Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>