Commit Graph

189053 Commits

Author SHA1 Message Date
Chuck Lever
07396051a5 SUNRPC: Use rpc_pton() in ip_map_parse()
The existing logic in ip_map_parse() can not currently parse
shorthanded IPv6 addresses (anything with a double colon), nor can
it parse an IPv6 presentation address with a scope ID.  An
IPv6-enabled mountd can pass down both.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26 17:52:33 -05:00
Mike Frysinger
0368897034 tracing/documentation: Cover new frame pointer semantics
Update the graph tracer examples to cover the new frame pointer semantics
(in terms of passing it along).  Move the HAVE_FUNCTION_GRAPH_FP_TEST docs
out of the Kconfig, into the right place, and expand on the details.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
LKML-Reference: <1264165967-18938-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 17:00:39 -05:00
Yang Hongyang
6993b1bb1e tracing/documentation: Fix a typo in ftrace.txt
'ftrace' is no longer the name of the function tracer, to activate
the function trace 'echo function > current_tracer' is to be used instead
of 'echo ftrace > current_tracer'. Update the documentation to reflect
the current implementation.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
LKML-Reference: <4B5D0BA8.20106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 16:59:33 -05:00
Tetsuo Handa
8e2d39a166 TOMOYO: Remove usage counter for temporary memory.
TOMOYO was using own memory usage counter for detecting memory leak.
But as kernel 2.6.31 introduced memory leak detection mechanism
( CONFIG_DEBUG_KMEMLEAK ), we no longer need to have own counter.

We remove usage counter for memory used for permission checks, but we keep
usage counter for memory used for policy so that we can apply quota.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
2010-01-27 08:20:48 +11:00
Steven Rostedt
3c05d74827 ring-buffer: Check for end of page in iterator
If the iterator comes to an empty page for some reason, or if
the page is emptied by a consuming read. The iterator code currently
does not check if the iterator is pass the contents, and may
return a false entry.

This patch adds a check to the ring buffer iterator to test if the
current page has been completely read and sets the iterator to the
next page if necessary.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 16:14:08 -05:00
Steven Rostedt
492a74f421 ring-buffer: Check if ring buffer iterator has stale data
Usually reads of the ring buffer is performed by a single task.
There are two types of reads from the ring buffer.

One is a consuming read which will consume the entry that was read
and the next read will be the entry that follows.

The other is an iterator that will let the user read the contents of
the ring buffer without modifying it. When an iterator is allocated,
writes to the ring buffer are disabled to protect the iterator.

The problem exists when consuming reads happen while an iterator is
allocated. Specifically, the kind of read that swaps out an entire
page (used by splice) and replaces it with a new read. If the iterator
is on the page that is swapped out, then the next read may read
from this swapped out page and return garbage.

This patch adds a check when reading the iterator to make sure that
the iterator contents are still valid. If a consuming read has taken
place, the iterator is reset.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 16:09:30 -05:00
Trond Myklebust
a2c0b9e291 NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
Even if the server is crazy, we should be able to mark the stateid as being
bad, to ensure it gets recovered.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:47 -05:00
Trond Myklebust
03391693a9 NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
Currently, nfs4_handle_exception() will call it twice if called with an
error of -NFS4ERR_STALE_CLIENTID, -NFS4ERR_STALE_STATEID or
-NFS4ERR_EXPIRED.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:38 -05:00
Trond Myklebust
8e469ebd6d NFSv4: Don't allow posix locking against servers that don't support it
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:30 -05:00
Trond Myklebust
2bee72a6aa NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
In most cases, we just want to mark the lock_stateid sequence id as being
uninitialised.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:21 -05:00
David Howells
b0706ca415 NFS: Avoid warnings when CONFIG_NFS_V4=n
Avoid the following warnings when CONFIG_NFS_V4=n:

	fs/nfs/sysctl.c:19: warning: unused variable `nfs_set_port_max'
	fs/nfs/sysctl.c:18: warning: unused variable `nfs_set_port_min'

by making those variables contingent on NFSv4 being configured.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:11 -05:00
H Hartley Sweeten
0aa05887af NFS: Make nfs_commitdata_release static
The symbol nfs_commitdata_release is only used locally
in this file. Make it static to prevent the following sparse warning:

warning: symbol 'nfs_commitdata_release' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:03 -05:00
Trond Myklebust
82be934a59 NFS: Try to commit unstable writes in nfs_release_page()
If someone calls nfs_release_page(), we presumably already know that the
page is clean, however it may be holding an unstable write.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:41:53 -05:00
Trond Myklebust
c9edda7140 NFS: Fix a reference leak in nfs_wb_cancel_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:41:34 -05:00
Stefan Richter
281e20323a firewire: core: fix use-after-free regression in FCP handler
Commit db5d247a "firewire: fix use of multiple AV/C devices, allow
multiple FCP listeners" introduced a regression into 2.6.33-rc3:
The core freed payloads of incoming requests to FCP_Request or
FCP_Response before a userspace driver accessed them.

We need to copy such payloads for each registered userspace client
and free the copies according to the lifetime rules of non-FCP client
request resources.

(This could possibly be optimized by reference counts instead of
copies.)

The presently only kernelspace driver which listens for FCP requests,
firedtv, was not affected because it already copies FCP frames into an
own buffer before returning to firewire-core's FCP handler dispatcher.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-01-26 20:54:50 +01:00
Stefan Richter
6d3faf6f43 firewire: cdev: add_descriptor documentation fix
struct fw_cdev_add_descriptor.length is in quadlets, not in bytes.
Also remove any doubts about the endianess of descriptor data.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-01-26 20:54:50 +01:00
Stefan Richter
e300839da4 firewire: core: add_descriptor size check
Presently, firewire-core only checks whether descriptors that are to be
added by userspace drivers to the local node's config ROM do not exceed
a size of 256 quadlets.  However, the sum of the bare minimum ROM plus
all descriptors (from firewire-core, from firewire-net, from userspace)
must not exceed 256 quadlets.

Otherwise, the bounds of a statically allocated buffer will be
overwritten.  If the kernel survives that, firewire-core will
subsequently be unable to parse the local node's config ROM.

(Note, userspace drivers can add descriptors only through device files
of local nodes.  These are usually only accessible by root, unlike
device files of remote nodes which may be accessible to lesser
privileged users.)

Therefore add a test which takes the actual present and required ROM
size into account for all descriptors of kernelspace and userspace
drivers.

Cc: stable@kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-01-26 20:54:50 +01:00
Vladimir Zapolskiy
70c91a3849 ARM: IMX31: configure pins iomux for SDHC setup on litekit board.
This patch adds SDHC support, and corrects current pins setup.
Added irq handling on card removal.

Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:56:54 +01:00
Uwe Kleine-König
4e41db871e mx2/mx3: debug-macro.S needs deprecated symbols
This fixes:

	arch/arm/kernel/debug.S:147: Error: constant expression expected -- `ldrne r3,=(((UART1_BASE_ADDR)-AIPI_BASE_ADDR)+AIPI_BASE_ADDR_VIRT)'
	arch/arm/kernel/debug.S:163: Error: constant expression expected -- `ldrne r3,=(((UART1_BASE_ADDR)-AIPI_BASE_ADDR)+AIPI_BASE_ADDR_VIRT)'

when compiling for mx2 with CONFIG_DEBUG_LL=y.  A similar error exists
on mx3 and is fixed by this commit, too.

These were introduced by aae7019382.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:55:52 +01:00
Baruch Siach
1c57402374 mx25: make the FEC AHB clk secondary of the IPG
This makes the FEC clock configuration consistent with the UART one.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:54:41 +01:00
Baruch Siach
faed40665d mx25: fix time accounting
The gpt_clk rate function doesn't consider the PER divider. This causes a
significant drift in time accounting. Fix this by introducing the correct rate
calculation function.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:54:06 +01:00
Baruch Siach
828df43f13 mx25: properly initialize clocks
This patch disables all unnecessary clock in mx25_clocks_init() to make a clean
start, the same as is being done for the rest of the i.MX chips.

This patch was tested on i.MX25 PDK.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:54:06 +01:00
Baruch Siach
fadc095622 mx25: remove unused mx25_clocks_init() argument
The fref is needless on mx25 since the reference clock is fixed at 24MHz.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:54:06 +01:00
Sascha Hauer
4cd3f96cd4 i.MX25: implement secondary clocks for uarts and fec
For uarts and fec need two clocks, implement it using the secondary clock
field in struct clk.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:52:45 +01:00
Sascha Hauer
9611a9b6f6 i.MX25: Allow secondary clocks in DEFINE_CLOCK
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2010-01-26 18:52:40 +01:00
Takashi Iwai
d0d2c38e39 Merge remote branch 'alsa/devel' into topic/misc 2010-01-26 18:13:04 +01:00
Johannes Berg
56007a028c mac80211: wait for beacon before enabling powersave
Because DTIM information is required for powersave
but is only conveyed in beacons, wait for a beacon
before enabling powersave, and change the way the
information is conveyed to the driver accordingly.

mwl8k doesn't currently seem to implement PS but
requires the DTIM period in a different way; after
talking to Lennert we agreed to just have mwl8k do
the parsing itself in the finalize_join work.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-26 11:53:21 -05:00
Johannes Berg
c21dbf9214 cfg80211: export cfg80211_find_ie
This new function (previously a static function
called just "find_ie" can be used to find a
specific IE in a buffer of IEs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-26 11:53:20 -05:00
Zhu Yi
3092ad0544 mac80211: fix NULL pointer dereference when ftrace is enabled
I got below kernel oops when I try to bring down the network interface if
ftrace is enabled. The root cause is drv_ampdu_action() is passed with a
NULL ssn pointer in the BA session tear down case. We need to check and
avoid dereferencing it in trace entry assignment.

BUG: unable to handle kernel NULL pointer dereference
Modules linked in: at (null)
IP: [<f98fe02a>] ftrace_raw_event_drv_ampdu_action+0x10a/0x160 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[...]
Call Trace:
 [<f98fdf20>] ? ftrace_raw_event_drv_ampdu_action+0x0/0x160 [mac80211]
 [<f98dac4c>] ? __ieee80211_stop_rx_ba_session+0xfc/0x220 [mac80211]
 [<f98d97fb>] ? ieee80211_sta_tear_down_BA_sessions+0x3b/0x50 [mac80211]
 [<f98dc6f6>] ? ieee80211_set_disassoc+0xe6/0x230 [mac80211]
 [<f98dc6ac>] ? ieee80211_set_disassoc+0x9c/0x230 [mac80211]
 [<f98dcbb8>] ? ieee80211_mgd_deauth+0x158/0x170 [mac80211]
 [<f98e4bdb>] ? ieee80211_deauth+0x1b/0x20 [mac80211]
 [<f8987f49>] ? __cfg80211_mlme_deauth+0xe9/0x120 [cfg80211]
 [<f898b870>] ? __cfg80211_disconnect+0x170/0x1d0 [cfg80211]

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-01-26 11:52:13 -05:00
Jaroslav Kysela
e763692578 ALSA: pcm_lib - return back hw_ptr_interrupt
Clemens Ladisch noted for hw_ptr_removal in "cleanup & merge hw_ptr
update functions" commit:

"It is possible for the status/delay ioctls to be called when the sound
card's pointer register alreay shows a position at the beginning of the
new period, but immediately before the interrupt is actually executed.
(This happens regularly on a SMP machine with mplayer.)  When that
happens, the code thinks that the position must be at least one period
ahead of the current position and drops an entire buffer of data."

Return back the hw_ptr_interrupt variable. The last interrupt pointer
is always computed from the latest hw_ptr instead of tracking it
separately (in this case all hw_ptr checks and modifications might
influence also hw_ptr_interrupt and it is difficult to keep it
consistent).

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
2010-01-26 17:50:50 +01:00
Patrick McHardy
e578756c35 netfilter: ctnetlink: fix expectation mask dump
The protocol number is not initialized, so userspace can't interpret
the layer 4 data properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-26 17:04:02 +01:00
Thomas Gleixner
7b7422a566 clocksource: Prevent potential kgdb dead lock
commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
logic) introduced a potential kgdb dead lock. When the kernel is
stopped by kgdb inside code which holds watchdog_lock then kgdb dead
locks in clocksource_resume_watchdog().

clocksource_resume_watchdog() is called from kbdg via
clocksource_touch_watchdog() to avoid that the clock source watchdog
marks TSC unstable after the kernel has been stopped.

Solve this by replacing spin_lock with a spin_trylock and just return
in case the lock is held. Not resetting the watchdog might result in
TSC becoming marked unstable, but that's an acceptable penalty for
using kgdb.

The timekeeping is anyway easily screwed up by kgdb when the system
uses either jiffies or a clock source which wraps in short intervals
(e.g. pm_timer wraps about every 4.6s), so we really do not have to
worry about that occasional TSC marked unstable side effect.

The second caller of clocksource_resume_watchdog() is
clocksource_resume(). The trylock is safe here as well because the
system is UP at this point, interrupts are disabled and nothing else
can hold watchdog_lock().

Reported-by: Jason Wessel <jason.wessel@windriver.com>
LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-26 14:53:16 +01:00
David S. Miller
b747caf365 ariadne: Fix build.
References removed HAVE_MULTICAST.

Reporeted-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-26 05:17:00 -08:00
Shan Wei
c92b544bd5 ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
The commit 0b5ccb2(title:ipv6: reassembly: use seperate reassembly queues for
conntrack and local delivery) has broken the saddr&&daddr member of
nf_ct_frag6_queue when creating new queue.  And then hash value
generated by nf_hashfn() was not equal with that generated by fq_find().
So, a new received fragment can't be inserted to right queue.

The patch fixes the bug with adding member of user to nf_ct_frag6_queue structure.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-26 05:13:27 -08:00
David S. Miller
6abce7711f sparc64: Fix UP build.
Can't reference irq_desc[].affinity when !SMP.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-26 04:16:49 -08:00
Chaithrika U S
e473b84742 ASoC: DaVinci: Fix stream restart error
Sometimes after a suspend-resume cycle, the ALSA application
restarts the stream when resume fails and McASP fails to work
as the clock is not enabled. This patch corrects this bug.

Testes on TI DA850/OMAP-L138 EVM.

Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2010-01-26 11:55:54 +00:00
Wei Ni
ccc5df058d ALSA: hda - Add support for more the 8 streams
In azx_stream_start() and azx_stream_stop(),
it use azx_readb/azx_writeb to read/write SIE,
it just enable/disable 8 streams.
But according to the HDA spec, it support 30 streams,
and the new HDA controller will support more then 8
streams. So we should use azx_readl/azx_writel to
read/write SIE.

Signed-off-by: Wei Ni <wni@nvidia.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-01-26 10:40:03 +01:00
Ben Dooks
159a3ddd6c ARM: Merge next-smdk6410-defconfig
Merge branch 'next-smdk6410-defconfig' into next-samsung
2010-01-26 18:21:40 +09:00
Ben Dooks
50ee2d35a5 ARM: SAMSUNG: Add error printing to s3c24xx_register_clocks
Add an error print to s3c24xx_register_clocks to provide more useful
information when failing to register the clock.

I belive this was originally left out due to the possibility of a
problem with low-level debugging code. However, if the low-level
debug code is not functional by now there will be a whole other set of
problems being presented to the system.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2010-01-26 17:44:33 +09:00
Ben Dooks
8428d47a36 ARM: SAMSUNG: Add documentation to the clock registration calls.
Add some kerneldoc documentation to the s3c24xx_register_clock and the
s3c24xx_register_clocks() call.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2010-01-26 17:44:32 +09:00
Florian Zumbiehl
cf944ee55c ALSA: cs46xx: Fix cpu idling with resume
Make sure that capture DMA doesn't stay enabled after system resume
as that potentially prevents the processor from entering deep sleep
states.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-01-26 09:06:14 +01:00
Jesse Barnes
de3f440f8c drm/i915: handle non-flip pending case when unpinning the scanout buffer
The first page flip queued will replace the current front buffer, which
should have a 0 pending flip count.  So at finish time we need to handle
that case (i.e. if the flip count is 0 *or* dec_and_test is 0 we need to
wake the waiters).

Also fix up an error path in the queue function and add some debug
output (only enabled with driver debugging).

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2010-01-25 22:01:12 -08:00
Dave Chinner
7d6a7bde52 xfs: Use delay write promotion for dquot flushing
xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item
pushing used to do to flush out delayed write dquot buffers. Change
it to use the new promotion method rather than an async flush.

Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock
held, yet the callers make the assumption that after this call the
flush lock is held. Always return with the flush lock held.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26 15:13:41 +11:00
Dave Chinner
089716aa14 xfs: Sort delayed write buffers before dispatch
Currently when the xfsbufd writes delayed write buffers, it pushes
them to disk in the order they come off the delayed write list. If
there are lots of buffers ѕpread widely over the disk, this results
in overwhelming the elevator sort queues in the block layer and we
end up losing the posibility of merging adjacent buffers to minimise
the number of IOs.

Use the new generic list_sort function to sort the delwri dispatch
queue before issue to ensure that the buffers are pushed in the most
friendly order possible to the lower layers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26 15:13:25 +11:00
Dave Chinner
d808f617ad xfs: Don't issue buffer IO direct from AIL push V2
All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.

Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.

Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.

Version 2
- kill XFS_ITEM_FLUSHING as it is now unused.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-02 10:13:42 +11:00
Dave Chinner
c854363e80 xfs: Use delayed write for inodes rather than async V2
We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06 12:39:36 +11:00
Dave Chinner
777df5afdb xfs: Make inode reclaim states explicit
A.K.A.: don't rely on xfs_iflush() return value in reclaim

We have gradually been moving checks out of the reclaim code because
they are duplicated in xfs_iflush(). We've had a history of problems
in this area, and many of them stem from the overloading of the
return values from xfs_iflush() and interaction with inode flush
locking to determine if the inode is safe to reclaim.

With the desire to move to delayed write flushing of inodes and
non-blocking inode tree reclaim walks, the overloading of the
return value of xfs_iflush makes it very difficult to determine
the correct thing to do next.

This patch explicitly re-adds the checks to the inode reclaim code,
removing the reliance on the return value of xfs_iflush() to
determine what to do next. It also means that we can clearly
document all the inode states that reclaim must handle and hence
we can easily see that we handled all the necessary cases.

This also removes the need for the xfs_inode_clean() check in
xfs_iflush() as all callers now check this first (safely).

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06 12:37:26 +11:00
Eric Sandeen
d5db0f97fb xfs: more reserved blocks fixups
This mangles the reserved blocks counts a little more.

1) add a helper function for the default reserved count
2) add helper functions to save/restore counts on ro/rw
3) save/restore reserved blocks on freeze/thaw
4) disallow changing reserved count while readonly

V2: changed field name to match Dave's changes

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-08 17:41:48 -06:00
Dave Chinner
388f1f0c34 xfs: turn off sign warnings
Because they cause warnings in static inline functions conditionally
compiled into XFS from the VFS (e.g. fsnotify).

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26 15:10:15 +11:00
Dave Chinner
cbe132a8bd xfs: don't hold onto reserved blocks on remount,ro
If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.

Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.

To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-26 15:08:49 +11:00