linux

Author	SHA1	Message	Date
Chuck Lever	07396051a5	SUNRPC: Use rpc_pton() in ip_map_parse() The existing logic in ip_map_parse() can not currently parse shorthanded IPv6 addresses (anything with a double colon), nor can it parse an IPv6 presentation address with a scope ID. An IPv6-enabled mountd can pass down both. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-01-26 17:52:33 -05:00
Mike Frysinger	0368897034	tracing/documentation: Cover new frame pointer semantics Update the graph tracer examples to cover the new frame pointer semantics (in terms of passing it along). Move the HAVE_FUNCTION_GRAPH_FP_TEST docs out of the Kconfig, into the right place, and expand on the details. Signed-off-by: Mike Frysinger <vapier@gentoo.org> LKML-Reference: <1264165967-18938-1-git-send-email-vapier@gentoo.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-01-26 17:00:39 -05:00
Yang Hongyang	6993b1bb1e	tracing/documentation: Fix a typo in ftrace.txt 'ftrace' is no longer the name of the function tracer, to activate the function trace 'echo function > current_tracer' is to be used instead of 'echo ftrace > current_tracer'. Update the documentation to reflect the current implementation. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> LKML-Reference: <4B5D0BA8.20106@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-01-26 16:59:33 -05:00
Tetsuo Handa	8e2d39a166	TOMOYO: Remove usage counter for temporary memory. TOMOYO was using own memory usage counter for detecting memory leak. But as kernel 2.6.31 introduced memory leak detection mechanism ( CONFIG_DEBUG_KMEMLEAK ), we no longer need to have own counter. We remove usage counter for memory used for permission checks, but we keep usage counter for memory used for policy so that we can apply quota. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>	2010-01-27 08:20:48 +11:00
Steven Rostedt	3c05d74827	ring-buffer: Check for end of page in iterator If the iterator comes to an empty page for some reason, or if the page is emptied by a consuming read. The iterator code currently does not check if the iterator is pass the contents, and may return a false entry. This patch adds a check to the ring buffer iterator to test if the current page has been completely read and sets the iterator to the next page if necessary. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-01-26 16:14:08 -05:00
Steven Rostedt	492a74f421	ring-buffer: Check if ring buffer iterator has stale data Usually reads of the ring buffer is performed by a single task. There are two types of reads from the ring buffer. One is a consuming read which will consume the entry that was read and the next read will be the entry that follows. The other is an iterator that will let the user read the contents of the ring buffer without modifying it. When an iterator is allocated, writes to the ring buffer are disabled to protect the iterator. The problem exists when consuming reads happen while an iterator is allocated. Specifically, the kind of read that swaps out an entire page (used by splice) and replaces it with a new read. If the iterator is on the page that is swapped out, then the next read may read from this swapped out page and return garbage. This patch adds a check when reading the iterator to make sure that the iterator contents are still valid. If a consuming read has taken place, the iterator is reset. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-01-26 16:09:30 -05:00
Trond Myklebust	a2c0b9e291	NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly Even if the server is crazy, we should be able to mark the stateid as being bad, to ensure it gets recovered. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:47 -05:00
Trond Myklebust	03391693a9	NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily Currently, nfs4_handle_exception() will call it twice if called with an error of -NFS4ERR_STALE_CLIENTID, -NFS4ERR_STALE_STATEID or -NFS4ERR_EXPIRED. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:38 -05:00
Trond Myklebust	8e469ebd6d	NFSv4: Don't allow posix locking against servers that don't support it Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:30 -05:00
Trond Myklebust	2bee72a6aa	NFSv4: Ensure that the NFSv4 locking can recover from stateid errors In most cases, we just want to mark the lock_stateid sequence id as being uninitialised. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:21 -05:00
David Howells	b0706ca415	NFS: Avoid warnings when CONFIG_NFS_V4=n Avoid the following warnings when CONFIG_NFS_V4=n: fs/nfs/sysctl.c:19: warning: unused variable `nfs_set_port_max' fs/nfs/sysctl.c:18: warning: unused variable `nfs_set_port_min' by making those variables contingent on NFSv4 being configured. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:11 -05:00
H Hartley Sweeten	0aa05887af	NFS: Make nfs_commitdata_release static The symbol nfs_commitdata_release is only used locally in this file. Make it static to prevent the following sparse warning: warning: symbol 'nfs_commitdata_release' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:42:03 -05:00
Trond Myklebust	82be934a59	NFS: Try to commit unstable writes in nfs_release_page() If someone calls nfs_release_page(), we presumably already know that the page is clean, however it may be holding an unstable write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:41:53 -05:00
Trond Myklebust	c9edda7140	NFS: Fix a reference leak in nfs_wb_cancel_page() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2010-01-26 15:41:34 -05:00
Stefan Richter	281e20323a	firewire: core: fix use-after-free regression in FCP handler Commit `db5d247a` "firewire: fix use of multiple AV/C devices, allow multiple FCP listeners" introduced a regression into 2.6.33-rc3: The core freed payloads of incoming requests to FCP_Request or FCP_Response before a userspace driver accessed them. We need to copy such payloads for each registered userspace client and free the copies according to the lifetime rules of non-FCP client request resources. (This could possibly be optimized by reference counts instead of copies.) The presently only kernelspace driver which listens for FCP requests, firedtv, was not affected because it already copies FCP frames into an own buffer before returning to firewire-core's FCP handler dispatcher. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2010-01-26 20:54:50 +01:00
Stefan Richter	6d3faf6f43	firewire: cdev: add_descriptor documentation fix struct fw_cdev_add_descriptor.length is in quadlets, not in bytes. Also remove any doubts about the endianess of descriptor data. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2010-01-26 20:54:50 +01:00
Stefan Richter	e300839da4	firewire: core: add_descriptor size check Presently, firewire-core only checks whether descriptors that are to be added by userspace drivers to the local node's config ROM do not exceed a size of 256 quadlets. However, the sum of the bare minimum ROM plus all descriptors (from firewire-core, from firewire-net, from userspace) must not exceed 256 quadlets. Otherwise, the bounds of a statically allocated buffer will be overwritten. If the kernel survives that, firewire-core will subsequently be unable to parse the local node's config ROM. (Note, userspace drivers can add descriptors only through device files of local nodes. These are usually only accessible by root, unlike device files of remote nodes which may be accessible to lesser privileged users.) Therefore add a test which takes the actual present and required ROM size into account for all descriptors of kernelspace and userspace drivers. Cc: stable@kernel.org Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2010-01-26 20:54:50 +01:00
Vladimir Zapolskiy	70c91a3849	ARM: IMX31: configure pins iomux for SDHC setup on litekit board. This patch adds SDHC support, and corrects current pins setup. Added irq handling on card removal. Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com> Cc: Daniel Mack <daniel@caiaq.de> Cc: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:56:54 +01:00
Uwe Kleine-König	4e41db871e	mx2/mx3: debug-macro.S needs deprecated symbols This fixes: arch/arm/kernel/debug.S:147: Error: constant expression expected -- `ldrne r3,=(((UART1_BASE_ADDR)-AIPI_BASE_ADDR)+AIPI_BASE_ADDR_VIRT)' arch/arm/kernel/debug.S:163: Error: constant expression expected -- `ldrne r3,=(((UART1_BASE_ADDR)-AIPI_BASE_ADDR)+AIPI_BASE_ADDR_VIRT)' when compiling for mx2 with CONFIG_DEBUG_LL=y. A similar error exists on mx3 and is fixed by this commit, too. These were introduced by `aae7019382`. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:55:52 +01:00
Baruch Siach	1c57402374	mx25: make the FEC AHB clk secondary of the IPG This makes the FEC clock configuration consistent with the UART one. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:54:41 +01:00
Baruch Siach	faed40665d	mx25: fix time accounting The gpt_clk rate function doesn't consider the PER divider. This causes a significant drift in time accounting. Fix this by introducing the correct rate calculation function. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:54:06 +01:00
Baruch Siach	828df43f13	mx25: properly initialize clocks This patch disables all unnecessary clock in mx25_clocks_init() to make a clean start, the same as is being done for the rest of the i.MX chips. This patch was tested on i.MX25 PDK. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:54:06 +01:00
Baruch Siach	fadc095622	mx25: remove unused mx25_clocks_init() argument The fref is needless on mx25 since the reference clock is fixed at 24MHz. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:54:06 +01:00
Sascha Hauer	4cd3f96cd4	i.MX25: implement secondary clocks for uarts and fec For uarts and fec need two clocks, implement it using the secondary clock field in struct clk. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:52:45 +01:00
Sascha Hauer	9611a9b6f6	i.MX25: Allow secondary clocks in DEFINE_CLOCK Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2010-01-26 18:52:40 +01:00
Takashi Iwai	d0d2c38e39	Merge remote branch 'alsa/devel' into topic/misc	2010-01-26 18:13:04 +01:00
Johannes Berg	56007a028c	mac80211: wait for beacon before enabling powersave Because DTIM information is required for powersave but is only conveyed in beacons, wait for a beacon before enabling powersave, and change the way the information is conveyed to the driver accordingly. mwl8k doesn't currently seem to implement PS but requires the DTIM period in a different way; after talking to Lennert we agreed to just have mwl8k do the parsing itself in the finalize_join work. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-01-26 11:53:21 -05:00
Johannes Berg	c21dbf9214	cfg80211: export cfg80211_find_ie This new function (previously a static function called just "find_ie" can be used to find a specific IE in a buffer of IEs. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-01-26 11:53:20 -05:00
Zhu Yi	3092ad0544	mac80211: fix NULL pointer dereference when ftrace is enabled I got below kernel oops when I try to bring down the network interface if ftrace is enabled. The root cause is drv_ampdu_action() is passed with a NULL ssn pointer in the BA session tear down case. We need to check and avoid dereferencing it in trace entry assignment. BUG: unable to handle kernel NULL pointer dereference Modules linked in: at (null) IP: [<f98fe02a>] ftrace_raw_event_drv_ampdu_action+0x10a/0x160 [mac80211] *pde = 00000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [...] Call Trace: [<f98fdf20>] ? ftrace_raw_event_drv_ampdu_action+0x0/0x160 [mac80211] [<f98dac4c>] ? __ieee80211_stop_rx_ba_session+0xfc/0x220 [mac80211] [<f98d97fb>] ? ieee80211_sta_tear_down_BA_sessions+0x3b/0x50 [mac80211] [<f98dc6f6>] ? ieee80211_set_disassoc+0xe6/0x230 [mac80211] [<f98dc6ac>] ? ieee80211_set_disassoc+0x9c/0x230 [mac80211] [<f98dcbb8>] ? ieee80211_mgd_deauth+0x158/0x170 [mac80211] [<f98e4bdb>] ? ieee80211_deauth+0x1b/0x20 [mac80211] [<f8987f49>] ? __cfg80211_mlme_deauth+0xe9/0x120 [cfg80211] [<f898b870>] ? __cfg80211_disconnect+0x170/0x1d0 [cfg80211] Cc: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-01-26 11:52:13 -05:00
Jaroslav Kysela	e763692578	ALSA: pcm_lib - return back hw_ptr_interrupt Clemens Ladisch noted for hw_ptr_removal in "cleanup & merge hw_ptr update functions" commit: "It is possible for the status/delay ioctls to be called when the sound card's pointer register alreay shows a position at the beginning of the new period, but immediately before the interrupt is actually executed. (This happens regularly on a SMP machine with mplayer.) When that happens, the code thinks that the position must be at least one period ahead of the current position and drops an entire buffer of data." Return back the hw_ptr_interrupt variable. The last interrupt pointer is always computed from the latest hw_ptr instead of tracking it separately (in this case all hw_ptr checks and modifications might influence also hw_ptr_interrupt and it is difficult to keep it consistent). Signed-off-by: Jaroslav Kysela <perex@perex.cz>	2010-01-26 17:50:50 +01:00
Patrick McHardy	e578756c35	netfilter: ctnetlink: fix expectation mask dump The protocol number is not initialized, so userspace can't interpret the layer 4 data properly. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-26 17:04:02 +01:00
Thomas Gleixner	7b7422a566	clocksource: Prevent potential kgdb dead lock commit `0f8e8ef7` (clocksource: Simplify clocksource watchdog resume logic) introduced a potential kgdb dead lock. When the kernel is stopped by kgdb inside code which holds watchdog_lock then kgdb dead locks in clocksource_resume_watchdog(). clocksource_resume_watchdog() is called from kbdg via clocksource_touch_watchdog() to avoid that the clock source watchdog marks TSC unstable after the kernel has been stopped. Solve this by replacing spin_lock with a spin_trylock and just return in case the lock is held. Not resetting the watchdog might result in TSC becoming marked unstable, but that's an acceptable penalty for using kgdb. The timekeeping is anyway easily screwed up by kgdb when the system uses either jiffies or a clock source which wraps in short intervals (e.g. pm_timer wraps about every 4.6s), so we really do not have to worry about that occasional TSC marked unstable side effect. The second caller of clocksource_resume_watchdog() is clocksource_resume(). The trylock is safe here as well because the system is UP at this point, interrupts are disabled and nothing else can hold watchdog_lock(). Reported-by: Jason Wessel <jason.wessel@windriver.com> LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com> Cc: kgdb-bugreport@lists.sourceforge.net Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-01-26 14:53:16 +01:00
David S. Miller	b747caf365	ariadne: Fix build. References removed HAVE_MULTICAST. Reporeted-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-26 05:17:00 -08:00
Shan Wei	c92b544bd5	ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure The commit 0b5ccb2(title:ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery) has broken the saddr&&daddr member of nf_ct_frag6_queue when creating new queue. And then hash value generated by nf_hashfn() was not equal with that generated by fq_find(). So, a new received fragment can't be inserted to right queue. The patch fixes the bug with adding member of user to nf_ct_frag6_queue structure. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-26 05:13:27 -08:00
David S. Miller	6abce7711f	sparc64: Fix UP build. Can't reference irq_desc[].affinity when !SMP. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-26 04:16:49 -08:00
Chaithrika U S	e473b84742	ASoC: DaVinci: Fix stream restart error Sometimes after a suspend-resume cycle, the ALSA application restarts the stream when resume fails and McASP fails to work as the clock is not enabled. This patch corrects this bug. Testes on TI DA850/OMAP-L138 EVM. Signed-off-by: Chaithrika U S <chaithrika@ti.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2010-01-26 11:55:54 +00:00
Wei Ni	ccc5df058d	ALSA: hda - Add support for more the 8 streams In azx_stream_start() and azx_stream_stop(), it use azx_readb/azx_writeb to read/write SIE, it just enable/disable 8 streams. But according to the HDA spec, it support 30 streams, and the new HDA controller will support more then 8 streams. So we should use azx_readl/azx_writel to read/write SIE. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2010-01-26 10:40:03 +01:00
Ben Dooks	159a3ddd6c	ARM: Merge next-smdk6410-defconfig Merge branch 'next-smdk6410-defconfig' into next-samsung	2010-01-26 18:21:40 +09:00
Ben Dooks	50ee2d35a5	ARM: SAMSUNG: Add error printing to s3c24xx_register_clocks Add an error print to s3c24xx_register_clocks to provide more useful information when failing to register the clock. I belive this was originally left out due to the possibility of a problem with low-level debugging code. However, if the low-level debug code is not functional by now there will be a whole other set of problems being presented to the system. Signed-off-by: Ben Dooks <ben-linux@fluff.org>	2010-01-26 17:44:33 +09:00
Ben Dooks	8428d47a36	ARM: SAMSUNG: Add documentation to the clock registration calls. Add some kerneldoc documentation to the s3c24xx_register_clock and the s3c24xx_register_clocks() call. Signed-off-by: Ben Dooks <ben-linux@fluff.org>	2010-01-26 17:44:32 +09:00
Florian Zumbiehl	cf944ee55c	ALSA: cs46xx: Fix cpu idling with resume Make sure that capture DMA doesn't stay enabled after system resume as that potentially prevents the processor from entering deep sleep states. Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2010-01-26 09:06:14 +01:00
Jesse Barnes	de3f440f8c	drm/i915: handle non-flip pending case when unpinning the scanout buffer The first page flip queued will replace the current front buffer, which should have a 0 pending flip count. So at finish time we need to handle that case (i.e. if the flip count is 0 or dec_and_test is 0 we need to wake the waiters). Also fix up an error path in the queue function and add some debug output (only enabled with driver debugging). Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net>	2010-01-25 22:01:12 -08:00
Dave Chinner	7d6a7bde52	xfs: Use delay write promotion for dquot flushing xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item pushing used to do to flush out delayed write dquot buffers. Change it to use the new promotion method rather than an async flush. Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock held, yet the callers make the assumption that after this call the flush lock is held. Always return with the flush lock held. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-01-26 15:13:41 +11:00
Dave Chinner	089716aa14	xfs: Sort delayed write buffers before dispatch Currently when the xfsbufd writes delayed write buffers, it pushes them to disk in the order they come off the delayed write list. If there are lots of buffers ѕpread widely over the disk, this results in overwhelming the elevator sort queues in the block layer and we end up losing the posibility of merging adjacent buffers to minimise the number of IOs. Use the new generic list_sort function to sort the delwri dispatch queue before issue to ensure that the buffers are pushed in the most friendly order possible to the lower layers. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-01-26 15:13:25 +11:00
Dave Chinner	d808f617ad	xfs: Don't issue buffer IO direct from AIL push V2 All buffers logged into the AIL are marked as delayed write. When the AIL needs to push the buffer out, it issues an async write of the buffer. This means that IO patterns are dependent on the order of buffers in the AIL. Instead of flushing the buffer, promote the buffer in the delayed write list so that the next time the xfsbufd is run the buffer will be flushed by the xfsbufd. Return the state to the xfsaild that the buffer was promoted so that the xfsaild knows that it needs to cause the xfsbufd to run to flush the buffers that were promoted. Using the xfsbufd for issuing the IO allows us to dispatch all buffer IO from the one queue. This means that we can make much more enlightened decisions on what order to flush buffers to disk as we don't have multiple places issuing IO. Optimisations to xfsbufd will be in a future patch. Version 2 - kill XFS_ITEM_FLUSHING as it is now unused. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-02-02 10:13:42 +11:00
Dave Chinner	c854363e80	xfs: Use delayed write for inodes rather than async V2 We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-02-06 12:39:36 +11:00
Dave Chinner	777df5afdb	xfs: Make inode reclaim states explicit A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-02-06 12:37:26 +11:00
Eric Sandeen	d5db0f97fb	xfs: more reserved blocks fixups This mangles the reserved blocks counts a little more. 1) add a helper function for the default reserved count 2) add helper functions to save/restore counts on ro/rw 3) save/restore reserved blocks on freeze/thaw 4) disallow changing reserved count while readonly V2: changed field name to match Dave's changes Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-02-08 17:41:48 -06:00
Dave Chinner	388f1f0c34	xfs: turn off sign warnings Because they cause warnings in static inline functions conditionally compiled into XFS from the VFS (e.g. fsnotify). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-01-26 15:10:15 +11:00
Dave Chinner	cbe132a8bd	xfs: don't hold onto reserved blocks on remount,ro If we hold onto reserved blocks when doing a remount,ro we end up writing the blocks used count to disk that includes the reserved blocks. Reserved blocks are not actually used, so this results in the values in the superblock being incorrect. Hence if we run xfs_check or xfs_repair -n while the filesystem is mounted remount,ro we end up with an inconsistent filesystem being reported. Also, running xfs_copy on the remount,ro filesystem will result in an inconsistent image being generated. To fix this, unreserve the blocks when doing the remount,ro, and reserved them again on remount,rw. This way a remount,ro filesystem will appear consistent on disk to all utilities. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-01-26 15:08:49 +11:00

... 145 146 147 148 149 ...

189053 Commits