Commit Graph

18327 Commits

Author SHA1 Message Date
Aneesh Kumar K.V
515f41c33a ext4: Ensure zeroout blocks have no dirty metadata
This fixes a bug (found by Curt Wohlgemuth) in which new blocks
returned from an extent created with ext4_ext_zeroout() can have dirty
metadata still associated with them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-29 23:39:06 -05:00
Frederic Weisbecker
98ea3f50bc reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion
Commit 500f5a0bf5
(reiserfs: Fix possible recursive lock) fixed a vmalloc under reiserfs
lock that triggered a lockdep warning because of a
IN-FS-RECLAIM <-> RECLAIM-FS-ON locking dependency inversion.

But this patch has ommitted another vmalloc call in the same path
that allocates the journal. Relax the lock for this one too.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2009-12-29 22:34:59 +01:00
Richard Kennedy
2faf2e19dd ext4: return correct wbc.nr_to_write in ext4_da_writepages
When ext4_da_writepages increases the nr_to_write in writeback_control
then it must always re-base the return value.  Originally there was a
(misguided) attempt prevent wbc.nr_to_write from going negative.  In
fact, it's necessary to allow nr_to_write to be negative so that
wb_writeback() can correctly calculate how many pages were actually
written.  

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-25 15:46:07 -05:00
Tobias Klauser
33e189bd57 nilfs2: Storage class should be before const qualifier
The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Jiro SEKIBA
5ee5814832 nilfs2: trivial coding style fix
This is a trivial style fix patch to mend errors/warnings
reported by "checkpatch.pl --file".

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Linus Torvalds
45e62974fb Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
  ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
  Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.
  Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?
  ocfs2: Use FIEMAP_EXTENT_SHARED
  fiemap: Add new extent flag FIEMAP_EXTENT_SHARED
  ocfs2: replace u8 by __u8 in ocfs2_fs.h
  ocfs2: explicit declare uninitialized var in user_cluster_connect()
  ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()
  ocfs2: return -EAGAIN instead of EAGAIN in dlm
  ocfs2/cluster: Make fence method configurable - v2
  ocfs2: Set MS_POSIXACL on remount
  ocfs2: Make acl use the default
  ocfs2: Always include ACL support
2009-12-24 12:59:11 -08:00
Tao Ma
8ff6af881d ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
In ocfs2_value_metas_in_xattr_header, we should Use
le16_to_cpu for ocfs2_extent_list.l_next_free_rec.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:16 -08:00
Tao Ma
b31d308ddc ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
I just noticed today that there are 2 places of "mlog(0,...)"
in  fs/ocfs2/cluster/heartbeat.c, but actually have no default
mask prefix in that file.
So change them to mlog(ML_HEARTBEAT,...).

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:13 -08:00
Tristan Ye
86239d59e2 Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.
For fast symlink, it can be treated the same as inlined files since
the data extent we want to return of both case all were stored in
metadata block. For symlink, it can be simply treated the same as we
did for regular files.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:09 -08:00
Linus Torvalds
f793067eb9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  devtmpfs: unlock mutex in case of string allocation error
  Driver core: export platform_device_register_data as a GPL symbol
  driver core: Prevent reference to freed memory on error path
  Driver-core: Fix bogus 0 error return in device_add()
  Driver core: driver_attribute parameters can often be const*
  Driver core: bin_attribute parameters can often be const*
  Driver core: device_attribute parameters can often be const*
  Doc/stable rules: add new cherry-pick logic
  vfs: get_sb_single() - do not pass options twice
  devtmpfs: Convert dirlock to a mutex
2009-12-23 13:35:03 -08:00
Linus Torvalds
849254e9bb Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Set i_nlink properly during reflink.
  ocfs2: Add reflinked file's inode to inode hash eariler.
  ocfs2: refcounttree.c cleanup.
  ocfs2: Find proper end cpos for a leaf refcount block.
2009-12-23 13:33:07 -08:00
Sage Weil
93cea5bebf ceph: use ceph_pagelist for mds reconnect message; change encoding (protocol change)
Use the ceph_pagelist to encode the MDS reconnect message.  We change the
message encoding (protocol change!) at the same time to make our life
easier (we don't know how many snaprealms we have when we start encoding).

An empty message implies the session is closed/does not exist.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 12:21:51 -08:00
Sage Weil
58bb3b374b ceph: support ceph_pagelist for message payload
The ceph_pagelist is a simple list of whole pages, strung together via
their lru list_head.  It facilitates encoding to a "buffer" of unknown
size.  Allow its use in place of the ceph_msg page vector.

This will be used to fix the huge buffer preallocation woes of MDS
reconnection.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 12:12:31 -08:00
Phil Carmody
66ecb92be9 Driver core: bin_attribute parameters can often be const*
Many struct bin_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore
make the promise not to, which will let those descriptors
be put in a ro section.

Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Kay Sievers
9329d1beae vfs: get_sb_single() - do not pass options twice
Filesystem code usually destroys the option buffer while
parsing it. This leads to errors when the same buffer is
passed twice. In case we fill a new superblock do not call
remount.

This is needed to quite a warning that the debugfs code
causes every boot.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Sage Weil
04a419f908 ceph: add feature bits to connection handshake (protocol change)
Define supported and required feature set.  Fail connection if the server
requires features we do not support (TAG_FEATURES), or if the server does
not support features we require.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 09:30:21 -08:00
Sage Weil
6df058c025 ceph: include transaction id in ceph_msg_header (protocol change)
Many (most?) message types include a transaction id.  By including it in
the fixed size header, we always have it available even when we are unable
to allocate memory for the (larger, variable sized) message body.  This
will allow us to error out the appropriate request instead of (silently)
dropping the reply.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:22 -08:00
Sage Weil
0cf90ab5b0 ceph: more informative msgpool errors
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:21 -08:00
Sage Weil
350b1c32ea ceph: control access to page vector for incoming data
When we issue an OSD read, we specify a vector of pages that the data is to
be read into.  The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.

Only read data into the page vector if the reply is coming from the
OSD we last sent the request to.  Keep track of which connection is using
the vector by taking a reference.  If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:20 -08:00
Sage Weil
ec302645f4 ceph: use connection mutex to protect read and write stages
Use a single mutex (previously out_mutex) to protect both read and write
activity from concurrent ceph_con_* calls.  Drop the mutex when doing
callbacks to avoid nested locking (the callback may need to call something
like ceph_con_close).

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:19 -08:00
Sage Weil
529cfcc46f ceph: unregister canceled/timed out osd requests
Canceled or timed out osd requests were getting left in the request list
and never deallocated (until umount).  Unregister if they are canceled
(control-c) or time out.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:19 -08:00
Sage Weil
e0e3271074 ceph: only unregister registered bdi
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:18 -08:00
Sage Weil
5dacf09121 ceph: do not touch_caps while iterating over caps list
Avoid confusing iterate_session_caps(), flag the session while we are
iterating so that __touch_cap does not rearrange items on the list.

All other modifiers of session->s_caps do so under the protection of
s_mutex.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:14 -08:00
Andrew Morton
3ebfdf885a jbd2: don't use __GFP_NOFAIL in journal_init_common()
It triggers the warning in get_page_from_freelist(), and it isn't
appropriate to use __GFP_NOFAIL here anyway.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14843

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 08:05:15 -05:00
Eric Sandeen
c8afb44682 ext4: flush delalloc blocks when space is low
Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:

for i in `seq 1 22500`; do
    echo -n > $SCRATCH_MNT/$i
    echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done

leads to ENOSPC even though after a sync, 40% of the fs is free
again.

This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.

When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.

This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.

We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:58:12 -05:00
Eric Sandeen
17bd55d037 fs-writeback: Add helper function to start writeback if idle
ext4, at least, would like to start pushing on writeback if it starts
to get close to ENOSPC when reserving worst-case blocks for delalloc
writes.  Writing out delalloc data will convert those worst-case
predictions into usually smaller actual usage, freeing up space
before we hit ENOSPC based on this speculation.

Thanks to Jens for the suggestion for the helper function,
& the naming help.

I've made the helper return status on whether writeback was
started even though I don't plan to use it in the ext4 patch;
it seems like it would be potentially useful to test this
in some cases.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
2009-12-23 07:57:07 -05:00
Julia Lawall
d3533d72e7 ext4: Eliminate potential double free on error path
b_entry_name and buffer are initially NULL, are initialized within a loop
to the result of calling kmalloc, and are freed at the bottom of this loop.
The loop contains gotos to cleanup, which also frees b_entry_name and
buffer.  Some of these gotos are before the reinitializations of
b_entry_name and buffer.  To maintain the invariant that b_entry_name and
buffer are NULL at the top of the loop, and thus acceptable arguments to
kfree, these variables are now set to NULL after the kfrees.

This seems to be the simplest solution.  A more complicated solution
would be to introduce more labels in the error handling code at the end of
the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier E;
expression E1;
iterator I;
statement S;
@@

*kfree(E);
... when != E = E1
    when != I(E,...) S
    when != &E
*kfree(E);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:52:31 -05:00
Andrew Morton
a6b43e3825 ext4: fix unsigned long long printk warning in super.c
sparc64 allmodconfig:

fs/ext4/super.c: In function `lifetime_write_kbytes_show':
fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)
fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:48:08 -05:00
Jan Kara
869835dfad quota: Improve checking of quota file header
When we are asked for vfsv0 quota format and the file is in vfsv1
format (or vice versa), refuse to use the quota file. Also return
with error when we don't like the header of quota file.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:13 +01:00
Yin Kangkai
765f836190 jbd: jbd-debug and jbd2-debug should be writable
jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not
correct. Make it writable so that we can start debuging.

Signed-off-by: Yin Kangkai <kangkai.yin@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:13 +01:00
Dmitry Monakhov
39bc680a81 ext4: fix sleep inside spinlock issue with quota and dealloc (#14739)
Unlock i_block_reservation_lock before vfs_dq_reserve_block().
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Dmitry Monakhov
d21cd8f163 ext4: Fix potential quota deadlock
We have to delay vfs_dq_claim_space() until allocation context destruction.
Currently we have following call-trace:
ext4_mb_new_blocks()
  /* task is already holding ac->alloc_semp */
 ->ext4_mb_mark_diskspace_used
    ->vfs_dq_claim_space()  /*  acquire dqptr_sem here. Possible deadlock */
 ->ext4_mb_release_context() /* drop ac->alloc_semp here */

Let's move quota claiming to ext4_da_update_reserve_space()

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.32-rc7 #18
 -------------------------------------------------------
 write-truncate-/3465 is trying to acquire lock:
  (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0

 but task is already holding lock:
  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
        [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
        [<c029c9d3>] ext4_free_blocks+0x73/0x130
        [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
        [<c02a8087>] ext4_truncate+0x187/0x5e0
        [<c01e0f7b>] vmtruncate+0x6b/0x70
        [<c022ec02>] inode_setattr+0x62/0x190
        [<c02a2d7a>] ext4_setattr+0x25a/0x370
        [<c022ee81>] notify_change+0x151/0x340
        [<c021349d>] do_truncate+0x6d/0xa0
        [<c0221034>] may_open+0x1d4/0x200
        [<c022412b>] do_filp_open+0x1eb/0x910
        [<c021244d>] do_sys_open+0x6d/0x140
        [<c021258e>] sys_open+0x2e/0x40
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #2 (&ei->i_data_sem){++++..}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c02a5787>] ext4_get_blocks+0x47/0x450
        [<c02a74c1>] ext4_getblk+0x61/0x1d0
        [<c02a7a7f>] ext4_bread+0x1f/0xa0
        [<c02bcddc>] ext4_quota_write+0x12c/0x310
        [<c0262d23>] qtree_write_dquot+0x93/0x120
        [<c0261708>] v2_write_dquot+0x28/0x30
        [<c025d3fb>] dquot_commit+0xab/0xf0
        [<c02be977>] ext4_write_dquot+0x77/0x90
        [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
        [<c025e321>] dquot_alloc_inode+0x101/0x180
        [<c029fec2>] ext4_new_inode+0x602/0xf00
        [<c02ad789>] ext4_create+0x89/0x150
        [<c0221ff2>] vfs_create+0xa2/0xc0
        [<c02246e7>] do_filp_open+0x7a7/0x910
        [<c021244d>] do_sys_open+0x6d/0x140
        [<c021258e>] sys_open+0x2e/0x40
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0526505>] mutex_lock_nested+0x65/0x2d0
        [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
        [<c02610af>] vfs_quota_on_path+0x5f/0x70
        [<c02bc812>] ext4_quota_on+0x112/0x190
        [<c026345a>] sys_quotactl+0x44a/0x8a0
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #0 (&s->s_dquot.dqptr_sem){++++..}:
        [<c017d361>] __lock_acquire+0x1091/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c025e73b>] dquot_claim_space+0x3b/0x1b0
        [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
        [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
        [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
        [<c02a5966>] ext4_get_blocks+0x226/0x450
        [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
        [<c02a6ed6>] ext4_da_writepages+0x506/0x790
        [<c01de272>] do_writepages+0x22/0x50
        [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
        [<c01d7b9b>] filemap_flush+0x2b/0x30
        [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
        [<c029e595>] ext4_release_file+0x75/0xb0
        [<c0216b59>] __fput+0xf9/0x210
        [<c0216c97>] fput+0x27/0x30
        [<c02122dc>] filp_close+0x4c/0x80
        [<c014510e>] put_files_struct+0x6e/0xd0
        [<c01451b7>] exit_files+0x47/0x60
        [<c0146a24>] do_exit+0x144/0x710
        [<c0147028>] do_group_exit+0x38/0xa0
        [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
        [<c0102849>] do_notify_resume+0xb9/0x890
        [<c01032d2>] work_notifysig+0x13/0x21

 other info that might help us debug this:

 3 locks held by write-truncate-/3465:
  #0:  (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
  #1:  (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
  #2:  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370

 stack backtrace:
 Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
 Call Trace:
  [<c0524cb3>] ? printk+0x1d/0x22
  [<c017ac9a>] print_circular_bug+0xca/0xd0
  [<c017d361>] __lock_acquire+0x1091/0x1260
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c017d5ea>] lock_acquire+0xba/0xd0
  [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
  [<c0527191>] down_read+0x51/0x90
  [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
  [<c025e73b>] dquot_claim_space+0x3b/0x1b0
  [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
  [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
  [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
  [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c052712c>] ? down_write+0x8c/0xa0
  [<c02a5966>] ext4_get_blocks+0x226/0x450
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c017908b>] ? trace_hardirqs_off+0xb/0x10
  [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
  [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
  [<c01d6860>] ? find_get_pages_tag+0x0/0x180
  [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
  [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
  [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
  [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
  [<c02a6ed6>] ext4_da_writepages+0x506/0x790
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
  [<c01de272>] do_writepages+0x22/0x50
  [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
  [<c01d7b9b>] filemap_flush+0x2b/0x30
  [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
  [<c029e595>] ext4_release_file+0x75/0xb0
  [<c0216b59>] __fput+0xf9/0x210
  [<c0216c97>] fput+0x27/0x30
  [<c02122dc>] filp_close+0x4c/0x80
  [<c014510e>] put_files_struct+0x6e/0xd0
  [<c01451b7>] exit_files+0x47/0x60
  [<c0146a24>] do_exit+0x144/0x710
  [<c017b163>] ? lock_release_holdtime+0x33/0x210
  [<c0528137>] ? _spin_unlock_irq+0x27/0x30
  [<c0147028>] do_group_exit+0x38/0xa0
  [<c017babb>] ? trace_hardirqs_on+0xb/0x10
  [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
  [<c0102849>] do_notify_resume+0xb9/0x890
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c017b163>] ? lock_release_holdtime+0x33/0x210
  [<c0165b50>] ? autoremove_wake_function+0x0/0x50
  [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
  [<c017babb>] ? trace_hardirqs_on+0xb/0x10
  [<c0300ba4>] ? security_file_permission+0x14/0x20
  [<c0215761>] ? vfs_write+0x131/0x190
  [<c0214f50>] ? do_sync_write+0x0/0x120
  [<c0103115>] ? sysenter_do_call+0x27/0x32
  [<c01032d2>] work_notifysig+0x13/0x21

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Jan Kara
82fdfa928c quota: Fix 64-bit limits setting on 32-bit archs
Fix warnings:
fs/quota/quota_v2.c: In function ‘v2_read_file_info’:
fs/quota/quota_v2.c:123: warning: integer constant is too large for ‘long’ type
fs/quota/quota_v2.c:124: warning: integer constant is too large for ‘long’ type

Reported-by: Jerry Leo <jerryleo860202@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Eric Sandeen
96d2a495c2 ext3: Replace lock/unlock_super() with an explicit lock for resizing
Use a separate lock to protect s_groups_count and the other block
group descriptors which get changed via an on-line resize operation,
so we can stop overloading the use of lock_super().

Port of ext4 commit 32ed5058ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Eric Sandeen
b8a052d016 ext3: Replace lock/unlock_super() with an explicit lock for the orphan list
Use a separate lock to protect the orphan list, so we can stop
overloading the use of lock_super().

Port of ext4 commit 3b9d4ed266
by Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:11 +01:00
Eric Sandeen
4854a5f0cb ext3: ext3_mark_recovery_complete() doesn't need to use lock_super
The function ext3_mark_recovery_complete() is called from two call
paths: either (a) while mounting the filesystem, in which case there's
no danger of any other CPU calling write_super() until the mount is
completed, and (b) while remounting the filesystem read-write, in
which case the fs core has already locked the superblock.  This also
allows us to take out a very vile unlock_super()/lock_super() pair in
ext3_remount().

Port of ext4 commit a63c9eb2ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:11 +01:00
Eric Sandeen
ed505ee454 ext3: Remove outdated comment about lock_super()
ext3_fill_super() is no longer called by read_super(), and it is no
longer called with the superblock locked.  The
unlock_super()/lock_super() is no longer present, so this comment is
entirely superfluous.

Port of ext4 commit 32ed5058ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:43:50 +01:00
Dmitry Monakhov
dc52dd3a3a quota: Move duplicated code to separate functions
- for(..) { mark_dquot_dirty(); } -> mark_all_dquot_dirty()
- for(..) { dput(); }             -> dqput_all()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:55 +01:00
Dmitry Monakhov
a9e7f44720 ext4: Convert to generic reserved quota's space management.
This patch also fixes write vs chown race condition.

Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:55 +01:00
Dmitry Monakhov
fd8fbfc170 quota: decouple fs reserved space from quota reservation
Currently inode_reservation is managed by fs itself and this
reservation is transfered on dquot_transfer(). This means what
inode_reservation must always be in sync with
dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result
in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be
triggered)
This is not easy because of complex locking order issues
for example http://bugzilla.kernel.org/show_bug.cgi?id=14739

The patch introduce quota reservation field for each fs-inode
(fs specific inode is used in order to prevent bloating generic
vfs inode). This reservation is managed by quota code internally
similar to i_blocks/i_bytes and may not be always in sync with
internal fs reservation.

Also perform some code rearrangement:
- Unify dquot_reserve_space() and dquot_reserve_space()
- Unify dquot_release_reserved_space() and dquot_free_space()
- Also this patch add missing warning update to release_rsv()
  dquot_release_reserved_space() must call flush_warnings() as
  dquot_free_space() does.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:54 +01:00
Dmitry Monakhov
b462707e7c Add unlocked version of inode_add_bytes() function
Quota code requires unlocked version of this function. Off course
we can just copy-paste the code, but copy-pasting is always an evil.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:54 +01:00
Dmitry Monakhov
c459001fa4 ext3: quota macros cleanup [V2]
Currently all quota block reservation macros contains hardcoded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:54 +01:00
Theodore Ts'o
cc3e1bea5d ext4, jbd2: Add barriers for file systems with exernal journals
This is a bit complicated because we are trying to optimize when we
send barriers to the fs data disk.  We could just throw in an extra
barrier to the data disk whenever we send a barrier to the journal
disk, but that's not always strictly necessary.

We only need to send a barrier during a commit when there are data
blocks which are must be written out due to an inode written in
ordered mode, or if fsync() depends on the commit to force data blocks
to disk.  Finally, before we drop transactions from the beginning of
the journal during a checkpoint operation, we need to guarantee that
any blocks that were flushed out to the data disk are firmly on the
rust platter before we drop the transaction from the journal.

Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 06:52:08 -05:00
Alan Cox
28ba0ec64c jfs: Fix 32bit build warning
loff_t is a type that isn't entirely dependant upon 32 v 64bit choice

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:35 -05:00
Al Viro
5300990c03 Sanitize f_flags helpers
* pull ACC_MODE to fs.h; we have several copies all over the place
* nightmarish expression calculating f_mode by f_flags deserves a helper
too (OPEN_FMODE(flags))

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:34 -05:00
Al Viro
482928d59d Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3)
Just set f_flags when shoving struct file into nameidata; don't
postpone that until __dentry_open().  do_filp_open() has correct
value; lookup_instantiate_filp() doesn't - we lose the difference
between O_RDWR and 3 by that point.

We still set .intent.open.flags, so no fs code needs to be changed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:34 -05:00
Roland Dreier
628ff7c1d8 anonfd: Allow making anon files read-only
It seems a couple places such as arch/ia64/kernel/perfmon.c and
drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile()
instead of a private pseudo-fs + alloc_file(), if only there were a way
to get a read-only file.  So provide this by having anon_inode_getfile()
create a read-only file if we pass O_RDONLY in flags.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:34 -05:00
Arnd Bergmann
ed2617585f fs/compat_ioctl.c: fix build error when !BLOCK
No driver uses SG_SET_TRANSFORM any more in Linux, since the ide-scsi
driver was removed in 2.6.29. The compat-ioctl cleanup series moved
the handling for this around, which broke building without CONFIG_BLOCK.

Just remove the code handling it for compat mode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:33 -05:00
Roland Dreier
385e3ed4f0 alloc_file(): simplify handling of mnt_clone_write() errors
When alloc_file() and init_file() were combined, the error handling of
mnt_clone_write() was taken into alloc_file() in a somewhat obfuscated
way.  Since we don't use the error code for anything except warning,
we might as well warn directly without an extra variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-22 12:27:33 -05:00
Sage Weil
7067f797b8 ceph: fix incremental osdmap pg_temp decoding bug
An incremental pg_temp wasn't being decoded properly (wrong bound on
for loop).

Also remove unused local variable, while we're at it.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:40:00 -08:00