Commit Graph

18327 Commits

Author SHA1 Message Date
Sage Weil
30dc6381bb ceph: fix error paths for corrupt osdmap messages
Both osdmap_decode() and osdmap_apply_incremental() should never return
NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:59 -08:00
Sage Weil
5de7bf8afa ceph: do not drop lease during revalidate
We need to hold session s_mutex for __ceph_mdsc_drop_dentry_lease(), which
we don't, so skip it.  It was purely an optimization.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:58 -08:00
Sage Weil
c4a29f26d5 ceph: ensure rename target dentry fails revalidation
This works around a bug in vfs_rename_dir() that rehashes the target
dentry.  Ensure such dentries always fail revalidation by timing out the
dentry lease and kicking it out of the current directory lease gen.

This can be reverted when the vfs bug is fixed.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:57 -08:00
Yehuda Sadeh
2baba25019 ceph: writeback congestion control
Set bdi congestion bit when amount of write data in flight exceeds adjustable
threshold.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:56 -08:00
Yehuda Sadeh
dbd646a851 ceph: writepage grabs and releases inode
Fixes a deadlock that is triggered due to kswapd,
while the page was locked and the iput couldn't tear
down the address space.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2009-12-21 16:39:56 -08:00
Yehuda Sadeh
169e16ce81 ceph: remove unaccessible code
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2009-12-21 16:39:55 -08:00
Sage Weil
06edf046dd ceph: include link to bdi in debugfs
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:54 -08:00
Sage Weil
e2885f06ce ceph: make mds ops interruptible
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:53 -08:00
Sage Weil
cf3e5c409b ceph: plug leak of incoming message during connection fault/close
If we explicitly close a connection, or there is a socket error, we need
to drop any partially received message.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:53 -08:00
Sage Weil
9ec7cab14e ceph: hex dump corrupt server data to KERN_DEBUG
Also, print fsid using standard format, NOT hex dump.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:52 -08:00
Yehuda Sadeh
93c20d98c2 ceph: fix msgpool reservation leak
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2009-12-21 16:39:51 -08:00
Sage Weil
b3d1dbbdd5 ceph: don't save sent messages on lossy connections
For lossy connections we drop all state on socket errors, so there is no
reason to keep sent ceph_msg's around.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:50 -08:00
Sage Weil
92ac41d0a4 ceph: detect lossy state of connection
The server indicates whether a connection is lossy; set our LOSSYTX bit
appropriately.  Do not set lossy bit on outgoing connections.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:49 -08:00
Sage Weil
5e095e8b40 ceph: plug msg leak in con_fault
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:49 -08:00
Sage Weil
c86a2930cc ceph: carry explicit msg reference for currently sending message
Carry a ceph_msg reference for connection->out_msg.  This will allow us to
make out_sent optional.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:38 -08:00
J. Bruce Fields
3d354cbc43 nfsd: fix "insecure" export option
A typo in 12045a6ee9 "nfsd: let "insecure" flag vary by
pseudoflavor" reversed the sense of the "insecure" flag.

Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-20 20:19:51 -08:00
J. Bruce Fields
f69ac2f5a3 nfsd: fix "insecure" export option
A typo in 12045a6ee9 "nfsd: let "insecure" flag vary by
pseudoflavor" reversed the sense of the "insecure" flag.

Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-20 10:22:58 -05:00
Linus Torvalds
aac3d39693 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
  sched: Fix broken assertion
  sched: Assert task state bits at build time
  sched: Update task_state_arraypwith new states
  sched: Add missing state chars to TASK_STATE_TO_CHAR_STR
  sched: Move TASK_STATE_TO_CHAR_STR near the TASK_state bits
  sched: Teach might_sleep() about preemptible RCU
  sched: Make warning less noisy
  sched: Simplify set_task_cpu()
  sched: Remove the cfs_rq dependency from set_task_cpu()
  sched: Add pre and post wakeup hooks
  sched: Move kthread_bind() back to kthread.c
  sched: Fix select_task_rq() vs hotplug issues
  sched: Fix sched_exec() balancing
  sched: Ensure set_task_cpu() is never called on blocked tasks
  sched: Use TASK_WAKING for fork wakups
  sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE
  sched: Fix task_hot() test order
  sched: Fix set_cpu_active() in cpu_down()
  sched: Mark boot-cpu active before smp_init()
  sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
  ...
2009-12-19 09:47:49 -08:00
Tao Ma
10cf1a02f4 ocfs2: Set i_nlink properly during reflink.
We create a file in orphan dir for reflink so that if there
is any error, we don't create any wrong dentry in the dir.
But actually the file in orphan dir should be i_nlink = 0
so that it can be replayed and freed successfully.

This patch first set i_nlink to 0 when creating the file in
orphan dir and then set it to 1(reflink now only works for
regular file) when we move it to the dest dir.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-18 13:32:28 -08:00
Tao Ma
c7d260afcb ocfs2: Add reflinked file's inode to inode hash eariler.
We used to add reflinked file's inode to inode hash when
we add it to the dest dir. But actually there is a race.
Consider the following sequence.
1. reflink happens and create the inode in orphan dir.
2. reflink thread is scheduled out because of some io.
3. recovery begins to work and calls ocfs2_recover_orphans.
   It calls ocfs2_iget and get a new inode and i_count = 1.
   It calls iput then and delete inode. the buffer's
   uptodate state is cleared.

This patch move insert_inode_hash to the create function so
that it can be found by step 3 and prevented from deleting
because i_count > 1.

This resolves the bug
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1183.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-18 13:32:20 -08:00
Tristan Ye
55f4946ed2 Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?
Let userspace have a chance to get the extent info of a
directory just like extN did.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-17 21:21:32 -08:00
Sunil Mushran
faf8b70f79 ocfs2: Use FIEMAP_EXTENT_SHARED
Adds FIEMAP_EXTENT_SHARED flag to refcounted extents.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-17 20:55:59 -08:00
Coly Li
9365454016 ocfs2: replace u8 by __u8 in ocfs2_fs.h
This patch replaces date type 'u8' with '__u8', which follows the coding style of ocfs2_fs.h, and portable to user space
for ocfs2-tools.

Signed-off-by: Coly Li <coly.li@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-17 20:55:54 -08:00
Coly Li
3a05d7961e ocfs2: explicit declare uninitialized var in user_cluster_connect()
This patch explicitly declares an uninitialized local variable in user_cluster_connect(), to remove a compiling warning.

Signed-off-by: Coly Li <coly.li@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-17 20:55:52 -08:00
Linus Torvalds
7c508e50be Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: make sure fallocate properly starts a transaction
  Btrfs: make metadata chunks smaller
  Btrfs: Show discard option in /proc/mounts
  Btrfs: deny sys_link across subvolumes.
  Btrfs: fail mount on bad mount options
  Btrfs: don't add extent 0 to the free space cache v2
  Btrfs: Fix per root used space accounting
  Btrfs: Fix btrfs_drop_extent_cache for skip pinned case
  Btrfs: Add delayed iput
  Btrfs: Pass transaction handle to security and ACL initialization functions
  Btrfs: Make truncate(2) more ENOSPC friendly
  Btrfs: Make fallocate(2) more ENOSPC friendly
  Btrfs: Avoid orphan inodes cleanup during committing transaction
  Btrfs: Avoid orphan inodes cleanup while replaying log
  Btrfs: Fix disk_i_size update corner case
  Btrfs: Rewrite btrfs_drop_extents
  Btrfs: Add btrfs_duplicate_item
  Btrfs: Avoid superfluous tree-log writeout
2009-12-17 16:01:03 -08:00
Masami Hiramatsu
f6151dfea2 mm: introduce coredump parameter structure
Introduce coredump parameter data structure (struct coredump_params) to
simplify binfmt->core_dump() arguments.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 15:45:31 -08:00
Oleg Nesterov
9cd80bbb07 do_wait() optimization: do not place sub-threads on task_struct->children list
Thanks to Roland who pointed out de_thread() issues.

Currently we add sub-threads to ->real_parent->children list.  This buys
nothing but slows down do_wait().

With this patch ->children contains only main threads (group leaders).
The only complication is that forget_original_parent() should iterate over
sub-threads by hand, and de_thread() needs another list_replace() when it
changes ->group_leader.

Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
tasks, we can remove this check (and we can unify do_wait_thread() and
ptrace_do_wait()).

This change can confuse the optimistic search in mm_update_next_owner(),
but this is fixable and minor.

Perhaps badness() and oom_kill_process() should be updated, but they
should be fixed in any case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ratan Nalumasu <rnalumasu@gmail.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 15:45:31 -08:00
Mike Frysinger
0f67b0b039 nommu: ramfs: remove unused local var
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 15:45:31 -08:00
Jan Kara
ec8e2f7466 reiserfs: truncate blocks not used by a write
It can happen that write does not use all the blocks allocated in
write_begin either because of some filesystem error (like ENOSPC) or
because page with data to write has been removed from memory.  We truncate
these blocks so that we don't have dangling blocks beyond i_size.

Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 15:45:30 -08:00
Linus Torvalds
b6e3224fb2 Revert "task_struct: make journal_info conditional"
This reverts commit e4c570c4cb, as
requested by Alexey:

 "I think I gave a good enough arguments to not merge it.
  To iterate:
   * patch makes impossible to start using ext3 on EXT3_FS=n kernels
     without reboot.
   * this is done only for one pointer on task_struct"

  None of config options which define task_struct are tristate directly
  or effectively."

Requested-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 13:23:24 -08:00
Chris Mason
7a5d24b106 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus 2009-12-17 16:01:41 -05:00
Linus Torvalds
a2770d86b3 Revert "fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)"
This reverts commit e9496ff46a. Quoth Al:

 "it's dependent on a lot of other stuff not currently in mainline
  and badly broken with current fs/namespace.c.  Sorry, badly
  out-of-order cherry-pick from old queue.

  PS: there's a large pending series reworking the refcounting and
  lifetime rules for vfsmounts that will, among other things, allow to
  rip a subtree away _without_ dissolving connections in it, to be
  garbage-collected when all active references are gone.  It's
  considerably saner wrt "is the subtree busy" logics, but it's nowhere
  near being ready for merge at the moment; this changeset is one of the
  things becoming possible with that sucker, but it certainly shouldn't
  have been picked during this cycle.  My apologies..."

Noticed-by: Eric Paris <eparis@redhat.com>
Requested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 12:51:05 -08:00
Chris Mason
3a1abec9f6 Btrfs: make sure fallocate properly starts a transaction
The recent patch to make fallocate enospc friendly would send
down a NULL trans handle to the allocator.  This moves the
transaction start to properly fix things.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 15:47:17 -05:00
Chris Mason
ebfee3d71d Merge branch btrfs-master into for-linus
Conflicts:
	fs/btrfs/acl.c
2009-12-17 15:02:22 -05:00
Josef Bacik
83d3c9696f Btrfs: make metadata chunks smaller
This patch makes us a bit less zealous about making sure we have enough free
metadata space by pearing down the size of new metadata chunks to 256mb instead
of 1gb.  Also, we used to try an allocate metadata chunks when allocating data,
but that sort of thing is done elsewhere now so we can just remove it.  With my
-ENOSPC test I used to have 3gb reserved for metadata out of 75gb, now I have
1.7gb.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:38 -05:00
Matthew Wilcox
20a5239a5d Btrfs: Show discard option in /proc/mounts
Christoph's patch e244a0aeb6 doesn't display
the discard option in /proc/mounts, leading to some confusion for me.
Here's the missing bit.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:37 -05:00
TARUISI Hiroaki
4a8be425a8 Btrfs: deny sys_link across subvolumes.
I rebased Christian Parpart's patch to deny hard link across
subvolumes. Original patch modifies also btrfs_rename, but
I excluded it because we can move across subvolumes now and
it make no problem.
-----------------

Hard link across subvolumes should not allowed in Btrfs.
btrfs_link checks root of 'to' directory is same as root
of 'from' file. If not same, btrfs_link returns -EPERM.

Signed-off-by: TARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:37 -05:00
Sage Weil
a7a3f7cadd Btrfs: fail mount on bad mount options
We shouldn't silently ignore unrecognized options.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:36 -05:00
Yan, Zheng
06b2331f83 Btrfs: don't add extent 0 to the free space cache v2
If block group 0 is completely free, btrfs_read_block_groups will
add extent [0, BTRFS_SUPER_INFO_OFFSET) to the free space cache.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:36 -05:00
Yan, Zheng
86b9f2eca5 Btrfs: Fix per root used space accounting
The bytes_used field in root item was originally planned to
trace the amount of used data and tree blocks. But it never
worked right since we can't trace freeing of data accurately.
This patch changes it to only trace the amount of tree blocks.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:35 -05:00
Yan, Zheng
55ef689900 Btrfs: Fix btrfs_drop_extent_cache for skip pinned case
The check for skip pinned case is wrong, it may breaks the
while loop too soon.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:35 -05:00
Yan, Zheng
24bbcf0442 Btrfs: Add delayed iput
iput() can trigger new transactions if we are dropping the
final reference, so calling it in btrfs_commit_transaction
may end up deadlock. This patch adds delayed iput to avoid
the issue.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:35 -05:00
Yan, Zheng
f34f57a3ab Btrfs: Pass transaction handle to security and ACL initialization functions
Pass transaction handle down to security and ACL initialization
functions, so we can avoid starting nested transactions

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:34 -05:00
Yan, Zheng
8082510e71 Btrfs: Make truncate(2) more ENOSPC friendly
truncating and deleting regular files are unbound operations,
so it's not good to do them in a single transaction. This
patch makes btrfs_truncate and btrfs_delete_inode start a
new transaction after all items in a tree leaf are deleted.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:34 -05:00
Yan, Zheng
5a303d5d4b Btrfs: Make fallocate(2) more ENOSPC friendly
fallocate(2) may allocate large number of file extents, so it's not
good to do it in a single transaction. This patch make fallocate(2)
start a new transaction for each file extents it allocates.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:33 -05:00
Yan, Zheng
2e4bfab970 Btrfs: Avoid orphan inodes cleanup during committing transaction
btrfs_lookup_dentry may trigger orphan cleanup, so it's not good
to call it while committing a transaction.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:33 -05:00
Yan, Zheng
c71bf099ab Btrfs: Avoid orphan inodes cleanup while replaying log
We do log replay in a single transaction, so it's not good to do unbound
operations. This patch cleans up orphan inodes cleanup after replaying
the log. It also avoids doing other unbound operations such as truncating
a file during replaying log. These unbound operations are postponed to
the orphan inode cleanup stage.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:33 -05:00
Yan, Zheng
c216775458 Btrfs: Fix disk_i_size update corner case
There are some cases file extents are inserted without involving
ordered struct. In these cases, we update disk_i_size directly,
without checking pending ordered extent and DELALLOC bit. This
patch extends btrfs_ordered_update_i_size() to handle these cases.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17 12:33:24 -05:00
Christoph Hellwig
eaff8079d4 kill I_LOCK
After I_SYNC was split from I_LOCK the leftover is always used together with
I_NEW and thus superflous.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-17 11:03:25 -05:00
Christoph Hellwig
7a0ad10c36 fold do_sync_file_range into sys_sync_file_range
We recently go rid of all callers of do_sync_file_range as they're better
served with vfs_fsync or the filemap_write_and_wait.  Now that
do_sync_file_range is down to a single caller fold it into it so that people
don't start using it again accidentally.  While at it also switch it from
using __filemap_fdatawrite_range(..., WB_SYNC_ALL) to the more clear
filemap_fdatawrite_range().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-17 11:03:25 -05:00