Commit Graph

17344 Commits

Author SHA1 Message Date
Sage Weil
ee7fdfaff7 ceph: include preferred osd in placement seed
Mix the preferred osd (if any) into the placement seed that is fed into
the CRUSH object placement calculation.  This prevents all the placement
pgs from peering with the same osds.

Rev the osd client protocol with this change.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-19 11:42:41 -07:00
Eric Dumazet
c720c7e838 inet: rename some inet_sock fields
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 18:52:53 -07:00
Wei Yongjun
3de0ef4f20 inotify: fix coalesce duplicate events into a single event in special case
If we do rename a dir entry, like this:

  rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2")
  rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ")

The duplicate events should be coalesced into a single event. But those two
events do not be coalesced into a single event, due to some bad check in
event_compare(). It can not match the two NULL inodes as the same event.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-18 15:49:38 -04:00
Eric Paris
9f0d793b52 fsnotify: do not set group for a mark before it is on the i_list
fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to
set the group and inode for the mark.  fsnotify_destroy_mark_by_entry uses
the fact that ->group != NULL to know if this group should be destroyed or
if it's already been done.

But fsnotify_add_mark sets the group and inode before it actually adds the
mark to the i_list and g_list.  This can result in a race in inotify, it
requires 3 threads.

sys_inotify_add_watch("file")	sys_inotify_add_watch("file")	sys_inotify_rm_watch([a])
inotify_update_watch()
inotify_new_watch()
inotify_add_to_idr()
   ^--- returns wd = [a]
				inotfiy_update_watch()
				inotify_new_watch()
				inotify_add_to_idr()
				fsnotify_add_mark()
				   ^--- returns wd = [b]
				returns to userspace;
								inotify_idr_find([a])
								   ^--- gives us the pointer from task 1
fsnotify_add_mark()
   ^--- this is going to set the mark->group and mark->inode fields, but will
return -EEXIST because of the race with [b].
								fsnotify_destroy_mark()
								   ^--- since ->group != NULL we call back
									into inotify_freeing_mark() which calls
								inotify_remove_from_idr([a])

since fsnotify_add_mark() failed we call:
inotify_remove_from_idr([a])     <------WHOOPS it's not in the idr, this could
					have been any entry added later!

The fix is to make sure we don't set mark->group until we are sure the mark is
on the inode and fsnotify_add_mark will return success.

Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-18 15:49:38 -04:00
Sage Weil
8fa9765576 ceph: enable readahead
Initialized bdi->ra_pages to enable readahead.  Use 512KB default.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-16 14:44:43 -07:00
Sage Weil
76e3b390d4 ceph: move dirty caps code around
Cleanup only.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15 18:14:44 -07:00
Sage Weil
8f3bc053c6 ceph: warn on allocation from msgpool with larger front_len
Pass the front_len we need when pulling a message off a msgpool,
and WARN if it is greater than the pool's size.  Then try to
allocate a new message (to continue without failing).

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15 18:14:43 -07:00
Sage Weil
07bd10fb98 ceph: correct subscribe_ack msgpool payload size
Defined a struct for the SUBSCRIBE_ACK, and use that to size
the msgpool.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15 18:14:42 -07:00
Sage Weil
afcdaea3f2 ceph: flush dirty caps via the cap_dirty list
Previously we were flushing dirty caps by passing an extra flag
when traversing the delayed caps list.  Besides being a bit ugly,
that can also miss caps that are dirty but didn't result in a
cap requeue: notably, mark_caps_dirty().

Separate the flushing into a separate helper, and traverse the
cap_dirty list.

This also brings i_dirty_item in line with i_dirty_caps: we are
on the list IFF caps != 0.  We carry an inode ref IFF
dirty_caps|flushing_caps != 0.

Lose the unused return value from __ceph_mark_caps_dirty().

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15 18:14:35 -07:00
Linus Torvalds
8e7768d3f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: fix socket fd translation
  dlm: fix lowcomms_connect_node for sctp
2009-10-15 15:21:20 -07:00
Linus Torvalds
dcbeb0bec5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: always pin metadata in discard mode
  Btrfs: enable discard support
  Btrfs: add -o discard option
  Btrfs: properly wait log writers during log sync
  Btrfs: fix possible ENOSPC problems with truncate
  Btrfs: fix btrfs acl #ifdef checks
  Btrfs: streamline tree-log btree block writeout
  Btrfs: avoid tree log commit when there are no changes
  Btrfs: only write one super copy during fsync
2009-10-15 15:06:37 -07:00
Neil Brown
83db93f4de sysfs: Allow sysfs_notify_dirent to be called from interrupt context.
sysfs_notify_dirent is a simple atomic operation that can be used to
alert user-space that new data can be read from a sysfs attribute.

Unfortunately it cannot currently be called from non-process context
because of its use of spin_lock which is sometimes taken with
interrupts enabled.

So change all lockers of sysfs_open_dirent_lock to disable interrupts,
thus making sysfs_notify_dirent safe to be called from non-process
context (as drivers/md does in md_safemode_timeout).

sysfs_get_open_dirent is (documented as being) only called from
process context, so it uses spin_lock_irq.  Other places
use spin_lock_irqsave.

The usage for sysfs_notify_dirent in md_safemode_timeout was
introduced in 2.6.28, so this patch is suitable for that and more
recent kernels.

Reported-by: Joel Andres Granados <jgranado@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-14 15:16:25 -07:00
Cornelia Huck
a6a8357788 sysfs: Allow sysfs_move_dir(..., NULL) again.
As device_move() and kobject_move() both handle a NULL destination,
sysfs_move_dir() should do this as well (again) and fall back to
sysfs_root in that case.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-14 15:16:25 -07:00
Sage Weil
cdc35f9627 ceph: move generic flushing code into helper
Both callers of __mark_caps_flushing() do the same work; move it
into the helper.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-14 14:43:56 -07:00
Frederic Weisbecker
27b3a5c51b kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0()
We had a watchdog in _get_block_create_0() that jumped to a fixup retry
path in case the bkl got relaxed while calling kmap().
This is not necessary anymore since we now have a reiserfs lock that is
not implicitly relaxed while sleeping.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:34:31 +02:00
Frederic Weisbecker
205cb37b89 kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl()
The reiserfs ioctl path doesn't need the big kernel lock anymore , now
that the filesystem synchronizes through its own lock.

We can then turn reiserfs_ioctl() into an unlocked_ioctl callback.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:28:12 +02:00
Frederic Weisbecker
ac78a07893 kill-the-bkl/reiserfs: always lock the ioctl path
Reiserfs uses the ioctl callback for its file operations, which means
that its ioctl path is still locked by the bkl, this was synchronizing
with the rest of the filsystem operations. We have changed that by
locking it with the new reiserfs lock but we do that only from the
compat_ioctl callback.

Fix that by locking reiserfs_ioctl() everytime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:27:57 +02:00
Sage Weil
f2cf418cec ceph: initialize sb->s_bdi, bdi_unregister after kill_anon_super
Writeback doesn't work without the bdi set, and writeback on
umount doesn't work if we unregister the bdi too early.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-14 14:09:07 -07:00
Sage Weil
c89136ea42 ceph: convert encode/decode macros to inlines
This avoids the fugly pass by reference and makes the code a bit easier
to read.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-14 09:59:09 -07:00
Chris Mason
444528b3e6 Btrfs: always pin metadata in discard mode
We have an optimization in btrfs to allow blocks to be
immediately freed if they were allocated in this transaction and never
written.  Otherwise they are pinned and freed when the transaction
commits.

This isn't optimal for discard mode because immediately freeing
them means immediately discarding them.  It is better to give the
block to the pinning code and letting the (slow) discard happen later.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-14 10:32:50 -04:00
Christoph Hellwig
0634857488 Btrfs: enable discard support
The discard support code in btrfs currently is guarded by ifdefs for
BIO_RW_DISCARD, which is never defines as it's the name of an enum
memeber.  Just remove the useless ifdefs to actually enable the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-14 10:32:49 -04:00
Christoph Hellwig
e244a0aeb6 Btrfs: add -o discard option
Enable discard by default is not a good idea given the the trim speed
of SSD prototypes we've seen, and the carecteristics for many high-end
arrays.  Turn of discards by default and require the -o discard option
to enable them on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-14 10:32:49 -04:00
Yan, Zheng
86df7eb921 Btrfs: properly wait log writers during log sync
A recently fsync optimization make btrfs_sync_log skip calling
wait_for_writer in the single log writer case. This is incorrect
since the writer count can also be increased by btrfs_pin_log.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-14 10:32:48 -04:00
Josef Bacik
5d5e103a70 Btrfs: fix possible ENOSPC problems with truncate
There's a problem where we don't do any space reservation for truncates, which
can cause you to OOPs because you will be allowed to go off in the weeds a bit
since we don't account for the delalloc bytes that are created as a result of
the truncate.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-14 10:32:47 -04:00
Alex Elder
ba313e68fa Merge branch 'master' of ssh://oss.sgi.com/oss/git/xfs/xfs into for-linus 2009-10-13 15:47:22 -05:00
Sage Weil
535bbb5307 ceph: add version field to message header
This makes it easier for individual message types to indicate
their particular encoding, and make future changes backward
compatible.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-13 12:55:26 -07:00
Christoph Hellwig
05277c75f6 xfs: fix double IRELE in xfs_dqrele_inode
xfs_dqrele_inode calls xfs_iput to release the ilock and a reference
and then also calls IRELE which does a second decrement of the reference
count.  This leads to a premature freeing of inodes when quotas were turned
off while the filesystem was mounted.

Thanks to Utako Kusaka for reporting the bug and provinding a good testcase.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Utako Kusaka <u-kusaka@wm.jp.nec.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-13 13:16:36 -05:00
Chris Mason
0eda294dfc Btrfs: fix btrfs acl #ifdef checks
The btrfs acl code was #ifdefing for a define
that didn't exist.  This correctly matches it
to the values used by the Kconfig file.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13 13:51:39 -04:00
Chris Mason
690587d109 Btrfs: streamline tree-log btree block writeout
Syncing the tree log is a 3 phase operation.

1) write and wait for all the tree log blocks for a given root.

2) write and wait for all the tree log blocks for the
tree of tree log roots.

3) write and wait for the super blocks (barriers here)

This isn't as efficient as it could be because there is
no requirement to wait for the blocks from step one to hit the disk
before we start writing the blocks from step two.  This commit
changes the sequence so that we don't start waiting until
all the tree blocks from both steps one and two have been sent
to disk.

We do this by breaking up btrfs_write_wait_marked_extents into
two functions, which is trivial because it was already broken
up into two parts.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13 13:35:12 -04:00
Chris Mason
257c62e1bc Btrfs: avoid tree log commit when there are no changes
rpm has a habit of running fdatasync when the file hasn't
changed.  We already detect if a file hasn't been changed
in the current transaction but it might have been sent to
the tree-log in this transaction and not changed since
the last call to fsync.

In this case, we want to avoid a tree log sync, which includes
a number of synchronous writes and barriers.  This commit
extends the existing tracking of the last transaction to change
a file to also track the last sub-transaction.

The end result is that rpm -ivh and -Uvh are roughly twice as fast,
and on par with ext3.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13 13:35:12 -04:00
Chris Mason
4722607db6 Btrfs: only write one super copy during fsync
During a tree-log commit for fsync, we've been writing at least
two copies of the super block and forcing them to disk.

The other filesystems write only one, and this change brings us on
par with them.  A full transaction commit will write all the super
copies, so we still have redundant info written on a regular
basis.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13 13:35:11 -04:00
Linus Torvalds
80f506918f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cciss: Add cciss_allow_hpsa module parameter
  cciss: Fix multiple calls to pci_release_regions
  blk-settings: fix function parameter kernel-doc notation
  writeback: kill space in debugfs item name
  writeback: account IO throttling wait as iowait
  elv_iosched_store(): fix strstrip() misuse
  cfq-iosched: avoid probable slice overrun when idling
  cfq-iosched: apply bool value where we return 0/1
  cfq-iosched: fix think time allowed for seekers
  cfq-iosched: fix the slice residual sign
  cfq-iosched: abstract out the 'may this cfqq dispatch' logic
  block: use proper BLK_RW_ASYNC in blk_queue_start_tag()
  block: Seperate read and write statistics of in_flight requests v2
  block: get rid of kblock_schedule_delayed_work()
  cfq-iosched: fix possible problem with jiffies wraparound
  cfq-iosched: fix issue with rq-rq merging and fifo list ordering
2009-10-13 10:21:33 -07:00
Theodore Ts'o
96ec2e0a71 ext3: Don't update superblock write time when filesystem is read-only
This avoids updating the superblock write time when we are mounting
the root file system read/only but we need to replay the journal; at
that point, for people who are east of GMT and who make their clock
tick in localtime for Windows bug-for-bug compatibility, and this will
cause e2fsck to complain and force a full file system check.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-10-13 00:06:43 +02:00
Sage Weil
572033069d ceph: remove unused CEPH_MSG_{OSD,MDS}_GETMAP
Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-12 10:29:44 -07:00
Sage Weil
8fc57da4d3 ceph: ignore trailing data in monamp
This lets us extend the format more easily.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-12 10:29:24 -07:00
Stefan Richter
a1be9eee29 NFS: suppress a build warning
struct sockaddr_storage * can safely be used as struct sockaddr *.
Suppress an "incompatible pointer type" warning.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-12 10:25:12 -07:00
Tetsuo Handa
a27ab9f26b LSM: Pass original mount flags to security_sb_mount().
This patch allows LSM modules to determine based on original mount flags
passed to mount(). A LSM module can get masked mount flags (if needed) by

	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
		   MS_STRICTATIME);

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
2009-10-12 10:56:03 +11:00
Tetsuo Handa
8b8efb4403 LSM: Add security_path_chroot().
This patch allows pathname based LSM modules to check chroot() operations.

This hook is used by TOMOYO.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
2009-10-12 10:56:02 +11:00
Tetsuo Handa
89eda06837 LSM: Add security_path_chmod() and security_path_chown().
This patch allows pathname based LSM modules to check chmod()/chown()
operations. Since notify_change() does not receive "struct vfsmount *",
we add security_path_chmod() and security_path_chown() to the caller of
notify_change().

These hooks are used by TOMOYO.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
2009-10-12 10:56:00 +11:00
Bernd Schmidt
ef1f7a7e87 ROMFS: fix length used with romfs_dev_strnlen() function
An interestingly corrupted romfs file system exposed a problem with the
romfs_dev_strnlen function: it's passing the wrong value to its helpers.
Rather than limit the string to the length passed in by the callers, it
uses the size of the device as the limit.

Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-11 11:33:56 -07:00
Linus Torvalds
474a503d4b Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix file clone ioctl for bookend extents
  Btrfs: fix uninit compiler warning in cow_file_range_nocow
  Btrfs: constify dentry_operations
  Btrfs: optimize back reference update during btrfs_drop_snapshot
  Btrfs: remove negative dentry when deleting subvolumne
  Btrfs: optimize fsync for the single writer case
  Btrfs: async delalloc flushing under space pressure
  Btrfs: release delalloc reservations on extent item insertion
  Btrfs: delay clearing EXTENT_DELALLOC for compressed extents
  Btrfs: cleanup extent_clear_unlock_delalloc flags
  Btrfs: fix possible softlockup in the allocator
  Btrfs: fix deadlock on async thread startup
2009-10-11 11:23:13 -07:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Sage Weil
752727a1b2 ceph: add file layout validation
This tracks updates to code shared with userspace.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 16:39:30 -07:00
Sage Weil
13e38c8ae7 ceph: update to mon client protocol v15
The mon request headers now include session_mon information that must
be properly initialized.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 16:39:27 -07:00
Linus Torvalds
4047df09a1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  ima: ecryptfs fix imbalance message
  eCryptfs: Remove Kconfig NET dependency and select MD5
  ecryptfs: depends on CRYPTO
2009-10-09 13:30:14 -07:00
Linus Torvalds
a372bf8b6a Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: stop calling filemap_fdatawait inside ->fsync
  fix readahead calculations in xfs_dir2_leaf_getdents()
  xfs: make sure xfs_sync_fsdata covers the log
  xfs: mark inodes dirty before issuing I/O
  xfs: cleanup ->sync_fs
  xfs: fix xfs_quiesce_data
  xfs: implement ->dirty_inode to fix timestamp handling
2009-10-09 13:29:42 -07:00
Sage Weil
266673db42 ceph: cancel osd requests before resending them
This ensures we don't submit the same request twice if we are kicking a
specific osd (as with an osd_reset), or when we hit a transient error and
resend.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 11:58:20 -07:00
Sage Weil
81b024e70f ceph: reset osd session on fault, not peer_reset
The peer_reset just takes longer (until we reconnect and discover the osd
dropped the session... which it will).

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 11:58:15 -07:00
Sage Weil
991abb6ecf ceph: fail gracefully on corrupt osdmap (bad pg_temp mapping)
Return an error and report a corrupt map instead of crying BUG().

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 11:58:11 -07:00
Sage Weil
0ba6478df7 ceph: revoke osd request message on request completion
If an osd has failed or returned and a request has been sent twice, it's
possible to get a reply and unregister the request while the request
message is queued for delivery.  Since the message references the caller's
page vector, we need to revoke it before completing.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09 11:58:07 -07:00