Commit Graph

18327 Commits

Author SHA1 Message Date
Sage Weil
6a026589ba ceph: fix sync read eof check deadlock
If a sync read gets a short result from the OSD, it may need to do a
getattr to see if it is short due to reaching end-of-file.  The getattr
was being done while holding a reference to FILE_RD, which can lead to
a deadlock if the MDS is revoking that capability bit and can't process
the getattr until it does.

We fix this by setting a flag if EOF size validation is needed, and doing
the getattr in ceph_aio_read, after the RD cap ref is dropped.  If the
read needs to be continued, we loop and continue traversing the file.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:53 -08:00
Sage Weil
68c283236a ceph: do not retain caps that are being revoked
Never retain caps in __send_cap() that are being revoked.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:52 -08:00
Sage Weil
cbd0363591 ceph: cap revocation fixes
Try to invalidate pages in ceph_check_caps() if FILE_CACHE is being
revoked.  If we fail, queue an immediate async invalidate if FILE_CACHE
is being revoked.  (If it's not being revoked, we just queue the caps
for later evaluation later, as per the old behavior.)

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:52 -08:00
Yehuda Sadeh
29065a513a ceph: sync read/write considers page cache
In the cases where we either do a sync read or a write, we
need to make sure that everything in the page cache is flushed.
In the case of a sync write we invalidate the relevant pages,
so that subsequent read/write reflects the new data written.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:51 -08:00
Yehuda Sadeh
3d497d858a ceph: fix truncation when not holding caps
A truncation should occur when either we have the
specified caps for the file, or (in cases where we are
not the only ones referencing the file) when it is mapped
or when it is opened. The latter two cases were not
handled.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:51 -08:00
Yehuda Sadeh
4af6b2257e ceph: refactor ceph_write_begin, fix ceph_page_mkwrite
Originally ceph_page_mkwrite called ceph_write_begin, hoping that
the returned locked page would be the page that it was requested
to mkwrite. Factored out relevant part of ceph_page_mkwrite and
we lock the right page anyway.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:50 -08:00
Yehuda Sadeh
972f0d3ab1 ceph: fix short synchronous reads
Zeroing of holes was not done correctly: page_off was miscalculated and
zeroing the tail didn't not adjust the 'read' value to include the zeroed
portion.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:49 -08:00
Sage Weil
02f90c6109 ceph: add uid field to ceph_pg_pool
Also verify encoding version as we go.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:49 -08:00
Yehuda Sadeh
f5a2041bd9 ceph: put unused osd connections on lru
Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:48 -08:00
Yehuda Sadeh
b056c8769d ceph: remove unused variable
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:48 -08:00
Sage Weil
ec0994e48e ceph: add support for auth_x authentication protocol
The auth_x protocol implements support for a kerberos-like mutual
authentication infrastructure used by Ceph.  We do not simply use vanilla
kerberos because of scalability and performance issues when dealing with
a large cluster of nodes providing a single logical service.

Auth_x provides mutual authentication of client and server and protects
against replay and man in the middle attacks.  It does not encrypt
the full session over the wire, however, so data payload may still be
snooped.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:45 -08:00
Sage Weil
07c8739c52 ceph: add struct version to auth encoding
Inlucde struct version in encoding. This will streamline future protocol
changes.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10 15:04:48 -08:00
Sage Weil
9bd2e6f8ba ceph: allow renewal of auth credentials
Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10 15:04:47 -08:00
Sage Weil
8b6e4f2d8b ceph: aes crypto and base64 encode/decode helpers
Helpers to encrypt/decrypt AES and base64.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10 15:04:46 -08:00
Sage Weil
c7e337d649 ceph: buffer decoding helpers
Helper for decoding into a ceph_buffer, and other misc decoding helpers
we will need.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10 15:04:39 -08:00
Li Zefan
66655de6d1 seq_file: Add helpers for iteration over a hlist
Some places in kernel need to iterate over a hlist in seq_file,
so provide some common helpers.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-10 11:12:06 -08:00
Arnd Bergmann
f79f118528 compat_ioctl: ignore RAID_VERSION ioctl
md ioctls are now handled by the md driver itself, but mdadm
may call RAID_VERSION on other devices as well. Mark the command
as IGNORE_IOCTL so this fails silently rather than printing
an annoying message.

Reported-by: "Michael S. Tsirkin" <m.s.tsirkin@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-10 07:36:16 -08:00
Linus Torvalds
5551638acb Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix dentry hash calculation for case-insensitive mounts
  [CIFS] Don't cache timestamps on utimes due to coarse granularity
  [CIFS] Maximum username length check in session setup does not match
  cifs: fix length calculation for converted unicode readdir names
  [CIFS] Add support for TCP_NODELAY
2010-02-10 07:16:44 -08:00
Kevin Dankwardt
eeb5b4ae81 fat: Fix stat->f_namelen
I found that the length of a file name when created cannot exceed 255
characters, yet, pathconf(), via statfs(), returns the maximum as 260.

Signed-off-by: Kevin Dankwardt <k@kcomputing.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2010-02-10 23:49:08 +09:00
Chuck Lever
f895c53f8a NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
For NFSv2 and v3:

O_DIRECT writes are always synchronous, and aren't cached, so nothing
should be flushed when closing an NFS O_DIRECT file descriptor.  Thus
there are no write errors to report on close(2).

In addition, there's no cached data to verify on the next open(2),
so we don't need clean GETATTR results at close time to compare with.

Thus, there's no need for the nfs_revalidate_inode() call when closing
an NFS O_DIRECT file.  This reduces the number of synchronous
on-the-wire requests for a simple open-write-close of an NFS O_DIRECT
file by roughly 20%.

For NFSv4:

Call nfs4_do_close() with wait set to zero when closing an NFS
O_DIRECT file.  The CLOSE will go on the wire, but the application
won't wait for it to complete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:05 -05:00
Chuck Lever
7e381172cf NFS: Improve NFS iostat byte count accuracy for writes
The bytes counted by the performance counters for NFS writes should
reflect write and sync errors.  If the write(2) system call reports
an error, the bytes should not be counted.  And, if the write is
short, the actual number of bytes that was written should be counted,
not the number of bytes that was requested.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:04 -05:00
Chuck Lever
aa2f1ef10e NFS: Account for NFS bytes read via the splice API
Bytes read via the splice API should be accounted for in the NFS
performance statistics.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:03 -05:00
Chuck Lever
4184dcf2db NFS: Fix byte accounting for generic NFS reads
Currently, the NFS I/O counters count the number of bytes requested
by applications, rather than the number of bytes actually read by the
system calls.

The number of bytes requested for reads is actually not that useful,
because the value is usually a buffer size for reads.  That is, that
requested number is usually a maximum, and frequently doesn't reflect
the actual number of bytes read.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:03 -05:00
Chuck Lever
c2459dc462 NFS: Proper accounting for NFS VFS calls
Nit: The VFSOPEN and VFSFLUSH counters are function call counters.
Count every call to these routines.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:02 -05:00
Andy Adamson
9733f0d928 nfs41: cleanup callback code to use __be32 type
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:01 -05:00
Andy Adamson
41f54a5548 nfs41: clear NFS4CLNT_RECALL_SLOT bit on session reset
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:00 -05:00
Andy Adamson
bae0ac0ee1 nfs41: fix nfs4_callback_recallslot
Return NFS4_OK if target high slotid equals enforced high slotid.
Fix nfs_client reference leak.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:00 -05:00
Andy Adamson
104aeba484 nfs41: resize slot table in reset
When session is reset, client can renegotiate slot table size.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:59 -05:00
Andy Adamson
b9efa1b27e nfs41: implement cb_recall_slot
Drain the fore channel and reset the max_slots to the new value.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:59 -05:00
Andy Adamson
4911096f1a nfs41: back channel drc minimal implementation
For now the back channel ca_maxresponsesize_cached is 0 and there is no
backchannel DRC. Return NFS4ERR_REP_TOO_BIG_TO_CACHE when a cb_sequence
cachethis is true.  When it is false, return NFS4ERR_RETRY_UNCACHED_REP as the
next operation error.

Remember the replay error accross compound operation processing.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:58 -05:00
Andy Adamson
b2f28bd783 nfs41: prepare for back channel drc
Make all cb_sequence arguments available to verify_seqid which will make
replay decisions.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:58 -05:00
Andy Adamson
e95e60daee nfs41: remove uneeded checks in callback processing
All callback operations have arguments to decode and require processing.
The preprocess_nfs4X_op functions catch unsupported or illegal ops so
decode_args and process_op pointers are always non NULL.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:57 -05:00
Andy Adamson
b92b301900 nfs41: directly encode back channel error
Skip all other processing when error is encountered.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Andy Adamson
31d2b4356b nfs41: fix wrong error on callback header xdr overflow
Set NFS4ERR_RESOURCE as CB_COMPOUND status and do not return an op on
decode_op_hdr or encode_op_hdr buffer overflow.

NFS4ERR_RESOURCE is correct for v4.0. Will fix the return for v4.1 along with
all the other NFS4ERR_RESOURCE errors in a later patch.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Mike Sager
72ce2b3c06 nfs41: Process callback's referring call list
If a CB_SEQUENCE referring call triple matches a slot table entry, the
client is still waiting for a response to the original request.  In this
case, return NFS4ERR_DELAY as the response to the callback.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager
a7989c3e47 nfs41: Check slot table for referring calls
Traverse a list of referring calls and look for a session/slot/seq number
match.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager
8e0d46e138 nfs41: Adjust max cache response size value
For the CREATE_SESSION attribute ca_maxresponsesize_cached, calculate
the value based on the rpc reply header size plus the maximum nfs compound
reply size.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:54 -05:00
Jeff Layton
97cefcc6d0 nfs: handle NFSv2 -EKEYEXPIRED returns from RPC layer appropriately
Add a wrapper around rpc_call_sync that handles -EKEYEXPIRED errors from
the RPC layer as it would an -EJUKEBOX error if NFSv2 had such a thing.
Also, add a handler for that error for async calls that makes it
resubmit the RPC on -EKEYEXPIRED.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:52 -05:00
Jeff Layton
b68d69b8c6 nfs: handle NFSv3 -EKEYEXPIRED errors as we would -EJUKEBOX
We're using -EKEYEXPIRED to indicate that a krb5 credcache contains an
expired ticket and that we should have the NFS layer retry the RPC call
instead of returning an error back to the caller. Handle this as we
would an -EJUKEBOX error return.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:51 -05:00
Jeff Layton
2c6434888c nfs4: handle -EKEYEXPIRED errors from RPC layer
If a KRB5 TGT ticket expires, we don't want to return an error
immediatel. If someone has a long running job and just forgets to run
"kinit" in time then this will make it fail.

Instead, we want to treat this situation as we would NFS4ERR_DELAY and
retry the upcall after delaying a bit with an exponential backoff.

This patch just makes any place that would handle NFS4ERR_DELAY also
handle -EKEYEXPIRED the same way. In the future, we may want to be more
sophisticated however and handle hard vs. soft mounts differently, or
specify some upper limit on how long we'll wait for a new TGT to be
acquired.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:50 -05:00
Trond Myklebust
fdcb45777a NFS: Fix the mapping of the NFSERR_SERVERFAULT error
It was recently pointed out that the NFSERR_SERVERFAULT error, which is
designed to inform the user of a serious internal error on the server, was
being mapped to an error value that is internal to the kernel.

This patch maps it to the error EREMOTEIO, which is exported to userland
through errno.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-09 14:29:29 -05:00
Trond Myklebust
7549ad5f9b NFS: Remove a redundant check for PageFsCache in nfs_migrate_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
2010-02-09 14:29:21 -05:00
Trond Myklebust
2c1740098c NFS: Fix a bug in nfs_fscache_release_page()
Not having an fscache cookie is perfectly valid if the user didn't mount
with the fscache option.

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=15234

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: stable@kernel.org
2010-02-09 14:29:10 -05:00
Linus Torvalds
3af9cf11b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix p9_client_destroy unconditional calling v9fs_put_trans
  9p: fix memory leak in v9fs_parse_options()
  9p: Fix the kernel crash on a failed mount
  9p: fix option parsing
  9p: Include fsync support for 9p client
  net/9p: fix statsize inside twstat
  net/9p: fail when user specifies a transport which we can't find
  net/9p: fix virtio transport to correctly update status on connect
2010-02-09 11:19:06 -08:00
Jeremy Kerr
50ab2fe147 proc_devtree: include linux/of.h
Currenly, proc_devtree.c depends on asm/prom.h to include linux/of.h, to
provide some device-tree definitions (eg, struct property).

Instead, include linux/of.h directly. We still need asm/prom.h for
HAVE_ARCH_DEVTREE_FIXUPS.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-02-09 08:34:10 -07:00
Jeremy Kerr
8cfb3343f7 of: make set_node_proc_entry private to proc_devtree.c
We only need set_node_proc_entry in proc_devtree.c, so move it there.

This fixes the !HAVE_ARCH_DEVTREE_FIXUPS build, as we can't make make
the definition in linux/of.h conditional on this #define (definitions in
asm/prom.h can't be exposed to linux/of.h, due to the enforced #include
ordering).

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-02-09 08:34:10 -07:00
Daniel Mack
3ad2f3fbb9 tree-wide: Assorted spelling fixes
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-09 11:13:56 +01:00
Linus Torvalds
deb0c98c7f Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  Revert "nfsd4: fix error return when pseudoroot missing"
2010-02-08 17:08:01 -08:00
Linus Torvalds
a5f28ae4df Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/cluster: Make o2net connect messages KERN_NOTICE
  ocfs2/dlm: Fix printing of lockname
  ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
  ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
  ocfs2: Plugs race between the dc thread and an unlock ast message
  ocfs2: Remove overzealous BUG_ON during blocked lock processing
  ocfs2: Do not downconvert if the lock level is already compatible
  ocfs2: Prevent a livelock in dlmglue
  ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
  ocfs2: Use compat_ptr in reflink_arguments.
  ocfs2/dlm: Handle EAGAIN for compatibility - v2
  ocfs2: Add parenthesis to wrap the check for O_DIRECT.
  ocfs2: Only bug out when page size is larger than cluster size.
  ocfs2: Fix memory overflow in cow_by_page.
  ocfs2/dlm: Print more messages during lock migration
  ocfs2/dlm: Ignore LVBs of locks in the Blocked list
  ocfs2/trivial: Remove trailing whitespaces
  ocfs2: fix a misleading variable name
  ocfs2: Sync max_inline_data_with_xattr from tools.
  ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
2010-02-08 16:05:50 -08:00
Eric Van Hensbergen
bf2d29c64d 9p: fix memory leak in v9fs_parse_options()
If match_strdup() fail this function exits without freeing the options string.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:59:34 -06:00