If we run out of cpuid entries for extended request types
we should return -E2BIG, just like we do for the standard
request types.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use standard msr-index.h's MSR declaration.
MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When reinjecting a software interrupt or exception, use the correct
instruction length provided by the hardware instead of a hardcoded 1.
Fixes problems running the suse 9.1 livecd boot loader.
Problem introduced by commit f0a3602c20 ("KVM: Move interrupt injection
logic to x86.c").
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The wakeup tracer, when enabled, has its own function tracer.
It only traces the functions on the CPU where the task it is following
is on. If a task is woken on one CPU but then migrates to another CPU
before it wakes up, the latency tracer will then start tracing functions
on the other CPU.
To find which CPU the task is on, the wakeup function tracer performs
a task_cpu(wakeup_task). But to make sure the task does not disappear
it grabs the wakeup_lock, which is also taken when the task wakes up.
By taking this lock, the function tracer does not need to worry about
the task being freed as it checks its cpu.
Jan Blunck found a problem with this approach on his 32 CPU box. When
a task is being traced by the wakeup tracer, all functions take this
lock. That means that on all 32 CPUs, each function call is taking
this one lock to see if the task is on that CPU. This lock has just
serialized all functions on all 32 CPUs. Needless to say, this caused
major issues on that box. It would even lockup.
This patch changes the wakeup latency to insert a probe on the migrate task
tracepoint. When a task changes its CPU that it will run on, the
probe will take note. Now the wakeup function tracer no longer needs
to take the lock. It only compares the current CPU with a variable that
holds the current CPU the task is on. We don't worry about races since
it is OK to add or miss a function trace.
Reported-by: Jan Blunck <jblunck@suse.de>
Tested-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
rb_buffer_peek() operates with struct ring_buffer_per_cpu *cpu_buffer
only. Thus, instead of passing variables buffer and cpu it is better
to use cpu_buffer directly. This also reduces the risk of races since
cpu_buffer is not calculated twice.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1249045084-3028-1-git-send-email-robert.richter@amd.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We don't need to take the alloc_sem lock when we are adding new
groups, since mballoc won't see the new group added until we bump
sbi->s_groups_count.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
We should check for need init flag with the group's alloc_sem held, to
make sure while we are loading the buddy cache and holding a reference
to it, a file system resize can't add new blocks to same group.
The patch also drops the need init flag check in
ext4_mb_regular_allocator() because doing the check without holding
alloc_sem is racy.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This moves the function around so that it can be called from
ext4_mb_load_buddy().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* lookup-permissions-cleanup:
jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
ext[234]: move over to 'check_acl' permission model
shmfs: use 'check_acl' instead of 'permission'
Make 'check_acl()' a first-class filesystem op
Simplify exec_permission_lite(), part 3
Simplify exec_permission_lite() further
Simplify exec_permission_lite() logic
Do not call 'ima_path_check()' for each path component
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT.
Here is a small test case. (Yes, there are other, useful PT_INTERP
which have only .text and no .data/.bss.)
----- ptinterp.S
_start: .globl _start
nop
int3
-----
$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
$ ./hello
Segmentation fault # during execve() itself
After applying the patch:
$ ./hello
Trace trap # user-mode execution after execve() finishes
If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
suggested checking for PROT_WRITE in the bss logic. I think it makes
most sense to simply apply the bss logic only when there is bss.
This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.
Reported-by: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
mode: If we're not using a journal, ext4_write_inode() now calls
ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
with a new "do_sync" parameter. If that parameter is nonzero _and_ we're
not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
instead of ext4_handle_dirty_metadata().
This problem was found in power-fail testing, checking the amount of
loss of files and blocks after a power failure when using fsync() and
when not using fsync(). It turned out that using fsync() was actually
worse than not doing so, possibly because it increased the likelihood
that the inodes would remain unflushed and would therefore be lost at
the power failure.
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When there is no journal present, we must attach buffer heads
associated with extent tree and indirect blocks to the inode's
mapping->private_list via mark_buffer_dirty_inode() so that
ext4_sync_file() --- which is called to service fsync() and
fdatasync() system calls --- can write out the inode's metadata blocks
by calling sync_mapping_buffers().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The definition of MCFSIM_PADDR and MCFSIM_PADAT now has MCF_BAR already added in.
Signed-off-by: Steven King <sfking@fdwdc.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
This adds the basic infrastructure used by all of the different Coldfire CPUs.
Signed-off-by: Steven King <sfking@fdwdc.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The macro res_size in drivers/net/dm9000.c is a copy of resource_size in
linux/ioport.h. Remove the function and use resource_size instead.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory for the private data is allocated using kzalloc in
alloc_etherdev (or alloc_netdev_mq respectively) so there is no need to
set it to 0 again.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ext4 is using a journal, a metadata block which is deallocated
must be passed into the journal layer so it can be dropped from the
current transaction and/or revoked. This is done by calling the
functions ext4_journal_forget() and ext4_journal_revoke(), which call
jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.
Since the jbd2_journal_forget() and jbd2_journal_revoke() call
bforget(), if ext4 is not using a journal, ext4_journal_forget() and
ext4_journal_revoke() must call bforget() to avoid a dirty metadata
block overwriting a block after it has been reallocated and reused for
another inode's data block.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix typo in checking dest ip has support before
programming destip addresses.
Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When new child qdiscs are attached to the mq qdisc, they are actually
attached as root qdiscs to the device queues. The lock selection for
new estimators incorrectly picks the root lock of the existing and
to be replaced qdisc, which results in a use-after-free once the old
qdisc has been destroyed.
Mark mq qdisc instances with a new flag and treat qdiscs attached to
mq as children similar to regular root qdiscs.
Additionally prevent estimators from being attached to the mq qdisc
itself since it only updates its byte and packet counters during dumps.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atombios will use the mc register access helper and R4XX hw have a
bigger mc range than R3XX so add R4XX specific mc register access
helper.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we stop CP and that it's still processing thing GPU hang might
happen, this patch wait for CP idle (the wait can timeout) so we
can avoid shutting down CP at bad time. This is especialy usefull
when reseting the GPU as it seems GPU reset fails to properly reset
CP when the CP wasn't stop after being idle.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds a setxattr handler to the file, directory, and symlink
inode_operations structures for sysfs. The patch uses hooks introduced in the
previous patch to handle the getting and setting of security information for
the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
sysfs_dirent structure has been replaced by a structure which contains the
iattr, secdata and secdata length to allow the changes to persist in the event
that the inode representing the sysfs_dirent is evicted. Because sysfs only
stores this information when a change is made all the optional data is moved
into one dynamically allocated field.
This patch addresses an issue where SELinux was denying virtd access to the PCI
configuration entries in sysfs. The lack of setxattr handlers for sysfs
required that a single label be assigned to all entries in sysfs. Granting virtd
access to every entry in sysfs is not an acceptable solution so fine grained
labeling of sysfs is required such that individual entries can be labeled
appropriately.
[sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch introduces three new hooks. The inode_getsecctx hook is used to get
all relevant information from an LSM about an inode. The inode_setsecctx is
used to set both the in-core and on-disk state for the inode based on a context
derived from inode_getsecctx.The final hook inode_notifysecctx will notify the
LSM of a change for the in-core state of the inode in question. These hooks are
for use in the labeled NFS code and addresses concerns of how to set security
on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's
explanation of the reason for these hooks is pasted below.
Quote Stephen Smalley
inode_setsecctx: Change the security context of an inode. Updates the
in core security context managed by the security module and invokes the
fs code as needed (via __vfs_setxattr_noperm) to update any backing
xattrs that represent the context. Example usage: NFS server invokes
this hook to change the security context in its incore inode and on the
backing file system to a value provided by the client on a SETATTR
operation.
inode_notifysecctx: Notify the security module of what the security
context of an inode should be. Initializes the incore security context
managed by the security module for this inode. Example usage: NFS
client invokes this hook to initialize the security context in its
incore inode to the value provided by the server for the file when the
server returned the file's attributes to the client.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
This factors out the part of the vfs_setxattr function that performs the
setting of the xattr and its notification. This is needed so the SELinux
implementation of inode_setsecctx can handle the setting of the xattr while
maintaining the proper separation of layers.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
We need to have a stronger barrier between releasing the lock and
checking for any waiting spinners. A compiler barrier is not sufficient
because the CPU's ordering rules do not prevent the read xl->spinners
from happening before the unlock assignment, as they are different
memory locations.
We need to have an explicit barrier to enforce the write-read ordering
to different memory locations.
Because of it, I can't bring up > 4 HVM guests on one SMP machine.
[ Code and commit comments expanded -J ]
[ Impact: avoid deadlock when using Xen PV spinlocks ]
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Where possible we enable interrupts while waiting for a spinlock to
become free, in order to reduce big latency spikes in interrupt handling.
However, at present if we manage to pick up the spinlock just before
blocking, we'll end up holding the lock with interrupts enabled for a
while. This will cause a deadlock if we recieve an interrupt in that
window, and the interrupt handler tries to take the lock too.
Solve this by shrinking the interrupt-enabled region to just around the
blocking call.
[ Impact: avoid race/deadlock when using Xen PV spinlocks ]
Reported-by: "Yang, Xiaowei" <xiaowei.yang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-fstack-protector uses a special per-cpu "stack canary" value.
gcc generates special code in each function to test the canary to make
sure that the function's stack hasn't been overrun.
On x86-64, this is simply an offset of %gs, which is the usual per-cpu
base segment register, so setting it up simply requires loading %gs's
base as normal.
On i386, the stack protector segment is %gs (rather than the usual kernel
percpu %fs segment register). This requires setting up the full kernel
GDT and then loading %gs accordingly. We also need to make sure %gs is
initialized when bringing up secondary cpus too.
To keep things consistent, we do the full GDT/segment register setup on
both architectures.
Because we need to avoid -fstack-protected code before setting up the GDT
and because there's no way to disable it on a per-function basis, several
files need to have stack-protector inhibited.
[ Impact: allow Xen booting with stack-protector enabled ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
We added the ENOSPC handling patch in xfs_create just after it got mered
with xfs_mkdir. Change the log reservation to the variable for either
the create or mkdir value so it does the right thing if get here for creating
a directory.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>