The intention of commit aae8679b0e
("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
/proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
multiple of 8 bytes, but now it allows to read 0 bytes, which actually
puts some data to user's buffer. According to POSIX, if count is zero,
read() should return zero and has no other results.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Thomas Tuttle <ttuttle@google.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty. This allows the filesystem to have
full control of page dirtying events coming from the VM.
Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.
The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite. It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).
In this window, the VM could write out the page, clearing page-dirty. The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.
It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail. The
filesystem may need to allocate some data structure, for example.
And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.
This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window. This provides the filesystem with a strong
synchronisation against the VM here.
- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
under dirty pages might be susceptible to i_size changing under partial page
at the end of file (we also have a buffer.c-side problem here, but it cannot
be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
page_mkwrite functions themselves.
[ This also moves page_mkwrite another step closer to fault, which should
eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
filesystem calldown and page lock/unlock cycle in __do_fault. ]
[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil <sage@newdream.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a linux-next allyesconfig build:
kernel/trace/ring_buffer.c:1726:
warning: passing argument 1 of 'atomic_cmpxchg' from incompatible pointer type
linux-next/arch/s390/include/asm/atomic.h:112:
note: expected 'struct atomic_t *' but argument is of type 'struct atomic64_t *'
atomic_long_cmpxchg and atomic_long_xchg are incorrectly defined for 64
bit architectures. They should be mapped to the atomic64_* variants.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/serial/crisv10.c:4428: error: unknown field 'read_proc' specified in initializer
Commit 0f043a81eb ("proc tty: remove struct
tty_operations::read_proc") removes the read_proc entry from struct
tty_operations.
Rework the proc handling in the CRISv10 serial driver to use proc_fops
instead.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing it in reverse order causes uevent to be sent before
we have a MAC address, which confuses udev.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0ba25ff4c6 ("br2684: convert to
net_device_ops") inadvertently deleted the initialization of the net_dev
pointer in the br2684_dev structure, leading to crashes. This patch
adds it back.
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel should only be using the high 16 bits of a kernel
generated priority. Filter priorities in all other cases only
use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto',
but when the kernel generates the priority of a filter is saves all
32 bits which can result in incorrect lookup failures when a filter
needs to be deleted or modified.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acpi_rs_get_pci_routing_table_length is not performing sufficient
validation on the package returned from _PRT. It assumes a package of
packages and fails/faults if this is not the case.
We should validate each subpackage when extracted from the parent
package, and not accept objects of the wrong type, since that will just
cause the scanning to fail (likely with a kernel oops).
This can only happen with a serious BIOS bug, and is accompanied by a
warning something like this:
ACPI Warning (nspredef-0949): \_SB_.PCI0.PEG4._PRT: Return Package type mismatch at index 0 - found Integer, expected Package [20090320]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace #x with __stringify(x).
Also, #ifndef __STR is removed and undefine __STR macro at the beginning.
The __STR() macro is still remained, because the assembler.h might be
included from assembly codes as well as C codes.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
This patch fixes the following build error of 2.6.30-rc3-git2:
AS arch/m32r/kernel/head.o
In file included from /include/linux/init.h:7,
from /arch/m32r/kernel/head.S:11:
/include/linux/stringify.h:9: error: syntax error in macro parameter list
/include/linux/stringify.h:10: error: syntax error in macro parameter list
This build error was caused at __HEAD macro in arch/m32r/kernel/head.S,
which uses __stringify() macro.
Remove -traditional option from EXTRA_AFLAGS for the m32r,
because the __stringify() macro depends on the gcc's variadic macro
extension function, due to commit:
Make __stringify support variable argument macros too
commit: 8f7c2c3731
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
The defines for TDM and synchronous clocks are not used - they are
mostly a legacy of the automatic clocking configuration. TDM will
require configuration of the number of timeslots and which ones to use
so can't be fit into the DAI format and synchronous mode is handled by
symmetric_rates (and needs to be done by constraints rather than when
the DAI format is being configured).
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
The AC97 wire format is completely fixed so CODECs don't have any choice
about the formats they accept but controllers accept a variety of data
formats and render them down onto the bus. Have a shared define so all
the CODEC drivers will interoperate with any of our controller drivers.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Export the following symbols using EXPORT_SYMBOL_GPL:
- clockevent_delta2ns
- clockevents_register_device
This allows us to build SuperH clockevent and clocksource
drivers as modules, see drivers/clocksource/sh_*.c
[ Impact: allow modular build of clockevent drivers ]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090501055247.8286.64067.sendpatchset@rx1.opensource.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some arches don't supply their own clocksource. This is mainly the
case in architectures that get their inter-tick times by reading the
counter on their interval timer. Since these timers wrap every tick,
they're not really useful as clocksources. Wrapping them to act like
one is possible but not very efficient. So we provide a callout these
arches can implement for use with the jiffies clocksource to provide
finer then tick granular time.
[ Impact: ease the migration to generic time keeping ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Setup clocksource mult_orig in clocksource_enable().
Clocksource drivers can save power by using keeping the
device clock disabled while the clocksource is unused.
In practice this means that the enable() and disable()
callbacks perform clk_enable() and clk_disable().
The enable() callback may also use clk_get_rate() to get
the clock rate from the clock framework. This information
can then be used to calculate the shift and mult variables.
Currently the mult_orig variable is setup from mult at
registration time only. This is conflicting with the above
case since the clock is disabled and the mult variable is
not yet calculated at the time of registration.
Moving the mult_orig setup code to clocksource_enable()
allows us to both handle the common case with no enable()
callback and the mult-changed-after-enable() case.
[ Impact: allow dynamic clock source usage ]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090501054546.8193.10688.sendpatchset@rx1.opensource.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the current kernel implementation only kernel timers for time interval
tv1 are being deferred. This patch allows any timer that is configured as
deferrable to be defer regardless of time interval.
This patch was previously discussed in
http://marc.info/?l=linux-kernel&m=123196343531966&w=2 and was acked by
Venki Pallipadi, the author of the original deferrable timer patch.
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The variable tick_broadcast_device is not used outside of the
file where it is defined, so let's make it static.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
tick_handle_periodic() can lock up hard when a one shot clock event
device is used in combination with jiffies clocksource.
Avoid an endless loop issue by requiring that a highres valid
clocksource be installed before we call tick_periodic() in a loop when
using ONESHOT mode. The result is we will only increment jiffies once
per interrupt until a continuous hardware clocksource is available.
Without this, we can run into a endless loop, where each cycle through
the loop, jiffies is updated which increments time by tick_period or
more (due to clock steering), which can cause the event programming to
think the next event was before the newly incremented time and fail
causing tick_periodic() to be called again and the whole process loops
forever.
[ Impact: prevent hard lock up ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
We were avoiding calling sg_init* on scatterlists passed
into virtnet_send_command to prevent extraneous end markers.
This caused build warnings for uninitialized variables.
Cleanup the code to create proper scatterlists.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carl Henrik Lunde reported and debugged this; the test for the
last allocated block was comparing bytes to blocks in this test:
if (logical + length - 1 == EXT_MAX_BLOCK ||
ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK)
flags |= FIEMAP_EXTENT_LAST;
so any extent which ended right at 4G was stopping the extent
walk. Just replacing these values with the extent block &
length should fix it.
Also give blksize_bits a saner type, and reverse the order
of the tests to make the more likely case tested first.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The fiemap and get_blk_size ioctls should be enabled even for
directories. So move it outisde file_ioctl.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
3c589_cs,3c574_cs,serial_cs:
(1)add cis(firmware) of 3Com lan&modem mulitifunction pcmcia card.
(2)load correct configuration register for 3Com card
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
In memory-constrained systems with many partitions, the ~68K for each
partition for the mb_history buffer can be excessive.
This patch adds a new mount option, mb_history_length, as well as a
way of setting the default via a module parameter (or via a sysfs
parameter in /sys/module/ext4/parameter/default_mb_history_length).
If the mb_history_length is set to zero, the mb_history facility is
disabled entirely.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move this out of a local variable into the nfs4_delegation object in
preparation for making this an async rpc call (at which point we'll need
any state like this in a common object that's preserved across function
calls).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's no point in keeping this field around--it's always zero.
(Background: the protocol allows you to tell the client that the file is
about to be truncated, as an optimization to save the client from
writing back dirty pages that will just be discarded. We don't
implement this hint. If we do some day, adding this field back in will
be the least of the work involved.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().
But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch makes the cleanup in bond_create nicer :) Also now the forgotten
free_netdev is called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_net.h uses the macro ETH_ALEN which is defined in linux/if_ether.h.
Discovered when hacking on virtio-over-pci patches.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds new logic to support a clock gating feature found on the
latest set of chipsets. The clock gating is performed on the tx/rx
engines when the link is disconnected. Clock gating helps in reducing
power consumption.
* modified based on comments from netdev
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LAN9512 and LAN9514 are USB hubs with an integrated 10/100 ethernet
controller. Logically this looks like an ethernet controller (similar
to LAN9500) permanently attached to one of the hub's downstream ports.
This patch adds the usb device id of the new ethernet controller to the
smsc95xx driver. This id is the same in both new devices.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMSC LAN9500 has dual purpose GPIO/LED pins, and by default at power-on
these are configured as GPIOs. This means that if LEDs are fitted they
won't ever light.
This patch sets them to be LED outputs for speed, duplex and
link/activity.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netconsole is loaded and a network interface fades away (e.g. on
rmmod $interface_driver_module) the rmmod remains stuck and some locks
are taken that prevent any additional module loading/unloading as well
as interface up/down changes.
In addition kernel logs (and console) get flooded at 10s interval with
[ 122.464065] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
[ 132.704059] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
This patch lets netconsole take NETDEV_UNREGISTER event into account
and release the affected interface if it was in use.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xt_socket can use connection tracking, and checks whether it is a module.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the device is not claimed by hid-input (i.e devices driver by userspace
hiddev/hidraw-based drivers, or completely detached from HID
and driver by libusb), we must not check the hid->inptus, as it
is not guaranteed to be initialized, as this is performed only for devices
handled by hid-input.
Reported-by: Guillaume Chazarain <guichaz@gmail.com>
Tested-by: Guillaume Chazarain <guichaz@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>