Commit Graph

506942 Commits

Author SHA1 Message Date
Keith Busch
695a4fe79f NVMe: Fix SG_IO status values
We've only been setting the sg_io_hdr status values on SCSI commands
that require an nvme command to complete the translation. The fields
in the struct are output parameters, so we have to set them, otherwise
user space will see whatever was in memory from before. In the case of
compat SG_IO, this would reveal kernel memory. This fixes the issue by
initializing the sg_io_hdr with successful status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:09 -07:00
Keith Busch
e179729a82 NVMe: Remove duplicate compat SG_IO code
We can return -ENOIOCTLCMD and the ioctl will be handled by
fs/compat_ioctl.c instead. This removes a lot of duplicate code in the
nvme driver.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:09 -07:00
Keith Busch
a96d4f5c2d NVMe: Reference count pci device
If an nvme device is removed but user space has an open reference,
the nvme driver would have been holding an invalid reference to its pci
device. You may get a general protection fault on x86 h/w when the driver
uses that reference in dma_map_sg(), as is done in nvme_map_user_pages()
from the IOCTL interface.

This patch fixes the fault by taking a reference on the pci device and
holding it even after device removal until all opens on the nvme device
are closed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reported-by: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:09 -07:00
Andreea-Cristina Bernat
062261be4e nvme: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.

The following Coccinelle semantic patch was used:
@@
@@

- rcu_assign_pointer
+ RCU_INIT_POINTER
  (..., NULL)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Sam Bradshaw
5905535610 NVMe: Correctly handle IOCTL_SUBMIT_IO when cpus > online queues
nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index.
This patch fixes the case where there are more cpus from which the ioctl
call can originate than online queues, which can happen when a device
supports or was allocated fewer interrupt vectors than exist cpu cores.

Thanks to Keith Busch for the implementation suggestion.

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Keith Busch
302c6727e5 NVMe: Fix filesystem sync deadlock on removal
This changes the order of deleting the gendisks so it happens after the
nvme IO queues are freed. If a device is removed while a filesystem has
associated dirty data, the removal will wait on these to complete before
proceeding from del_gendisk, which could have caused deadlock before.

The implication of this is that an orderly removal of a responsive
device won't necessarily wait for dirty data to be written, but we are
not guaranteed the device is even going to respond at this point either.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Keith Busch
f435c2825b NVMe: Call nvme_free_queue directly
Rather than relying on call_rcu, this patch directly frees the
nvme_queue's memory after ensuring no readers exist. Some arch specific
dma_free_coherent implementations may not be called from a call_rcu's
soft interrupt context, hence the change.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reported-by: Matthew Minter <matthew_minter@xyratex.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Dan McLeran
2484f40780 NVMe: Add shutdown timeout as module parameter.
The current implementation hard-codes the shutdown timeout to 2 seconds.
Some devices take longer than this to complete a normal shutdown.
Changing the shutdown timeout to a module parameter with a default
timeout of 5 seconds.

Signed-off-by: Dan McLeran <daniel.mcleran@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Keith Busch
7c1b245038 NVMe: Skip orderly shutdown on failed devices
Rather than skipping shutdown only for devices that have been removed,
skip the orderly shutdown on failed devices to avoid the long timeout
handling that inevitably happens when deleting queues on such a device.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Keith Busch
a67394790a NVMe: Whitespace fixes
Fixing tabs inadvertently converted to spaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:08 -07:00
Keith Busch
c81f49758a NVMe: Use pci_stop_and_remove_bus_device_locked()
Race conditions are theoretically possible between the NVMe PCI device
removal and the generic PCI bus rescan and device removal that can be
triggered via sysfs.

To avoid those race conditions make the NVMe code use
pci_stop_and_remove_bus_device_locked().

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Keith Busch
badc34d415 NVMe: Handling devices incapable of I/O
This is a minor refactor for handling devices that are incapable of IO.
The driver previously used special error codes to know that IO queues
are unavailable, but we have an online queue count now.

This also fixes an issue where the driver successfully sets the queue
count, but either is unable to allocate an IO queue or the device can't
create one for some reason.

If the driver can successfully enable the device and get responses to
admin commands, the driver will bring up a character device for managment
but not create block devices.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Dan McLeran
01079522f9 NVMe: Change nvme_enable_ctrl to set EN and manage CC thru ctrl_config.
Change the behavior of nvme_enable_ctrl to set EN.
Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl.
Remove reading of the CC register and manage the state in
dev->ctrl_config.

Signed-off-by: Dan McLeran <daniel.mcleran@intel.com>
[removed an unwanted write to CC]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Keith Busch
1d09062460 NVMe: Mismatched host/device page size support
Adds support for devices with max page size smaller than the host's.
In the case we encounter such a host/device combination, the driver will
split a page into as many PRP entries as necessary for the device's page
size capabilities. If the device's reported minimum page size is greater
than the host's, the driver will not attempt to enable the device and
return an error instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Matthew Wilcox
a7dd7957ac NVMe: Update list of status codes
Taken from the draft NVMe 1.1b specification.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Keith Busch
6fccf9383b NVMe: Async event request
Submits NVMe asynchronous event requests, one event up to the controller
maximum or number of possible different event types (8), whichever is
smaller. Events successfully returned by the controller are logged.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Joe Perches
4d51abf9bc block: Use dma_zalloc_coherent
Use the zeroing function instead of dma_alloc_coherent & memset(,0,)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 13:17:07 -07:00
Fabian Frederick
436f7c2068 igmp: remove camel case definitions
use standard uppercase for definitions

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
c18450a52a udp: remove else after return
else is unnecessary after return 0 in __udp4_lib_rcv()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
aa1f731e52 inet: frags: remove inline on static in c file
remove __inline__ / inline and let compiler decide what to do
with static functions

Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
0d3979b9c7 ipv4: remove 0/NULL assignment on static
static values are automatically initialized to 0

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
c9f503b006 ipv4: use seq_puts instead of seq_printf where possible
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
b92022f3e5 tcp: spelling s/plugable/pluggable
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
988b13438c cipso: remove NULL assignment on static
Also add blank line after structure declarations

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
4c787b1626 ipv4: include linux/bug.h instead of asm/bug.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:20 -05:00
Fabian Frederick
4973404f81 cipso: kerneldoc warning fix
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:20 -05:00
Dylan Reid
defcd98b16 ASoC: max98090: Different comp tables for different pclks
In addtion expand the table to handle other values of sysclk.  Instead
of making the table 3D, expand it to a more descriptive struct.  The
divisors are specified in Table 19 of the 98090 data sheet version
0p94.

The dmic frequency was previously assumed.  Instead make it explicit
and configurable through device tree.  This now handles independently
set pclk and dmic frequency.

Based on downstream work by Ralph Birt.

Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-11-04 19:59:21 +00:00
Mark Brown
33ebcd9b45 Merge branch 'fix/max98090' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-max98090 2014-11-04 19:58:58 +00:00
Dylan Reid
ece509c109 ASoC: max98090: Correct pclk divisor settings
The Baytrail-based chromebooks have a 20MHz mclk, the code was setting
the divisor incorrectly in this case.  According to the 98090
datasheet, the divisor should be set to DIV1 for 10 <= mclk <= 20.
Correct this and the surrounding clock ranges as well to match the
datasheet.

Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-11-04 19:58:02 +00:00
Linus Torvalds
a1cff6e25e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
 "Specifics:
   - a few code fixes improving the Exynos code base.  They remove dead
     and unreachable code.  No functional changes here
   - in Exynos code base, fixes regarding the right usage of features
     (TRIMINFO and TRIMRELOAD)
   - documentation of RCAR thermal
   - fix in the of-thermal, regarding the proper usage of of-APIs
   - fixes on thermal-core, removal of unreachable code"

[ Eduardo is sending the thermal fixes on behalf of Rui Zhang this time.
  Rui is currently unable to send pull requests due to troubles with his
  machine and he's currently in a business trip ]

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  Thermal:Remove usless if(!result) before return tz
  thermal: exynos: fix IRQ clearing on TMU initialization
  thermal: fix multiple disbalanced device node counters
  thermal: rcar: Add binding docs for new R-Car Gen2 SoCs
  thermal: exynos: Add support for TRIM_RELOAD feature at Exynos3250
  thermal: exynos: Add support for many TRIMINFO_CTRL registers
  thermal: samsung: Exynos5260 and Exynos5420 should not use TRIM_RELOAD flag
  thermal: exynos: remove identical values from exynos*_tmu_registers structures
  thermal: exynos: remove redundant pdata checks from exynos_tmu_control()
  thermal: exynos: cache non_hw_trigger_levels in pdata
  thermal: exynos: simplify temp_to_code() and code_to_temp()
  thermal: exynos: remove redundant threshold_code checks from exynos_tmu_initialize()
  thermal: exynos: remove redundant pdata checks from exynos_tmu_initialize()
  thermal: exynos: remove dead code for HW_MODE calibration
  thermal: exynos: remove unused struct exynos_tmu_registers entries
2014-11-04 11:57:27 -08:00
Torsten Fleischer
d1d8180252 spi: spi-gpio: Add dt support for a single device with no chip select
In order to describe a single slave device that has no chip select line
the 'num-chipselects' property has to be <0> and the 'cs-gpios' property
doesn't need to be set.

Signed-off-by: Torsten Fleischer <torfl6749@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-11-04 19:57:25 +00:00
Linus Torvalds
9319bc1ce0 Merge tag 'platform-drivers-x86-v3.18-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform drievr updates from Darren Hart:
 "A short list of patches applying quirks and new DMI matches.  These
  pass my basic build tests and have spent 4 days in linux-next"

* tag 'platform-drivers-x86-v3.18-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  quirk for Lenovo Yoga 3: no rfkill switch
  acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
  samsung-laptop: Add broken-acpi-video quirk for NC210/NC110
  asus-nb-wmi: Add wapf4 quirk for the X550VB
  toshiba_acpi: Add Toshiba TECRA A50-A to the alt keymap dmi list
2014-11-04 11:52:45 -08:00
Jan Beulich
97b67ae559 x86-64: Use RIP-relative addressing for most per-CPU accesses
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).

With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:43:14 +01:00
Jan Beulich
6d24c5f72d x86-64: Handle PC-relative relocations on per-CPU data
This is in preparation of using RIP-relative addressing in many of the
per-CPU accesses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A15A0200007800044A9A@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:43:14 +01:00
Linus Torvalds
8a97577a59 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes if you please"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc: use device_online/offline() instead of cpu_up/down()
  powerpc/powernv: Properly fix LPC debugfs endianness
  powerpc: do_notify_resume can be called with bad thread_info flags argument
  powerpc/fadump: Fix endianess issues in firmware assisted dump handling
  powerpc: Fix section mismatch warning
2014-11-04 11:18:29 -08:00
Jan Beulich
2c773dd31f x86: Convert a few more per-CPU items to read-mostly ones
Both this_cpu_off and cpu_info aren't getting modified post boot, yet
are being accessed on enough code paths that grouping them with other
frequently read items seems desirable. For cpu_info this at the same
time implies removing the cache line alignment (which afaict became
pointless when it got converted to per-CPU data years ago).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:13:28 +01:00
Linus Torvalds
1efa82ecb6 Merge tag 'ftracetest-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftracetest fix from Steven Rostedt:
 "Running the ftracetests on a machine that had the debugfs file system
  mounted in two locations caused the ftracetests to fail.  This is
  because the ftracetests script does a grep of the /proc/mounts file to
  find where the debugfs file system is mounted.  If it is mounted
  twice, then the grep returns two lines instead of just one.  This
  causes the ftracetests to get confused and fail.

  Use "head -1" to only return the first mount point for debugfs"

* tag 'ftracetest-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftracetest: Take the first debugfs mount found
2014-11-04 11:12:25 -08:00
Murali Karicheri
97ef1af9b2 ARM: keystone: defconfig: add options to enable PCI controller
This patch enables PCI controller driver for Keystone SoCs by
default.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:29:39 -08:00
Murali Karicheri
443fcf6316 ARM: keystone: add pcie related options
Now that Keystone PCI controller is merged, add pcie related options
by default for keystone architecture so that driver can be enabled in
the build.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:29:39 -08:00
Grygorii Strashko
4d5702b5f9 ARM: keystone_defconfig: enable mdio and marvell eth phys
Enable MDIO support for Keystone 2 SoCs and also
enable Marvell Ethernet PHYs support for Keystone 2 K2H EVM
which has two 1G Marvell 88E1111-B2 PHYs installed.

For more information see:
- http://www.advantech.com/Support/TI-EVM/EVMK2HX.aspx

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:28:40 -08:00
Grygorii Strashko
930399bded ARM: keystone_defconfig: enable dsp irq and gpio support
Enable DSP IRQ controller and GPIOs support for Keystone 2.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:28:40 -08:00
Murali Karicheri
fc89a57600 ARM: dts: keystone-k2e: add DT bindings for PCI controller for port 1
K2E SoC has a second PCI port based on Synopsis Designware PCIe h/w.
Add DT bindings to support PCI controller for port 1 for this SoC.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:27:21 -08:00
Murali Karicheri
bed80507e1 ARM: dts: keystone: add DT bindings for PCI controller for port 0
Add common DT bindings to support PCI controller driver for port 0 on all
of the K2 SoCs that has Synopsis Designware based pcie h/w.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:27:21 -08:00
Grygorii Strashko
469fddd81c ARM: dts: k2l-evm: add 1g ethernet phys nodes
Keystone K2L-EVM has two 1G Marvell 88E1514 Ethernet PHYs
installed, which are compatible with 88E1510.
Hence, add corresponding child nodes for 1G MDIO bus.

For more information see:
  https://www.einfochips.com/index.php/partnerships/texas-instruments/k2l-evm.html

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:24:50 -08:00
Grygorii Strashko
efa66b3074 ARM: dts: k2e-evm: add 1g ethernet phys nodes
Keystone K2E-EVM has two 1G Marvell 88E1514 Ethernet PHYs
installed, which are compatible with 88E1510.
Hence, add corresponding child nodes for 1G MDIO bus.

For more information see:
  https://www.einfochips.com/index.php/partnerships/texas-instruments/k2e-evm.html

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-11-04 10:24:49 -08:00
Joe Thornber
c822ed967c dm thin: grab a virtual cell before looking up the mapping
Avoids normal IO racing with discard.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-11-04 13:05:53 -05:00
Antoine Tenart
9a23c1d6f0 ahci: fix AHCI parameters not taken into account
Changes into the AHCI subsystem have introduced a bug by not taking into
account the force_port_map and mask_port_map parameters when using the
ahci_pci_save_initial_config function. This commit fixes it by setting
the internal parameters of the ahci_port_priv structure.

Fixes: 725c7b570f

Reported-and-tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
2014-11-04 12:56:25 -05:00
Aravind Gopalakrishnan
bc4febe93c EDAC, MCE, AMD: Add decoding table for MC6 xec
Extended error code meanings are tabulated for other banks. Extend that
tradition for MC6 too.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-04 18:49:20 +01:00
Tejun Heo
9c6ac78eb3 writeback: fix a subtle race condition in I_DIRTY clearing
After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
tests inode->i_state locklessly to see whether it already has all the
necessary I_DIRTY bits set.  The comment above the barrier doesn't
contain any useful information - memory barriers can't ensure "changes
are seen by all cpus" by itself.

And it sure enough was broken.  Please consider the following
scenario.

 CPU 0					CPU 1
 -------------------------------------------------------------------------------

					enters __writeback_single_inode()
					grabs inode->i_lock
					tests PAGECACHE_TAG_DIRTY which is clear
 enters __set_page_dirty()
 grabs mapping->tree_lock
 sets PAGECACHE_TAG_DIRTY
 releases mapping->tree_lock
 leaves __set_page_dirty()

 enters __mark_inode_dirty()
 smp_mb()
 sees I_DIRTY_PAGES set
 leaves __mark_inode_dirty()
					clears I_DIRTY_PAGES
					releases inode->i_lock

Now @inode has dirty pages w/ I_DIRTY_PAGES clear.  This doesn't seem
to lead to an immediately critical problem because requeue_inode()
later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
deciding whether the inode needs to be requeued for IO and there are
enough unintentional memory barriers inbetween, so while the inode
ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
IO list.

The lack of explicit barrier may also theoretically affect the other
I_DIRTY bits which deal with metadata dirtiness.  There is no
guarantee that a strong enough barrier exists between
I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
inode.  Filesystem inode writeout path likely has enough stuff which
can behave as full barrier but it's theoretically possible that the
writeout may not see all the updates from ->dirty_inode().

Fix it by adding an explicit smp_mb() after I_DIRTY clearing.  Note
that I_DIRTY_PAGES needs a special treatment as it always needs to be
cleared to be interlocked with the lockless test on
__mark_inode_dirty() side.  It's cleared unconditionally and
reinstated after smp_mb() if the mapping still has dirty pages.

Also add comments explaining how and why the barriers are paired.

Lightly tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-04 10:42:23 -07:00
Daniel J Blueman
bdee237c03 x86: mm: Use 2GB memory block size on large-memory x86-64 systems
On large-memory x86-64 systems of 64GB or more with memory hot-plug
enabled, use a 2GB memory block size. Eg with 64GB memory, this reduces
the number of directories in /sys/devices/system/memory from 512 to 32,
making it more manageable, and reducing the creation time accordingly.

This caveat is that the memory can't be offlined (for hotplug or
otherwise) with the finer default 128MB granularity, but this is
unimportant due to the high memory densities generally used with such
large-memory systems, where eg a single DIMM is the order of 16GB.

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-4-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 18:19:27 +01:00