linux

Author	SHA1	Message	Date
Alexander Duyck	9f801abc3d	fm10k: Add support for IEEE DCBx This patch adds support for management of the limited QOS features of the FM10000 interface. Specifically we can support up to 8 traffic classes, however the part only provides 1 Rx and 1 Tx FIFO in the host interface and as a result this can lead to head-of-line blocking on Rx. This can be avoided by setting PFC only for priorities that cannot afford to drop frames. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:21 -07:00
Alexander Duyck	883a9ccbae	fm10k: Add support for SR-IOV to driver This patch combines the recently added VF messaging and configuration functionality with the interfaces provided by the kernel to allow for configuration and management of SR-IOV. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:21 -07:00
Alexander Duyck	c265386553	fm10k: Add support for SR-IOV to PF core files This change adds a set of functions to fm10k_pf.c which allows for configuring the VF via a set of standardized TLV messages. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:21 -07:00
Alexander Duyck	5cb8db4a4c	fm10k: Add support for VF This patch provides the functions necessary to configure the VF making use of the same API pointers as the PF. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:20 -07:00
Alexander Duyck	b651957c20	fm10k: Add support for PF <-> VF mailbox This patch adds support for the PF <-> VF mailbox. It functions similar to the PF <-> SM mailbox however there are several modifications made to improve the reliability of the mailbox itself. In addition the PF/VF mailbox is much smaller an only supports a total size of 16 DWORDs vs the 1024 DWORDS provided for the PF/SM mailbox. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:20 -07:00
Alexander Duyck	5cd5e2e982	fm10k: Add support for MACVLAN acceleration This patch adds support for L2 MACVLAN by making use of the fact that the RRC provides a unique tag per filter called a Global Resource Tag, or GLORT. In the case of this offload what I have done is assigned a linear block of these so that each GLORT represents one of the MACVLAN netdevs. By doing this I can share the Rx queues and Tx queues for all of the MACVLAN netdevs while allowing them to be demuxed in the Rx cleanup path. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:20 -07:00
Alexander Duyck	76a540d472	fm10k: Add support for netdev offloads This patch adds support for basic offloads including TSO, Tx checksum, Rx checksum, Rx hash, and the same features applied to VXLAN/NVGRE tunnels. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:19 -07:00
Alexander Duyck	aa3ac82268	fm10k: Add support for multiple queues This patch takes the driver from supporting a single queue to supporting multiple queues. The upper queue limit for the PF is 128 queues and the upper limit for the VF is (128 / num_vfs) rounded down to nearest power of 2. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:19 -07:00
Alexander Duyck	19ae1b3fb9	fm10k: Add support for PCI power management and error handling Add PCI power management and error handling to allow the device to support suspend/resume and recovery of any PCIe errors. The fm10k devices do not support wake on LAN, and there is no plan to add this as a feature. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:19 -07:00
Alexander Duyck	82dd0f7ee9	fm10k: Add ethtool support This patch adds basic ethtool support to the device to allow for configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:18 -07:00
Alexander Duyck	b101c96264	fm10k: Add transmit and receive fastpath and interrupt handlers This change adds the transmit and receive fastpath and interrupt handlers. With this code in place the network device is now able to send and receive frames over the network interface using a single queue. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> CC: Rick Jones <rick.jones2@hp.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:18 -07:00
Alexander Duyck	3abaae42e1	fm10k: Add Tx/Rx hardware ring bring-up/tear-down This patch adds support for allocating, configuring, and freeing Tx/Rx ring resources. With these changes in place the descriptor queues are in a state where they are ready to transmit or receive if provided buffers. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:18 -07:00
Alexander Duyck	b7d8514c23	fm10k: Add service task to handle delayed events This patch adds support for the service task. The service task takes care of all processes that cannot be done in interrupt context such as resets, stats updates, TC prio updates, and checking for hung or detached devices. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:17 -07:00
Alexander Duyck	e27ef599ab	fm10k: add support for Tx/Rx rings This change adds the defines and structures necessary to support both Tx and Rx descriptor rings. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:17 -07:00
Alexander Duyck	18283cad0a	fm10k: Add interrupt support This patch set adds interrupt support for the fm10k interfaces. The interfaces themselves only support MSI-X, so neither MSI or legacy interrupts are used. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:16 -07:00
Alexander Duyck	504c5eac1d	fm10k: Add support for ndo_open/stop Add support for brining the interface up/down. This is still primitive yet as we have not yet added support for the descriptor queues. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:16 -07:00
Alexander Duyck	8f5e20d45c	fm10k: Add support for L2 filtering This patch adds support for L2 filtering. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:16 -07:00
Alexander Duyck	0e7b364408	fm10k: Add netdev Now that we have the ability to configure the basic settings on the device we can start allocating and configuring a netdev for the interface. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:15 -07:00
Alexander Duyck	401b5383c6	fm10k: Add support for configuring PF interface This patch adds support for the operations which will configure filters on the interface. In addition with these patches we begin to introduce the PF messages that will be sent to or received from the Switch Management entity. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:15 -07:00
Alexander Duyck	b6fec18fd1	fm10k: Add support for PF This patch adds basic support for the PF. With this it is possible to bring up the interface, but without being able to configure any of the filters on the interface itself. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:15 -07:00
Alexander Duyck	1337e6b977	fm10k: Implement PF <-> SM mailbox operations This patch adds support for the mailbox that connects the PF to the Switch Management entity. This mailbox will pass TLV formatted messages between the two entities by using a pair of shared ring buffers. The primary use of the mailbox is to configure L2 forwarding addresses, VLANs, and general resource allocation from the switch. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:14 -07:00
Alexander Duyck	6b1f201f1a	fm10k: Add support for mailbox This patch adds generic mailbox support. The general idea of the mailboxes is to use a pair of ring buffers, one for request, one for response to send data between the local driver and some remote entity be it the PF of the Switch Manager. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:14 -07:00
Alexander Duyck	04a5aefbfb	fm10k: Add support for basic interaction with hardware This patch adds the basic read/write operations for accessing the hardware. In addition to read read functionality the read functions also provide surprise remove detection in the event that the device either loses power or is removed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:14 -07:00
Alexander Duyck	ae17db0ee5	fm10k: Add support for TLV message parsing and generation This patch adds support for the TVL message formats supported by the PF, VF, and Switch Management entity. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:13 -07:00
Alexander Duyck	6d2ce9001b	fm10k: Add register defines and basic structures This patch adds the basic defines and structures needed by the PF for operation. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:13 -07:00
Alexander Duyck	b3890e3074	fm10k: Add skeletal frame for Intel(R) FM10000 Ethernet Switch Host Interface Driver This patch adds the beginning framework onto which I am going to add the fm10k driver which supports the Intel(R) FM10000 Ethernet Switch Host Interface. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-09-23 03:59:13 -07:00
Andy Gross	a64efe15cf	dmaengine: qcom_adm: Add device tree binding Add device tree binding support for the QCOM ADM DMA driver. Signed-off-by: Andy Gross <agross@codeaurora.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2014-09-23 16:03:48 +05:30
Pramod Gurav	849a8c25c8	pinctrl: lantiq: Release gpiochip resources in fail case This patch releases gpiochip resources with of_gpiochip_remove and gpiochip_remove in failure cases. CC: John Crispin <blogic@openwrt.org> CC: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Pramod Gurav <pramod.gurav@smartplayin.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2014-09-23 12:23:09 +02:00
Stefan Agner	3dac1918a4	pinctrl: imx: detect uninitialized pins The pinctrl driver initialized the register offsets for the pins with 0. On Vybrid an offset of 0 is a valid offset for the pinctrl mux register. So far, this was solved using the ZERO_OFFSET_VALID flag which allowed offsets of 0. However, this does not allow to verify whether a pins struct imx_pmx_func was initialized or not. Use signed offset values for register offsets and initialize those with -1 in order to detect uninitialized offset values reliable. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2014-09-23 11:45:47 +02:00
Bernhard Thaler	48e68ff5e5	Bluetooth: Check for SCO type before setting retransmission effort SCO connection cannot be setup to devices that do not support retransmission. Patch based on http://permalink.gmane.org/gmane.linux.bluez.kernel/7779 and adapted for this kernel version. Code changed to check SCO/eSCO type before setting retransmission effort and max. latency. The purpose of the patch is to support older devices not capable of eSCO. Tested on Blackberry 655+ headset which does not support retransmission. Credits go to Alexander Sommerhuber. Signed-off-by: Bernhard Thaler <bernhard.thaler@r-it.at> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-09-23 11:30:04 +02:00
David Henningsson	a5062dee82	ALSA: hda - add explicit include of err.h Since every caller of snd_hda_jack_detect_enable_callback needs to use the macros from err.h, it makes sense to include it directly from hda_jack.h. Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2014-09-23 09:42:17 +02:00
Stephen Boyd	48d11e067f	mmc: Consolidate emmc tuning blocks The same tuning block exists in the dw_mmc h.c and sdhci-msm.c files. Move these into mmc.c so that they can be shared across drivers. Reported-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2014-09-23 09:13:19 +02:00
Stephen Boyd	ffed1b94cb	mmc: sdhci-msm: Make tuning block table endian agnostic If we're tuning on a big-endian CPU we'll never determine we properly tuned the device because we compare the data we received from the controller with a table that assumes the CPU is little-endian. Change the table to be an array of bytes instead of 32-bit words so we can use memcmp() without needing to byte-swap every word depending on the endianess of the CPU. Cc: Asutosh Das <asutoshd@codeaurora.org> Cc: Venkat Gopalakrishnan <venkatg@codeaurora.org> Reviewed-by: Georgi Djakov <gdjakov@mm-sol.com> Fixes: `415b5a75da` "mmc: sdhci-msm: Add platform_execute_tuning implementation" Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2014-09-23 09:05:28 +02:00
Stephen Warren	d4d1144908	mmc: don't request CD IRQ until mmc_start_host() As soon as the CD IRQ is requested, it can trigger, since it's an externally controlled event. If it does, delayed_work host->detect will be scheduled. Many host controller probe()s are roughly structured as: _probe() { host = sdhci_pltfm_init(); mmc_of_parse(host->mmc); rc = sdhci_add_host(host); if (rc) { sdhci_pltfm_free(); return rc; } In 3.17, CD IRQs can are enabled quite early via _probe() -> mmc_of_parse() -> mmc_gpio_request_cd() -> mmc_gpiod_request_cd_irq(). Note that in linux-next, mmc_of_parse() calls mmc_gpiod_request_cd() rather than mmc_gpio_request_cd(), and mmc_gpiod_request_cd() doesn't call mmc_gpiod_request_cd_irq(). However, this issue still exists if mmc_gpio_request_cd() is called directly before mmc_start_host(). sdhci_add_host() may fail part way through (e.g. due to deferred probe for a vmmc regulator), and sdhci_pltfm_free() does nothing to unrequest the CD IRQ nor cancel the delayed_work. sdhci_pltfm_free() is coded to assume that if sdhci_add_host() failed, then the delayed_work cannot (or should not) have been triggered. This can lead to the following with CONFIG_DEBUG_OBJECTS_* enabled, when kfree(host) is eventually called inside sdhci_pltfm_free(): WARNING: CPU: 2 PID: 6 at lib/debugobjects.c:263 debug_print_object+0x8c/0xb4() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x18 The object being complained about is host->detect. There's no need to request the CD IRQ so early; mmc_start_host() already requests it. For most SDHCI hosts at least, the typical call path that does this is: _probe() -> sdhci_add_host() -> mmc_add_host() -> mmc_start_host(). Therefore, remove the call to mmc_gpiod_request_cd_irq() from mmc_gpio_request_cd(). This also matches mmc_gpiod*_request_cd(), which already doesn't call mmc_gpiod_request_cd_irq(). However, some host controller drivers call mmc_gpio_request_cd() after mmc_start_host() has already been called, and assume that this will also call mmc_gpiod_request_cd_irq(). Update those drivers to explicitly call mmc_gpiod_request_cd_irq() themselves. Ideally, these drivers should be modified to move their call to mmc_gpio_request_cd() before their call to mmc_add_host(). However that's too large a change for stable. This solves the problem (eliminates the kernel error message above), since it guarantees that the IRQ can't trigger before mmc_start_host() is called. The critical point here is that once sdhci_add_host() calls mmc_add_host() -> mmc_start_host(), sdhci_add_host() is coded not to fail. In other words, if there's a chance that mmc_start_host() may have been called, and CD IRQs triggered, and the delayed_work scheduled, sdhci_add_host() won't fail, and so cleanup is no longer via sdhci_pltfm_free() (which doesn't free the IRQ or cancel the work queue) but instead must be via sdhci_remove_host(), which calls mmc_remove_host() -> mmc_stop_host(), which does free the IRQ and cancel the work queue. CC: Russell King <linux@arm.linux.org.uk> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2014-09-23 09:01:36 +02:00
Dave Chinner	7abbb8f928	xfs: xfs_swap_extent_flush can be static Fix sparse warning introduced by commit `4ef897a` ("xfs: flush both inodes in xfs_swap_extents"). Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 16:20:11 +10:00
Dave Chinner	02cc18764c	xfs: xfs_buf_write_fail_rl_state can be static Fix sparse warning introduced by commit `ac8809f9` ("xfs: abort metadata writeback on permanent errors"). Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 16:15:45 +10:00
Josh Triplett	3cf6b0151b	Merge branches 'tiny/bloat-o-meter-no-SyS', 'tiny/more-procless', 'tiny/no-advice', 'tiny/tinyconfig' and 'tiny/x86-boot-compressed-use-yn' into tiny/next	2014-09-22 23:14:40 -07:00
Fengguang Wu	ea95961df7	xfs: xfs_rtget_summary can be static Fix sparse warning introduced by commit `afabfd3` ("xfs: combine xfs_rtmodify_summary and xfs_rtget_summary"). Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 16:11:43 +10:00
Fabian Frederick	e3cf17962a	xfs: remove second xfs_quota.h inclusion in xfs_icache.c xfs_quota.h was included twice. Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 16:05:55 +10:00
Linus Torvalds	f3670394c2	Revert "x86/efi: Fixup GOT in all boot code paths" This reverts commit `9cb0e39423`. It causes my Sony Vaio Pro 11 to immediately reboot at startup. Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-22 23:05:49 -07:00
Eric Sandeen	fb04013156	xfs: don't ASSERT on corrupt ftype xfs_dir3_data_get_ftype() gets the file type off disk, but ASSERTs if it's invalid: ASSERT(type < XFS_DIR3_FT_MAX); We shouldn't ASSERT on bad values read from disk. V3 dirs are CRC-protected, but V2 dirs + ftype are not. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 16:05:32 +10:00
Dave Chinner	8af3dcd3c8	xfs: xlog_cil_force_lsn doesn't always wait correctly When running a tight mount/unmount loop on an older kernel, RedHat QE found that unmount would occasionally hang in xfs_buf_unpin_wait() on the superblock buffer. Tracing and other debug work by Eric Sandeen indicated that it was hanging on the writing of the superblock during unmount immediately after logging the superblock counters in a synchronous transaction. Further debug indicated that the synchronous transaction was not waiting for completion correctly, and we narrowed it down to xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing the transaction in the iclog buffer to disk correctly. While this unmount superblock write code is now very different in mainline kernels, the xlog_cil_force_lsn() code is identical, and it was bisected to the backport of commit `f876e44` ("xfs: always do log forces via the workqueue"). This commit made the CIL push asynchronous for log forces and hence exposed a race condition that couldn't occur on a synchronous push. Essentially, the xlog_cil_force_lsn() relied implicitly on the fact that the sequence push would be complete by the time xlog_cil_push_now() returned, resulting in the context being pushed being in the committing list. When it was made asynchronous, it was recognised that there was a race condition in detecting whether an asynchronous push has started or not and code was added to handle it. Unfortunately, the fix was not quite right and left a race condition where it it would detect an empty CIL while a push was in progress before the context had been added to the committing list. This was incorrectly seen as a "nothing to do" condition and so would tell xfs_log_force_lsn() that there is nothing to wait for, and hence it would push the iclogbufs in memory. The fix is simple, but explaining the logic and the race condition is a lot more complex. The fix is to add the context to the committing list before we start emptying the CIL. This allows us to detect the difference between an empty "do nothing" push and a push that has not started by adding a discrete "emptying the CIL" state to avoid the transient, incorrect "empty" condition that the (unchanged) waiting code was seeing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:57:59 +10:00
Dave Chinner	f6d31f4b04	Merge branch 'xfs-shift-extents-rework' into for-next	2014-09-23 15:51:14 +10:00
Brian Foster	8b5279e33f	xfs: only writeback and truncate pages for the freed range xfs_free_file_space() only affects the range of the file for which space is being freed. It currently writes and truncates the page cache from the start offset of the free to EOF. Modify xfs_free_file_space() to write back and truncate page cache of just the range being freed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:39:05 +10:00
Brian Foster	f71721d061	xfs: writeback and inval. file range to be shifted by collapse The collapse range operation currently writes the entire file before starting the collapse to avoid changes in the in-core extent list due to writeback causing the extent count to change. Now that collapse range is fsb based rather than extent index based it can sustain changes in the extent list during the shift sequence without disruption. Modify xfs_collapse_file_space() to writeback and invalidate pages associated with the range of the file to be shifted. xfs_free_file_space() currently has similar behavior, but the space free need only affect the region of the file that is freed and this could change in the future. Also update the comments to reflect the current implementation. We retain the eofblocks trim permanently as a best option for dealing with delalloc extents. We don't shift delalloc extents because this scenario only occurs with post-eof preallocation (since data must be flushed such that the cache can be invalidated and data can be shifted). That means said space must also be initialized before being shifted into the accessible region of the file only to be immediately truncated off as the last part of the collapse. In other words, the eofblocks trim will happen anyways, we just run it first to ensure the file remains in a consistent state throughout the collapse. Finally, detect and fail explicitly in the event of a delalloc extent during the extent shift. The implementation does not support delalloc extents and the caller is expected to prevent this scenario in advance as is done by collapse. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:39:05 +10:00
Brian Foster	a979bdfea1	xfs: refactor single extent shift into xfs_bmse_shift_one() helper xfs_bmap_shift_extents() has a variety of conditions and error checks that make the logic difficult to follow and indent heavy. Refactor the loop body of this function into a new xfs_bmse_shift_one() helper. This simplifies the error checks, eliminates index decrement on merge hack by pushing the index increment down into the helper, and makes the code more readable by reducing multiple levels of indentation. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:39:04 +10:00
Brian Foster	ddb19e3180	xfs: refactor shift-by-merge into xfs_bmse_merge() helper The extent shift mechanism in xfs_bmap_shift_extents() is complicated and handles several different, non-deterministic scenarios. These include extent shifts, extent merges and potential btree updates in either of the former scenarios. Refactor the code to be more linear and readable. The loop logic in xfs_bmap_shift_extents() and some initial error checking is adjusted slightly. The associated btree lookup and update/delete operations are condensed into single blocks of code. This reduces the number of btree-specific blocks and facilitates the separation of the merge operation into a new xfs_bmse_merge() and xfs_bmse_can_merge() helpers. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:38:09 +10:00
Brian Foster	2c845f5a5f	xfs: track collapse via file offset rather than extent index The collapse range implementation uses a transaction per extent shift. The progress of the overall operation is tracked via the current extent index of the in-core extent list. This is racy because the ilock must be dropped and reacquired for each transaction according to locking and log reservation rules. Therefore, writeback to prior regions of the file is possible and can change the extent count. This changes the extent to which the current index refers and causes the collapse to fail mid operation. To avoid this problem, the entire file is currently written back before the collapse operation starts. To eliminate the need to flush the entire file, use the file offset (fsb) to track the progress of the overall extent shift operation rather than the extent index. Modify xfs_bmap_shift_extents() to unconditionally convert the start_fsb parameter to an extent index and return the file offset of the extent where the shift left off, if further extents exist. The bulk of ths function can remain based on extent index as ilock is held by the caller. xfs_collapse_file_space() now uses the fsb output as the starting point for the subsequent shift. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:37:09 +10:00
Dave Chinner	0d085a529b	xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly XFS has been having trouble with stray delayed allocation extents beyond EOF for a long time. Recent changes to the collapse range code has triggered erroneous EBUSY errors on page invalidtion for block size smaller than page size filesystems. These have been caused by dirty buffers beyond EOF on a partial page which do not get written to disk during a sync. The issue is that write-ahead in xfs_cluster_write() finds such a partial page and handles it by leaving the page dirty but pushing it into a writeback state. This used to work just fine, as the write_cache_pages() code would then find the dirty partial page in the next mapping tree lookup as the dirty tag is still set. Unfortunately, when we moved to a mark and sweep approach to writeback to fix other writeback sync issues, we broken this. THe act of marking the page as under writeback now clears the TOWRITE tag in the radix tree, even though the page is still dirty. This causes the TOWRITE tag to be cleared, and hence the next lookup on the mapping tree does not find the dirty partial page and so doesn't try to write it again. This same writeback bug was found recently in ext4 and fixed in commit `1c8349a` ("ext4: fix data integrity sync in ordered mode") without communication to the wider filesystem community. We can use exactly the same fix here so the TOWRITE flag is not cleared on partial page writes. cc: stable@vger.kernel.org # dependent on `1c8349a171` Root-cause-found-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-23 15:36:27 +10:00
Ingo Molnar	6273143359	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v3.18 RCU changes from Paul E. McKenney: " * Update RCU documentation. These were posted to LKML at https://lkml.org/lkml/2014/8/28/378. * Miscellaneous fixes. These were posted to LKML at https://lkml.org/lkml/2014/8/28/386. An additional fix that eliminates a documented (but now inconvenient) deadlock between RCU hotplug and expedited grace periods was posted at https://lkml.org/lkml/2014/8/28/573. * Changes related to No-CBs CPUs and NO_HZ_FULL. These were posted to LKML at https://lkml.org/lkml/2014/8/28/412. * Torture-test updates. These were posted to LKML at https://lkml.org/lkml/2014/8/28/546 and at https://lkml.org/lkml/2014/9/11/1114. * RCU-tasks implementation. These were posted to LKML at https://lkml.org/lkml/2014/8/28/540. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-23 07:21:42 +02:00

... 113 114 115 116 117 ...

482213 Commits