linux

Author	SHA1	Message	Date
Daniel Borkmann	81164413ad	net: tcp: add per route congestion control This work adds the possibility to define a per route/destination congestion control algorithm. Generally, this opens up the possibility for a machine with different links to enforce specific congestion control algorithms with optimal strategies for each of them based on their network characteristics, even transparently for a single application listening on all links. For our specific use case, this additionally facilitates deployment of DCTCP, for example, applications can easily serve internal traffic/dsts in DCTCP and external one with CUBIC. Other scenarios would also allow for utilizing e.g. long living, low priority background flows for certain destinations/routes while still being able for normal traffic to utilize the default congestion control algorithm. We also thought about a per netns setting (where different defaults are possible), but given its actually a link specific property, we argue that a per route/destination setting is the most natural and flexible. The administrator can utilize this through ip-route(8) by appending "congctl [lock] <name>", where <name> denotes the name of a congestion control algorithm and the optional lock parameter allows to enforce the given algorithm so that applications in user space would not be allowed to overwrite that algorithm for that destination. The dst metric lookups are being done when a dst entry is already available in order to avoid a costly lookup and still before the algorithms are being initialized, thus overhead is very low when the feature is not being used. While the client side would need to drop the current reference on the module, on server side this can actually even be avoided as we just got a flat-copied socket clone. Joint work with Florian Westphal. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Daniel Borkmann	ea69763999	net: tcp: add RTAX_CC_ALGO fib handling This patch adds the minimum necessary for the RTAX_CC_ALGO congestion control metric to be set up and dumped back to user space. While the internal representation of RTAX_CC_ALGO is handled as a u32 key, we avoided to expose this implementation detail to user space, thus instead, we chose the netlink attribute that is being exchanged between user space to be the actual congestion control algorithm name, similarly as in the setsockopt(2) API in order to allow for maximum flexibility, even for 3rd party modules. It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as it should have been stored in RTAX_FEATURES instead, we first thought about reusing it for the congestion control key, but it brings more complications and/or confusion than worth it. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Daniel Borkmann	c5c6a8ab45	net: tcp: add key management to congestion control This patch adds necessary infrastructure to the congestion control framework for later per route congestion control support. For a per route congestion control possibility, our aim is to store a unique u32 key identifier into dst metrics, which can then be mapped into a tcp_congestion_ops struct. We argue that having a RTAX key entry is the most simple, generic and easy way to manage, and also keeps the memory footprint of dst entries lower on 64 bit than with storing a pointer directly, for example. Having a unique key id also allows for decoupling actual TCP congestion control module management from the FIB layer, i.e. we don't have to care about expensive module refcounting inside the FIB at this point. We first thought of using an IDR store for the realization, which takes over dynamic assignment of unused key space and also performs the key to pointer mapping in RCU. While doing so, we stumbled upon the issue that due to the nature of dynamic key distribution, it just so happens, arguably in very rare occasions, that excessive module loads and unloads can lead to a possible reuse of previously used key space. Thus, previously stale keys in the dst metric are now being reassigned to a different congestion control algorithm, which might lead to unexpected behaviour. One way to resolve this would have been to walk FIBs on the actually rare occasion of a module unload and reset the metric keys for each FIB in each netns, but that's just very costly. Therefore, we argue a better solution is to reuse the unique congestion control algorithm name member and map that into u32 key space through jhash. For that, we split the flags attribute (as it currently uses 2 bits only anyway) into two u32 attributes, flags and key, so that we can keep the cacheline boundary of 2 cachelines on x86_64 and cache the precalculated key at registration time for the fast path. On average we might expect 2 - 4 modules being loaded worst case perhaps 15, so a key collision possibility is extremely low, and guaranteed collision-free on LE/BE for all in-tree modules. Overall this results in much simpler code, and all without the overhead of an IDR. Due to the deterministic nature, modules can now be unloaded, the congestion control algorithm for a specific but unloaded key will fall back to the default one, and on module reload time it will switch back to the expected algorithm transparently. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Florian Westphal	e715b6d3a5	net: fib6: convert cfg metric to u32 outside of table write lock Do the nla validation earlier, outside the write lock. This is needed by followup patch which needs to be able to call request_module (which can sleep) if needed. Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Tom Herbert	ad6f939ab1	ip: Add offset parameter to ip_cmsg_recv Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	5961de9f19	ip: Add offset parameter to ip_cmsg_recv Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	c44d13d6f3	ip: IP cmsg cleanup Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that they can be referenced in other source files. Restructure ip_cmsg_recv to not go through flags using shift, check for flags by 'and'. This eliminates both the shift and a conditional per flag check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	224d019c4f	ip: Move checksum convert defines to inet Move convert_csum from udp_sock to inet_sock. This allows the possibility that we can use convert checksum for different types of sockets and also allows convert checksum to be enabled from inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Thomas Graf	149118d893	netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel() Calling nla_nest_cancel() in a different order as the nesting was built up can lead to negative offsets being calculated which results in skb_trim() being called with an underflowed unsigned int. Warn if mark < skb->data as it's definitely a bug. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:38:22 -05:00
Ying Xue	a3449ded12	list_nulls: fix missing header Fixup below build error: include/linux/list_nulls.h: In function ‘hlist_nulls_del’: include/linux/list_nulls.h:84:13: error: ‘LIST_POISON2’ undeclared (first use in this function) Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 23:11:43 -05:00
Ying Xue	86b35b64ed	rhashtable: fix missing header Fixup below build error: include/linux/rhashtable.h: At top level: include/linux/rhashtable.h:118:34: error: field ‘mutex’ has incomplete type Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 23:11:43 -05:00
Jesse Gross	df5dba8e52	geneve: Remove socket hash table. The hash table for open Geneve ports is used only on creation and deletion time. It is not performance critical and is not likely to grow to a large number of items. Therefore, this can be changed to use a simple linked list. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
Jesse Gross	829a3ada9c	geneve: Simplify locking. The existing Geneve locking scheme was pulled over directly from VXLAN. However, VXLAN has a number of built in mechanisms which make the locking more complex and are unlikely to be necessary with Geneve. This simplifies the locking to use a basic scheme of a mutex when doing updates plus RCU on receive. In addition to making the code easier to read, this also avoids the possibility of a race when creating or destroying sockets since UDP sockets and the list of Geneve sockets are protected by different locks. After this change, the entire operation is atomic. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
Jesse Gross	61f3cade76	geneve: Remove workqueue. The work queue is used only to free the UDP socket upon destruction. This is not necessary with Geneve and generally makes the code more difficult to reason about. It also introduces nondeterministic behavior such as when a socket is rapidly deleted and recreated, which could fail as the the deletion happens asynchronously. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
Thomas Graf	f89bd6f87a	rhashtable: Supports for nulls marker In order to allow for wider usage of rhashtable, use a special nulls marker to terminate each chain. The reason for not using the existing nulls_list is that the prev pointer usage would not be valid as entries can be linked in two different buckets at the same time. The 4 nulls base bits can be set through the rhashtable_params structure like this: struct rhashtable_params params = { [...] .nulls_base = (1U << RHT_BASE_SHIFT), }; This reduces the hash length from 32 bits to 27 bits. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:57 -05:00
Thomas Graf	97defe1ecf	rhashtable: Per bucket locks & deferred expansion/shrinking Introduces an array of spinlocks to protect bucket mutations. The number of spinlocks per CPU is configurable and selected based on the hash of the bucket. This allows for parallel insertions and removals of entries which do not share a lock. The patch also defers expansion and shrinking to a worker queue which allows insertion and removal from atomic context. Insertions and deletions may occur in parallel to it and are only held up briefly while the particular bucket is linked or unzipped. Mutations of the bucket table pointer is protected by a new mutex, read access is RCU protected. In the event of an expansion or shrinking, the new bucket table allocated is exposed as a so called future table as soon as the resize process starts. Lookups, deletions, and insertions will briefly use both tables. The future table becomes the main table after an RCU grace period and initial linking of the old to the new table was performed. Optimization of the chains to make use of the new number of buckets follows only the new table is in use. The side effect of this is that during that RCU grace period, a bucket traversal using any rht_for_each() variant on the main table will not see any insertions performed during the RCU grace period which would at that point land in the future table. The lookup will see them as it searches both tables if needed. Having multiple insertions and removals occur in parallel requires nelems to become an atomic counter. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:57 -05:00
Thomas Graf	113948d841	spinlock: Add spin_lock_bh_nested() Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:57 -05:00
Thomas Graf	897362e446	nft_hash: Remove rhashtable_remove_pprev() The removal function of nft_hash currently stores a reference to the previous element during lookup which is used to optimize removal later on. This was possible because a lock is held throughout calling rhashtable_lookup() and rhashtable_remove(). With the introdution of deferred table resizing in parallel to lookups and insertions, the nftables lock will no longer synchronize all table mutations and the stored pprev may become invalid. Removing this optimization makes removal slightly more expensive on average but allows taking the resize cost out of the insert and remove path. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: netfilter-devel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:57 -05:00
Thomas Graf	88d6ed15ac	rhashtable: Convert bucket iterators to take table and index This patch is in preparation to introduce per bucket spinlocks. It extends all iterator macros to take the bucket table and bucket index. It also introduces a new rht_dereference_bucket() to handle protected accesses to buckets. It introduces a barrier() to the RCU iterators to the prevent the compiler from caching the first element. The lockdep verifier is introduced as stub which always succeeds and properly implement in the next patch when the locks are introduced. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:56 -05:00
Thomas Graf	8d24c0b431	rhashtable: Do hashing inside of rhashtable_lookup_compare() Hash the key inside of rhashtable_lookup_compare() like rhashtable_lookup() does. This allows to simplify the hashing functions and keep them private. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: netfilter-devel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:56 -05:00
Richard Cochran	1891172aa5	timecounter: provide a macro to initialize the cyclecounter mask field. There is no need for users of the timecounter/cyclecounter code to include clocksource.h just for a single macro. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-02 16:47:35 -05:00
David S. Miller	6c032edc8a	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg say: ==================== pull request: bluetooth-next 2014-12-31 Here's the first batch of bluetooth patches for 3.20. - Cleanups & fixes to ieee802154 drivers - Fix synchronization of mgmt commands with respective HCI commands - Add self-tests for LE pairing crypto functionality - Remove 'BlueFritz!' specific handling from core using a new quirk flag - Public address configuration support for ath3012 - Refactor debugfs support into a dedicated file - Initial support for LE Data Length Extension feature from Bluetooth 4.2 Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-02 15:58:21 -05:00
Jesse Gross	9b174d88c2	net: Add Transparent Ethernet Bridging GRO support. Currently the only tunnel protocol that supports GRO with encapsulated Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer so that it can be used by other tunnel protocols such as GRE and Geneve. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-02 15:46:41 -05:00
Roger Chen	3cf8e53a48	GMAC: define clock ID used for GMAC changes since v2: 1. remove SCLK_MAC_PLL Signed-off-by: Roger Chen <roger.chen@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 19:14:18 -05:00
Alexander Duyck	345e9b5426	fib_trie: Push rcu_read_lock/unlock to callers This change is to start cleaning up some of the rcu_read_lock/unlock handling. I realized while reviewing the code there are several spots that I don't believe are being handled correctly or are masking warnings by locally calling rcu_read_lock/unlock instead of calling them at the correct level. A common example is a call to fib_get_table followed by fib_table_lookup. The rcu_read_lock/unlock ought to wrap both but there are several spots where they were not wrapped. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 18:25:54 -05:00
Bill Hong	33f72e6f0c	l2tp : multicast notification to the registered listeners Previously l2tp module did not provide any means for the user space to get notified when tunnels/sessions are added/modified/deleted. This change contains the following - create a multicast group for the listeners to register. - notify the registered listeners when the tunnels/sessions are created/modified/deleted. Signed-off-by: Bill Hong <bhong@brocade.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Sven-Thorsten Dietrich <sven@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 14:17:20 -05:00
Nimrod Andy	de40ed31b3	net: fec: add Wake-on-LAN support Support for Wake-on-LAN using Magic Packet. ENET IP supports sleep mode in low power status, when system enter suspend status, Magic packet can wake up system even if all SOC clocks are gate. The patch doing below things: - flagging the device as a wakeup source for the system, as well as its Wake-on-LAN interrupt - prepare the hardware for entering WoL mode - add standard ethtool WOL interface - enable the ENET interrupt to wake us Tested on i.MX6q/dl sabresd, sabreauto boards, i.MX6SX arm2 boards. Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 13:06:50 -05:00
Dmitry Eremin-Solenikov	dd45077799	arm: sa1100: move irda header to linux/platform_data In the end asm/mach/irda.h header is not used by anybody except sa1100. Move the header to the platform data includes dir and rename it to irda-sa11x0.h. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:44:07 -05:00
Richard Cochran	2eebdde652	timecounter: keep track of accumulated fractional nanoseconds The current timecounter implementation will drop a variable amount of resolution, depending on the magnitude of the time delta. In other words, reading the clock too often or too close to a time stamp conversion will introduce errors into the time values. This patch fixes the issue by introducing a fractional nanosecond field that accumulates the low order bits. Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:27 -05:00
Richard Cochran	796c1efd6f	timecounter: provide a helper function to shift the time. Some PTP Hardware Clock drivers use a struct timecounter to represent their clock. To adjust the time by a given offset, these drivers all perform a two step read/write of their timecounter. However, it is better and simpler just to adjust the offset in one step. This patch introduces a little routine to help drivers implement the adjtime method. Suggested-by: Janusz Użycki <j.uzycki@elproma.com.pl> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:25 -05:00
Richard Cochran	74d23cc704	time: move the timecounter/cyclecounter code into its own file. The timecounter code has almost nothing to do with the clocksource code. Let it live in its own file. This will help isolate the timecounter users from the clocksource users in the source tree. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:25 -05:00
Linus Torvalds	2c90331cf5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen. 2) Fix receive checksum handling in enic driver, from Govindarajulu Varadarajan. 3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from Herbert Xu. Also, add code to detect drivers that have this mistake in the future. 4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai. 5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP input path,f rom Nicolas Dichtel. 6) Fix MPLS action validation in openvswitch, from Pravin B Shelar. 7) Fix double SKB free in vxlan driver, also from Pravin. 8) When we scrub a packet, which happens when we are switching the context of the packet (namespace, etc.), we should reset the secmark. From Thomas Graf. 9) ->ndo_gso_check() needs to do more than return true/false, it also has to allow the driver to clear netdev feature bits in order for the caller to be able to proceed properly. From Jesse Gross. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) genetlink: A genl_bind() to an out-of-range multicast group should not WARN(). netlink/genetlink: pass network namespace to bind/unbind ne2k-pci: Add pci_disable_device in error handling bonding: change error message to debug message in __bond_release_one() genetlink: pass multicast bind/unbind to families netlink: call unbind when releasing socket netlink: update listeners directly when removing socket genetlink: pass only network namespace to genl_has_listeners() netlink: rename netlink_unbind() to netlink_undo_bind() net: Generalize ndo_gso_check to ndo_features_check net: incorrect use of init_completion fixup neigh: remove next ptr from struct neigh_table net: xilinx: Remove unnecessary temac_property in the driver net: phy: micrel: use generic config_init for KSZ8021/KSZ8031 net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding openvswitch: fix odd_ptr_err.cocci warnings Bluetooth: Fix accepting connections when not using mgmt Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR brcmfmac: Do not crash if platform data is not populated ipw2200: select CFG80211_WEXT ...	2014-12-30 10:45:47 -08:00
Linus Torvalds	df90dcd100	Merge tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI material from Rafael J Wysocki: "These are fixes (operating performance points library, cpufreq-dt driver, cpufreq core, ACPI backlight, cpupower tool), cleanups (cpuidle), new processor IDs for the RAPL (Running Average Power Limit) power capping driver, and a modification of the generic power domains framework allowing modular drivers to call one of its helper functions. Specifics: - Fix for a potential NULL pointer dereference in the cpufreq core due to an initialization race condition (Ethan Zhao). - Fixes for abuse of the OPP (Operating Performance Points) API related to RCU and other minor issues in the OPP library and the cpufreq-dt driver (Dmitry Torokhov). - cpuidle governors cleanup making them measure idle duration in a better way without using the CPUIDLE_FLAG_TIME_INVALID flag which allows that flag to be dropped from the ACPI cpuidle driver and from the core too (Len Brown). - New ACPI backlight blacklist entries for Samsung machines without a working native backlight interface that need to use the ACPI backlight instead (Aaron Lu). - New CPU IDs of future Intel Xeon CPUs for the Intel RAPL power capping driver (Jacob Pan). - Generic power domains framework modification to export the of_genpd_get_from_provider() function to modular drivers that will allow future driver modifications to be based on the mainline (Amit Daniel Kachhap). - Two fixes for the cpupower tool (Michal Privoznik, Prarit Bhargava)" * tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / video: Add some Samsung models to disable_native_backlight list tools / cpupower: Fix no idle state information return value tools / cpupower: Correctly detect if running as root cpufreq: fix a NULL pointer dereference in __cpufreq_governor() cpufreq-dt: defer probing if OPP table is not ready PM / OPP: take RCU lock in dev_pm_opp_get_opp_count PM / OPP: fix warning in of_free_opp_table() PM / OPP: add some lockdep annotations powercap / RAPL: add IDs for future Xeon CPUs PM / Domains: Export of_genpd_get_from_provider function cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID	2014-12-29 18:50:02 -08:00
Linus Torvalds	4c5d499503	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "First of all, the most important change is the thermal cpu cooling fixes. The major fix here is to have proper sequencing between cpufreq layer and thermal cpu cooling registration. A take away of this fix is an improvement in the thermal drivers code. Thermal drivers that require cpu cooling do not need to check for cpufreq layer. The requirement now is to propagate the error code, if any, while registering cpu cooling device. Thanks to Viresh for implementing the required CPUfreq changes. Second, a new driver is introduced for int340x processor thermal device. Given that int340x thermal is disabled by default, and this processor thermal device is only available on limited platforms, plus the driver does nothing but exposes some thermal limitation information for user space to use, thus I think it is safe to include it in this pull request after missing 3.19-rc2. Specifics: - Thermal cpu cooling fixes and cleanups. - introduce INT340X processor thermal reporting device driver. - several small fixes and cleanups for int340x thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (43 commits) Thermal/int340x/int3403: Free acpi notification handler Thermal/int340x/processor_thermal: Fix memory leak Thermal/int340x/int3403: Fix memory leak thermal: int340x: Introduce processor reporting device thermal: int340x_thermal: drop owner assignment from platform_drivers thermal: drop owner assignment from platform_drivers thermal: cpu_cooling: document node in struct cpufreq_cooling_device thermal/powerclamp: add ids for future xeon cpus Thermal/int340x: Handle properly the case when _trt or _art acpi entry is missing thermal: cpu_cooling: return ERR_PTR() for !CPU_THERMAL or !THERMAL_OF thermal: cpu_cooling: small memory leak on error thermal: ti-soc-thermal: Do not print error message in the EPROBE_DEFER case thermal: db8500: Do not print error message in the EPROBE_DEFER case thermal: imx: Do not print error message in the EPROBE_DEFER case thermal: Fix cdev registration with THERMAL_NO_LIMIT on 64bit drivers: thermal: Remove ARCH_HAS_BANDGAP dependency for samsung thermal:core:fix: Check return code of the ->get_max_state() callback thermal: cpu_cooling: update copyright tags thermal: cpu_cooling: Use cpufreq_dev->freq_table for finding level/freq thermal: cpu_cooling: Store frequencies in descending order ...	2014-12-29 13:13:41 -08:00
Michal Hocko	45f87de57f	mm: get rid of radix tree gfp mask for pagecache_get_page Commit `2457aec637` ("mm: non-atomically mark page accessed during page cache allocation where possible") has added a separate parameter for specifying gfp mask for radix tree allocations. Not only this is less than optimal from the API point of view because it is error prone, it is also buggy currently because grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then the radix tree allocation wouldn't obey the restriction and might recurse into filesystem and cause deadlocks. This is the case for most filesystems unfortunately because only ext4 and gfs2 are using AOP_FLAG_NOFS. Let's simply remove radix_gfp_mask parameter because the allocation context is same for both page cache and for the radix tree. Just make sure that the radix tree gets only the sane subset of the mask (e.g. do not pass __GFP_WRITE). Long term it is more preferable to convert remaining users of AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this interface even further. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-29 12:45:45 -08:00
Rafael J. Wysocki	4f2f277789	Merge branches 'pm-domains', 'powercap' and 'pm-tools' * pm-domains: PM / Domains: Export of_genpd_get_from_provider function * powercap: powercap / RAPL: add IDs for future Xeon CPUs * pm-tools: tools / cpupower: Fix no idle state information return value tools / cpupower: Correctly detect if running as root	2014-12-29 21:24:00 +01:00
Rafael J. Wysocki	ff23ab2441	Merge branches 'pm-cpufreq' and 'pm-cpuidle' * pm-cpufreq: cpufreq: fix a NULL pointer dereference in __cpufreq_governor() cpufreq-dt: defer probing if OPP table is not ready * pm-cpuidle: cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID	2014-12-29 21:23:41 +01:00
Johannes Berg	023e2cfa36	netlink/genetlink: pass network namespace to bind/unbind Netlink families can exist in multiple namespaces, and for the most part multicast subscriptions are per network namespace. Thus it only makes sense to have bind/unbind notifications per network namespace. To achieve this, pass the network namespace of a given client socket to the bind/unbind functions. Also do this in generic netlink, and there also make sure that any bind for multicast groups that only exist in init_net is rejected. This isn't really a problem if it is accepted since a client in a different namespace will never receive any notifications from such a group, but it can confuse the family if not rejected (it's also possible to silently (without telling the family) accept it, but it would also have to be ignored on unbind so families that take any kind of action on bind/unbind won't do unnecessary work for invalid clients like that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 03:07:50 -05:00
Johannes Berg	c380d9a7af	genetlink: pass multicast bind/unbind to families In order to make the newly fixed multicast bind/unbind functionality in generic netlink, pass them down to the appropriate family. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 02:20:23 -05:00
Johannes Berg	f8403a2e47	genetlink: pass only network namespace to genl_has_listeners() There's no point to force the caller to know about the internal genl_sock to use inside struct net, just have them pass the network namespace. This doesn't really change code generation since it's an inline, but makes the caller less magic - there's never any reason to pass another socket. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 02:20:23 -05:00
Jesse Gross	5f35227ea3	net: Generalize ndo_gso_check to ndo_features_check GSO isn't the only offload feature with restrictions that potentially can't be expressed with the current features mechanism. Checksum is another although it's a general issue that could in theory apply to anything. Even if it may be possible to implement these restrictions in other ways, it can result in duplicate code or inefficient per-packet behavior. This generalizes ndo_gso_check so that drivers can remove any features that don't make sense for a given packet, similar to netif_skb_features(). It also converts existing driver restrictions to the new format, completing the work that was done to support tunnel protocols since the issues apply to checksums as well. By actually removing features from the set that are used to do offloading, it solves another problem with the existing interface. In these cases, GSO would run with the original set of features and not do anything because it appears that segmentation is not required. CC: Tom Herbert <therbert@google.com> CC: Joe Stringer <joestringer@nicira.com> CC: Eric Dumazet <edumazet@google.com> CC: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Tom Herbert <therbert@google.com> Fixes: `04ffcb255f` ("net: Add ndo_gso_check") Tested-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-26 17:20:56 -05:00
Nicolas Dichtel	ef8f342b43	neigh: remove next ptr from struct neigh_table After commit `d7480fd3b1` ("neigh: remove dynamic neigh table registration support"), this field is not used anymore. CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-26 17:07:08 -05:00
Marcel Holtmann	711ffa78f4	Bluetooth: Introduce HCI_QUIRK_BROKEN_LOCAL_COMMANDS constant Some controllers advertise support for Bluetooth 1.2 specification, but they do not support the HCI Read Local Supported Commands command. If that is the case, then the driver can quirk the behavior and force the core to skip this command. This will allow removing vendor specific checks out of the core. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-12-26 20:16:10 +02:00
Linus Torvalds	08b022a965	Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Xmas fixes pull: core: one atomic fix, revert the WARN_ON dumb buffers patch. agp: fixup Dave J. nouveau: fix 3.18 regression for old userspace tegra fixes: vblank and iommu fixes amdkfd: fix bugs shown by testing with userspace, init apertures once msm: hdmi fixes and cleanup i915: misc fixes There is also a link ordering fix that I've asked to be cc'ed to you, putting iommu before gpu, it fixes an issue with amdkfd when things are all in the kernel, but I didn't like sending it via my tree without discussion. I'll probably be a bit on/off for a few weeks with pulls now, due to holidays and LCA, so don't be surprised if stuff gets a bit backed up, and things end up a bit large due to lag" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits) Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2" agp: Fix up email address & attributions in AGP MODULE_AUTHOR tags nouveau: bring back legacy mmap handler drm/msm/hdmi: rework HDMI IRQ handler drm/msm/hdmi: enable regulators before clocks to avoid warnings drm/msm/mdp5: update irqs on crtc<->encoder link change drm/msm: block incoming update on pending updates drm/atomic: fix potential null ptr on plane enable drm/msm: Deletion of unnecessary checks before the function call "release_firmware" drm/msm: Deletion of unnecessary checks before two function calls drm/tegra: dc: Select root window for event dispatch drm/tegra: gem: Use the proper size for GEM objects drm/tegra: gem: Flush buffer objects upon allocation drm/tegra: dc: Fix a potential race on page-flip completion drm/tegra: dc: Consistently use the same pipe drm/irq: Add drm_crtc_vblank_count() drm/irq: Add drm_crtc_handle_vblank() drm/irq: Add drm_crtc_send_vblank_event() drm/i915: Disable PSMI sleep messages on all rings around context switches drm/i915: Force the CS stall for invalidate flushes ...	2014-12-25 16:04:15 -08:00
Dave Airlie	da6b51d007	Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2" This reverts commit `355a701838`. This had some bad side effects under normal operation, and should have been dropped earlier. Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-12-24 13:13:22 +10:00
Linus Torvalds	66b3f4f0a0	Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "Four patches to fix various problems with the audit subsystem, all are fairly small and straightforward. One patch fixes a problem where we weren't using the correct gfp allocation flags (GFP_KERNEL regardless of context, oops), one patch fixes a problem with old userspace tools (this was broken for a while), one patch fixes a problem where we weren't recording pathnames correctly, and one fixes a problem with PID based filters. In general I don't think there is anything controversial with this patchset, and it fixes some rather unfortunate bugs; the allocation flag one can be particularly scary looking for users" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: restore AUDIT_LOGINUID unset ABI audit: correctly record file names with different path name types audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb audit: don't attempt to lookup PIDs when changing PID filtering audit rules	2014-12-23 18:13:16 -08:00
Richard Guy Briggs	041d7b98ff	audit: restore AUDIT_LOGINUID unset ABI A regression was caused by commit `780a7654ce`: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by `e1760bd`) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Cc: stable@vger.kernel.org # v3.10-rc1+ Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-12-23 16:40:18 -05:00
Dave Airlie	fc556fb64e	Merge tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux into drm-fixes drm/tegra: Fixes for v3.19-rc1 This is a set of fixes for two regressions and one bug in the IOMMU mapping code. It turns out that all of these issues turn up primarily on Tegra30 hardware. The IOMMU mapping bug only manifests on buffers that aren't multiples of the page size. I happened to be testing HDMI with 1080p while writing the code and framebuffers for that happen to fit exactly within 2025 pages of 4 KiB each. One of the regressions is caused by the IOMMU code allocating pages from shmem which can have associated cache lines. If the pages aren't flushed then these cache lines may be flushed later on and cause framebuffer corruption. I'm not sure why I didn't see this before. Perhaps the board that I was using had enough RAM so that the pages shmem would hand out had a better chance of being unused. Or maybe I didn't look too closely. The fix for this is to fake up an SG table so that it can be passed to the DMA API. Ideally this would use drm_clflush_(), but implementing that for ARM causes DRM to fail to build as a module since some of the low-level cache maintenance functions aren't exported. Hopefully we can get a suitable API exported on ARM for the next release. The second regression is caused by a mismatch between the hardware pipe number and the CRTC's DRM index. These were used inconsistently, which could cause one code location to call drm_vblank_get() with a different pipe than the corresponding drm_vblank_put(), thereby causing the reference count to become unbalanced. Alexandre also reported a possible race condition related to this, which this series also fixes. tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux: drm/tegra: dc: Select root window for event dispatch drm/tegra: gem: Use the proper size for GEM objects drm/tegra: gem: Flush buffer objects upon allocation drm/tegra: dc: Fix a potential race on page-flip completion drm/tegra: dc: Consistently use the same pipe drm/irq: Add drm_crtc_vblank_count() drm/irq: Add drm_crtc_handle_vblank() drm/irq: Add drm_crtc_send_vblank_event()	2014-12-23 08:24:26 +10:00
stephen hemminger	6d08acd2d3	in6: fix conflict with glibc Resolve conflicts between glibc definition of IPV6 socket options and those defined in Linux headers. Looks like earlier efforts to solve this did not cover all the definitions. It resolves warnings during iproute2 build. Please consider for stable as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-22 16:12:36 -05:00
Zhang Rui	32c9edc4e3	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal into thermal-soc	2014-12-21 22:49:12 +08:00

1 2 3 4 5 ...

71709 Commits