The wakeup.flags.enabled flag in struct acpi_device is not used
consistently, as there is no reason why it should only apply
to the enabling/disabling of the wakeup GPE, so put the invocation
of acpi_enable_wakeup_device_power() under it too.
Moreover, it is not necessary to call
acpi_enable_wakeup_devices() and acpi_disable_wakeup_devices() for
suspend-to-idle, so don't do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change the log level of the "System wakeup enabled/disabled by ACPI"
message in acpi_pm_device_sleep_wake() to "debug" to reduce to log
noise level.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The work functions provided by the users of acpi_add_pm_notifier()
should be run synchronously before re-enabling the wakeup GPE in
case they are used to clear the status and/or disable the wakeup
signaling at the source. Otherwise, which is the case currently in
the PCI bus type code, the same wakeup event may be signaled for
multiple times while the execution of the work function in response
to it has already been queued up.
Fortunately, acpi_add_pm_notifier() is only used by PCI and by
ACPI device PM code internally, so the change is relatively
straightforward to make.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
ACPICA commit ed0389cb11a61e63c568ac1f67948fc6a7bd1aeb
An invalid opcode indicates something seriously wrong with the
input AML file. The AML parser is immediately confused and lost,
causing the resulting parse tree to be ill-formed. The actual
disassembly can then cause numerous unrelated errors and faults.
This change aborts the disassembly upon discovery of such an
opcode during the AML parse phase.
Link: https://github.com/acpica/acpica/commit/ed0389cb
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 99bc3beca92c6574ea1d69de42e54f872e6373ce
It is reported that on Linux, RTC driver complains wrong errors on
hardware reduced platform:
[ 4.085420] ACPI Warning: Could not enable fixed event - real_time_clock (4) (20160422/evxface-654)
This patch fixes this by correctly adding runtime reduced hardware check.
Reported by Chandan Tagore, fixed by Lv Zheng.
Link: https://github.com/acpica/acpica/commit/99bc3bec
Tested-by: Chandan Tagore <tagore.chandan@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 51a92f414de7af1f7f7524de3f61daf5413cac9f
Acpiexec gives this warning when resources containing GPIOs are extracted
using Resource command:
**** Data mismatch in descriptor [00] type 8C, Offset 00000000 ****
Mismatch at byte offset 13: is 00, should be 25
**** Data mismatch in descriptor [01] type 8C, Offset 00000025 ****
Mismatch at byte offset 13: is 00, should be 25
This happens because we do not set VendorOffset when doing resource to AML
conversion. Fix this by always setting VendorOffset.
Link: https://github.com/acpica/acpica/commit/51a92f41
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 08b83591c0db751769d61fa889f4f50f575aeffb
PinGroupConfig() is analogous to PinGroupFunction() but instead of mode
(muxing), it is used to apply specific fine-grained configuration to a
set of referenced pins.
The format of this new resource is:
PinGroupConfig (Shared/Exclusive, PinConfigType, PinConfigValue,
ResourceSource, ResourceSourceIndex, ResourceSourceLabel,
ResourceUsage, DescriptorName, VendorData)
The PinConfigType/PinConfigValue are the same used by PinConfig()
resource.
Here also the combination of ResourceSource and ResourceSourceLabel is
used to specify the PinGroup() this resource refers to.
Link: https://github.com/acpica/acpica/commit/08b83591
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit bd9a745749eac7137cd23085e6bdeb322de14ea2
PinGroupFunction() is a new resource introduced with ACPI 6.2. It is
used with PinGroup() to configure specific mode for a set of pins
exposed by a GPIO controller.
The format of the resource is:
PinGroupFunction (Shared/Exclusive, FunctionNumber, ResourceSource,
ResourceSourceIndex, ResourceSourceLabel,
ResourceUsage, DescriptorName, VendorData)
The resource_source and ResourceSourceLabel fields are used to specify
the PinGroup() resource referenced by PinGroupFunction().
Device (GPIO)
{
Name (_CRS, ResourceTemplate () {
PinGroup ("group1") {2, 3}
PinGroup ("group2") {4, 5}
...
})
}
Device (I2C)
{
Name (_CRS, ResourceTemplate () {
PinGroupFunction (Exclusive, 6, "^GPIO", 0, "mygroup2")
})
}
In the above example the PinGroupFunction() references the second
PinGroup() resource (using label "mygroup2" and configures pins 4 and 5
into mode 6.
Link: https://github.com/acpica/acpica/commit/bd9a7457
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 7d928e3174fb19d7dc0066b03c30bea07c001563
ACPI 6.2 introduced a new resource that is used to declare set of pins
belonging to a GPIO controller. This resource is referenced by new
PinGroupFunction() and PinGroupConfig() resources using ResourceSource
and ResourceLabel fields.
The PinGroup() resource looks like this:
PinGroup (ResourceLabel, ResourceUsage, DescriptorName,
VendorData) {Pin List}
This resource should be listed in _CRS under the GPIO/pincontroller
device providing these pins.
Link: https://github.com/acpica/acpica/commit/7d928e31
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit a06fdba686cefccd5dd5b93b52fa0f1e3f984906
ACPI 6.2 introduced a new resource that is used to specify fine-grained
configuration of a pin or set of pins used by a device. The ASL syntax of
this new resource looks like:
PinConfig (Shared/Exclusive, PinConfigType, PinConfigValue,
ResourceSource, ResourceSourceIndex, ResourceUsage,
DescriptorName, Vendordata) {Pin List}
PinConfigType is an integer with following accepted values:
0x00 (Default) - No configuration is applied to the pin
0x01 (Bias Pull-up) - Pin is pulled up using certain size resistor
0x02 (Bias Pull-down) - Pin is pulled down using certain size resistor
0x03 (Bias Default) - Set to default biasing
0x04 (Bias Disable) - All bias settings will be disabled
0x05 (Bias High Impedance) - Configure the pin as hi_z
0x06 (Bias Bus Hold) - Configure the pin in a weak latch state where
it drives the last value on a tristate bus
0x07 (Drive Open Drain) - Configure the pin into open drain state
0x08 (Drive Open Source) - Configure the pin into open source state
0x09 (Drive Push Pull) - Configure the pin into push-pull state
0x0a (Drive Strength) - How much the pin can supply current
0x0b (Slew Rate) - Configure slew rate of the pin
0x0c (Input Debounce) - Enable input debouncer for the pin
0x0d (Input Schmitt Trigger) - Enable schmitt trigger for the pin
0x0e - 0x7f - Reserved
0x80 - 0xff - Vendor defined types
The PinConfigValue depends on the type and is expressed as units
suitable for that type (for example bias uses Ohms).
Link: https://github.com/acpica/acpica/commit/a06fdba6
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 6bbc6357f7061f1243601adde0ea45f7a89274e0
ACPI 6.2 introduced a new resource that is used to describe how certain
pins are muxed for a device. The ASL syntax of this new resource looks
like below:
PinFunction(Shared, PinConfig, FunctionNumber, ResourceSource,
ResourceSourceIndex, ResourceUsage, DescriptorName,
VendorData) {Pin List}
Which is pretty similar to GpioIo()/GpioInt() resources.
Teach ACPICA about this new resource.
Link: https://github.com/acpica/acpica/commit/6bbc6357
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 3c36625deffdfb034378b1793e2ead9c8fdd767e
Changes the resource descriptor parse tree walk to a general
preprocessing walk and calls the Switch conversion code from here.
Move Switch code to new dmswitch.c file. Also improves algorithm to
handle multiple levels of Switch statements and perform legacy
disassembly for older or otherwise non-spec compliant Switch
implementations.
Link: https://github.com/acpica/acpica/commit/3c36625d
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Considering this case:
1. A program opens a sysfs table file 65535 times, it can increase
validation_count and first increment cause the table to be mapped:
validation_count = 65535
2. AML execution causes "Load" to be executed on the same
table, this time it cannot increase validation_count, so
validation_count remains:
validation_count = 65535
3. The program closes sysfs table file 65535 times, it can decrease
validation_count and the last decrement cause the table to be
unmapped:
validation_count = 0
4. AML code still accessing the loaded table, kernel crash can be
observed.
To prevent that from happening, add a validation_count threashold.
When it is reached, the validation_count can no longer be
incremented/decremented to invalidate the table descriptor (means
preventing table unmappings)
Note that code added in acpi_tb_put_table() is actually a no-op but
changes the warning message into a "warn once" one. Lv Zheng.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog, comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull IOMMU fixes from Joerg Roedel:
- another compile-fix for my header cleanup
- a couple of fixes for the recently merged IOMMU probe deferal code
- fixes for ACPI/IORT code necessary with IOMMU probe deferal
* tag 'iommu-fixes-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
arm: dma-mapping: Reset the device's dma_ops
ACPI/IORT: Move the check to get iommu_ops from translated fwspec
ARM: dma-mapping: Don't tear down third-party mappings
ACPI/IORT: Ignore all errors except EPROBE_DEFER
iommu/of: Ignore all errors except EPROBE_DEFER
iommu/of: Fix check for returning EPROBE_DEFER
iommu/dma: Fix function declaration
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".
Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.
This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].
The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Revert commit eed4d47efe (ACPI / sleep: Ignore spurious SCI wakeups
from suspend-to-idle) as it turned out to be premature and triggered
a number of different issues on various systems.
That includes, but is not limited to, premature suspend-to-RAM aborts
on Dell XPS 13 (9343) reported by Dominik.
The issue the commit in question attempted to address is real and
will need to be taken care of going forward, but evidently more work
is needed for this purpose.
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit da28e1955d (ACPICA: Disassembler: Enhance resource
descriptor detection) as it is based on an assumption that doesn't
hold all the time and causes problems to happen because of that.
Reported-by: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
With IOMMU probe deferral, iort_iommu_configure can be called
multiple times for the same device. Hence we have a check
to see if the device's fwspec is already translated and return
the iommu_ops from that directly. But the check is wrongly
placed in iort_iommu_xlate, which breaks devices with multiple
sids. Move the check to iort_iommu_configure.
Fixes: 5a1bb638d5 ("drivers: acpi: Handle IOMMU lookup failure with deferred probing or error")
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While deferring the probe of IOMMU masters, xlate and
add_device callbacks called from iort_iommu_configure
can pass back error values like -ENODEV, which means
the IOMMU cannot be connected with that master for real
reasons. Before the IOMMU probe deferral, all such errors
were ignored. Now all those errors are propagated back,
killing the master's probe for such errors. Instead ignore
all the errors except EPROBE_DEFER, which is the only one
of concern and let the master work without IOMMU, thus
restoring the old behavior. Also make explicit that
acpi_dma_configure handles only -EPROBE_DEFER from
iort_iommu_configure.
Fixes: 5a1bb638d5 ("drivers: acpi: Handle IOMMU lookup failure with deferred probing or error")
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In the Linux kernel, acpi_get_table() "clones" haven't been fully
balanced by acpi_put_table() invocations. In upstream ACPICA, due to
the design change, there are also unbalanced acpi_get_table_by_index()
invocations requiring special care.
acpi_get_table() reference counting mismatches may occor due to that
and printing error messages related to them is not useful at this
point. The strict balanced validation count check should only be
enabled after confirming that all invocations are safe and aligned
with their designed purposes.
Thus this patch removes the error value returned by acpi_tb_get_table()
in that case along with the accompanying error message to fix the
issue.
Fixes: 174cc7187e (ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel)
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Reported-by: Anush Seetharaman <anush.seetharaman@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit 77e9a4aa9d (ACPI / button: Change default behavior to
lid_init_state=open) which changed the kernel's behavior on laptops
that boot with closed lids and expect the lid switch state to be
reported accurately by the kernel.
If you boot or resume your laptop with the lid closed on a docking
station while using an external monitor connected to it, both internal
and external displays will light on, while only the external should.
There is a design choice in gdm to only provide the greeter on the
internal display when lit on, so users only see a gray area on the
external monitor. Also, the cursor will not show up as it's by
default on the internal display too.
To "fix" that, users have to open the laptop once and close it once
again to sync the state of the switch with the hardware state.
Even if the "method" operation mode implementation can be buggy on
some platforms, the "open" choice is worse. It breaks docking
stations basically and there is no way to have a user-space hwdb to
fix that.
On the contrary, it's rather easy in user-space to have a hwdb
with the problematic platforms. Then, libinput (1.7.0+) can fix
the state of the lid switch for us: you need to set the udev
property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
When libinput detects internal keyboard events, it will overwrite the
state of the switch to open, making it reliable again. Given that
logind only checks the lid switch value after a timeout, we can
assume the user will use the internal keyboard before this timeout
expires.
For example, such a hwdb entry is:
libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380
Cc: 4.11+ <stable@vger.kernel.org> # 4.11+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull RAS fixes from Thomas Gleixner:
"Two fixlets for RAS:
- Export memory_error() so the NFIT module can utilize it
- Handle memory errors in NFIT correctly"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
acpi, nfit: Fix the memory error check in nfit_handle_mce()
x86/MCE: Export memory_error()
With the enhanced CPU hotplug lockdep coverage the following lockdep splat
happens:
======================================================
WARNING: possible circular locking dependency detected
4.12.0-rc2+ #84 Tainted: G W
------------------------------------------------------
cpuhp/1/15 is trying to acquire lock:
flush_work+0x39/0x2f0
but task is already holding lock:
cpuhp_thread_fun+0x30/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (cpuhp_state){+.+.+.}:
lock_acquire+0xb4/0x200
cpuhp_kick_ap_work+0x72/0x330
_cpu_down+0x8b/0x100
do_cpu_down+0x3e/0x60
cpu_down+0x10/0x20
cpu_subsys_offline+0x14/0x20
device_offline+0x88/0xb0
online_store+0x4c/0xa0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x156/0x1e0
__vfs_write+0x37/0x160
vfs_write+0xca/0x1c0
SyS_write+0x58/0xc0
entry_SYSCALL_64_fastpath+0x23/0xc2
-> #1 (cpu_hotplug_lock.rw_sem){++++++}:
lock_acquire+0xb4/0x200
cpus_read_lock+0x3d/0xb0
apply_workqueue_attrs+0x17/0x50
__alloc_workqueue_key+0x1e1/0x530
scsi_host_alloc+0x373/0x480 [scsi_mod]
ata_scsi_add_hosts+0xcb/0x130 [libata]
ata_host_register+0x11a/0x2c0 [libata]
ata_host_activate+0xf0/0x150 [libata]
ahci_host_activate+0x13e/0x170 [libahci]
ahci_init_one+0xa3a/0xd3f [ahci]
local_pci_probe+0x45/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x1f9/0x690
worker_thread+0x200/0x3d0
kthread+0x138/0x170
ret_from_fork+0x31/0x40
-> #0 ((&wfc.work)){+.+.+.}:
__lock_acquire+0x11e1/0x13e0
lock_acquire+0xb4/0x200
flush_work+0x5c/0x2f0
work_on_cpu+0xa1/0xd0
acpi_processor_get_throttling+0x3d/0x50
acpi_processor_reevaluate_tstate+0x2c/0x50
acpi_soft_cpu_online+0x69/0xd0
cpuhp_invoke_callback+0xb4/0x8b0
cpuhp_up_callbacks+0x36/0xc0
cpuhp_thread_fun+0x14e/0x160
smpboot_thread_fn+0x1e8/0x300
kthread+0x138/0x170
ret_from_fork+0x31/0x40
other info that might help us debug this:
Chain exists of:
(&wfc.work) --> cpu_hotplug_lock.rw_sem --> cpuhp_state
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpuhp_state);
lock(cpu_hotplug_lock.rw_sem);
lock(cpuhp_state);
lock((&wfc.work));
*** DEADLOCK ***
1 lock held by cpuhp/1/15:
cpuhp_thread_fun+0x30/0x160
stack backtrace:
CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G W 4.12.0-rc2+ #84
Hardware name: Supermicro SYS-4048B-TR4FT/X10QBi, BIOS 1.1a 07/29/2015
Call Trace:
dump_stack+0x85/0xc4
print_circular_bug+0x209/0x217
__lock_acquire+0x11e1/0x13e0
lock_acquire+0xb4/0x200
? lock_acquire+0xb4/0x200
? flush_work+0x39/0x2f0
? acpi_processor_start+0x50/0x50
flush_work+0x5c/0x2f0
? flush_work+0x39/0x2f0
? acpi_processor_start+0x50/0x50
? mark_held_locks+0x6d/0x90
? queue_work_on+0x56/0x90
? trace_hardirqs_on_caller+0x154/0x1c0
? trace_hardirqs_on+0xd/0x10
? acpi_processor_start+0x50/0x50
work_on_cpu+0xa1/0xd0
? find_worker_executing_work+0x50/0x50
? acpi_processor_power_exit+0x70/0x70
acpi_processor_get_throttling+0x3d/0x50
acpi_processor_reevaluate_tstate+0x2c/0x50
acpi_soft_cpu_online+0x69/0xd0
cpuhp_invoke_callback+0xb4/0x8b0
? lock_acquire+0xb4/0x200
? padata_replace+0x120/0x120
cpuhp_up_callbacks+0x36/0xc0
cpuhp_thread_fun+0x14e/0x160
smpboot_thread_fn+0x1e8/0x300
kthread+0x138/0x170
? sort_range+0x30/0x30
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x31/0x40
The problem is that the work is scheduled on the current CPU from the
hotplug thread associated with that CPU.
It's not required to invoke these functions via the workqueue because the
hotplug thread runs on the target CPU already.
Check whether current is a per cpu thread pinned on the target CPU and
invoke the function directly to avoid the workqueue.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-acpi@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170524081549.620489733@linutronix.de
Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem
unearthed a circular lock dependency which was hidden from lockdep due to
the lockdep annotation of get_online_cpus() which prevents lockdep from
creating full dependency chains.
CPU0 CPU1
---- ----
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
This dependency is established via acpi_processor_start() which calls into
the work queue code. And the work queue code establishes the reverse
dependency.
This is not a problem of get_online_cpus() recursion, it's a possible
deadlock undetected by lockdep so far.
The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to
protect the probing from acpi_processor_start().
There is a side effect to this: cpu_hotplug_disable() makes a concurrent
cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but that
probing usually happens during the boot process where no interaction is
possible. Any later invocations are infrequent enough and concurrent
hotplug attempts are so unlikely that the danger of user space visible
regressions is very close to zero. Anyway, thats preferrable over a real
deadlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-acpi@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170524081548.851588594@linutronix.de