If cur_state for the powerclamp cooling device is set to the default
minimum state of 0, without setting first to cur_state > 0, this results
in NULL pointer access.
This NULL pointer access happens in the powercap core idle-inject
function idle_inject_set_duration() as there is no NULL check for
idle_inject_device pointer. This pointer must be allocated by calling
idle_inject_register() or idle_inject_register_full().
In the function powerclamp_set_cur_state(), idle_inject_device pointer
is allocated only when the cur_state > 0. But setting 0 without changing
to any other state, idle_inject_set_duration() will be called with a
NULL idle_inject_device pointer.
To address this, just return from powerclamp_set_cur_state() if the
current cooling device state is the same as the last one. Since the
power-up default cooling device state is 0, changing the state to 0
again here will return without calling idle_inject_set_duration().
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 8526eb7fc7 ("thermal: intel: powerclamp: Use powercap idle-inject feature")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217386
Tested-by: Risto A. Paju <teknohog@iki.fi>
Cc: 6.3+ <stable@kernel.org> # 6.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more thermal control updates from Rafael Wysocki:
"These are mostly cleanups on top of the previously merged thermal
control changes plus some driver fixes and the removal of the Intel
Menlow thermal driver.
Specifics:
- Add compatible DT bindings for imx6sll and imx6ul to fix a dtbs
check warning (Stefan Wahren)
- Update the example in the DT bindings to reflect changes with the
ADC node name for QCom TM and TM5 (Marijn Suijten)
- Fix comments for the cpuidle_cooling_register() function to match
the function prototype (Chenggang Wang)
- Fix inconsistent temperature read and some Mediatek variant board
reboot by reverting a change and handling the temperature
differently (AngeloGioacchino Del Regno)
- Fix a memory leak in the initialization error path for the Mediatek
driver (Kang Chen)
- Use of_address_to_resource() in the Mediatek driver (Rob Herring)
- Fix unit address in the QCom tsens driver DT bindings (Krzysztof
Kozlowski)
- Clean up the step-wise thermal governor (Zhang Rui)
- Introduce thermal_zone_device() for accessing the device field of
struct thermal_zone_device and two drivers use it (Daniel Lezcano)
- Clean up the ACPI thermal driver a bit (Daniel Lezcano)
- Delete the thermal driver for Intel Menlow platforms that is not
expected to have any users (Rafael Wysocki)"
* tag 'thermal-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: menlow: Get rid of this driver
ACPI: thermal: Move to dedicated function sysfs extra attr creation
ACPI: thermal: Use thermal_zone_device()
thermal: intel: pch_thermal: Use thermal driver device to write a trace
thermal: core: Encapsulate tz->device field
thermal: gov_step_wise: Adjust code logic to match comment
thermal: gov_step_wise: Delete obsolete comment
dt-bindings: thermal: qcom-tsens: Correct unit address
thermal/drivers/mediatek: Use of_address_to_resource()
thermal/drivers/mediatek: Change clk_prepare_enable to devm_clk_get_enabled in mtk_thermal_probe
thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe
thermal/drivers/mediatek: Add temperature constraints to validate read
Revert "thermal/drivers/mediatek: Add delay after thermal banks initialization"
thermal/drivers/cpuidle_cooling: Delete unmatched comments
dt-bindings: thermal: Use generic ADC node name in examples
dt-bindings: imx-thermal: Add imx6sll and imx6ul compatible
Merge additional thermal core and ACPI thermal changes for 6.4-rc1:
- Clean up the step-wise thermal governor (Zhang Rui).
- Introduce thermal_zone_device() for accessing the device field of
struct thermal_zone_device and two drivers use it (Daniel Lezcano).
- Clean up the ACPI thermal driver a bit (Daniel Lezcano).
- Delete the thermal driver for Intel Menlow platforms that is not
expected to have any users (Rafael Wysocki).
* thermal-core:
thermal: intel: menlow: Get rid of this driver
ACPI: thermal: Move to dedicated function sysfs extra attr creation
ACPI: thermal: Use thermal_zone_device()
thermal: intel: pch_thermal: Use thermal driver device to write a trace
thermal: core: Encapsulate tz->device field
thermal: gov_step_wise: Adjust code logic to match comment
thermal: gov_step_wise: Delete obsolete comment
According to my information, there are no active users of this driver in
the field.
Moreover, it does some really questionable things and gets in the way of
thermal core improvements, so drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The pch_critical() callback accesses the thermal zone device structure
internals, it dereferences the thermal zone struct device and the 'type'.
Use the available accessors instead of accessing the structure directly.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are still some drivers needing to play with the thermal zone
device internals. That is not the best but until we can figure out if
the information is really needed, let's encapsulate the field used in
the thermal zone device structure, so we can move forward relocating
the thermal zone device structure definition in the thermal framework
private headers.
Some drivers are accessing tz->device, that implies they need to have
the knowledge of the thermal_zone_device structure but we want to
self-encapsulate this structure and reduce the scope of the structure
to the thermal core only.
By adding this wrapper, these drivers won't need the thermal zone
device structure definition and are no longer an obstacle to its
relocation to the private thermal core headers.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For the algorithm of choosing the next target state in step_wise
governor, the code does the right thing but is implemented in a
way different from what the comment describes. And this hurts the code
readability.
As the logic in the comment is simpler, adjust the code logic to align
with the comment.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 4102c4042a ("thermal/core: Remove DROP_FULL and RAISE_FULL")
removes support for THERMAL_TREND_RAISE_FULL/DROP_FULL but leaves the
comment unchanged.
Delete the obsolte comment about THERMAL_TREND_RAISE_FULL/DROP_FULL.
Fixes: 4102c4042a ("thermal/core: Remove DROP_FULL and RAISE_FULL")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more devicetree updates from Rob Herring:
- First part of DT header detangling dropping cpu.h from of_device.h
and replacing some includes with forward declarations. A handful of
drivers needed some adjustment to their includes as a result.
- Refactor of_device.h to be used by bus drivers rather than various
device drivers. This moves non-bus related functions out of
of_device.h. The end goal is for of_platform.h and of_device.h to
stop including each other.
- Refactor open coded parsing of "ranges" in some bus drivers to use DT
address parsing functions
- Add some new address parsing functions of_property_read_reg(),
of_range_count(), and of_range_to_resource() in preparation to
convert more open coded parsing of DT addresses to use them.
- Treewide clean-ups to use of_property_read_bool() and
of_property_present() as appropriate. The ones here are the ones that
didn't get picked up elsewhere.
* tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
bus: tegra-gmi: Replace of_platform.h with explicit includes
hte: Use of_property_present() for testing DT property presence
w1: w1-gpio: Use of_property_read_bool() for boolean properties
virt: fsl: Use of_property_present() for testing DT property presence
soc: fsl: Use of_property_present() for testing DT property presence
sbus: display7seg: Use of_property_read_bool() for boolean properties
sparc: Use of_property_read_bool() for boolean properties
sparc: Use of_property_present() for testing DT property presence
bus: mvebu-mbus: Remove open coded "ranges" parsing
of/address: Add of_property_read_reg() helper
of/address: Add of_range_count() helper
of/address: Add support for 3 address cell bus
of/address: Add of_range_to_resource() helper
of: unittest: Add bus address range parsing tests
of: Drop cpu.h include from of_device.h
OPP: Adjust includes to remove of_device.h
irqchip: loongson-eiointc: Add explicit include for cpuhotplug.h
cpuidle: Adjust includes to remove of_device.h
cpufreq: sun50i: Add explicit include for cpu.h
cpufreq: Adjust includes to remove of_device.h
...
The AUXADC thermal v1 allows reading temperature range between -20°C to
150°C and any value out of this range is invalid.
Add new definitions for MT8173_TEMP_{MIN_MAX} and a new small helper
mtk_thermal_temp_is_valid() to check if new readings are in range: if
not, we tell to the API that the reading is invalid by returning
THERMAL_TEMP_INVALID.
It was chosen to introduce the helper function because, even though this
temperature range is realistically ok for all, it comes from a downstream
kernel driver for version 1, but here we also support v2 and v3 which may
may have wider constraints.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230419061146.22246-3-angelogioacchino.delregno@collabora.com
Some more testing revealed that this commit introduces a regression on some
MT8173 Chromebooks and at least on one MT6795 Sony Xperia M5 smartphone due
to the delay being apparently variable and machine specific.
Another solution would be to delay for a bit more (~70ms) but this is not
feasible for two reasons: first of all, we're adding an even bigger delay
in a probe function; second, some machines need less, some may need even
more, making the msleep at probe solution highly suboptimal.
This reverts commit 10debf8c2d.
Fixes: 10debf8c2d ("thermal/drivers/mediatek: Add delay after thermal banks initialization")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230419061146.22246-2-angelogioacchino.delregno@collabora.com
Add support for DLVR (Digital Linear Voltage Regulator) attributes,
which can be used to control RFIM.
Here instead of "fivr" another directory "dlvr" is created with DLVR
attributes:
/sys/bus/pci/devices/0000:00:04.0/dlvr
├── dlvr_freq_mhz
├── dlvr_freq_select
├── dlvr_hardware_rev
├── dlvr_pll_busy
├── dlvr_rfim_enable
└── dlvr_spread_spectrum_pct
└── dlvr_control_mode
└── dlvr_control_lock
Attributes
dlvr_freq_mhz (RO):
Current DLVR PLL frequency in MHz.
dlvr_freq_select (RW):
Sets DLVR PLL clock frequency.
dlvr_hardware_rev (RO):
DLVR hardware revision.
dlvr_pll_busy (RO):
PLL can't accept frequency change when set.
dlvr_rfim_enable (RW):
0: Disable RF frequency hopping, 1: Enable RF frequency hopping.
dlvr_control_mode (RW):
Specifies how frequencies are spread. 0: Down spread, 1: Spread in Center.
dlvr_control_lock (RW):
1: future writes are ignored.
dlvr_spread_spectrum_pct (RW)
A write to this register updates the DLVR spread spectrum percent value.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more thermal control changes for 6.4-rc1 from Daniel Lezcano:
"- Do preparating cleaning and DT bindings for RK3588 support
(Sebastian Reichel)
- Add driver support for RK3588 (Finley Xiao)
- Use devm_reset_control_array_get_exclusive() for the Rockchip driver
(Ye Xingchen)
- Detect power gated thermal zones and return -EAGAIN when reading the
temperature (Mikko Perttunen)
- Remove thermal_bind_params structure as it is unused (Zhang Rui)
- Drop unneeded quotes in DT bindings allowing to run yamllint (Rob
Herring)
- Update the power allocator documentation according to the thermal
trace relocation (Lukas Bulwahn)
- Fix sensor 1 interrupt status bitmask for the Mediatek LVTS sensor
(Chen-Yu Tsai)
- Use the dev_err_probe() helper in the Amlogic driver (Ye Xingchen)
- Add AP domain support to LVTS thermal controllers for mt8195
(Balsam CHIHI)
- Remove buggy call to thermal_of_zone_unregister() (Daniel Lezcano)
- Make thermal_of_zone_[un]register() private to the thermal OF code
(Daniel Lezcano)
- Create a private copy of the thermal zone device parameters
structure when registering a thermal zone (Daniel Lezcano)"
* tag 'thermal-v6.4-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/core: Alloc-copy-free the thermal zone parameters structure
thermal/of: Unexport unused OF functions
thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister
thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195
dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195
thermal: amlogic: Use dev_err_probe()
thermal/drivers/mediatek/lvts_thermal: Fix sensor 1 interrupt status bitmask
MAINTAINERS: adjust entry in THERMAL/POWER_ALLOCATOR after header movement
dt-bindings: thermal: Drop unneeded quotes
thermal/core: Remove thermal_bind_params structure
thermal/drivers/tegra-bpmp: Handle offline zones
thermal/drivers/rockchip: use devm_reset_control_array_get_exclusive()
dt-bindings: rockchip-thermal: Support the RK3588 SoC compatible
thermal/drivers/rockchip: Support RK3588 SoC in the thermal driver
thermal/drivers/rockchip: Support dynamic sized sensor array
thermal/drivers/rockchip: Simplify channel id logic
thermal/drivers/rockchip: Use dev_err_probe
thermal/drivers/rockchip: Simplify clock logic
thermal/drivers/rockchip: Simplify getting match data
Some older processors don't allow BIT(13) and BIT(15) in the current
mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:
unchecked MSR access error: WRMSR to 0x19c (tried to
write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
(throttle_active_work+0xa6/0x1d0)
To avoid unchecked MSR issues, check CPUID for each relevant feature and
use that information to set the supported feature bits only in the
"clear" mask for cores. Do the same for the analogous package mask set
by "THERM_STATUS_CLEAR_PKG_MASK".
Introduce functions thermal_intr_init_core_clear_mask() and
thermal_intr_init_pkg_clear_mask() to set core and package mask bits,
respectively. These functions are called during initialization.
Fixes: 6fe1e64b60 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.2+ <stable@kernel.org> # 6.2+
[ rjw: Renamed 2 funtions and 2 static variables, edited subject and
changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The caller of the function thermal_zone_device_register_with_trips()
can pass a thermal_zone_params structure parameter.
This one is used by the thermal core code until the thermal zone is
destroyed. That forces the caller, so the driver, to keep the pointer
valid until it unregisters the thermal zone if we want to make the
thermal zone device structure private the core code.
As the thermal zone device structure would be private, the driver can
not access to thermal zone device structure to retrieve the tzp field
after it passed it to register the thermal zone.
So instead of forcing the users of the function to deal with the tzp
structure life cycle, make the usage easier by allocating our own
thermal zone params, copying the parameter content and by freeing at
unregister time. The user can then create the parameters on the stack,
pass it to the registering function and forget about it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230404075138.2914680-3-daniel.lezcano@linaro.org
The driver is using the devm_thermal_of_zone_device_register().
In the error path of the function calling
devm_thermal_of_zone_device_register(), the function
devm_thermal_of_zone_unregister() should be called instead of
thermal_of_zone_unregister(), otherwise this one will be called twice
when the device is freed.
The same happens for the remove function where the devm_ guarantee the
thermal_of_zone_unregister() will be called, so adding this call in
the remove function will lead to a double free also.
Use devm_ variant in the error path of the probe function.
Remove thermal_of_zone_unregister() in the remove function.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230404075138.2914680-1-daniel.lezcano@linaro.org
The binary representation for sensor 1 interrupt status was incorrectly
assembled, when compared to the full table given in the same comment
section. The conversion into hex was also incorrect, leading to
incorrect interrupt status bitmask for sensor 1. This would cause the
driver to incorrectly identify changes for sensor 1, when in fact it
was sensor 0, or a sensor access time out.
Fix the binary and hex representations in the comments, and the actual
bitmask macro.
Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230328031017.1360976-1-wenst@chromium.org
Thermal zones located in power domains may not be accessible when
the domain is powergated. In this situation, reading the temperature
will return -BPMP_EFAULT. When evaluating trips, BPMP will internally
use -256C as the temperature for offline zones.
For smooth operation, for offline zones, return -EAGAIN when reading
the temperature and allow registration of zones even if they are
offline during probe.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230330094904.2589428-1-cyndis@kapsi.fi
Pull thermal control material for 6.4-rc1 from Daniel Lezcano:
"- Add more thermal zone device encapsulation: prevent setting
structure field directly, access the sensor device instead the
thermal zone's device for trace, relocate the traces in
drivers/thermal (Daniel Lezcano)
- Use the generic trip point for the i.MX and remove the get_trip_temp
ops (Daniel Lezcano)
- Use the devm_platform_ioremap_resource() in the Hisilicon driver
(Yang Li)
- Remove R-Car H3 ES1.* handling as public has only access to the ES2
version and the upstream support for the ES1 has been shutdown (Wolfram
Sang)
- Add a delay after initializing the bank in order to let the time to
the hardware to initialze itself before reading the temperature
(Amjad Ouled-Ameur)
- Add MT8365 support (Amjad Ouled-Ameur)"
* tag 'thermal-v6.4-rc1-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/ti: Use fixed update interval
thermal/drivers/stm: Don't set no_hwmon to false
thermal/drivers/db8500: Use driver dev instead of tz->device
thermal/core: Relocate the traces definition in thermal directory
thermal/drivers/hisi: Use devm_platform_ioremap_resource()
thermal/drivers/imx: Use the thermal framework for the trip point
thermal/drivers/imx: Remove get_trip_temp ops
thermal/drivers/rcar_gen3_thermal: Remove R-Car H3 ES1.* handling
thermal/drivers/mediatek: Add delay after thermal banks initialization
thermal/drivers/mediatek: Add support for MT8365 SoC
thermal/drivers/mediatek: Control buffer enablement tweaks
dt-bindings: thermal: mediatek: Add binding documentation for MT8365 SoC
Once thermal_list_lock has been acquired in
__thermal_cooling_device_register(), it is not necessary to drop it
and take it again until all of the thermal zones have been updated,
so change the code accordingly.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the TI thermal driver sets the sensor update interval based
on the polling of the thermal zone. In order to get the polling rate,
the code inspects the thermal zone device structure internals, thus
breaking the self-encapsulation of the thermal framework core
framework.
On the other side, we see the common polling rates set in the device
tree for the platforms using this driver are 500 or 1000 ms.
Setting the polling rate to 250 ms would be far enough to cover the
combination we found in the device tree.
Instead of accessing the thermal zone device structure polling rate,
let's use a common update interval of 250 ms for the driver.
Cc: Keerthy <j-keerthy@ti.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/20230307133735.90772-7-daniel.lezcano@linaro.org
The traces are exported but only local to the thermal core code. On
the other side, the traces take the thermal zone device structure as
argument, thus they have to rely on the exported thermal.h header
file. As we want to move the structure to the private thermal core
header, first we have to relocate those traces to the same place as
many drivers do.
Cc: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307133735.90772-2-daniel.lezcano@linaro.org
The thermal framework provides an API to get the trip related to a
trip point id. We want to consolidate the generic trip points code,
thus preventing the different drivers to deal with the trip points
after they registered them.
The set_trip_temp ops will be changed regarding the above changes but
first we need to rework a bit the different implementation in the
drivers.
The goal is to prevent using the trip id but use a trip point passed
as parameter which will contain all the needed information.
As we don't have the trip point passed as parameter yet, we get the
trip point using the generic trip thermal framewrok APIs and use it to
take exactly the same decisions.
The difference with this change and the previous code is from where we
get the thermal trip point (which is the same).
No functional change intended.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20230309092821.1590586-2-daniel.lezcano@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>