Fixes for v6.5-rc4
Display:
+ Fix to correct the UBWC programming for decoder version 4.3 seen
on SM8550
+ Add the missing flush and fetch bits for DMA4 and DMA5 SSPPs.
+ Fix to drop the unused dpu_core_perf_data_bus_id enum from the code
+ Drop the unused dsi_phy_14nm_17mA_regulators from QCM 2290 DSI cfg.
GPU:
+ Fix warn splat for newer devices without revn
+ Remove name/revn for a690.. we shouldn't be populating these for
newer devices, for consistency, but it slipped through review
+ Fix a6xx gpu snapshot BINDLESS_DATA size (was listed in bytes
instead of dwords, causing AHB faults on a6xx gen4/a660-family)
+ Disallow submit with fence id 0
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9MwCSfiyv8i7yWAsJKYEzCDyzaTx=ujX80Y23rZd9RA@mail.gmail.com
A fence id of zero is expected to be invalid, and is not removed from
the fence_idr table. If userspace is requesting to specify the fence
id with the FENCE_SN_IN flag, we need to reject a zero fence id value.
Fixes: 17154addc5 ("drm/msm: Add MSM_SUBMIT_FENCE_SN_IN")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/549180/
On GFX v9.4.3, compute queue MQD is populated using the values in HQD
persistent state register. Hence don't clear the values on module
unload, instead restore it to the default reset value so that MQD is
initialized correctly during next module load. In particular, preload
flag needs to be set on compute queue MQD, otherwise it could cause
uninitialized values being used at device reset state resulting in EDC.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This error path needs to unlock the "aconnector->handle_mst_msg_ready"
mutex before returning.
Fixes: 4f6d9e38c4 ("drm/amd/display: Add polling method to handle MST reply packet")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Description]
It is not valid to set the WDIVIDER value to 0, so do not
re-write to DISPCLK_WDIVIDER if the current value is 0
(i.e., it is at it's initial value and we have not made any
requests to change DISPCLK yet).
Reviewed-by: Saaem Rizvi <syedsaaem.rizvi@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Report current GFX clock also from average clock value as the original
CurrClock data is not valid/accurate any more as per FW team
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared. If the third call
fails, the memory from the second call should be cleared.
Fixes: b95b539168 ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
An instance of for_each_inst() was not changed to match its new
behaviour and is causing a loop.
v2: remove tmp_mask variable
Fixes: b579ea632f ("drm/amdgpu: Modify for_each_inst macro")
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This requires a bit of background. Properly done a modeset driver's
unload/remove sequence should be
drm_dev_unplug();
drm_atomic_helper_shutdown();
drm_dev_put();
The trouble is that the drm_dev_unplugged() checks are by design racy,
they do not synchronize against all outstanding ioctl. This is because
those ioctl could block forever (both for modeset and for driver
specific ioctls), leading to deadlocks in hotunplug. Instead the code
sections that touch the hardware need to be annotated with
drm_dev_enter/exit, to avoid accessing hardware resources after the
unload/remove has finished.
To avoid use-after-free issues all the involved userspace visible
objects are supposed to hold a reference on the underlying drm_device,
like drm_file does.
The issue now is that we missed one, the atomic modeset ioctl can be run
in a nonblocking fashion, and in that case it cannot rely on the implied
drm_device reference provided by the ioctl calling context. This can
result in a use-after-free if an nonblocking atomic commit is carefully
raced against a driver unload.
Fix this by unconditionally grabbing a drm_device reference for any
drm_atomic_state structures. Strictly speaking this isn't required for
blocking commits and TEST_ONLY calls, but it's the simpler approach.
Thanks to shanzhulig for the initial idea of grabbing an unconditional
reference, I just added comments, a condensed commit message and fixed a
minor potential issue in where exactly we drop the final reference.
Reported-by: shanzhulig <shanzhulig@gmail.com>
Suggested-by: shanzhulig <shanzhulig@gmail.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent code set xcp_id stored from file private data when opening
device to amdgpu bo for accounting memory usage etc, but not all
VMs are attached to this fpriv structure like the vm cases in
amdgpu_mes_self_test, otherwise, KASAN will complain below out
of bound access. And more importantly, VM code should not touch
fpriv structure, so drop fpriv code handling from amdgpu_vm_pt.
[ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069
[ 77.294146] Call Trace:
[ 77.294178] <TASK>
[ 77.294208] dump_stack_lvl+0x49/0x63
[ 77.294260] print_report+0x16f/0x4a6
[ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200
[ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[ 77.297556] kasan_report+0xb4/0x130
[ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[ 77.299202] __asan_load4+0x6f/0x90
[ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu]
[ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu]
[ 77.303721] ? preempt_count_sub+0x18/0xc0
[ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu]
[ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu]
[ 77.306683] ? kasan_set_track+0x25/0x30
[ 77.306737] ? kasan_save_alloc_info+0x1b/0x30
[ 77.306795] ? __kasan_kmalloc+0x87/0xa0
[ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu]
v2: without specifying xcp partition for PD/PT bo, the xcp id is -1.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686
Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
file_priv needs to be setup firstly, otherwise, root PD
will always be allocated on partition 0, even if opening
the device from other partitions.
Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Specific TBT4 dock doesn't send out short HPD to notify source
that IRQ event DOWN_REP_MSG_RDY is set. Which violates the spec
and cause source can't send out streams to mst sinks.
[How]
To cover this misbehavior, add an additional polling method to detect
DOWN_REP_MSG_RDY is set. HPD driven handling method is still kept.
Just hook up our handler to drm mgr->cbs->poll_hpd_irq().
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jerry Zuo <jerry.zuo@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following errors & warnings reported by checkpatch:
ERROR: space required before the open brace '{'
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxW)
ERROR: else should follow close brace '}'
ERROR: open brace '{' following function definitions go on the next line
ERROR: code indent should use tabs where possible
WARNING: braces {} are not necessary for single statement blocks
WARNING: void function return statements are not generally useful
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Up until now, amdgpu was silently degrading to vsync when
user-space requested an async flip but the hardware didn't support
it.
The hardware doesn't support immediate flips when the update changes
the FB pitch, the DCC state, the rotation, enables or disables CRTCs
or planes, etc. This is reflected in the dm_crtc_state.update_type
field: UPDATE_TYPE_FAST means that immediate flip is supported.
Silently degrading async flips to vsync is not the expected behavior
from a uAPI point-of-view. Xorg expects async flips to fail if
unsupported, to be able to fall back to a blit. i915 already behaves
this way.
This patch aligns amdgpu with uAPI expectations and returns a failure
when an async flip is not possible.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
v3: drop warn printing (Christian)
v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)
Fixes: 84ec374bd5 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
DCN301 does not have FAMS hence the workaround needed on other DCN3x
variants related to OTG min/max selector programming is not applicable for it.
Hence isolate it and have it use the old sequence without workaround.
Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Make a few functions non static so that they can be reused for other
asic. This is in preparation for separating out OTG programming sequence
for DCN301
Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU7 does a check if the dGPU is inserted into a Rocket Lake system,
to turn off DPM. Extend this check to all systems that have problems
with dynamic switching by using the
amdgpu_device_pcie_dynamic_switching_supported() helper.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In an error path where the submit is free'd without the job being run,
the hw_fence pointer is simply a kzalloc'd block of memory. In this
case we should just kfree() it, rather than trying to decrement it's
reference count. Fortunately we can tell that this is the case by
checking for a zero refcount, since if the job was run, the submit would
be holding a reference to the hw_fence.
Fixes: f94e6a51e1 ("drm/msm: Pre-allocate hw_fence")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/547088/