When invoke memfd_pin_folios, we need offer an array to save each folio
which we pinned.
The current way is dynamic alloc an array(use kvmalloc), get folios,
save into udmabuf and then free.
Depend on the size, kvmalloc can do something different:
Below PAGE_SIZE, slab allocator will be used, which have good alloc
performance, due to it cached page.
PAGE_SIZE - PCP Order, PCP(per-cpu-pageset) also given buddy page a
cache in each CPU, so different CPU no need to hold some lock(zone or
some) to get the locally page. If PCP cached page, the access also fast.
PAGE_SIZE - BUDDY_MAX, try to get page from buddy, due to kvmalloc adjusted
the gfp flags, if zone freelist can't alloc page(fast path), we will not
enter slowpath to reclaim memory. Due to need hold lock and check, may
slow, but still fast than vmalloc.
Anything wrong will fallback into vmalloc to alloc memory, it obtains
contiguous virtual addresses by loop alloc order 0 page(PAGE_SIZE), and
then map it into vmalloc area. If necessary, page alloc may enter
slowpath to reclaim memory. Hence, if fallback into vmalloc, it's slow.
When create, we need to iter each udmabuf item, then pin it's range
folios, if each item's range folio's count is large, we may fallback each
into vmalloc.
This patch find the largest range folio in items, then alloc this size's
folio array. When pin range folios, reuse this array.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-8-link@vivo.com
Currently, udmabuf handles folio by create an unpin list to record
each folio obtained from the list and unpinning them when released. To
maintain this, many struct have been established.
However, maintain this requires a significant amount of memory and
iter the list is a substantial overhead, which is not friendly to the
CPU cache.
When create, we arranged the folio array in the order of pin and set
the offset according to pgcnt. So, if record each pinned folio when
create, then can easy unpin it. Compare to use list to record it,
an array also can do this.
Hence, this patch setup a pinned_folios array(size is the pgcnt) to
instead of udmabuf_folio struct, it record each folio which pinned when
invoke memfd_pin_folios, then unpin folio by iter pinned_folios.
Note that, since a folio may be pinned multiple times, each folio can be
added to pinned_folios multiple times, depend on how many times the
folio has been pinned when create.
Compare to udmabuf_folio(24 byte size), a folio pointer is 8 byte, if no
large folio - each folio is PAGE_SIZE - and need to unpin when release.
So need to record each folio, by this patch, each folio can save 16 byte.
But if large folio used, depend on the large folio's number, the
pinned_folios array may take more memory, but it still can makes unpin
access more cache-friendly.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-7-link@vivo.com
After udmabuf is allocated, its resources need to be initialized,
including various array structures. The current array structure has
already been greatly expanded.
Also, before udmabuf needs to be kfree, the occupied resources need to
be released.
This part is repetitive and maybe overlooked.
This patch give a helper function when init and deinit, by this,
reduce duplicate code.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-6-link@vivo.com
Currently vmap_udmabuf set page's array by each folio.
But, ubuf->folios is only contain's the folio's head page.
That mean we repeatedly mapped the folio head page to the vmalloc area.
Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
so, we can't use page array to map, instead, use pfn array.
By this, we removed page usage in udmabuf totally.
Fixes: 5e72b2b41a ("udmabuf: convert udmabuf driver to use folios")
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-4-link@vivo.com
When PAGE_SIZE 4096, MAX_PAGE_ORDER 10, 64bit machine,
page_alloc only support 4MB.
If above this, trigger this warn and return NULL.
udmabuf can change size limit, if change it to 3072(3GB), and then alloc
3GB udmabuf, will fail create.
[ 4080.876581] ------------[ cut here ]------------
[ 4080.876843] WARNING: CPU: 3 PID: 2015 at mm/page_alloc.c:4556 __alloc_pages+0x2c8/0x350
[ 4080.878839] RIP: 0010:__alloc_pages+0x2c8/0x350
[ 4080.879470] Call Trace:
[ 4080.879473] <TASK>
[ 4080.879473] ? __alloc_pages+0x2c8/0x350
[ 4080.879475] ? __warn.cold+0x8e/0xe8
[ 4080.880647] ? __alloc_pages+0x2c8/0x350
[ 4080.880909] ? report_bug+0xff/0x140
[ 4080.881175] ? handle_bug+0x3c/0x80
[ 4080.881556] ? exc_invalid_op+0x17/0x70
[ 4080.881559] ? asm_exc_invalid_op+0x1a/0x20
[ 4080.882077] ? udmabuf_create+0x131/0x400
Because MAX_PAGE_ORDER, kmalloc can max alloc 4096 * (1 << 10), 4MB
memory, each array entry is pointer(8byte), so can save 524288 pages(2GB).
Further more, costly order(order 3) may not be guaranteed that it can be
applied for, due to fragmentation.
This patch change udmabuf array use kvmalloc_array, this can fallback
alloc into vmalloc, which can guarantee allocation for any size and does
not affect the performance of kmalloc allocations.
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-3-link@vivo.com
The current udmabuf mmap only fills the physical memory to the
corresponding virtual address when the user actually accesses the
virtual address.
However, the current udmabuf has already obtained and pinned the folio
upon completion of the creation.This means that the physical memory has
already been acquired, rather than being accessed dynamically.
As a result, the page fault has lost its purpose as a demanding
page. Due to the fact that page fault requires trapping into kernel mode
and filling in when accessing the corresponding virtual address in mmap,
when creating a large size udmabuf, this represents a considerable
overhead.
This patch fill the pfn into page table, and then pre-fault each pfn
into vma, when first access.
Notice, if anything wrong , we do not return an error during this
pre-fault step. However, an error will be returned if the failure occurs
when the addr is truly accessed
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-2-link@vivo.com
Condsider the following call sequence:
/* Upper layer */
dma_fence_begin_signalling();
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();
...
The driver might here use a utility that is annotated as intended for the
dma-fence signalling critical path. Now if the upper layer isn't correctly
annotated yet for whatever reason, resulting in
/* Upper layer */
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();
We will receive a false lockdep locking order violation notification from
dma_fence_begin_signalling(). However entering a dma-fence signalling
critical section itself doesn't block and could not cause a deadlock.
So use a successful read_trylock() annotation instead for
dma_fence_begin_signalling(). That will make sure that the locking order
is correctly registered in the first case, and doesn't register any
locking order in the second case.
The alternative is of course to make sure that the "Upper layer" is always
correctly annotated. But experience shows that's not easily achievable
in all cases.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230428125233.228353-1-thomas.hellstrom@linux.intel.com
On RK3399 the VOPL is loaded before VOPB and get registered as crtc-0.
However, on RK3288 and PX30 VOPB is gets registered as crtc-0 instead of
VOPL.
With VOPL registered as crtc-0 the kernel kms client is not able to
enable 4K display modes for console use on RK3399.
Load VOPB before VOPL to help kernel kms client make use of 4K display
modes for console use on RK3399.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240908145511.3331451-8-jonas@kwiboo.se
The previous tables for mpll_cfg and curr_ctrl were created using the
20-pages of example settings provided by the PHY vendor. Those
example settings weren't particularly dense, so there were places
where we were guessing what the settings would be for 10-bit and
12-bit (not that we use those anyway). It was also always a lot of
extra work every time we wanted to add a new clock rate since we had
to cross-reference several tables.
In <https://crrev.com/c/285855> I've gone through the work to figure
out how to generate this table automatically. Let's now use the
automatically generated table and then we'll never need to look at it
again.
We only support 8-bit mode right now and only support a small number
of clock rates and I've verified that the only 8-bit rate that was
affected was 148.5. That mode appears to have been wrong in the old
table.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240908145511.3331451-6-jonas@kwiboo.se
Dut to the high HDMI signal voltage driver, Mickey have meet
a serious RF/EMI problem, so we decided to reduce HDMI signal
voltage to a proper value.
The default params for phy is cklvl = 20 & txlvl = 13 (RF/EMI failed)
ck: lvl = 13, term=100, vlo = 2.71, vhi=3.14, vswing = 0.43
tx: lvl = 20, term=100, vlo = 2.81, vhi=3.16, vswing = 0.35
1. We decided to reduce voltage value to lower, but VSwing still
keep high, RF/EMI have been improved but still failed.
ck: lvl = 6, term=100, vlo = 2.61, vhi=3.11, vswing = 0.50
tx: lvl = 6, term=100, vlo = 2.61, vhi=3.11, vswing = 0.50
2. We try to keep voltage value and vswing both lower, then RF/EMI
test all passed ;)
ck: lvl = 11, term= 66, vlo = 2.68, vhi=3.09, vswing = 0.40
tx: lvl = 11, term= 66, vlo = 2.68, vhi=3.09, vswing = 0.40
When we back to run HDMI different test and single-end test, we see
different test passed, but signle-end test failed. The oscilloscope
show that simgle-end clock's VL value is 1.78v (which remind LowLimit
should not lower then 2.6v).
3. That's to say there are some different between PHY document and
measure value. And according to experiment 2 results, we need to
higher clock voltage and lower data voltage, then we can keep RF/EMI
satisfied and single-end & differen test passed.
ck: lvl = 9, term=100, vlo = 2.65, vhi=3.12, vswing = 0.47
tx: lvl = 16, term=100, vlo = 2.75, vhi=3.15, vswing = 0.39
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240908145511.3331451-3-jonas@kwiboo.se
RK3228 and RK3328 clock rate is being validated against a mpll config
table intended for a Synopsys phy, and not the used inno-hdmi-phy.
Instead get a reference to the hdmiphy clk and validate rates against
it to enable use of HDMI2.0 modes, e.g. 4K@60Hz, on RK3228 and RK3328.
For Synopsis phy the max_tmds_clock validation is sufficient.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Tested-by: Diederik de Haas <didi.debian@cknow.org> # Rock64
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240908145511.3331451-2-jonas@kwiboo.se
sparse reports:
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:122:16: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:122:16: expected void const volatile [noderef] __iomem *addr
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:122:16: got unsigned int [usertype] *wa_dma_data
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:130:9: warning: incorrect type in argument 2 (different address spaces)
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:130:9: expected void volatile [noderef] __iomem *addr
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:130:9: got unsigned int [usertype] *wa_dma_data
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:414:9: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:414:9: expected void const volatile [noderef] __iomem *addr
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c:414:9: got unsigned int *
These come from pieces of code which do essentially:
p = dma_alloc_coherent()
dma_transfer_to_p()
readl(p)
writel(x, p)
dma_transfer_from_p()
I think we would do just fine without readl() and writel(), accessing
the memory without any extras, but ensuring that the necessary barriers
are in place. But this code is for a legacy platform, has been working
for ages, and it's doing work-arounds for hardware issues, and those
hardware issues are very difficult to trigger... So I would just rather
leave the code be as it is now.
However, the warnings are not nice. Hide the warnings by a (__iomem void
*) typecast.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407311737.VsJ0Sr1w-lkp@intel.com/
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806-omapdrm-misc-fixes-v1-2-15d31aea0831@ideasonboard.com
smatch reports:
drivers/gpu/drm/omapdrm/dss/base.c:176 omapdss_device_disconnect() error: we previously assumed 'src' could be null (see line 169)
This code is mostly from a time when omapdrm had its own display device
model. I can't honestly remember the details, and I don't think it's
worth digging in deeply into that for a legacy driver.
However, it looks like we only call omapdss_device_disconnect() and
omapdss_device_connect() with NULL as the src parameter. We can thus
drop the src parameter from both functions, and fix the smatch warning.
I don't think omapdss_device_disconnect() ever gets NULL for the dst
parameter (if it did, we'd crash soon after returning from the
function), but I have kept the !dst check, just in case, but I added a
WARN_ON() there.
Also, if the dst parameter can be NULL, we can't always get the struct
dss_device pointer from dst->dss (which is only used for a debug print).
To make sure we can't hit that issue, do it similarly to the
omapdss_device_connect() function: add 'struct dss_device *dss' as the
first parameter, so that we always have it regardless of the dst.
Fixes: 79107f274b ("drm/omap: Add support for drm_bridge")
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806-omapdrm-misc-fixes-v1-1-15d31aea0831@ideasonboard.com
Inline ast_vga_connector_init() into its only caller. The helper
currently only does half of the connector-init work and is trivial
enough to be inlined. While at it, remove the error message from the
call to ast_ddc_create(). The function already warns on errors.
Also set the local variables for encoder and connector as late as
possible, so that the compiler warns if we use them before having
initialized the instance.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-9-tzimmermann@suse.de
Inline ast_sil164_connector_init() into its only caller. The helper
currently only does half of the connector-init work and is trivial
enough to be inlined. While at it, remove the error message from the
call to ast_ddc_create(). The function already warns on errors.
Also set the local variables for encoder and connector as late as
possible, so that the compiler warns if we use them before having
initialized the instance.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-8-tzimmermann@suse.de
Inline ast_dp501_connector_init() into its only caller. The helper
currently only does half of the connector-init work and is trivial
enough to be inlined.
Also set the local variables for encoder and connector as late as
possible, so that the compiler warns if we use them before having
initialized the instance.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-6-tzimmermann@suse.de
Replace ast_dp_set_on_off() with ast_dp_set_enable(). The helper's
new name reflects the performed operation. If enabling fails, the
new helper prints a warning. The code that waits for the programmed
effect to take place is now located in __ast_dp_wait_enable().
Also align the register constants with the rest of the code.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-5-tzimmermann@suse.de
Replace the helper for controlling power on the physical connector,
ast_dp_power_on_off(), with ast_dp_set_phy_sleep(). The new name
reflects the effect of the operation. Simplify the implementation.
The call now controls sleeping, hence semantics are inversed. Each
'on' becomes an 'off' operation and vice versa.
Do the same for ast_dp_power_is_on() and also align naming of the
register constant with the rest of the code.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-4-tzimmermann@suse.de
Inline ast_astdp_connector_init() into its only caller. The helper
currently only does half of the connector-init work and is trivial
enough to be inlined.
Also set the local variables for encoder and connector as late as
possible, so that the compiler warns if we use them before having
initialized the instance.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911115347.899148-2-tzimmermann@suse.de