Go to file
Tong Tiangen d256d1cd8d mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs()
We found a softlock issue in our test, analyzed the logs, and found that
the relevant CPU call trace as follows:

CPU0:
  _do_fork
    -> copy_process()
      -> write_lock_irq(&tasklist_lock)  //Disable irq,waiting for
      					 //tasklist_lock

CPU1:
  wp_page_copy()
    ->pte_offset_map_lock()
      -> spin_lock(&page->ptl);        //Hold page->ptl
    -> ptep_clear_flush()
      -> flush_tlb_others() ...
        -> smp_call_function_many()
          -> arch_send_call_function_ipi_mask()
            -> csd_lock_wait()         //Waiting for other CPUs respond
	                               //IPI

CPU2:
  collect_procs_anon()
    -> read_lock(&tasklist_lock)       //Hold tasklist_lock
      ->for_each_process(tsk)
        -> page_mapped_in_vma()
          -> page_vma_mapped_walk()
	    -> map_pte()
              ->spin_lock(&page->ptl)  //Waiting for page->ptl

We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2
unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result,
softlockup is triggered.

For collect_procs_anon(), what we're doing is task list iteration, during
the iteration, with the help of call_rcu(), the task_struct object is freed
only after one or more grace periods elapse. the logic as follows:

release_task()
  -> __exit_signal()
    -> __unhash_process()
      -> list_del_rcu()

  -> put_task_struct_rcu_user()
    -> call_rcu(&task->rcu, delayed_put_task_struct)

delayed_put_task_struct()
  -> put_task_struct()
  -> if (refcount_sub_and_test())
     	__put_task_struct()
          -> free_task()

Therefore, under the protection of the rcu lock, we can safely use
get_task_struct() to ensure a safe reference to task_struct during the
iteration.

By removing the use of tasklist_lock in task list iteration, we can break
the softlock chain above.

The same logic can also be applied to:
 - collect_procs_file()
 - collect_procs_fsdax()
 - collect_procs_ksm()

Link: https://lkml.kernel.org/r/20230828022527.241693-1-tongtiangen@huawei.com
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-05 11:11:52 -07:00
Documentation More thermal control updates for 6.6-rc1 2023-09-04 15:17:28 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch ARC updates for v6.6 2023-09-04 15:38:24 -07:00
block for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto This update includes the following changes: 2023-08-29 11:23:29 -07:00
drivers More power management updates for 6.6-rc1 2023-09-04 15:21:55 -07:00
fs f2fs update for 6.6-rc1 2023-09-02 15:37:59 -07:00
include More power management updates for 6.6-rc1 2023-09-04 15:21:55 -07:00
init workqueue: Changes for v6.6 2023-09-01 16:06:32 -07:00
io_uring for-6.6/io_uring-2023-08-28 2023-08-29 20:11:33 -07:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel printk changes for 6.6 2023-09-04 13:20:19 -07:00
lib printk changes for 6.6 2023-09-04 13:20:19 -07:00
mm mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
net TTY/Serial driver changes for 6.6-rc1 2023-09-01 09:38:00 -07:00
rust Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
samples VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
scripts Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
security Three cleanup patches, no behavior changes. 2023-09-04 10:38:35 -07:00
sound This pull request contains the following changes for UML: 2023-09-04 11:32:21 -07:00
tools tools/mm: fix undefined reference to pthread_once 2023-09-05 10:13:45 -07:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Revert ".gitignore: ignore *.cover and *.mbx" 2023-07-04 15:05:12 -07:00
.mailmap for-linus-2023083101 2023-09-01 12:31:44 -07:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS More thermal control updates for 6.6-rc1 2023-09-04 15:17:28 -07:00
Makefile Rust changes for v6.6 2023-08-29 08:19:46 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.