| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
The sysfs sriov_numvfs_store() path acquires the device lock before the
config space access lock:
sriov_numvfs_store
device_lock # A (1) acquire device lock
sriov_configure
vfio_pci_sriov_configure # (for example)
vfio_pci_core_sriov_configure
pci_disable_sriov
sriov_disable
pci_cfg_access_lock
pci_wait_cfg # B (4) wait for dev->block_cfg_access == 0
Previously, pci_dev_lock() acquired the config space access lock before the
device lock:
pci_dev_lock
pci_cfg_access_lock
dev->block_cfg_access = 1 # B (2) set dev->block_cfg_access = 1
device_lock # A (3) wait for device lock
Any path that uses pci_dev_lock(), e.g., pci_reset_function(), may
deadlock with sriov_numvfs_store() if the operations occur in the sequence
(1) (2) (3) (4).
Avoid the deadlock by reversing the order in pci_dev_lock() so it acquires
the device lock before the config space access lock, the same as the
sriov_numvfs_store() path.
[bhelgaas: combined and adapted commit log from Jay Zhou's independent
subsequent posting:
https://lore.kernel.org/r/20220404062539.1710-1-jianjay.zhou@huawei.com] |
| In the Linux kernel, the following vulnerability has been resolved:
net/9p: use a dedicated spinlock for trans_fd
Shamelessly copying the explanation from Tetsuo Handa's suggested
patch[1] (slightly reworded):
syzbot is reporting inconsistent lock state in p9_req_put()[2],
for p9_tag_remove() from p9_req_put() from IRQ context is using
spin_lock_irqsave() on "struct p9_client"->lock but trans_fd
(not from IRQ context) is using spin_lock().
Since the locks actually protect different things in client.c and in
trans_fd.c, just replace trans_fd.c's lock by a new one specific to the
transport (client.c's protect the idr for fid/tag allocations,
while trans_fd.c's protects its own req list and request status field
that acts as the transport's state machine) |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg()
In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard
lockup call trace hangs the system.
Call Trace:
_raw_spin_lock_irqsave+0x32/0x40
lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
lpfc_do_work+0x1485/0x1d70 [lpfc]
kthread+0x112/0x130
ret_from_fork+0x1f/0x40
Kernel panic - not syncing: Hard LOCKUP
The same CPU tries to claim the phba->port_list_lock twice.
Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and
lpfc_printf_log() macros before calling lpfc_dmp_dbg(). There is no need
to take the phba->port_list_lock within lpfc_dmp_dbg(). |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock
During stress I/O tests with 500+ vports, hard LOCKUP call traces are
observed.
CPU A:
native_queued_spin_lock_slowpath+0x192
_raw_spin_lock_irqsave+0x32
lpfc_handle_fcp_err+0x4c6
lpfc_fcp_io_cmd_wqe_cmpl+0x964
lpfc_sli4_fp_handle_cqe+0x266
__lpfc_sli4_process_cq+0x105
__lpfc_sli4_hba_process_cq+0x3c
lpfc_cq_poll_hdler+0x16
irq_poll_softirq+0x76
__softirqentry_text_start+0xe4
irq_exit+0xf7
do_IRQ+0x7f
CPU B:
native_queued_spin_lock_slowpath+0x5b
_raw_spin_lock+0x1c
lpfc_abort_handler+0x13e
scmd_eh_abort_handler+0x85
process_one_work+0x1a7
worker_thread+0x30
kthread+0x112
ret_from_fork+0x1f
Diagram of lockup:
CPUA CPUB
---- ----
lpfc_cmd->buf_lock
phba->hbalock
lpfc_cmd->buf_lock
phba->hbalock
Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock in
lpfc_abort_handler routine so that it tries to take the lpfc_cmd->buf_lock
first before phba->hbalock. |
| In the Linux kernel, the following vulnerability has been resolved:
loop: implement ->free_disk
Ensure that the lo_device which is stored in the gendisk private
data is valid until the gendisk is freed. Currently the loop driver
uses a lot of effort to make sure a device is not freed when it is
still in use, but to to fix a potential deadlock this will be relaxed
a bit soon. |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
When user_dlm_destroy_lock failed, it didn't clean up the flags it set
before exit. For USER_LOCK_IN_TEARDOWN, if this function fails because of
lock is still in used, next time when unlink invokes this function, it
will return succeed, and then unlink will remove inode and dentry if lock
is not in used(file closed), but the dlm lock is still linked in dlm lock
resource, then when bast come in, it will trigger a panic due to
user-after-free. See the following panic call trace. To fix this,
USER_LOCK_IN_TEARDOWN should be reverted if fail. And also error should
be returned if USER_LOCK_IN_TEARDOWN is set to let user know that unlink
fail.
For the case of ocfs2_dlm_unlock failure, besides USER_LOCK_IN_TEARDOWN,
USER_LOCK_BUSY is also required to be cleared. Even though spin lock is
released in between, but USER_LOCK_IN_TEARDOWN is still set, for
USER_LOCK_BUSY, if before every place that waits on this flag,
USER_LOCK_IN_TEARDOWN is checked to bail out, that will make sure no flow
waits on the busy flag set by user_dlm_destroy_lock(), then we can
simplely revert USER_LOCK_BUSY when ocfs2_dlm_unlock fails. Fix
user_dlm_cluster_lock() which is the only function not following this.
[ 941.336392] (python,26174,16):dlmfs_unlink:562 ERROR: unlink
004fb0000060000b5a90b8c847b72e1, error -16 from destroy
[ 989.757536] ------------[ cut here ]------------
[ 989.757709] kernel BUG at fs/ocfs2/dlmfs/userdlm.c:173!
[ 989.757876] invalid opcode: 0000 [#1] SMP
[ 989.758027] Modules linked in: ksplice_2zhuk2jr_ib_ipoib_new(O)
ksplice_2zhuk2jr(O) mptctl mptbase xen_netback xen_blkback xen_gntalloc
xen_gntdev xen_evtchn cdc_ether usbnet mii ocfs2 jbd2 rpcsec_gss_krb5
auth_rpcgss nfsv4 nfsv3 nfs_acl nfs fscache lockd grace ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
fcoe libfcoe libfc scsi_transport_fc sunrpc ipmi_devintf bridge stp llc
rds_rdma rds bonding ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE)
mlx4_vnic falcon_kal(E) falcon_lsm_pinned_13402(E) mlx4_ib ib_sa ib_mad
ib_core ib_addr xenfs xen_privcmd dm_multipath iTCO_wdt iTCO_vendor_support
pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core ipmi_ssif i2c_core ipmi_si
ipmi_msghandler
[ 989.760686] ioatdma sg ext3 jbd mbcache sd_mod ahci libahci ixgbe dca ptp
pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mlx4_core crc32c_intel
be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio
libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi
dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
ksplice_2zhuk2jr_ib_ipoib_old]
[ 989.761987] CPU: 10 PID: 19102 Comm: dlm_thread Tainted: P OE
4.1.12-124.57.1.el6uek.x86_64 #2
[ 989.762290] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30350100 06/17/2021
[ 989.762599] task: ffff880178af6200 ti: ffff88017f7c8000 task.ti:
ffff88017f7c8000
[ 989.762848] RIP: e030:[<ffffffffc07d4316>] [<ffffffffc07d4316>]
__user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
[ 989.763185] RSP: e02b:ffff88017f7cbcb8 EFLAGS: 00010246
[ 989.763353] RAX: 0000000000000000 RBX: ffff880174d48008 RCX:
0000000000000003
[ 989.763565] RDX: 0000000000120012 RSI: 0000000000000003 RDI:
ffff880174d48170
[ 989.763778] RBP: ffff88017f7cbcc8 R08: ffff88021f4293b0 R09:
0000000000000000
[ 989.763991] R10: ffff880179c8c000 R11: 0000000000000003 R12:
ffff880174d48008
[ 989.764204] R13: 0000000000000003 R14: ffff880179c8c000 R15:
ffff88021db7a000
[ 989.764422] FS: 0000000000000000(0000) GS:ffff880247480000(0000)
knlGS:ffff880247480000
[ 989.764685] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 989.764865] CR2: ffff8000007f6800 CR3: 0000000001ae0000 CR4:
0000000000042660
[ 989.765081] Stack:
[ 989.765167] 00000000000
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
tracing: Fix sleeping function called from invalid context on RT kernel
When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the
cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the
atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel,
these locks are replaced with sleepable rt-spinlock, so the stack calltrace will
be triggered.
Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start
tp_printk=1" enabled.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
__might_resched.cold+0x11d/0x155
rt_spin_lock+0x40/0x70
trace_event_buffer_commit+0x2fa/0x4c0
? map_vsyscall+0x93/0x93
trace_event_raw_event_initcall_start+0xbe/0x110
? perf_trace_initcall_finish+0x210/0x210
? probe_sched_wakeup+0x34/0x40
? ttwu_do_wakeup+0xda/0x310
? trace_hardirqs_on+0x35/0x170
? map_vsyscall+0x93/0x93
do_one_initcall+0x217/0x3c0
? trace_event_raw_event_initcall_level+0x170/0x170
? push_cpu_stop+0x400/0x400
? cblist_init_generic+0x241/0x290
kernel_init_freeable+0x1ac/0x347
? _raw_spin_unlock_irq+0x65/0x80
? rest_init+0xf0/0xf0
kernel_init+0x1e/0x150
ret_from_fork+0x22/0x30
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
There is a deadlock in rtllib_beacons_stop(), which is shown
below:
(Thread 1) | (Thread 2)
| rtllib_send_beacon()
rtllib_beacons_stop() | mod_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | rtllib_send_beacon_cb()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold ieee->beacon_lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, rtllib_beacons_stop() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock. |
| In the Linux kernel, the following vulnerability has been resolved:
drivers: usb: host: Fix deadlock in oxu_bus_suspend()
There is a deadlock in oxu_bus_suspend(), which is shown below:
(Thread 1) | (Thread 2)
| timer_action()
oxu_bus_suspend() | mod_timer()
spin_lock_irq() //(1) | (wait a time)
... | oxu_watchdog()
del_timer_sync() | spin_lock_irq() //(2)
(wait timer to stop) | ...
We hold oxu->lock in position (1) of thread 1, and use
del_timer_sync() to wait timer to stop, but timer handler
also need oxu->lock in position (2) of thread 2. As a result,
oxu_bus_suspend() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irq(), which could let timer handler to obtain
the needed lock. |
| In the Linux kernel, the following vulnerability has been resolved:
drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
There is a deadlock in ieee80211_beacons_stop(), which is shown below:
(Thread 1) | (Thread 2)
| ieee80211_send_beacon()
ieee80211_beacons_stop() | mod_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | ieee80211_send_beacon_cb()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold ieee->beacon_lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, ieee80211_beacons_stop() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock. |
| In the Linux kernel, the following vulnerability has been resolved:
drivers: tty: serial: Fix deadlock in sa1100_set_termios()
There is a deadlock in sa1100_set_termios(), which is shown
below:
(Thread 1) | (Thread 2)
| sa1100_enable_ms()
sa1100_set_termios() | mod_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | sa1100_timeout()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold sport->port.lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need sport->port.lock in position (2) of thread 2. As a result,
sa1100_set_termios() will block forever.
This patch moves del_timer_sync() before spin_lock_irqsave()
in order to prevent the deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
drivers: staging: rtl8192eu: Fix deadlock in rtw_joinbss_event_prehandle
There is a deadlock in rtw_joinbss_event_prehandle(), which is shown below:
(Thread 1) | (Thread 2)
| _set_timer()
rtw_joinbss_event_prehandle()| mod_timer()
spin_lock_bh() //(1) | (wait a time)
... | rtw_join_timeout_handler()
| _rtw_join_timeout_handler()
del_timer_sync() | spin_lock_bh() //(2)
(wait timer to stop) | ...
We hold pmlmepriv->lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_joinbss_event_prehandle() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock. What`s more, we change spin_lock_bh() to
spin_lock_irq() in _rtw_join_timeout_handler() in order to
prevent deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: get rid of warning on transaction commit when using flushoncommit
When using the flushoncommit mount option, during almost every transaction
commit we trigger a warning from __writeback_inodes_sb_nr():
$ cat fs/fs-writeback.c:
(...)
static void __writeback_inodes_sb_nr(struct super_block *sb, ...
{
(...)
WARN_ON(!rwsem_is_locked(&sb->s_umount));
(...)
}
(...)
The trace produced in dmesg looks like the following:
[947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
[947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
[947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
[947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
[947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
[947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
[947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
[947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
[947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
[947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
[947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
[947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
[947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
[947.571072] Call Trace:
[947.572354] <TASK>
[947.573266] btrfs_commit_transaction+0x1f1/0x998
[947.576785] ? start_transaction+0x3ab/0x44e
[947.579867] ? schedule_timeout+0x8a/0xdd
[947.582716] transaction_kthread+0xe9/0x156
[947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407
[947.590104] kthread+0x131/0x139
[947.592168] ? set_kthread_struct+0x32/0x32
[947.595174] ret_from_fork+0x22/0x30
[947.597561] </TASK>
[947.598553] ---[ end trace 644721052755541c ]---
This is because we started using writeback_inodes_sb() to flush delalloc
when committing a transaction (when using -o flushoncommit), in order to
avoid deadlocks with filesystem freeze operations. This change was made
by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots
in flushoncommit"). After that change we started producing that warning,
and every now and then a user reports this since the warning happens too
often, it spams dmesg/syslog, and a user is unsure if this reflects any
problem that might compromise the filesystem's reliability.
We can not just lock the sb->s_umount semaphore before calling
writeback_inodes_sb(), because that would at least deadlock with
filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
is called while we are holding that semaphore in write mode, and that can
trigger a transaction commit, resulting in a deadlock. It would also
trigger the same type of deadlock in the unmount path. Possibly, it could
also introduce some other locking dependencies that lockdep would report.
To fix this call try_to_writeback_inodes_sb() instead of
writeback_inodes_sb(), because that will try to read lock sb->s_umount
and then will only call writeback_inodes_sb() if it was able to lock it.
This is fine because the cases where it can't read lock sb->s_umount
are during a filesystem unmount or during a filesystem freeze - in those
cases sb->s_umount is write locked and sync_filesystem() is called, which
calls writeback_inodes_sb(). In other words, in all cases where we can't
take a read lock on sb->s_umount, writeback is already being triggered
elsewhere.
An alternative would be to call btrfs_start_delalloc_roots() with a
number of pages different from LONG_MAX, for example matching the number
of delalloc bytes we currently have, in
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
drm/vc4: Fix deadlock on DSI device attach error
DSI device attach to DSI host will be done with host device's lock
held.
Un-registering host in "device attach" error path (ex: probe retry)
will result in deadlock with below call trace and non operational
DSI display.
Startup Call trace:
[ 35.043036] rt_mutex_slowlock.constprop.21+0x184/0x1b8
[ 35.043048] mutex_lock_nested+0x7c/0xc8
[ 35.043060] device_del+0x4c/0x3e8
[ 35.043075] device_unregister+0x20/0x40
[ 35.043082] mipi_dsi_remove_device_fn+0x18/0x28
[ 35.043093] device_for_each_child+0x68/0xb0
[ 35.043105] mipi_dsi_host_unregister+0x40/0x90
[ 35.043115] vc4_dsi_host_attach+0xf0/0x120 [vc4]
[ 35.043199] mipi_dsi_attach+0x30/0x48
[ 35.043209] tc358762_probe+0x128/0x164 [tc358762]
[ 35.043225] mipi_dsi_drv_probe+0x28/0x38
[ 35.043234] really_probe+0xc0/0x318
[ 35.043244] __driver_probe_device+0x80/0xe8
[ 35.043254] driver_probe_device+0xb8/0x118
[ 35.043263] __device_attach_driver+0x98/0xe8
[ 35.043273] bus_for_each_drv+0x84/0xd8
[ 35.043281] __device_attach+0xf0/0x150
[ 35.043290] device_initial_probe+0x1c/0x28
[ 35.043300] bus_probe_device+0xa4/0xb0
[ 35.043308] deferred_probe_work_func+0xa0/0xe0
[ 35.043318] process_one_work+0x254/0x700
[ 35.043330] worker_thread+0x4c/0x448
[ 35.043339] kthread+0x19c/0x1a8
[ 35.043348] ret_from_fork+0x10/0x20
Shutdown Call trace:
[ 365.565417] Call trace:
[ 365.565423] __switch_to+0x148/0x200
[ 365.565452] __schedule+0x340/0x9c8
[ 365.565467] schedule+0x48/0x110
[ 365.565479] schedule_timeout+0x3b0/0x448
[ 365.565496] wait_for_completion+0xac/0x138
[ 365.565509] __flush_work+0x218/0x4e0
[ 365.565523] flush_work+0x1c/0x28
[ 365.565536] wait_for_device_probe+0x68/0x158
[ 365.565550] device_shutdown+0x24/0x348
[ 365.565561] kernel_restart_prepare+0x40/0x50
[ 365.565578] kernel_restart+0x20/0x70
[ 365.565591] __do_sys_reboot+0x10c/0x220
[ 365.565605] __arm64_sys_reboot+0x2c/0x38
[ 365.565619] invoke_syscall+0x4c/0x110
[ 365.565634] el0_svc_common.constprop.3+0xfc/0x120
[ 365.565648] do_el0_svc+0x2c/0x90
[ 365.565661] el0_svc+0x4c/0xf0
[ 365.565671] el0t_64_sync_handler+0x90/0xb8
[ 365.565682] el0t_64_sync+0x180/0x184 |
| In the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix hang in usb_kill_urb by adding memory barriers
The syzbot fuzzer has identified a bug in which processes hang waiting
for usb_kill_urb() to return. It turns out the issue is not unlinking
the URB; that works just fine. Rather, the problem arises when the
wakeup notification that the URB has completed is not received.
The reason is memory-access ordering on SMP systems. In outline form,
usb_kill_urb() and __usb_hcd_giveback_urb() operating concurrently on
different CPUs perform the following actions:
CPU 0 CPU 1
---------------------------- ---------------------------------
usb_kill_urb(): __usb_hcd_giveback_urb():
... ...
atomic_inc(&urb->reject); atomic_dec(&urb->use_count);
... ...
wait_event(usb_kill_urb_queue,
atomic_read(&urb->use_count) == 0);
if (atomic_read(&urb->reject))
wake_up(&usb_kill_urb_queue);
Confining your attention to urb->reject and urb->use_count, you can
see that the overall pattern of accesses on CPU 0 is:
write urb->reject, then read urb->use_count;
whereas the overall pattern of accesses on CPU 1 is:
write urb->use_count, then read urb->reject.
This pattern is referred to in memory-model circles as SB (for "Store
Buffering"), and it is well known that without suitable enforcement of
the desired order of accesses -- in the form of memory barriers -- it
is entirely possible for one or both CPUs to execute their reads ahead
of their writes. The end result will be that sometimes CPU 0 sees the
old un-decremented value of urb->use_count while CPU 1 sees the old
un-incremented value of urb->reject. Consequently CPU 0 ends up on
the wait queue and never gets woken up, leading to the observed hang
in usb_kill_urb().
The same pattern of accesses occurs in usb_poison_urb() and the
failure pathway of usb_hcd_submit_urb().
The problem is fixed by adding suitable memory barriers. To provide
proper memory-access ordering in the SB pattern, a full barrier is
required on both CPUs. The atomic_inc() and atomic_dec() accesses
themselves don't provide any memory ordering, but since they are
present, we can use the optimized smp_mb__after_atomic() memory
barrier in the various routines to obtain the desired effect.
This patch adds the necessary memory barriers. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock between quota disable and qgroup rescan worker
Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.
The deadlock happens with the steps following:
1) Task A calls ioctl to disable quota. It starts a transaction and
waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
joins to the transaction that task A started. Then task B commits to
the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
transaction. In this transaction start, task C waits for completion
of the transaction that task A started and task B committed.
This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.
An example report of the deadlock:
[372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
[372.479944] Not tainted 5.16.0-rc8 #7
[372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.493898] task:kworker/u16:6 state:D stack: 0 pid: 103 ppid: 2 flags:0x00004000
[372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
[372.510782] Call Trace:
[372.514092] <TASK>
[372.521684] __schedule+0xb56/0x4850
[372.530104] ? io_schedule_timeout+0x190/0x190
[372.538842] ? lockdep_hardirqs_on+0x7e/0x100
[372.547092] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.555591] schedule+0xe0/0x270
[372.561894] btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
[372.570506] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
[372.578875] ? free_unref_page+0x3f2/0x650
[372.585484] ? finish_wait+0x270/0x270
[372.591594] ? release_extent_buffer+0x224/0x420 [btrfs]
[372.599264] btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
[372.607157] ? lock_release+0x3a9/0x6d0
[372.613054] ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
[372.620960] ? do_raw_spin_lock+0x11e/0x250
[372.627137] ? rwlock_bug.part.0+0x90/0x90
[372.633215] ? lock_is_held_type+0xe4/0x140
[372.639404] btrfs_work_helper+0x1ae/0xa90 [btrfs]
[372.646268] process_one_work+0x7e9/0x1320
[372.652321] ? lock_release+0x6d0/0x6d0
[372.658081] ? pwq_dec_nr_in_flight+0x230/0x230
[372.664513] ? rwlock_bug.part.0+0x90/0x90
[372.670529] worker_thread+0x59e/0xf90
[372.676172] ? process_one_work+0x1320/0x1320
[372.682440] kthread+0x3b9/0x490
[372.687550] ? _raw_spin_unlock_irq+0x24/0x50
[372.693811] ? set_kthread_struct+0x100/0x100
[372.700052] ret_from_fork+0x22/0x30
[372.705517] </TASK>
[372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
[372.729827] Not tainted 5.16.0-rc8 #7
[372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.767106] task:btrfs-transacti state:D stack: 0 pid: 2347 ppid: 2 flags:0x00004000
[372.787776] Call Trace:
[372.801652] <TASK>
[372.812961] __schedule+0xb56/0x4850
[372.830011] ? io_schedule_timeout+0x190/0x190
[372.852547] ? lockdep_hardirqs_on+0x7e/0x100
[372.871761] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.886792] schedule+0xe0/0x270
[372.901685] wait_current_trans+0x22c/0x310 [btrfs]
[372.919743] ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
[372.938923] ? finish_wait+0x270/0x270
[372.959085] ? join_transaction+0xc7
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to
add ZONE_DEVICE memory, if requested free mem region's end pfn were
huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s |
| In the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree during disable_unused
Doug reported [1] the following hung task:
INFO: task swapper/0:1 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
rpm_resume+0xe0/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
clk_pm_runtime_get+0x30/0xb0
clk_disable_unused_subtree+0x58/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused+0x4c/0xe4
do_one_initcall+0xcc/0x2d8
do_initcall_level+0xa4/0x148
do_initcalls+0x5c/0x9c
do_basic_setup+0x24/0x30
kernel_init_freeable+0xec/0x164
kernel_init+0x28/0x120
ret_from_fork+0x10/0x20
INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008
Workqueue: events_unbound deferred_probe_work_func
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
schedule_preempt_disabled+0x2c/0x48
__mutex_lock+0x238/0x488
__mutex_lock_slowpath+0x1c/0x28
mutex_lock+0x50/0x74
clk_prepare_lock+0x7c/0x9c
clk_core_prepare_lock+0x20/0x44
clk_prepare+0x24/0x30
clk_bulk_prepare+0x40/0xb0
mdss_runtime_resume+0x54/0x1c8
pm_generic_runtime_resume+0x30/0x44
__genpd_runtime_resume+0x68/0x7c
genpd_runtime_resume+0x108/0x1f4
__rpm_callback+0x84/0x144
rpm_callback+0x30/0x88
rpm_resume+0x1f4/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
__device_attach+0xe0/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
device_add+0x644/0x814
mipi_dsi_device_register_full+0xe4/0x170
devm_mipi_dsi_device_register_full+0x28/0x70
ti_sn_bridge_probe+0x1dc/0x2c0
auxiliary_bus_probe+0x4c/0x94
really_probe+0xcc/0x2c8
__driver_probe_device+0xa8/0x130
driver_probe_device+0x48/0x110
__device_attach_driver+0xa4/0xcc
bus_for_each_drv+0x8c/0xd8
__device_attach+0xf8/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
deferred_probe_work_func+0x9c/0xd8
process_one_work+0x148/0x518
worker_thread+0x138/0x350
kthread+0x138/0x1e0
ret_from_fork+0x10/0x20
The first thread is walking the clk tree and calling
clk_pm_runtime_get() to power on devices required to read the clk
hardware via struct clk_ops::is_enabled(). This thread holds the clk
prepare_lock, and is trying to runtime PM resume a device, when it finds
that the device is in the process of resuming so the thread schedule()s
away waiting for the device to finish resuming before continuing. The
second thread is runtime PM resuming the same device, but the runtime
resume callback is calling clk_prepare(), trying to grab the
prepare_lock waiting on the first thread.
This is a classic ABBA deadlock. To properly fix the deadlock, we must
never runtime PM resume or suspend a device with the clk prepare_lock
held. Actually doing that is near impossible today because the global
prepare_lock would have to be dropped in the middle of the tree, the
device runtime PM resumed/suspended, and then the prepare_lock grabbed
again to ensure consistency of the clk tree topology. If anything
changes with the clk tree in the meantime, we've lost and will need to
start the operation all over again.
Luckily, most of the time we're simply incrementing or decrementing the
runtime PM count on an active device, so we don't have the chance to
schedule away with the prepare_lock held. Let's fix this immediate
problem that can be
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: arm64: Fix circular locking dependency
The rule inside kvm enforces that the vcpu->mutex is taken *inside*
kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires
the kvm->lock while already holding the vcpu->mutex lock from
kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by
protecting the hyp vm handle with the config_lock, much like we already
do for other forms of VM-scoped data. |
| In the Linux kernel, the following vulnerability has been resolved:
ftrace: Add cond_resched() to ftrace_graph_set_hash()
When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.
Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.
This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced. |