| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:
pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
intel_pasid_alloc_table+0x9c/0x1d0
dmar_insert_one_dev_info+0x423/0x540
? device_to_iommu+0x12d/0x2f0
intel_iommu_attach_device+0x116/0x290
__iommu_attach_device+0x1a/0x90
iommu_group_add_device+0x190/0x2c0
__iommu_probe_device+0x13e/0x250
iommu_probe_device+0x24/0x150
iommu_bus_notifier+0x69/0x90
blocking_notifier_call_chain+0x5a/0x80
device_add+0x3db/0x7b0
? arch_memremap_can_ram_remap+0x19/0x50
? memremap+0x75/0x140
pci_device_add+0x193/0x1d0
pci_scan_single_device+0xb9/0xf0
pci_scan_slot+0x4c/0x110
pci_scan_child_bus_extend+0x3a/0x290
vmd_enable_domain.constprop.0+0x63e/0x820
vmd_probe+0x163/0x190
local_pci_probe+0x42/0x80
work_for_cpu_fn+0x13/0x20
process_one_work+0x1e2/0x3b0
worker_thread+0x1c4/0x3a0
? rescuer_thread+0x370/0x370
kthread+0xc7/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:
$ lspci
...
0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
...
10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10000:82:00
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
audit: improve robustness of the audit queue handling
If the audit daemon were ever to get stuck in a stopped state the
kernel's kauditd_thread() could get blocked attempting to send audit
records to the userspace audit daemon. With the kernel thread
blocked it is possible that the audit queue could grow unbounded as
certain audit record generating events must be exempt from the queue
limits else the system enter a deadlock state.
This patch resolves this problem by lowering the kernel thread's
socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks
the kauditd_send_queue() function to better manage the various audit
queues when connection problems occur between the kernel and the
audit daemon. With this patch, the backlog may temporarily grow
beyond the defined limits when the audit daemon is stopped and the
system is under heavy audit pressure, but kauditd_thread() will
continue to make progress and drain the queues as it would for other
connection problems. For example, with the audit daemon put into a
stopped state and the system configured to audit every syscall it
was still possible to shutdown the system without a kernel panic,
deadlock, etc.; granted, the system was slow to shutdown but that is
to be expected given the extreme pressure of recording every syscall.
The timeout value of HZ/10 was chosen primarily through
experimentation and this developer's "gut feeling". There is likely
no one perfect value, but as this scenario is limited in scope (root
privileges would be needed to send SIGSTOP to the audit daemon), it
is likely not worth exposing this as a tunable at present. This can
always be done at a later date if it proves necessary. |
| In the Linux kernel, the following vulnerability has been resolved:
s390/qeth: fix deadlock during failing recovery
Commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery") removed
taking discipline_mutex inside qeth_do_reset(), fixing potential
deadlocks. An error path was missed though, that still takes
discipline_mutex and thus has the original deadlock potential.
Intermittent deadlocks were seen when a qeth channel path is configured
offline, causing a race between qeth_do_reset and ccwgroup_remove.
Call qeth_set_offline() directly in the qeth_do_reset() error case and
then a new variant of ccwgroup_set_offline(), without taking
discipline_mutex. |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: core: sysfs: Fix hang when device state is set via sysfs
This fixes a regression added with:
commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")
The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device() with the state_mutex held. If the SCSI error handler
thread is just starting to test the device in scsi_send_eh_cmnd() then it's
going to try to grab the state_mutex.
We are then stuck, because when scsi_rescan_device() tries to send its I/O
scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery()
which will return true (the host state is still in recovery) and I/O will
just be requeued. scsi_send_eh_cmnd() will then never be able to grab the
state_mutex to finish error handling.
To prevent the deadlock move the rescan-related code to after we drop the
state_mutex.
This also adds a check for if we are already in the running state. This
prevents extra scans and helps the iscsid case where if the transport class
has already onlined the device during its recovery process then we don't
need userspace to do it again plus possibly block that daemon. |
| In the Linux kernel, the following vulnerability has been resolved:
mtd: require write permissions for locking and badblock ioctls
MEMLOCK, MEMUNLOCK and OTPLOCK modify protection bits. Thus require
write permission. Depending on the hardware MEMLOCK might even be
write-once, e.g. for SPI-NOR flashes with their WP# tied to GND. OTPLOCK
is always write-once.
MEMSETBADBLOCK modifies the bad block table. |
| In the Linux kernel, the following vulnerability has been resolved:
bnxt_en: Fix RX consumer index logic in the error path.
In bnxt_rx_pkt(), the RX buffers are expected to complete in order.
If the RX consumer index indicates an out of order buffer completion,
it means we are hitting a hardware bug and the driver will abort all
remaining RX packets and reset the RX ring. The RX consumer index
that we pass to bnxt_discard_rx() is not correct. We should be
passing the current index (tmp_raw_cons) instead of the old index
(raw_cons). This bug can cause us to be at the wrong index when
trying to abort the next RX packet. It can crash like this:
#0 [ffff9bbcdf5c39a8] machine_kexec at ffffffff9b05e007
#1 [ffff9bbcdf5c3a00] __crash_kexec at ffffffff9b111232
#2 [ffff9bbcdf5c3ad0] panic at ffffffff9b07d61e
#3 [ffff9bbcdf5c3b50] oops_end at ffffffff9b030978
#4 [ffff9bbcdf5c3b78] no_context at ffffffff9b06aaf0
#5 [ffff9bbcdf5c3bd8] __bad_area_nosemaphore at ffffffff9b06ae2e
#6 [ffff9bbcdf5c3c28] bad_area_nosemaphore at ffffffff9b06af24
#7 [ffff9bbcdf5c3c38] __do_page_fault at ffffffff9b06b67e
#8 [ffff9bbcdf5c3cb0] do_page_fault at ffffffff9b06bb12
#9 [ffff9bbcdf5c3ce0] page_fault at ffffffff9bc015c5
[exception RIP: bnxt_rx_pkt+237]
RIP: ffffffffc0259cdd RSP: ffff9bbcdf5c3d98 RFLAGS: 00010213
RAX: 000000005dd8097f RBX: ffff9ba4cb11b7e0 RCX: ffffa923cf6e9000
RDX: 0000000000000fff RSI: 0000000000000627 RDI: 0000000000001000
RBP: ffff9bbcdf5c3e60 R8: 0000000000420003 R9: 000000000000020d
R10: ffffa923cf6ec138 R11: ffff9bbcdf5c3e83 R12: ffff9ba4d6f928c0
R13: ffff9ba4cac28080 R14: ffff9ba4cb11b7f0 R15: ffff9ba4d5a30000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 |
| In the Linux kernel, the following vulnerability has been resolved:
dmaengine: idxd: fix wq cleanup of WQCFG registers
A pre-release silicon erratum workaround where wq reset does not clear
WQCFG registers was leaked into upstream code. Use wq reset command
instead of blasting the MMIO region. This also address an issue where
we clobber registers in future devices. |
| In the Linux kernel, the following vulnerability has been resolved:
x86/fred: Clear WFE in missing-ENDBRANCH #CPs
An indirect branch instruction sets the CPU indirect branch tracker
(IBT) into WAIT_FOR_ENDBRANCH (WFE) state and WFE stays asserted
across the instruction boundary. When the decoder finds an
inappropriate instruction while WFE is set ENDBR, the CPU raises a #CP
fault.
For the "kernel IBT no ENDBR" selftest where #CPs are deliberately
triggered, the WFE state of the interrupted context needs to be
cleared to let execution continue. Otherwise when the CPU resumes
from the instruction that just caused the previous #CP, another
missing-ENDBRANCH #CP is raised and the CPU enters a dead loop.
This is not a problem with IDT because it doesn't preserve WFE and
IRET doesn't set WFE. But FRED provides space on the entry stack
(in an expanded CS area) to save and restore the WFE state, thus the
WFE state is no longer clobbered, so software must clear it.
Clear WFE to avoid dead looping in ibt_clear_fred_wfe() and the
!ibt_fatal code path when execution is allowed to continue.
Clobbering WFE in any other circumstance is a security-relevant bug.
[ dhansen: changelog rewording ] |
| In the Linux kernel, the following vulnerability has been resolved:
drm/panthor: Lock XArray when getting entries for the VM
Similar to commit cac075706f29 ("drm/panthor: Fix race when converting
group handle to group object") we need to use the XArray's internal
locking when retrieving a vm pointer from there.
v2: Removed part of the patch that was trying to protect fetching
the heap pointer from XArray, as that operation is protected by
the @pool->lock. |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: ufs: core: Fix another deadlock during RTC update
If ufshcd_rtc_work calls ufshcd_rpm_put_sync() and the pm's usage_count
is 0, we will enter the runtime suspend callback. However, the runtime
suspend callback will wait to flush ufshcd_rtc_work, causing a deadlock.
Replace ufshcd_rpm_put_sync() with ufshcd_rpm_put() to avoid the
deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/fbdev-dma: Only cleanup deferred I/O if necessary
Commit 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
necessary") initializes deferred I/O only if it is used.
drm_fbdev_dma_fb_destroy() however calls fb_deferred_io_cleanup()
unconditionally with struct fb_info.fbdefio == NULL. KASAN with the
out-of-tree Apple silicon display driver posts following warning from
__flush_work() of a random struct work_struct instead of the expected
NULL pointer derefs.
[ 22.053799] ------------[ cut here ]------------
[ 22.054832] WARNING: CPU: 2 PID: 1 at kernel/workqueue.c:4177 __flush_work+0x4d8/0x580
[ 22.056597] Modules linked in: uhid bnep uinput nls_ascii ip6_tables ip_tables i2c_dev loop fuse dm_multipath nfnetlink zram hid_magicmouse btrfs xor xor_neon brcmfmac_wcc raid6_pq hci_bcm4377 bluetooth brcmfmac hid_apple brcmutil nvmem_spmi_mfd simple_mfd_spmi dockchannel_hid cfg80211 joydev regmap_spmi nvme_apple ecdh_generic ecc macsmc_hid rfkill dwc3 appledrm snd_soc_macaudio macsmc_power nvme_core apple_isp phy_apple_atc apple_sart apple_rtkit_helper apple_dockchannel tps6598x macsmc_hwmon snd_soc_cs42l84 videobuf2_v4l2 spmi_apple_controller nvmem_apple_efuses videobuf2_dma_sg apple_z2 videobuf2_memops spi_nor panel_summit videobuf2_common asahi videodev pwm_apple apple_dcp snd_soc_apple_mca apple_admac spi_apple clk_apple_nco i2c_pasemi_platform snd_pcm_dmaengine mc i2c_pasemi_core mux_core ofpart adpdrm drm_dma_helper apple_dart apple_soc_cpufreq leds_pwm phram
[ 22.073768] CPU: 2 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.11.2-asahi+ #asahi-dev
[ 22.075612] Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
[ 22.077032] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 22.078567] pc : __flush_work+0x4d8/0x580
[ 22.079471] lr : __flush_work+0x54/0x580
[ 22.080345] sp : ffffc000836ef820
[ 22.081089] x29: ffffc000836ef880 x28: 0000000000000000 x27: ffff80002ddb7128
[ 22.082678] x26: dfffc00000000000 x25: 1ffff000096f0c57 x24: ffffc00082d3e358
[ 22.084263] x23: ffff80004b7862b8 x22: dfffc00000000000 x21: ffff80005aa1d470
[ 22.085855] x20: ffff80004b786000 x19: ffff80004b7862a0 x18: 0000000000000000
[ 22.087439] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000005
[ 22.089030] x14: 1ffff800106ddf0a x13: 0000000000000000 x12: 0000000000000000
[ 22.090618] x11: ffffb800106ddf0f x10: dfffc00000000000 x9 : 1ffff800106ddf0e
[ 22.092206] x8 : 0000000000000000 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000001
[ 22.093790] x5 : ffffc000836ef728 x4 : 0000000000000000 x3 : 0000000000000020
[ 22.095368] x2 : 0000000000000008 x1 : 00000000000000aa x0 : 0000000000000000
[ 22.096955] Call trace:
[ 22.097505] __flush_work+0x4d8/0x580
[ 22.098330] flush_delayed_work+0x80/0xb8
[ 22.099231] fb_deferred_io_cleanup+0x3c/0x130
[ 22.100217] drm_fbdev_dma_fb_destroy+0x6c/0xe0 [drm_dma_helper]
[ 22.101559] unregister_framebuffer+0x210/0x2f0
[ 22.102575] drm_fb_helper_unregister_info+0x48/0x60
[ 22.103683] drm_fbdev_dma_client_unregister+0x4c/0x80 [drm_dma_helper]
[ 22.105147] drm_client_dev_unregister+0x1cc/0x230
[ 22.106217] drm_dev_unregister+0x58/0x570
[ 22.107125] apple_drm_unbind+0x50/0x98 [appledrm]
[ 22.108199] component_del+0x1f8/0x3a8
[ 22.109042] dcp_platform_shutdown+0x24/0x38 [apple_dcp]
[ 22.110357] platform_shutdown+0x70/0x90
[ 22.111219] device_shutdown+0x368/0x4d8
[ 22.112095] kernel_restart+0x6c/0x1d0
[ 22.112946] __arm64_sys_reboot+0x1c8/0x328
[ 22.113868] invoke_syscall+0x78/0x1a8
[ 22.114703] do_el0_svc+0x124/0x1a0
[ 22.115498] el0_svc+0x3c/0xe0
[ 22.116181] el0t_64_sync_handler+0x70/0xc0
[ 22.117110] el0t_64_sync+0x190/0x198
[ 22.117931] ---[ end trace 0000000000000000 ]--- |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: fnic: Move flush_work initialization out of if block
After commit 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a
work queue"), it can happen that a work item is sent to an uninitialized
work queue. This may has the effect that the item being queued is never
actually queued, and any further actions depending on it will not
proceed.
The following warning is observed while the fnic driver is loaded:
kernel: WARNING: CPU: 11 PID: 0 at ../kernel/workqueue.c:1524 __queue_work+0x373/0x410
kernel: <IRQ>
kernel: queue_work_on+0x3a/0x50
kernel: fnic_wq_copy_cmpl_handler+0x54a/0x730 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: fnic_isr_msix_wq_copy+0x2d/0x60 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: __handle_irq_event_percpu+0x36/0x1a0
kernel: handle_irq_event_percpu+0x30/0x70
kernel: handle_irq_event+0x34/0x60
kernel: handle_edge_irq+0x7e/0x1a0
kernel: __common_interrupt+0x3b/0xb0
kernel: common_interrupt+0x58/0xa0
kernel: </IRQ>
It has been observed that this may break the rediscovery of Fibre
Channel devices after a temporary fabric failure.
This patch fixes it by moving the work queue initialization out of
an if block in fnic_probe(). |
| In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35
[WHY & HOW]
Mismatch in DCN35 DML2 cause bw validation failed to acquire unexpected DPP pipe to cause
grey screen and system hang. Remove EnhancedPrefetchScheduleAccelerationFinal value override
to match HW spec.
(cherry picked from commit 9dad21f910fcea2bdcff4af46159101d7f9cd8ba) |
| In the Linux kernel, the following vulnerability has been resolved:
vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
dev_queue_xmit_nit is expected to be called with BH disabled.
__dev_queue_xmit has the following:
/* Disable soft irqs for various locks below. Also
* stops preemption for RCU.
*/
rcu_read_lock_bh();
VRF must follow this invariant. The referenced commit removed this
protection. Which triggered a lockdep warning:
================================
WARNING: inconsistent lock state
6.11.0 #1 Tainted: G W
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
{IN-SOFTIRQ-W} state was registered at:
lock_acquire+0x19a/0x4f0
_raw_spin_lock+0x27/0x40
packet_rcv+0xa33/0x1320
__netif_receive_skb_core.constprop.0+0xcb0/0x3a90
__netif_receive_skb_list_core+0x2c9/0x890
netif_receive_skb_list_internal+0x610/0xcc0
[...]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rlock-AF_PACKET);
<Interrupt>
lock(rlock-AF_PACKET);
*** DEADLOCK ***
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
mark_lock+0x102e/0x16b0
__lock_acquire+0x9ae/0x6170
lock_acquire+0x19a/0x4f0
_raw_spin_lock+0x27/0x40
tpacket_rcv+0x863/0x3b30
dev_queue_xmit_nit+0x709/0xa40
vrf_finish_direct+0x26e/0x340 [vrf]
vrf_l3_out+0x5f4/0xe80 [vrf]
__ip_local_out+0x51e/0x7a0
[...] |
| In the Linux kernel, the following vulnerability has been resolved:
tracing/timerlat: Drop interface_lock in stop_kthread()
stop_kthread() is the offline callback for "trace/osnoise:online", since
commit 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing
of kthread in stop_kthread()"), the following ABBA deadlock scenario is
introduced:
T1 | T2 [BP] | T3 [AP]
osnoise_hotplug_workfn() | work_for_cpu_fn() | cpuhp_thread_fun()
| _cpu_down() | osnoise_cpu_die()
mutex_lock(&interface_lock) | | stop_kthread()
| cpus_write_lock() | mutex_lock(&interface_lock)
cpus_read_lock() | cpuhp_kick_ap() |
As the interface_lock here in just for protecting the "kthread" field of
the osn_var, use xchg() instead to fix this issue. Also use
for_each_online_cpu() back in stop_per_cpu_kthreads() as it can take
cpu_read_lock() again. |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice
The km.state is not checked in driver's delayed work. When
xfrm_state_check_expire() is called, the state can be reset to
XFRM_STATE_EXPIRED, even if it is XFRM_STATE_DEAD already. This
happens when xfrm state is deleted, but not freed yet. As
__xfrm_state_delete() is called again in xfrm timer, the following
crash occurs.
To fix this issue, skip xfrm_state_check_expire() if km.state is not
XFRM_STATE_VALID.
Oops: general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP
CPU: 5 UID: 0 PID: 7448 Comm: kworker/u102:2 Not tainted 6.11.0-rc2+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_sw_limits [mlx5_core]
RIP: 0010:__xfrm_state_delete+0x3d/0x1b0
Code: 0f 84 8b 01 00 00 48 89 fd c6 87 c8 00 00 00 05 48 8d bb 40 10 00 00 e8 11 04 1a 00 48 8b 95 b8 00 00 00 48 8b 85 c0 00 00 00 <48> 89 42 08 48 89 10 48 8b 55 10 48 b8 00 01 00 00 00 00 ad de 48
RSP: 0018:ffff88885f945ec8 EFLAGS: 00010246
RAX: dead000000000122 RBX: ffffffff82afa940 RCX: 0000000000000036
RDX: dead000000000100 RSI: 0000000000000000 RDI: ffffffff82afb980
RBP: ffff888109a20340 R08: ffff88885f945ea0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88885f945ff8 R12: 0000000000000246
R13: ffff888109a20340 R14: ffff88885f95f420 R15: ffff88885f95f400
FS: 0000000000000000(0000) GS:ffff88885f940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2163102430 CR3: 00000001128d6001 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
? die_addr+0x33/0x90
? exc_general_protection+0x1a2/0x390
? asm_exc_general_protection+0x22/0x30
? __xfrm_state_delete+0x3d/0x1b0
? __xfrm_state_delete+0x2f/0x1b0
xfrm_timer_handler+0x174/0x350
? __xfrm_state_delete+0x1b0/0x1b0
__hrtimer_run_queues+0x121/0x270
hrtimer_run_softirq+0x88/0xd0
handle_softirqs+0xcc/0x270
do_softirq+0x3c/0x50
</IRQ>
<TASK>
__local_bh_enable_ip+0x47/0x50
mlx5e_ipsec_handle_sw_limits+0x7d/0x90 [mlx5_core]
process_one_work+0x137/0x2d0
worker_thread+0x28d/0x3a0
? rescuer_thread+0x480/0x480
kthread+0xb8/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/guc_submit: add missing locking in wedged_fini
Any non-wedged queue can have a zero refcount here and can be running
concurrently with an async queue destroy, therefore dereferencing the
queue ptr to check wedge status after the lookup can trigger UAF if
queue is not wedged. Fix this by keeping the submission_state lock held
around the check to postpone the free and make the check safe, before
dropping again around the put() to avoid the deadlock.
(cherry picked from commit d28af0b6b9580b9f90c265a7da0315b0ad20bbfd) |
| In the Linux kernel, the following vulnerability has been resolved:
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations. Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).
CPU0 CPU1 CPU2
1 lock(&kvm->slots_lock);
2 lock(&vcpu->mutex);
3 lock(&kvm->srcu);
4 lock(cpu_hotplug_lock);
5 lock(kvm_lock);
6 lock(&kvm->slots_lock);
7 lock(cpu_hotplug_lock);
8 sync(&kvm->srcu);
Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier():
cpuhp_cpufreq_online()
|
-> cpufreq_online()
|
-> cpufreq_gov_performance_limits()
|
-> __cpufreq_driver_target()
|
-> __target_index()
|
-> cpufreq_freq_transition_begin()
|
-> cpufreq_notify_transition()
|
-> ... __kvmclock_cpufreq_notifier()
But, actually triggering such deadlocks is beyond rare due to the
combination of dependencies and timings involved. E.g. the cpufreq
notifier is only used on older CPUs without a constant TSC, mucking with
the NX hugepage mitigation while VMs are running is very uncommon, and
doing so while also onlining/offlining a CPU (necessary to generate
contention on cpu_hotplug_lock) would be even more unusual.
The most robust solution to the general cpu_hotplug_lock issue is likely
to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq
notifier doesn't to take kvm_lock. For now, settle for fixing the most
blatant deadlock, as switching to an RCU-protected list is a much more
involved change, but add a comment in locking.rst to call out that care
needs to be taken when walking holding kvm_lock and walking vm_list.
======================================================
WARNING: possible circular locking dependency detected
6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O
------------------------------------------------------
tee/35048 is trying to acquire lock:
ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]
but task is already holding lock:
ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kvm_lock){+.+.}-{3:3}:
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
kvm_dev_ioctl+0x4fb/0xe50 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #2 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x2e/0xb0
static_key_slow_inc+0x16/0x30
kvm_lapic_set_base+0x6a/0x1c0 [kvm]
kvm_set_apic_base+0x8f/0xe0 [kvm]
kvm_set_msr_common+0x9ae/0xf80 [kvm]
vmx_set_msr+0xa54/0xbe0 [kvm_intel]
__kvm_set_msr+0xb6/0x1a0 [kvm]
kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&kvm->srcu){.+.+}-{0:0}:
__synchronize_srcu+0x44/0x1a0
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
erofs: handle overlapped pclusters out of crafted images properly
syzbot reported a task hang issue due to a deadlock case where it is
waiting for the folio lock of a cached folio that will be used for
cache I/Os.
After looking into the crafted fuzzed image, I found it's formed with
several overlapped big pclusters as below:
Ext: logical offset | length : physical offset | length
0: 0.. 16384 | 16384 : 151552.. 167936 | 16384
1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384
2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384
...
Here, extent 0/1 are physically overlapped although it's entirely
_impossible_ for normal filesystem images generated by mkfs.
First, managed folios containing compressed data will be marked as
up-to-date and then unlocked immediately (unlike in-place folios) when
compressed I/Os are complete. If physical blocks are not submitted in
the incremental order, there should be separate BIOs to avoid dependency
issues. However, the current code mis-arranges z_erofs_fill_bio_vec()
and BIO submission which causes unexpected BIO waits.
Second, managed folios will be connected to their own pclusters for
efficient inter-queries. However, this is somewhat hard to implement
easily if overlapped big pclusters exist. Again, these only appear in
fuzzed images so let's simply fall back to temporary short-lived pages
for correctness.
Additionally, it justifies that referenced managed folios cannot be
truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy
up `struct z_erofs_bvec`") for simplicity although it shouldn't be any
difference. |