| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc, afs: Fix peer hash locking vs RCU callback
In its address list, afs now retains pointers to and refs on one or more
rxrpc_peer objects. The address list is freed under RCU and at this time,
it puts the refs on those peers.
Now, when an rxrpc_peer object runs out of refs, it gets removed from the
peer hash table and, for that, rxrpc has to take a spinlock. However, it
is now being called from afs's RCU cleanup, which takes place in BH
context - but it is just taking an ordinary spinlock.
The put may also be called from non-BH context, and so there exists the
possibility of deadlock if the BH-based RCU cleanup happens whilst the hash
spinlock is held. This led to the attached lockdep complaint.
Fix this by changing spinlocks of rxnet->peer_hash_lock back to
BH-disabling locks.
================================
WARNING: inconsistent lock state
6.13.0-rc5-build2+ #1223 Tainted: G E
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff88810babe228 (&rxnet->peer_hash_lock){+.?.}-{3:3}, at: rxrpc_put_peer+0xcb/0x180
{SOFTIRQ-ON-W} state was registered at:
mark_usage+0x164/0x180
__lock_acquire+0x544/0x990
lock_acquire.part.0+0x103/0x280
_raw_spin_lock+0x2f/0x40
rxrpc_peer_keepalive_worker+0x144/0x440
process_one_work+0x486/0x7c0
process_scheduled_works+0x73/0x90
worker_thread+0x1c8/0x2a0
kthread+0x19b/0x1b0
ret_from_fork+0x24/0x40
ret_from_fork_asm+0x1a/0x30
irq event stamp: 972402
hardirqs last enabled at (972402): [<ffffffff8244360e>] _raw_spin_unlock_irqrestore+0x2e/0x50
hardirqs last disabled at (972401): [<ffffffff82443328>] _raw_spin_lock_irqsave+0x18/0x60
softirqs last enabled at (972300): [<ffffffff810ffbbe>] handle_softirqs+0x3ee/0x430
softirqs last disabled at (972313): [<ffffffff810ffc54>] __irq_exit_rcu+0x44/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rxnet->peer_hash_lock);
<Interrupt>
lock(&rxnet->peer_hash_lock);
*** DEADLOCK ***
1 lock held by swapper/1/0:
#0: ffffffff83576be0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x7/0x30
stack backtrace:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G E 6.13.0-rc5-build2+ #1223
Tainted: [E]=UNSIGNED_MODULE
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x57/0x80
print_usage_bug.part.0+0x227/0x240
valid_state+0x53/0x70
mark_lock_irq+0xa5/0x2f0
mark_lock+0xf7/0x170
mark_usage+0xe1/0x180
__lock_acquire+0x544/0x990
lock_acquire.part.0+0x103/0x280
_raw_spin_lock+0x2f/0x40
rxrpc_put_peer+0xcb/0x180
afs_free_addrlist+0x46/0x90 [kafs]
rcu_do_batch+0x2d2/0x640
rcu_core+0x2f7/0x350
handle_softirqs+0x1ee/0x430
__irq_exit_rcu+0x44/0x110
irq_exit_rcu+0xa/0x30
sysvec_apic_timer_interrupt+0x7f/0xa0
</IRQ> |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.12.0+ #4 Not tainted
-----------------------------------------------------
charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
and this task is already holding:
ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
which would create a new lock dependency:
(&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&x->lock){+.-.}-{3:3}
... which became SOFTIRQ-irq-safe at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
to a SOFTIRQ-irq-unsafe lock:
(&xa->xa_lock#24){+.+.}-{3:3}
... which became SOFTIRQ-irq-unsafe at:
...
lock_acquire+0x1be/0x520
_raw_spin_lock+0x2c/0x40
xa_set_mark+0x70/0x110
mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&xa->xa_lock#24);
local_irq_disable();
lock(&x->lock);
lock(&xa->xa_lock#24);
<Interrupt>
lock(&x->lock);
*** DEADLOCK ***
2 locks held by charon/1337:
#0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
#1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&x->lock){+.-.}-{3:3} ops: 29 {
HARDIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_alloc_spi+0xc0/0xe60
xfrm_alloc_userspi+0x5f6/0xbc0
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
IN-SOFTIRQ-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
afs: Fix merge preference rule failure condition
syzbot reported a lock held when returning to userspace[1]. This is
because if argc is less than 0 and the function returns directly, the held
inode lock is not released.
Fix this by store the error in ret and jump to done to clean up instead of
returning directly.
[dh: Modified Lizhi Xu's original patch to make it honour the error code
from afs_split_string()]
[1]
WARNING: lock held when returning to user space!
6.13.0-rc3-syzkaller-00209-g499551201b5f #0 Not tainted
------------------------------------------------
syz-executor133/5823 is leaving the kernel with locks still held!
1 lock held by syz-executor133/5823:
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: inode_lock include/linux/fs.h:818 [inline]
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: afs_proc_addr_prefs_write+0x2bb/0x14e0 fs/afs/addr_prefs.c:388 |
| In the Linux kernel, the following vulnerability has been resolved:
cgroup/cpuset: remove kernfs active break
A warning was found:
WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
FS: 00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kernfs_drain+0x15e/0x2f0
__kernfs_remove+0x165/0x300
kernfs_remove_by_name_ns+0x7b/0xc0
cgroup_rm_file+0x154/0x1c0
cgroup_addrm_files+0x1c2/0x1f0
css_clear_dir+0x77/0x110
kill_css+0x4c/0x1b0
cgroup_destroy_locked+0x194/0x380
cgroup_rmdir+0x2a/0x140
It can be explained by:
rmdir echo 1 > cpuset.cpus
kernfs_fop_write_iter // active=0
cgroup_rm_file
kernfs_remove_by_name_ns kernfs_get_active // active=1
__kernfs_remove // active=0x80000002
kernfs_drain cpuset_write_resmask
wait_event
//waiting (active == 0x80000001)
kernfs_break_active_protection
// active = 0x80000001
// continue
kernfs_unbreak_active_protection
// active = 0x80000002
...
kernfs_should_drain_open_files
// warning occurs
kernfs_put_active
This warning is caused by 'kernfs_break_active_protection' when it is
writing to cpuset.cpus, and the cgroup is removed concurrently.
The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
get_online_cpus()") made cpuset_hotplug_workfn asynchronous, This change
involves calling flush_work(), which can create a multiple processes
circular locking dependency that involve cgroup_mutex, potentially leading
to a deadlock. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset: break
kernfs active protection in cpuset_write_resmask()") added
'kernfs_break_active_protection' in the cpuset_write_resmask. This could
lead to this warning.
After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
processing synchronous"), the cpuset_write_resmask no longer needs to
wait the hotplug to finish, which means that concurrent hotplug and cpuset
operations are no longer possible. Therefore, the deadlock doesn't exist
anymore and it does not have to 'break active protection' now. To fix this
warning, just remove kernfs_break_active_protection operation in the
'cpuset_write_resmask'. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix deadlock when freeing cgroup storage
The following commit
bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
first introduced deadlock prevention for fentry/fexit programs attaching
on bpf_task_storage helpers. That commit also employed the logic in map
free path in its v6 version.
Later bpf_cgrp_storage was first introduced in
c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
which faces the same issue as bpf_task_storage, instead of its busy
counter, NULL was passed to bpf_local_storage_map_free() which opened
a window to cause deadlock:
<TASK>
(acquiring local_storage->lock)
_raw_spin_lock_irqsave+0x3d/0x50
bpf_local_storage_update+0xd1/0x460
bpf_cgrp_storage_get+0x109/0x130
bpf_prog_a4d4a370ba857314_cgrp_ptr+0x139/0x170
? __bpf_prog_enter_recur+0x16/0x80
bpf_trampoline_6442485186+0x43/0xa4
cgroup_storage_ptr+0x9/0x20
(holding local_storage->lock)
bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
bpf_selem_unlink_storage+0x6f/0x110
bpf_local_storage_map_free+0xa2/0x110
bpf_map_free_deferred+0x5b/0x90
process_one_work+0x17c/0x390
worker_thread+0x251/0x360
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Progs:
- A: SEC("fentry/cgroup_storage_ptr")
- cgid (BPF_MAP_TYPE_HASH)
Record the id of the cgroup the current task belonging
to in this hash map, using the address of the cgroup
as the map key.
- cgrpa (BPF_MAP_TYPE_CGRP_STORAGE)
If current task is a kworker, lookup the above hash
map using function parameter @owner as the key to get
its corresponding cgroup id which is then used to get
a trusted pointer to the cgroup through
bpf_cgroup_from_id(). This trusted pointer can then
be passed to bpf_cgrp_storage_get() to finally trigger
the deadlock issue.
- B: SEC("tp_btf/sys_enter")
- cgrpb (BPF_MAP_TYPE_CGRP_STORAGE)
The only purpose of this prog is to fill Prog A's
hash map by calling bpf_cgrp_storage_get() for as
many userspace tasks as possible.
Steps to reproduce:
- Run A;
- while (true) { Run B; Destroy B; }
Fix this issue by passing its busy counter to the free procedure so
it can be properly incremented before storage/smap locking. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix racy issue from session lookup and expire
Increment the session reference count within the lock for lookup to avoid
racy issue with session expire. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: bpf_local_storage: Always use bpf_mem_alloc in PREEMPT_RT
In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non preemptible
context. bpf_mem_alloc must be used in PREEMPT_RT. This patch is
to enforce bpf_mem_alloc in the bpf_local_storage when CONFIG_PREEMPT_RT
is enabled.
[ 35.118559] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 35.118566] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1832, name: test_progs
[ 35.118569] preempt_count: 1, expected: 0
[ 35.118571] RCU nest depth: 1, expected: 1
[ 35.118577] INFO: lockdep is turned off.
...
[ 35.118647] __might_resched+0x433/0x5b0
[ 35.118677] rt_spin_lock+0xc3/0x290
[ 35.118700] ___slab_alloc+0x72/0xc40
[ 35.118723] __kmalloc_noprof+0x13f/0x4e0
[ 35.118732] bpf_map_kzalloc+0xe5/0x220
[ 35.118740] bpf_selem_alloc+0x1d2/0x7b0
[ 35.118755] bpf_local_storage_update+0x2fa/0x8b0
[ 35.118784] bpf_sk_storage_get_tracing+0x15a/0x1d0
[ 35.118791] bpf_prog_9a118d86fca78ebb_trace_inet_sock_set_state+0x44/0x66
[ 35.118795] bpf_trace_run3+0x222/0x400
[ 35.118820] __bpf_trace_inet_sock_set_state+0x11/0x20
[ 35.118824] trace_inet_sock_set_state+0x112/0x130
[ 35.118830] inet_sk_state_store+0x41/0x90
[ 35.118836] tcp_set_state+0x3b3/0x640
There is no need to adjust the gfp_flags passing to the
bpf_mem_cache_alloc_flags() which only honors the GFP_KERNEL.
The verifier has ensured GFP_KERNEL is passed only in sleepable context.
It has been an old issue since the first introduction of the
bpf_local_storage ~5 years ago, so this patch targets the bpf-next.
bpf_mem_alloc is needed to solve it, so the Fixes tag is set
to the commit when bpf_mem_alloc was first used in the bpf_local_storage. |
| In the Linux kernel, the following vulnerability has been resolved:
media: uvcvideo: Fix deadlock during uvc_probe
If uvc_probe() fails, it can end up calling uvc_status_unregister() before
uvc_status_init() is called.
Fix this by checking if dev->status is NULL or not in
uvc_status_unregister(). |
| In the Linux kernel, the following vulnerability has been resolved:
rhashtable: Fix potential deadlock by moving schedule_work outside lock
Move the hash table growth check and work scheduling outside the
rht lock to prevent a possible circular locking dependency.
The original implementation could trigger a lockdep warning due to
a potential deadlock scenario involving nested locks between
rhashtable bucket, rq lock, and dsq lock. By relocating the
growth check and work scheduling after releasing the rth lock, we break
this potential deadlock chain.
This change expands the flexibility of rhashtable by removing
restrictive locking that previously limited its use in scheduler
and workqueue contexts.
Import to say that this calls rht_grow_above_75(), which reads from
struct rhashtable without holding the lock, if this is a problem, we can
move the check to the lock, and schedule the workqueue after the lock.
Modified so that atomic_inc is also moved outside of the bucket
lock along with the growth above 75% check. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/nouveau/gr/gf100: Fix missing unlock in gf100_gr_chan_new()
When the call to gf100_grctx_generate() fails, unlock gr->fecs.mutex
before returning the error.
Fixes smatch warning:
drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:480 gf100_gr_chan_new() warn: inconsistent returns '&gr->fecs.mutex'. |
| In the Linux kernel, the following vulnerability has been resolved:
nfs_common: must not hold RCU while calling nfsd_file_put_local
Move holding the RCU from nfs_to_nfsd_file_put_local to
nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that
requires the RCU anyway (the puts for nfsd_file and netns were
combined to avoid an extra indirect reference but that
micro-optimization isn't possible now).
This fixes xfstests generic/013 and it triggering:
"Voluntary context switch within RCU read-side critical section!"
[ 143.545738] Call Trace:
[ 143.546206] <TASK>
[ 143.546625] ? show_regs+0x6d/0x80
[ 143.547267] ? __warn+0x91/0x140
[ 143.547951] ? rcu_note_context_switch+0x496/0x5d0
[ 143.548856] ? report_bug+0x193/0x1a0
[ 143.549557] ? handle_bug+0x63/0xa0
[ 143.550214] ? exc_invalid_op+0x1d/0x80
[ 143.550938] ? asm_exc_invalid_op+0x1f/0x30
[ 143.551736] ? rcu_note_context_switch+0x496/0x5d0
[ 143.552634] ? wakeup_preempt+0x62/0x70
[ 143.553358] __schedule+0xaa/0x1380
[ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40
[ 143.554958] ? try_to_wake_up+0x1fe/0x6b0
[ 143.555715] ? wake_up_process+0x19/0x20
[ 143.556452] schedule+0x2e/0x120
[ 143.557066] schedule_preempt_disabled+0x19/0x30
[ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0
[ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs]
[ 143.559894] down_read+0x4e/0xb0
[ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs]
[ 143.561460] ? _raw_spin_unlock+0x12/0x30
[ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs]
[ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs]
[ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs]
[ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs]
[ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs]
[ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs]
[ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs]
[ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs]
[ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs]
[ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs]
[ 143.572359] xfs_file_release+0x12d/0x140 [xfs]
[ 143.573324] __fput+0xe8/0x2d0
[ 143.573922] __fput_sync+0x1d/0x30
[ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd]
[ 143.575430] nfsd_file_free+0x96/0x150 [nfsd]
[ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd]
[ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd]
[ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio]
[ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs]
[ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs]
[ 143.581038] nfs_file_release+0x3e/0x60 [nfs]
[ 143.581879] __fput+0xe8/0x2d0
[ 143.582464] __fput_sync+0x1d/0x30
[ 143.583108] __x64_sys_close+0x41/0x80
[ 143.583823] x64_sys_call+0x189a/0x20d0
[ 143.584552] do_syscall_64+0x64/0x170
[ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 143.586185] RIP: 0033:0x7f3c5153efd7 |
| In the Linux kernel, the following vulnerability has been resolved:
block: Prevent potential deadlocks in zone write plug error recovery
Zone write plugging for handling writes to zones of a zoned block
device always execute a zone report whenever a write BIO to a zone
fails. The intent of this is to ensure that the tracking of a zone write
pointer is always correct to ensure that the alignment to a zone write
pointer of write BIOs can be checked on submission and that we can
always correctly emulate zone append operations using regular write
BIOs.
However, this error recovery scheme introduces a potential deadlock if a
device queue freeze is initiated while BIOs are still plugged in a zone
write plug and one of these write operation fails. In such case, the
disk zone write plug error recovery work is scheduled and executes a
report zone. This in turn can result in a request allocation in the
underlying driver to issue the report zones command to the device. But
with the device queue freeze already started, this allocation will
block, preventing the report zone execution and the continuation of the
processing of the plugged BIOs. As plugged BIOs hold a queue usage
reference, the queue freeze itself will never complete, resulting in a
deadlock.
Avoid this problem by completely removing from the zone write plugging
code the use of report zones operations after a failed write operation,
instead relying on the device user to either execute a report zones,
reset the zone, finish the zone, or give up writing to the device (which
is a fairly common pattern for file systems which degrade to read-only
after write failures). This is not an unreasonnable requirement as all
well-behaved applications, FSes and device mapper already use report
zones to recover from write errors whenever possible by comparing the
current position of a zone write pointer with what their assumption
about the position is.
The changes to remove the automatic error recovery are as follows:
- Completely remove the error recovery work and its associated
resources (zone write plug list head, disk error list, and disk
zone_wplugs_work work struct). This also removes the functions
disk_zone_wplug_set_error() and disk_zone_wplug_clear_error().
- Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into
BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write
plug whenever a write opration targetting the zone of the zone write
plug fails. This flag indicates that the zone write pointer offset is
not reliable and that it must be updated when the next report zone,
reset zone, finish zone or disk revalidation is executed.
- Modify blk_zone_write_plug_bio_endio() to set the
BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed
write BIO.
- Modify the function disk_zone_wplug_set_wp_offset() to clear this
new flag, thus implementing recovery of a correct write pointer
offset with the reset (all) zone and finish zone operations.
- Modify blkdev_report_zones() to always use the disk_report_zones_cb()
callback so that disk_zone_wplug_sync_wp_offset() can be called for
any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag.
This implements recovery of a correct write pointer offset for zone
write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within
the range of the report zones operation executed by the user.
- Modify blk_revalidate_seq_zone() to call
disk_zone_wplug_sync_wp_offset() for all sequential write required
zones when a zoned block device is revalidated, thus always resolving
any inconsistency between the write pointer offset of zone write
plugs and the actual write pointer position of sequential zones. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: IDLETIMER: Fix for possible ABBA deadlock
Deletion of the last rule referencing a given idletimer may happen at
the same time as a read of its file in sysfs:
| ======================================================
| WARNING: possible circular locking dependency detected
| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| ------------------------------------------------------
| iptables/3303 is trying to acquire lock:
| ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20
|
| but task is already holding lock:
| ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
|
| which lock already depends on the new lock.
A simple reproducer is:
| #!/bin/bash
|
| while true; do
| iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
| iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
| done &
| while true; do
| cat /sys/class/xt_idletimer/timers/testme >/dev/null
| done
Avoid this by freeing list_mutex right after deleting the element from
the list, then continuing with the teardown. |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: iso: Fix circular lock in iso_listen_bis
This fixes the circular locking dependency warning below, by
releasing the socket lock before enterning iso_listen_bis, to
avoid any potential deadlock with hdev lock.
[ 75.307983] ======================================================
[ 75.307984] WARNING: possible circular locking dependency detected
[ 75.307985] 6.12.0-rc6+ #22 Not tainted
[ 75.307987] ------------------------------------------------------
[ 75.307987] kworker/u81:2/2623 is trying to acquire lock:
[ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
at: iso_connect_cfm+0x253/0x840 [bluetooth]
[ 75.308021]
but task is already holding lock:
[ 75.308022] ffff8fdd61a10078 (&hdev->lock)
at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[ 75.308053]
which lock already depends on the new lock.
[ 75.308054]
the existing dependency chain (in reverse order) is:
[ 75.308055]
-> #1 (&hdev->lock){+.+.}-{3:3}:
[ 75.308057] __mutex_lock+0xad/0xc50
[ 75.308061] mutex_lock_nested+0x1b/0x30
[ 75.308063] iso_sock_listen+0x143/0x5c0 [bluetooth]
[ 75.308085] __sys_listen_socket+0x49/0x60
[ 75.308088] __x64_sys_listen+0x4c/0x90
[ 75.308090] x64_sys_call+0x2517/0x25f0
[ 75.308092] do_syscall_64+0x87/0x150
[ 75.308095] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 75.308098]
-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[ 75.308100] __lock_acquire+0x155e/0x25f0
[ 75.308103] lock_acquire+0xc9/0x300
[ 75.308105] lock_sock_nested+0x32/0x90
[ 75.308107] iso_connect_cfm+0x253/0x840 [bluetooth]
[ 75.308128] hci_connect_cfm+0x6c/0x190 [bluetooth]
[ 75.308155] hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth]
[ 75.308180] hci_le_meta_evt+0xe7/0x200 [bluetooth]
[ 75.308206] hci_event_packet+0x21f/0x5c0 [bluetooth]
[ 75.308230] hci_rx_work+0x3ae/0xb10 [bluetooth]
[ 75.308254] process_one_work+0x212/0x740
[ 75.308256] worker_thread+0x1bd/0x3a0
[ 75.308258] kthread+0xe4/0x120
[ 75.308259] ret_from_fork+0x44/0x70
[ 75.308261] ret_from_fork_asm+0x1a/0x30
[ 75.308263]
other info that might help us debug this:
[ 75.308264] Possible unsafe locking scenario:
[ 75.308264] CPU0 CPU1
[ 75.308265] ---- ----
[ 75.308265] lock(&hdev->lock);
[ 75.308267] lock(sk_lock-
AF_BLUETOOTH-BTPROTO_ISO);
[ 75.308268] lock(&hdev->lock);
[ 75.308269] lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
[ 75.308270]
*** DEADLOCK ***
[ 75.308271] 4 locks held by kworker/u81:2/2623:
[ 75.308272] #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0},
at: process_one_work+0x443/0x740
[ 75.308276] #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)),
at: process_one_work+0x1ce/0x740
[ 75.308280] #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3}
at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[ 75.308304] #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2},
at: hci_connect_cfm+0x29/0x190 [bluetooth] |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: iso: Fix circular lock in iso_conn_big_sync
This fixes the circular locking dependency warning below, by reworking
iso_sock_recvmsg, to ensure that the socket lock is always released
before calling a function that locks hdev.
[ 561.670344] ======================================================
[ 561.670346] WARNING: possible circular locking dependency detected
[ 561.670349] 6.12.0-rc6+ #26 Not tainted
[ 561.670351] ------------------------------------------------------
[ 561.670353] iso-tester/3289 is trying to acquire lock:
[ 561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
at: iso_conn_big_sync+0x73/0x260 [bluetooth]
[ 561.670405]
but task is already holding lock:
[ 561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
[ 561.670450]
which lock already depends on the new lock.
[ 561.670452]
the existing dependency chain (in reverse order) is:
[ 561.670453]
-> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}:
[ 561.670458] lock_acquire+0x7c/0xc0
[ 561.670463] lock_sock_nested+0x3b/0xf0
[ 561.670467] bt_accept_dequeue+0x1a5/0x4d0 [bluetooth]
[ 561.670510] iso_sock_accept+0x271/0x830 [bluetooth]
[ 561.670547] do_accept+0x3dd/0x610
[ 561.670550] __sys_accept4+0xd8/0x170
[ 561.670553] __x64_sys_accept+0x74/0xc0
[ 561.670556] x64_sys_call+0x17d6/0x25f0
[ 561.670559] do_syscall_64+0x87/0x150
[ 561.670563] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670567]
-> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[ 561.670571] lock_acquire+0x7c/0xc0
[ 561.670574] lock_sock_nested+0x3b/0xf0
[ 561.670577] iso_sock_listen+0x2de/0xf30 [bluetooth]
[ 561.670617] __sys_listen_socket+0xef/0x130
[ 561.670620] __x64_sys_listen+0xe1/0x190
[ 561.670623] x64_sys_call+0x2517/0x25f0
[ 561.670626] do_syscall_64+0x87/0x150
[ 561.670629] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670632]
-> #0 (&hdev->lock){+.+.}-{3:3}:
[ 561.670636] __lock_acquire+0x32ad/0x6ab0
[ 561.670639] lock_acquire.part.0+0x118/0x360
[ 561.670642] lock_acquire+0x7c/0xc0
[ 561.670644] __mutex_lock+0x18d/0x12f0
[ 561.670647] mutex_lock_nested+0x1b/0x30
[ 561.670651] iso_conn_big_sync+0x73/0x260 [bluetooth]
[ 561.670687] iso_sock_recvmsg+0x3e9/0x500 [bluetooth]
[ 561.670722] sock_recvmsg+0x1d5/0x240
[ 561.670725] sock_read_iter+0x27d/0x470
[ 561.670727] vfs_read+0x9a0/0xd30
[ 561.670731] ksys_read+0x1a8/0x250
[ 561.670733] __x64_sys_read+0x72/0xc0
[ 561.670736] x64_sys_call+0x1b12/0x25f0
[ 561.670738] do_syscall_64+0x87/0x150
[ 561.670741] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670744]
other info that might help us debug this:
[ 561.670745] Chain exists of:
&hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH
[ 561.670751] Possible unsafe locking scenario:
[ 561.670753] CPU0 CPU1
[ 561.670754] ---- ----
[ 561.670756] lock(sk_lock-AF_BLUETOOTH);
[ 561.670758] lock(sk_lock
AF_BLUETOOTH-BTPROTO_ISO);
[ 561.670761] lock(sk_lock-AF_BLUETOOTH);
[ 561.670764] lock(&hdev->lock);
[ 561.670767]
*** DEADLOCK *** |
| In the Linux kernel, the following vulnerability has been resolved:
ptdma: pt_core_execute_cmd() should use spinlock
The interrupt handler (pt_core_irq_handler()) of the ptdma
driver can be called from interrupt context. The code flow
in this function can lead down to pt_core_execute_cmd() which
will attempt to grab a mutex, which is not appropriate in
interrupt context and ultimately leads to a kernel panic.
The fix here changes this mutex to a spinlock, which has
been verified to resolve the issue. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock between concurrent dio writes when low on free data space
When reserving data space for a direct IO write we can end up deadlocking
if we have multiple tasks attempting a write to the same file range, there
are multiple extents covered by that file range, we are low on available
space for data and the writes don't expand the inode's i_size.
The deadlock can happen like this:
1) We have a file with an i_size of 1M, at offset 0 it has an extent with
a size of 128K and at offset 128K it has another extent also with a
size of 128K;
2) Task A does a direct IO write against file range [0, 256K), and because
the write is within the i_size boundary, it takes the inode's lock (VFS
level) in shared mode;
3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and
then gets the extent map for the extent covering the range [0, 128K).
At btrfs_get_blocks_direct_write(), it creates an ordered extent for
that file range ([0, 128K));
4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file
range [0, 256K);
5) Task A executes btrfs_dio_iomap_begin() again, this time for the file
range [128K, 256K), and locks the file range [128K, 256K);
6) Task B starts a direct IO write against file range [0, 256K) as well.
It also locks the inode in shared mode, as it's within the i_size limit,
and then tries to lock file range [0, 256K). It is able to lock the
subrange [0, 128K) but then blocks waiting for the range [128K, 256K),
as it is currently locked by task A;
7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data
space. Because we are low on available free space, it triggers the
async data reclaim task, and waits for it to reserve data space;
8) The async reclaim task decides to wait for all existing ordered extents
to complete (through btrfs_wait_ordered_roots()).
It finds the ordered extent previously created by task A for the file
range [0, 128K) and waits for it to complete;
9) The ordered extent for the file range [0, 128K) can not complete
because it blocks at btrfs_finish_ordered_io() when trying to lock the
file range [0, 128K).
This results in a deadlock, because:
- task B is holding the file range [0, 128K) locked, waiting for the
range [128K, 256K) to be unlocked by task A;
- task A is holding the file range [128K, 256K) locked and it's waiting
for the async data reclaim task to satisfy its space reservation
request;
- the async data reclaim task is waiting for ordered extent [0, 128K)
to complete, but the ordered extent can not complete because the
file range [0, 128K) is currently locked by task B, which is waiting
on task A to unlock file range [128K, 256K) and task A waiting
on the async data reclaim task.
This results in a deadlock between 4 task: task A, task B, the async
data reclaim task and the task doing ordered extent completion (a work
queue task).
This type of deadlock can sporadically be triggered by the test case
generic/300 from fstests, and results in a stack trace like the following:
[12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds.
[12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000
[12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[12084.036599] Call Trace:
[12084.036601] <TASK>
[12084.036606] __schedule+0x3cb/0xed0
[12084.036616] schedule+0x4e/0xb0
[12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs]
[12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]
[12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.0367
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
media: mediatek: vcodec: prevent kernel crash when rmmod mtk-vcodec-dec.ko
If the driver support subdev mode, the parameter "dev->pm.dev" will be
NULL in mtk_vcodec_dec_remove. Kernel will crash when try to rmmod
mtk-vcodec-dec.ko.
[ 4380.702726] pc : do_raw_spin_trylock+0x4/0x80
[ 4380.707075] lr : _raw_spin_lock_irq+0x90/0x14c
[ 4380.711509] sp : ffff80000819bc10
[ 4380.714811] x29: ffff80000819bc10 x28: ffff3600c03e4000 x27: 0000000000000000
[ 4380.721934] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 4380.729057] x23: ffff3600c0f34930 x22: ffffd5e923549000 x21: 0000000000000220
[ 4380.736179] x20: 0000000000000208 x19: ffffd5e9213e8ebc x18: 0000000000000020
[ 4380.743298] x17: 0000002000000000 x16: ffffd5e9213e8e90 x15: 696c346f65646976
[ 4380.750420] x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000040
[ 4380.757542] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[ 4380.764664] x8 : 0000000000000000 x7 : ffff3600c7273ae8 x6 : ffffd5e9213e8ebc
[ 4380.771786] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
[ 4380.778908] x2 : 0000000000000000 x1 : ffff3600c03e4000 x0 : 0000000000000208
[ 4380.786031] Call trace:
[ 4380.788465] do_raw_spin_trylock+0x4/0x80
[ 4380.792462] __pm_runtime_disable+0x2c/0x1b0
[ 4380.796723] mtk_vcodec_dec_remove+0x5c/0xa0 [mtk_vcodec_dec]
[ 4380.802466] platform_remove+0x2c/0x60
[ 4380.806204] __device_release_driver+0x194/0x250
[ 4380.810810] driver_detach+0xc8/0x15c
[ 4380.814462] bus_remove_driver+0x5c/0xb0
[ 4380.818375] driver_unregister+0x34/0x64
[ 4380.822288] platform_driver_unregister+0x18/0x24
[ 4380.826979] mtk_vcodec_dec_driver_exit+0x1c/0x888 [mtk_vcodec_dec]
[ 4380.833240] __arm64_sys_delete_module+0x190/0x224
[ 4380.838020] invoke_syscall+0x48/0x114
[ 4380.841760] el0_svc_common.constprop.0+0x60/0x11c
[ 4380.846540] do_el0_svc+0x28/0x90
[ 4380.849844] el0_svc+0x4c/0x100
[ 4380.852975] el0t_64_sync_handler+0xec/0xf0
[ 4380.857148] el0t_64_sync+0x190/0x194
[ 4380.860801] Code: 94431515 17ffffca d503201f d503245f (b9400004) |
| In the Linux kernel, the following vulnerability has been resolved:
nvdimm: Fix firmware activation deadlock scenarios
Lockdep reports the following deadlock scenarios for CXL root device
power-management, device_prepare(), operations, and device_shutdown()
operations for 'nd_region' devices:
Chain exists of:
&nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(system_transition_mutex);
lock(&nvdimm_bus->reconfig_mutex);
lock(system_transition_mutex);
lock(&nvdimm_region_key);
Chain exists of:
&cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&cxl_root_key);
lock(acpi_scan_lock);
lock(&cxl_root_key);
lock(&cxl_nvdimm_bridge_key);
These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
which walks the entire system device topology taking device_lock() along
the way. The nvdimm_bus_lock() is protecting against unregistration,
multiple simultaneous ops callers, and preventing activate_show() from
racing activate_store(). For the first 2, the lock is redundant.
Unregistration already flushes all ops users, and sysfs already prevents
multiple threads to be active in an ops handler at the same time. For
the last userspace should already be waiting for its last
activate_store() to complete, and does not need activate_show() to flush
the write side, so this lock usage can be deleted in these attributes. |
| In the Linux kernel, the following vulnerability has been resolved:
tty: fix deadlock caused by calling printk() under tty_port->lock
pty_write() invokes kmalloc() which may invoke a normal printk() to print
failure message. This can cause a deadlock in the scenario reported by
syz-bot below:
CPU0 CPU1 CPU2
---- ---- ----
lock(console_owner);
lock(&port_lock_key);
lock(&port->lock);
lock(&port_lock_key);
lock(&port->lock);
lock(console_owner);
As commit dbdda842fe96 ("printk: Add console owner and waiter logic to
load balance console writes") said, such deadlock can be prevented by
using printk_deferred() in kmalloc() (which is invoked in the section
guarded by the port->lock). But there are too many printk() on the
kmalloc() path, and kmalloc() can be called from anywhere, so changing
printk() to printk_deferred() is too complicated and inelegant.
Therefore, this patch chooses to specify __GFP_NOWARN to kmalloc(), so
that printk() will not be called, and this deadlock problem can be
avoided.
Syzbot reported the following lockdep error:
======================================================
WARNING: possible circular locking dependency detected
5.4.143-00237-g08ccc19a-dirty #10 Not tainted
------------------------------------------------------
syz-executor.4/29420 is trying to acquire lock:
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: console_trylock_spinning kernel/printk/printk.c:1752 [inline]
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: vprintk_emit+0x2ca/0x470 kernel/printk/printk.c:2023
but task is already holding lock:
ffff8880119c9158 (&port->lock){-.-.}-{2:2}, at: pty_write+0xf4/0x1f0 drivers/tty/pty.c:120
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&port->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
tty_port_tty_get drivers/tty/tty_port.c:288 [inline] <-- lock(&port->lock);
tty_port_default_wakeup+0x1d/0xb0 drivers/tty/tty_port.c:47
serial8250_tx_chars+0x530/0xa80 drivers/tty/serial/8250/8250_port.c:1767
serial8250_handle_irq.part.0+0x31f/0x3d0 drivers/tty/serial/8250/8250_port.c:1854
serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1827 [inline] <-- lock(&port_lock_key);
serial8250_default_handle_irq+0xb2/0x220 drivers/tty/serial/8250/8250_port.c:1870
serial8250_interrupt+0xfd/0x200 drivers/tty/serial/8250/8250_core.c:126
__handle_irq_event_percpu+0x109/0xa50 kernel/irq/handle.c:156
[...]
-> #1 (&port_lock_key){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
serial8250_console_write+0x184/0xa40 drivers/tty/serial/8250/8250_port.c:3198
<-- lock(&port_lock_key);
call_console_drivers kernel/printk/printk.c:1819 [inline]
console_unlock+0x8cb/0xd00 kernel/printk/printk.c:2504
vprintk_emit+0x1b5/0x470 kernel/printk/printk.c:2024 <-- lock(console_owner);
vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394
printk+0xba/0xed kernel/printk/printk.c:2084
register_console+0x8b3/0xc10 kernel/printk/printk.c:2829
univ8250_console_init+0x3a/0x46 drivers/tty/serial/8250/8250_core.c:681
console_init+0x49d/0x6d3 kernel/printk/printk.c:2915
start_kernel+0x5e9/0x879 init/main.c:713
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
-> #0 (console_owner){....}-{0:0}:
[...]
lock_acquire+0x127/0x340 kernel/locking/lockdep.c:4734
console_trylock_spinning kernel/printk/printk.c:1773
---truncated--- |