| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: ath12k: fix read pointer after free in ath12k_mac_assign_vif_to_vdev()
In ath12k_mac_assign_vif_to_vdev(), if arvif is created on a different
radio, it gets deleted from that radio through a call to
ath12k_mac_unassign_link_vif(). This action frees the arvif pointer.
Subsequently, there is a check involving arvif, which will result in a
read-after-free scenario.
Fix this by moving this check after arvif is again assigned via call to
ath12k_mac_assign_link_vif().
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 |
| Commvault Web Server has an unspecified vulnerability that can be exploited by a remote, authenticated attacker. According to the Commvault advisory: "Webservers can be compromised through bad actors creating and executing webshells." Fixed in version 11.36.46, 11.32.89, 11.28.141, and 11.20.217 for Windows and Linux platforms. This vulnerability was added to the CISA Known Exploited Vulnerabilities (KEV) Catalog on 2025-04-28. |
| In the Linux kernel, the following vulnerability has been resolved:
RDMA/mlx5: Fix page_size variable overflow
Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
unsigned long to support values larger than 31 and avoid overflow.
For example: If we try to register 4GB of memory that is contiguous in
physical memory, the driver will optimize the page_size and try to use
an mkey with 4GB entity size. The 'unsigned int' page_size variable will
overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().
WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G W 6.14.0-rc4-dirty #43
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
FS: 00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x81/0x130
? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
? report_bug+0xfc/0x1e0
? handle_bug+0x55/0x90
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
create_real_mr+0x54/0x150 [mlx5_ib]
ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
? walk_system_ram_range+0x79/0xd0
? ___pte_offset_map+0x1b/0x110
? __pte_offset_map_lock+0x80/0x100
ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
__x64_sys_ioctl+0x94/0xb0
do_syscall_64+0x50/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb7ecf0737b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
iio: backend: make sure to NULL terminate stack buffer
Make sure to NULL terminate the buffer in
iio_backend_debugfs_write_reg() before passing it to sscanf(). It is a
stack variable so we should not assume it will 0 initialized. |
| In the Linux kernel, the following vulnerability has been resolved:
vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint
If vhost_scsi_set_endpoint is called multiple times without a
vhost_scsi_clear_endpoint between them, we can hit multiple bugs
found by Haoran Zhang:
1. Use-after-free when no tpgs are found:
This fixes a use after free that occurs when vhost_scsi_set_endpoint is
called more than once and calls after the first call do not find any
tpgs to add to the vs_tpg. When vhost_scsi_set_endpoint first finds
tpgs to add to the vs_tpg array match=true, so we will do:
vhost_vq_set_backend(vq, vs_tpg);
...
kfree(vs->vs_tpg);
vs->vs_tpg = vs_tpg;
If vhost_scsi_set_endpoint is called again and no tpgs are found
match=false so we skip the vhost_vq_set_backend call leaving the
pointer to the vs_tpg we then free via:
kfree(vs->vs_tpg);
vs->vs_tpg = vs_tpg;
If a scsi request is then sent we do:
vhost_scsi_handle_vq -> vhost_scsi_get_req -> vhost_vq_get_backend
which sees the vs_tpg we just did a kfree on.
2. Tpg dir removal hang:
This patch fixes an issue where we cannot remove a LIO/target layer
tpg (and structs above it like the target) dir due to the refcount
dropping to -1.
The problem is that if vhost_scsi_set_endpoint detects a tpg is already
in the vs->vs_tpg array or if the tpg has been removed so
target_depend_item fails, the undepend goto handler will do
target_undepend_item on all tpgs in the vs_tpg array dropping their
refcount to 0. At this time vs_tpg contains both the tpgs we have added
in the current vhost_scsi_set_endpoint call as well as tpgs we added in
previous calls which are also in vs->vs_tpg.
Later, when vhost_scsi_clear_endpoint runs it will do
target_undepend_item on all the tpgs in the vs->vs_tpg which will drop
their refcount to -1. Userspace will then not be able to remove the tpg
and will hang when it tries to do rmdir on the tpg dir.
3. Tpg leak:
This fixes a bug where we can leak tpgs and cause them to be
un-removable because the target name is overwritten when
vhost_scsi_set_endpoint is called multiple times but with different
target names.
The bug occurs if a user has called VHOST_SCSI_SET_ENDPOINT and setup
a vhost-scsi device to target/tpg mapping, then calls
VHOST_SCSI_SET_ENDPOINT again with a new target name that has tpgs we
haven't seen before (target1 has tpg1 but target2 has tpg2). When this
happens we don't teardown the old target tpg mapping and just overwrite
the target name and the vs->vs_tpg array. Later when we do
vhost_scsi_clear_endpoint, we are passed in either target1 or target2's
name and we will only match that target's tpgs when we loop over the
vs->vs_tpg. We will then return from the function without doing
target_undepend_item on the tpgs.
Because of all these bugs, it looks like being able to call
vhost_scsi_set_endpoint multiple times was never supported. The major
user, QEMU, already has checks to prevent this use case. So to fix the
issues, this patch prevents vhost_scsi_set_endpoint from being called
if it's already successfully added tpgs. To add, remove or change the
tpg config or target name, you must do a vhost_scsi_clear_endpoint
first. |
| In the Linux kernel, the following vulnerability has been resolved:
w1: fix NULL pointer dereference in probe
The w1_uart_probe() function calls w1_uart_serdev_open() (which includes
devm_serdev_device_open()) before setting the client ops via
serdev_device_set_client_ops(). This ordering can trigger a NULL pointer
dereference in the serdev controller's receive_buf handler, as it assumes
serdev->ops is valid when SERPORT_ACTIVE is set.
This is similar to the issue fixed in commit 5e700b384ec1
("platform/chrome: cros_ec_uart: properly fix race condition") where
devm_serdev_device_open() was called before fully initializing the
device.
Fix the race by ensuring client ops are set before enabling the port via
w1_uart_serdev_open(). |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix array bounds error with may_goto
may_goto uses an additional 8 bytes on the stack, which causes the
interpreters[] array to go out of bounds when calculating index by
stack_size.
1. If a BPF program is rewritten, re-evaluate the stack size. For non-JIT
cases, reject loading directly.
2. For non-JIT cases, calculating interpreters[idx] may still cause
out-of-bounds array access, and just warn about it.
3. For jit_requested cases, the execution of bpf_func also needs to be
warned. So move the definition of function __bpf_prog_ret0_warn out of
the macro definition CONFIG_BPF_JIT_ALWAYS_ON. |
| In the Linux kernel, the following vulnerability has been resolved:
PCI: Fix NULL dereference in SR-IOV VF creation error path
Clean up when virtfn setup fails to prevent NULL pointer dereference
during device removal. The kernel oops below occurred due to incorrect
error handling flow when pci_setup_device() fails.
Add pci_iov_scan_device(), which handles virtfn allocation and setup and
cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need
to call pci_stop_and_remove_bus_device(). This prevents accessing
partially initialized virtfn devices during removal.
BUG: kernel NULL pointer dereference, address: 00000000000000d0
RIP: 0010:device_del+0x3d/0x3d0
Call Trace:
pci_remove_bus_device+0x7c/0x100
pci_iov_add_virtfn+0xfa/0x200
sriov_enable+0x208/0x420
mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core]
sriov_numvfs_store+0xae/0x1a0
[bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly] |
| In the Linux kernel, the following vulnerability has been resolved:
powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
Commit 176cda0619b6 ("powerpc/perf: Add perf interface to expose vpa
counters") introduced 'vpa_pmu' to expose Book3s-HV nested APIv2 provided
L1<->L2 context switch latency counters to L1 user-space via
perf-events. However the newly introduced PMU named 'vpa_pmu' doesn't
assign ownership of the PMU to the module 'vpa_pmu'. Consequently the
module 'vpa_pmu' can be unloaded while one of the perf-events are still
active, which can lead to kernel oops and panic of the form below on a
Pseries-LPAR:
BUG: Kernel NULL pointer dereference on read at 0x00000058
<snip>
NIP [c000000000506cb8] event_sched_out+0x40/0x258
LR [c00000000050e8a4] __perf_remove_from_context+0x7c/0x2b0
Call Trace:
[c00000025fc3fc30] [c00000025f8457a8] 0xc00000025f8457a8 (unreliable)
[c00000025fc3fc80] [fffffffffffffee0] 0xfffffffffffffee0
[c00000025fc3fcd0] [c000000000501e70] event_function+0xa8/0x120
<snip>
Kernel panic - not syncing: Aiee, killing interrupt handler!
Fix this by adding the module ownership to 'vpa_pmu' so that the module
'vpa_pmu' is ref-counted and prevented from being unloaded when perf-events
are initialized. |
| In the Linux kernel, the following vulnerability has been resolved:
net: airoha: Fix qid report in airoha_tc_get_htb_get_leaf_queue()
Fix the following kernel warning deleting HTB offloaded leafs and/or root
HTB qdisc in airoha_eth driver properly reporting qid in
airoha_tc_get_htb_get_leaf_queue routine.
$tc qdisc replace dev eth1 root handle 10: htb offload
$tc class add dev eth1 arent 10: classid 10:4 htb rate 100mbit ceil 100mbit
$tc qdisc replace dev eth1 parent 10:4 handle 4: ets bands 8 \
quanta 1514 3028 4542 6056 7570 9084 10598 12112
$tc qdisc del dev eth1 root
[ 55.827864] ------------[ cut here ]------------
[ 55.832493] WARNING: CPU: 3 PID: 2678 at 0xffffffc0798695a4
[ 55.956510] CPU: 3 PID: 2678 Comm: tc Tainted: G O 6.6.71 #0
[ 55.963557] Hardware name: Airoha AN7581 Evaluation Board (DT)
[ 55.969383] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 55.976344] pc : 0xffffffc0798695a4
[ 55.979851] lr : 0xffffffc079869a20
[ 55.983358] sp : ffffffc0850536a0
[ 55.986665] x29: ffffffc0850536a0 x28: 0000000000000024 x27: 0000000000000001
[ 55.993800] x26: 0000000000000000 x25: ffffff8008b19000 x24: ffffff800222e800
[ 56.000935] x23: 0000000000000001 x22: 0000000000000000 x21: ffffff8008b19000
[ 56.008071] x20: ffffff8002225800 x19: ffffff800379d000 x18: 0000000000000000
[ 56.015206] x17: ffffffbf9ea59000 x16: ffffffc080018000 x15: 0000000000000000
[ 56.022342] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
[ 56.029478] x11: ffffffc081471008 x10: ffffffc081575a98 x9 : 0000000000000000
[ 56.036614] x8 : ffffffc08167fd40 x7 : ffffffc08069e104 x6 : ffffff8007f86000
[ 56.043748] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
[ 56.050884] x2 : 0000000000000000 x1 : 0000000000000250 x0 : ffffff800222c000
[ 56.058020] Call trace:
[ 56.060459] 0xffffffc0798695a4
[ 56.063618] 0xffffffc079869a20
[ 56.066777] __qdisc_destroy+0x40/0xa0
[ 56.070528] qdisc_put+0x54/0x6c
[ 56.073748] qdisc_graft+0x41c/0x648
[ 56.077324] tc_get_qdisc+0x168/0x2f8
[ 56.080978] rtnetlink_rcv_msg+0x230/0x330
[ 56.085076] netlink_rcv_skb+0x5c/0x128
[ 56.088913] rtnetlink_rcv+0x14/0x1c
[ 56.092490] netlink_unicast+0x1e0/0x2c8
[ 56.096413] netlink_sendmsg+0x198/0x3c8
[ 56.100337] ____sys_sendmsg+0x1c4/0x274
[ 56.104261] ___sys_sendmsg+0x7c/0xc0
[ 56.107924] __sys_sendmsg+0x44/0x98
[ 56.111492] __arm64_sys_sendmsg+0x20/0x28
[ 56.115580] invoke_syscall.constprop.0+0x58/0xfc
[ 56.120285] do_el0_svc+0x3c/0xbc
[ 56.123592] el0_svc+0x18/0x4c
[ 56.126647] el0t_64_sync_handler+0x118/0x124
[ 56.131005] el0t_64_sync+0x150/0x154
[ 56.134660] ---[ end trace 0000000000000000 ]--- |
| In the Linux kernel, the following vulnerability has been resolved:
exfat: fix missing shutdown check
xfstests generic/730 test failed because after deleting the device
that still had dirty data, the file could still be read without
returning an error. The reason is the missing shutdown check in
->read_iter.
I also noticed that shutdown checks were missing from ->write_iter,
->splice_read, and ->mmap. This commit adds shutdown checks to all
of them. |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "smb: client: fix TCP timers deadlock after rmmod"
This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.
Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
mentioned as CVE-2024-54680, but is actually did not fix anything;
The issue can be reproduced on top of it. [0]
Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
Fix use-after-free of network namespace.") and introduced a real
issue by reviving the kernel TCP socket.
When a reconnect happens for a CIFS connection, the socket state
transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
in tcp_close() stops all timers for the socket.
If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.
Usually, FIN can be retransmitted by the peer, but if the peer aborts
the connection, the issue comes into reality.
I warned about this privately by pointing out the exact report [1],
but the bogus fix was finally merged.
So, we should not stop the timers to finally kill the connection on
our side in that case, meaning we must not use a kernel socket for
TCP whose sk->sk_net_refcnt is 0.
The kernel socket does not have a reference to its netns to make it
possible to tear down netns without cleaning up every resource in it.
For example, tunnel devices use a UDP socket internally, but we can
destroy netns without removing such devices and let it complete
during exit. Otherwise, netns would be leaked when the last application
died.
However, this is problematic for TCP sockets because TCP has timers to
close the connection gracefully even after the socket is close()d. The
lifetime of the socket and its netns is different from the lifetime of
the underlying connection.
If the socket user does not maintain the netns lifetime, the timer could
be fired after the socket is close()d and its netns is freed up, resulting
in use-after-free.
Actually, we have seen so many similar issues and converted such sockets
to have a reference to netns.
That's why I converted the CIFS client socket to have a reference to
netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
fix**.
Regarding the LOCKDEP issue, we can prevent the module unload by
bumping the module refcount when switching the LOCKDDEP key in
sock_lock_init_class_and_name(). [2]
For a while, let's revert the bogus fix.
Note that now we can use sk_net_refcnt_upgrade() for the socket
conversion, but I'll do so later separately to make backport easy. |
| In the Linux kernel, the following vulnerability has been resolved:
staging: vchiq_arm: Fix possible NPR of keep-alive thread
In case vchiq_platform_conn_state_changed() is never called or fails before
driver removal, ka_thread won't be a valid pointer to a task_struct. So
do the necessary checks before calling kthread_stop to avoid a crash. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: don't unregister hook when table is dormant
When nf_tables_updchain encounters an error, hook registration needs to
be rolled back.
This should only be done if the hook has been registered, which won't
happen when the table is flagged as dormant (inactive).
Just move the assignment into the registration block. |
| In the Linux kernel, the following vulnerability has been resolved:
uprobes/x86: Harden uretprobe syscall trampoline check
Jann reported a possible issue when trampoline_check_ip returns
address near the bottom of the address space that is allowed to
call into the syscall if uretprobes are not set up:
https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf
Though the mmap minimum address restrictions will typically prevent
creating mappings there, let's make sure uretprobe syscall checks
for that. |
| In the Linux kernel, the following vulnerability has been resolved:
x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
When verify_sha256_digest() fails, __apply_microcode_amd() should propagate
the failure by returning false (and not -1 which is promoted to true). |
| In the Linux kernel, the following vulnerability has been resolved:
LoongArch: BPF: Don't override subprog's return value
The verifier test `calls: div by 0 in subprog` triggers a panic at the
ld.bu instruction. The ld.bu insn is trying to load byte from memory
address returned by the subprog. The subprog actually set the correct
address at the a5 register (dedicated register for BPF return values).
But at commit 73c359d1d356 ("LoongArch: BPF: Sign-extend return values")
we also sign extended a5 to the a0 register (return value in LoongArch).
For function call insn, we later propagate the a0 register back to a5
register. This is right for native calls but wrong for bpf2bpf calls
which expect zero-extended return value in a5 register. So only move a0
to a5 for native calls (i.e. non-BPF_PSEUDO_CALL). |
| In the Linux kernel, the following vulnerability has been resolved:
net: ibmveth: make veth_pool_store stop hanging
v2:
- Created a single error handling unlock and exit in veth_pool_store
- Greatly expanded commit message with previous explanatory-only text
Summary: Use rtnl_mutex to synchronize veth_pool_store with itself,
ibmveth_close and ibmveth_open, preventing multiple calls in a row to
napi_disable.
Background: Two (or more) threads could call veth_pool_store through
writing to /sys/devices/vio/30000002/pool*/*. You can do this easily
with a little shell script. This causes a hang.
I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new
kernel. I ran this test again and saw:
Setting pool0/active to 0
Setting pool1/active to 1
[ 73.911067][ T4365] ibmveth 30000002 eth0: close starting
Setting pool1/active to 1
Setting pool1/active to 0
[ 73.911367][ T4366] ibmveth 30000002 eth0: close starting
[ 73.916056][ T4365] ibmveth 30000002 eth0: close complete
[ 73.916064][ T4365] ibmveth 30000002 eth0: open starting
[ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
[ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
[ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than 122 seconds.
[ 243.683827][ T123] Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8
[ 243.683833][ T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000
[ 243.683852][ T123] Call Trace:
[ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable)
[ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0
[ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0
[ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210
[ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50
[ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0
[ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60
[ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc
[ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270
[ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0
[ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0
[ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650
[ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150
[ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340
[ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
...
[ 243.684087][ T123] Showing all locks held in the system:
[ 243.684095][ T123] 1 lock held by khungtaskd/123:
[ 243.684099][ T123] #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248
[ 243.684114][ T123] 4 locks held by stress.sh/4365:
[ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
[ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
[ 243.684143][ T123] #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
[ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60
[ 243.684166][ T123] 5 locks held by stress.sh/4366:
[ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
[ 243.
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net: decrease cached dst counters in dst_release
Upstream fix ac888d58869b ("net: do not delay dst_entries_add() in
dst_release()") moved decrementing the dst count from dst_destroy to
dst_release to avoid accessing already freed data in case of netns
dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels
are used, this fix is incomplete as the same issue will be seen for
cached dsts:
Unable to handle kernel paging request at virtual address ffff5aabf6b5c000
Call trace:
percpu_counter_add_batch+0x3c/0x160 (P)
dst_release+0xec/0x108
dst_cache_destroy+0x68/0xd8
dst_destroy+0x13c/0x168
dst_destroy_rcu+0x1c/0xb0
rcu_do_batch+0x18c/0x7d0
rcu_core+0x174/0x378
rcu_core_si+0x18/0x30
Fix this by invalidating the cache, and thus decrementing cached dst
counters, in dst_release too. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Patch series "mm: fixes for device-exclusive entries (hmm)", v2.
Discussing the PageTail() call in make_device_exclusive_range() with
Willy, I recently discovered [1] that device-exclusive handling does not
properly work with THP, making the hmm-tests selftests fail if THPs are
enabled on the system.
Looking into more details, I found that hugetlb is not properly fenced,
and I realized that something that was bugging me for longer -- how
device-exclusive entries interact with mapcounts -- completely breaks
migration/swapout/split/hwpoison handling of these folios while they have
device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making
them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive, these folios cannot get swapped out
(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much
one forces memory reclaim), and when having a memory block onlined to
ZONE_MOVABLE, trying to offline it will loop forever and complain about
failed migration of a page that should be movable.
# echo offline > /sys/devices/system/memory/memory136/state
# echo online_movable > /sys/devices/system/memory/memory136/state
# ./hmm-swap &
... wait until everything is device-exclusive
# echo offline > /sys/devices/system/memory/memory136/state
[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
index:0x7f20671f7 pfn:0x442b6a
[ 285.196618][T14882] memcg:ffff888179298000
[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
[ 285.201734][T14882] raw: ...
[ 285.204464][T14882] raw: ...
[ 285.207196][T14882] page dumped because: migration failure
[ 285.209072][T14882] page_owner tracks the page as allocated
[ 285.210915][T14882] page last allocated via order 0, migratetype
Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
...
This series fixes all issues I found so far. There is no easy way to fix
without a bigger rework/cleanup. I have a bunch of cleanups on top (some
previous sent, some the result of the discussion in v1) that I will send
out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of
these (non-present, non-none) fake-swap entries; but that just results in
the same problem we keep having (lack of spare PTE bits), and staring at
other similar fake-swap entries, that ship has sailed.
With this series, make_device_exclusive() doesn't actually belong into
mm/rmap.c anymore, but I'll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW,
so I'd appreciate some testing, especially if the interaction between two
GPUs wanting a device-exclusive entry works as expected.
<program>
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd {
__u64 addr;
__u64 ptr;
__u64 npages;
__u64 cpages;
__u64 faults;
};
const size_t size = 1 * 1024 * 1024 * 1024ul;
const size_t chunk_size = 2 * 1024 * 1024ul;
int m
---truncated--- |