14
326 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Vlad Buslov
|
ec694ad393 |
net/sched: cls_api: Fix lockup on flushing explicitly created chain
[ Upstream commit c9a82bec02c339cdda99b37c5e62b3b71fc4209c ]
Mingshuai Ren reports:
When a new chain is added by using tc, one soft lockup alarm will be
generated after delete the prio 0 filter of the chain. To reproduce
the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:
Fix the issue by accounting for additional reference to chains that are
explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
RTM_NEWTFILTER message.
Fixes:
|
||
Hangyu Hua
|
198da74a4e |
net: sched: fix possible refcount leak in tc_chain_tmplt_add()
[ Upstream commit 44f8baaf230c655c249467ca415b570deca8df77 ]
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.
Fixes:
|
||
Eric Dumazet
|
8f7cbd6d5e |
net: sched: move rtm_tca_policy declaration to include file
[ Upstream commit 886bc7d6ed3357975c5f1d3c784da96000d4bbb4 ]
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.
This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?
Fixes:
|
||
Vlad Buslov
|
cc5fe387c6 |
net/sched: cls_api: remove block_cb from driver_list before freeing
[ Upstream commit da94a7781fc3c92e7df7832bc2746f4d39bc624e ]
Error handler of tcf_block_bind() frees the whole bo->cb_list on error.
However, by that time the flow_block_cb instances are already in the driver
list because driver ndo_setup_tc() callback is called before that up the
call chain in tcf_block_offload_cmd(). This leaves dangling pointers to
freed objects in the list and causes use-after-free[0]. Fix it by also
removing flow_block_cb instances from driver_list before deallocating them.
[0]:
[ 279.868433] ==================================================================
[ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0
[ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963
[ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4
[ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 279.876295] Call Trace:
[ 279.876882] <TASK>
[ 279.877413] dump_stack_lvl+0x33/0x50
[ 279.878198] print_report+0xc2/0x610
[ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.879994] kasan_report+0xae/0xe0
[ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core]
[ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0
[ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.885037] ? tcf_block_setup+0x6b0/0x6b0
[ 279.885901] ? mutex_lock+0x7d/0xd0
[ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0
[ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress]
[ 279.888846] tcf_block_get_ext+0x61c/0x1200
[ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress]
[ 279.891701] qdisc_create+0x401/0xea0
[ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470
[ 279.893473] tc_modify_qdisc+0x6f7/0x16d0
[ 279.894344] ? tc_get_qdisc+0xac0/0xac0
[ 279.895213] ? mutex_lock+0x7d/0xd0
[ 279.896005] ? __mutex_lock_slowpath+0x10/0x10
[ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.898672] ? __sys_sendmsg+0xb5/0x140
[ 279.899494] ? do_syscall_64+0x3d/0x90
[ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.901337] ? kasan_save_stack+0x2e/0x40
[ 279.902177] ? kasan_save_stack+0x1e/0x40
[ 279.903058] ? kasan_set_track+0x21/0x30
[ 279.903913] ? kasan_save_free_info+0x2a/0x40
[ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0
[ 279.905741] ? kmem_cache_free+0x179/0x400
[ 279.906599] netlink_rcv_skb+0x12c/0x360
[ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.908360] ? netlink_ack+0x1550/0x1550
[ 279.909192] ? rhashtable_walk_peek+0x170/0x170
[ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390
[ 279.911086] ? _copy_from_iter+0x3d6/0xc70
[ 279.912031] netlink_unicast+0x553/0x790
[ 279.912864] ? netlink_attachskb+0x6a0/0x6a0
[ 279.913763] ? netlink_recvmsg+0x416/0xb50
[ 279.914627] netlink_sendmsg+0x7a1/0xcb0
[ 279.915473] ? netlink_unicast+0x790/0x790
[ 279.916334] ? iovec_from_user.part.0+0x4d/0x220
[ 279.917293] ? netlink_unicast+0x790/0x790
[ 279.918159] sock_sendmsg+0xc5/0x190
[ 279.918938] ____sys_sendmsg+0x535/0x6b0
[ 279.919813] ? import_iovec+0x7/0x10
[ 279.920601] ? kernel_sendmsg+0x30/0x30
[ 279.921423] ? __copy_msghdr+0x3c0/0x3c0
[ 279.922254] ? import_iovec+0x7/0x10
[ 279.923041] ___sys_sendmsg+0xeb/0x170
[ 279.923854] ? copy_msghdr_from_user+0x110/0x110
[ 279.924797] ? ___sys_recvmsg+0xd9/0x130
[ 279.925630] ? __perf_event_task_sched_in+0x183/0x470
[ 279.926656] ? ___sys_sendmsg+0x170/0x170
[ 279.927529] ? ctx_sched_in+0x530/0x530
[ 279.928369] ? update_curr+0x283/0x4f0
[ 279.929185] ? perf_event_update_userpage+0x570/0x570
[ 279.930201] ? __fget_light+0x57/0x520
[ 279.931023] ? __switch_to+0x53d/0xe70
[ 279.931846] ? sockfd_lookup_light+0x1a/0x140
[ 279.932761] __sys_sendmsg+0xb5/0x140
[ 279.933560] ? __sys_sendmsg_sock+0x20/0x20
[ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0
[ 279.935490] do_syscall_64+0x3d/0x90
[ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.937311] RIP: 0033:0x7f21c814f887
[ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887
[ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003
[ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001
[ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
[ 279.949690] </TASK>
[ 279.950706] Allocated by task 2960:
[ 279.951471] kasan_save_stack+0x1e/0x40
[ 279.952338] kasan_set_track+0x21/0x30
[ 279.953165] __kasan_kmalloc+0x77/0x90
[ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0
[ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.956020] tcf_block_get_ext+0x61c/0x1200
[ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.957873] qdisc_create+0x401/0xea0
[ 279.958656] tc_modify_qdisc+0x6f7/0x16d0
[ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.960392] netlink_rcv_skb+0x12c/0x360
[ 279.961216] netlink_unicast+0x553/0x790
[ 279.962044] netlink_sendmsg+0x7a1/0xcb0
[ 279.962906] sock_sendmsg+0xc5/0x190
[ 279.963702] ____sys_sendmsg+0x535/0x6b0
[ 279.964534] ___sys_sendmsg+0xeb/0x170
[ 279.965343] __sys_sendmsg+0xb5/0x140
[ 279.966132] do_syscall_64+0x3d/0x90
[ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.968407] Freed by task 2960:
[ 279.969114] kasan_save_stack+0x1e/0x40
[ 279.969929] kasan_set_track+0x21/0x30
[ 279.970729] kasan_save_free_info+0x2a/0x40
[ 279.971603] ____kasan_slab_free+0x11a/0x1b0
[ 279.972483] __kmem_cache_free+0x14d/0x280
[ 279.973337] tcf_block_setup+0x29d/0x6b0
[ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0
[ 279.975186] tcf_block_get_ext+0x61c/0x1200
[ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.977065] qdisc_create+0x401/0xea0
[ 279.977857] tc_modify_qdisc+0x6f7/0x16d0
[ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.979562] netlink_rcv_skb+0x12c/0x360
[ 279.980388] netlink_unicast+0x553/0x790
[ 279.981214] netlink_sendmsg+0x7a1/0xcb0
[ 279.982043] sock_sendmsg+0xc5/0x190
[ 279.982827] ____sys_sendmsg+0x535/0x6b0
[ 279.983703] ___sys_sendmsg+0xeb/0x170
[ 279.984510] __sys_sendmsg+0xb5/0x140
[ 279.985298] do_syscall_64+0x3d/0x90
[ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.987532] The buggy address belongs to the object at ffff888147e2bf00
which belongs to the cache kmalloc-192 of size 192
[ 279.989747] The buggy address is located 32 bytes inside of
freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0)
[ 279.992367] The buggy address belongs to the physical page:
[ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a
[ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001
[ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 280.000894] page dumped because: kasan: bad access detected
[ 280.002386] Memory state around the buggy address:
[ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.007700] ^
[ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.011564] ==================================================================
Fixes:
|
||
Eric Dumazet
|
22d95b5449 |
net_sched: add __rcu annotation to netdev->qdisc
commit 5891cd5ec46c2c2eb6427cb54d214b149635dd0e upstream.
syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.
[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
__dev_open+0x2e9/0x3a0 net/core/dev.c:1416
__dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
__rtnl_newlink net/core/rtnetlink.c:3489 [inline]
rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2496
__do_sys_sendmsg net/socket.c:2505 [inline]
__se_sys_sendmsg net/socket.c:2503 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
__tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2496
__do_sys_sendmsg net/socket.c:2505 [inline]
__se_sys_sendmsg net/socket.c:2503 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes:
|
||
Hangyu Hua
|
903f7d322c |
net: sched: fix possible refcount leak in tc_new_tfilter()
[ Upstream commit c2e1cfefcac35e0eea229e148c8284088ce437b5 ]
tfilter_put need to be called to put the refount got by tp->ops->get to
avoid possible refcount leak when chain->tmplt_ops != NULL and
chain->tmplt_ops != tp->ops.
Fixes:
|
||
Marcelo Ricardo Leitner
|
94541468c1 |
net/sched: fix initialization order when updating chain 0 head
[ Upstream commit e65812fd22eba32f11abe28cb377cbd64cfb1ba0 ]
Currently, when inserting a new filter that needs to sit at the head
of chain 0, it will first update the heads pointer on all devices using
the (shared) block, and only then complete the initialization of the new
element so that it has a "next" element.
This can lead to a situation that the chain 0 head is propagated to
another CPU before the "next" initialization is done. When this race
condition is triggered, packets being matched on that CPU will simply
miss all other filters, and will flow through the stack as if there were
no other filters installed. If the system is using OVS + TC, such
packets will get handled by vswitchd via upcall, which results in much
higher latency and reordering. For other applications it may result in
packet drops.
This is reproducible with a tc only setup, but it varies from system to
system. It could be reproduced with a shared block amongst 10 veth
tunnels, and an ingress filter mirroring packets to another veth.
That's because using the last added veth tunnel to the shared block to
do the actual traffic, it makes the race window bigger and easier to
trigger.
The fix is rather simple, to just initialize the next pointer of the new
filter instance (tp) before propagating the head change.
The fixes tag is pointing to the original code though this issue should
only be observed when using it unlocked.
Fixes:
|
||
Eric Dumazet
|
b1d17e920d |
net: sched: fix use-after-free in tc_new_tfilter()
commit 04c2a47ffb13c29778e2a14e414ad4cb5a5db4b5 upstream.
Whenever tc_new_tfilter() jumps back to replay: label,
we need to make sure @q and @chain local variables are cleared again,
or risk use-after-free as in [1]
For consistency, apply the same fix in tc_ctl_chain()
BUG: KASAN: use-after-free in mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
Write of size 8 at addr ffff8880985c4b08 by task syz-executor.4/1945
CPU: 0 PID: 1945 Comm: syz-executor.4 Not tainted 5.17.0-rc1-syzkaller-00495-gff58831fa02d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
tcf_chain_head_change_item net/sched/cls_api.c:372 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:386
tcf_chain_tp_insert net/sched/cls_api.c:1657 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1707 [inline]
tc_new_tfilter+0x1e67/0x2350 net/sched/cls_api.c:2086
rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x331/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmmsg+0x195/0x470 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2647172059
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f2645aa5168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f2647285100 RCX: 00007f2647172059
RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000006
RBP: 00007f26471cc08d R08: 0000000000000000 R09: 0000000000000000
R10: 9e00000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffb3f7f02f R14: 00007f2645aa5300 R15: 0000000000022000
</TASK>
Allocated by task 1944:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:436 [inline]
____kasan_kmalloc mm/kasan/common.c:515 [inline]
____kasan_kmalloc mm/kasan/common.c:474 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:524
kmalloc_node include/linux/slab.h:604 [inline]
kzalloc_node include/linux/slab.h:726 [inline]
qdisc_alloc+0xac/0xa10 net/sched/sch_generic.c:941
qdisc_create.constprop.0+0xce/0x10f0 net/sched/sch_api.c:1211
tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5592
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x331/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmmsg+0x195/0x470 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 3609:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free+0x130/0x160 mm/kasan/common.c:328
kasan_slab_free include/linux/kasan.h:236 [inline]
slab_free_hook mm/slub.c:1728 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1754
slab_free mm/slub.c:3509 [inline]
kfree+0xcb/0x280 mm/slub.c:4562
rcu_do_batch kernel/rcu/tree.c:2527 [inline]
rcu_core+0x7b8/0x1540 kernel/rcu/tree.c:2778
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
__call_rcu kernel/rcu/tree.c:3026 [inline]
call_rcu+0xb1/0x740 kernel/rcu/tree.c:3106
qdisc_put_unlocked+0x6f/0x90 net/sched/sch_generic.c:1109
tcf_block_release+0x86/0x90 net/sched/cls_api.c:1238
tc_new_tfilter+0xc0d/0x2350 net/sched/cls_api.c:2148
rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x331/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmmsg+0x195/0x470 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff8880985c4800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of
1024-byte region [ffff8880985c4800, ffff8880985c4c00)
The buggy address belongs to the page:
page:ffffea0002617000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985c0
head:ffffea0002617000 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888010c41dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 1941, ts 1038999441284, free_ts 1033444432829
prep_new_page mm/page_alloc.c:2434 [inline]
get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271
alloc_slab_page mm/slub.c:1799 [inline]
allocate_slab mm/slub.c:1944 [inline]
new_slab+0x28a/0x3b0 mm/slub.c:2004
___slab_alloc+0x87c/0xe90 mm/slub.c:3018
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105
slab_alloc_node mm/slub.c:3196 [inline]
slab_alloc mm/slub.c:3238 [inline]
__kmalloc+0x2fb/0x340 mm/slub.c:4420
kmalloc include/linux/slab.h:586 [inline]
kzalloc include/linux/slab.h:715 [inline]
__register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1335
neigh_sysctl_register+0x2c8/0x5e0 net/core/neighbour.c:3787
devinet_sysctl_register+0xb1/0x230 net/ipv4/devinet.c:2618
inetdev_init+0x286/0x580 net/ipv4/devinet.c:278
inetdev_event+0xa8a/0x15d0 net/ipv4/devinet.c:1532
notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1919
call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
call_netdevice_notifiers net/core/dev.c:1945 [inline]
register_netdevice+0x1073/0x1500 net/core/dev.c:9698
veth_newlink+0x59c/0xa90 drivers/net/veth.c:1722
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1352 [inline]
free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
free_unref_page_prepare mm/page_alloc.c:3325 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3404
release_pages+0x748/0x1220 mm/swap.c:956
tlb_batch_pages_flush mm/mmu_gather.c:50 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:243 [inline]
tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:250
zap_pte_range mm/memory.c:1441 [inline]
zap_pmd_range mm/memory.c:1490 [inline]
zap_pud_range mm/memory.c:1519 [inline]
zap_p4d_range mm/memory.c:1540 [inline]
unmap_page_range+0x1d1d/0x2a30 mm/memory.c:1561
unmap_single_vma+0x198/0x310 mm/memory.c:1606
unmap_vmas+0x16b/0x2f0 mm/memory.c:1638
exit_mmap+0x201/0x670 mm/mmap.c:3178
__mmput+0x122/0x4b0 kernel/fork.c:1114
mmput+0x56/0x60 kernel/fork.c:1135
exit_mm kernel/exit.c:507 [inline]
do_exit+0xa3c/0x2a30 kernel/exit.c:793
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
__do_sys_exit_group kernel/exit.c:946 [inline]
__se_sys_exit_group kernel/exit.c:944 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Memory state around the buggy address:
ffff8880985c4a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880985c4a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880985c4b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880985c4b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880985c4c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes:
|
||
Baowen Zheng
|
9cb405ee53 |
flow_offload: return EOPNOTSUPP for the unsupported mpls action type
[ Upstream commit 166b6a46b78bf8b9559a6620c3032f9fe492e082 ]
We need to return EOPNOTSUPP for the unsupported mpls action type when
setup the flow action.
In the original implement, we will return 0 for the unsupported mpls
action type, actually we do not setup it and the following actions
to the flow action entry.
Fixes:
|
||
Vlad Buslov
|
066a637d1c |
net: sched: lock action when translating it to flow_action infra
[ Upstream commit 7a47281439ba00b11fc098f36695522184ce5a82 ] In order to remove dependency on rtnl lock, take action's tcfa_lock when constructing its representation as flow_action_entry structure. Refactor tcf_sample_get_group() to assume that caller holds tcf_lock and don't take it manually. This callback is only called from flow_action infra representation translator which now calls it with tcf_lock held, so this refactoring is necessary to prevent deadlock. Allocate memory with GFP_ATOMIC flag for ip_tunnel_info copy because tcf_tunnel_info_copy() is only called from flow_action representation infra code with tcf_lock spinlock taken. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Yajun Deng
|
2e6ab87f8e |
net: sched: cls_api: Fix the the wrong parameter
[ Upstream commit 9d85a6f44bd5585761947f40f7821c9cd78a1bbe ]
The 4th parameter in tc_chain_notify() should be flags rather than seq.
Let's change it back correctly.
Fixes:
|
||
Vlad Buslov
|
976ee31ea3 |
net: sched: fix police ext initialization
commit 396d7f23adf9e8c436dd81a69488b5b6a865acf8 upstream. When police action is created by cls API tcf_exts_validate() first conditional that calls tcf_action_init_1() directly, the action idr is not updated according to latest changes in action API that require caller to commit newly created action to idr with tcf_idr_insert_many(). This results such action not being accessible through act API and causes crash reported by syzbot: ================================================================== BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline] BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204 CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 ================================================================== Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G B 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 panic+0x306/0x73d kernel/panic.c:231 end_report+0x58/0x5e mm/kasan/report.c:100 __kasan_report mm/kasan/report.c:403 [inline] kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Kernel Offset: disabled Fix the issue by calling tcf_idr_insert_many() after successful action initialization. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Toke Høiland-Jørgensen
|
9b7fd81cf9 |
sched: consistently handle layer3 header accesses in the presence of VLANs
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes:
|
||
Cong Wang
|
7d8da6d7d9 |
net_sched: fix tcm_parent in tc filter dump
[ Upstream commit a7df4870d79b00742da6cc93ca2f336a71db77f7 ] When we tell kernel to dump filters from root (ffff:ffff), those filters on ingress (ffff:0000) are matched, but their true parents must be dumped as they are. However, kernel dumps just whatever we tell it, that is either ffff:ffff or ffff:0000: $ nl-cls-list --dev=dummy0 --parent=root cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all $ nl-cls-list --dev=dummy0 --parent=ffff: cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all This is confusing and misleading, more importantly this is a regression since 4.15, so the old behavior must be restored. And, when tc filters are installed on a tc class, the parent should be the classid, rather than the qdisc handle. Commit |
||
Eric Dumazet
|
9b60a32108 |
net_sched: use validated TCA_KIND attribute in tc_new_tfilter()
[ Upstream commit 36d79af7fb59d6d9106feb9c1855eb93d6d53fe6 ]
sysbot found another issue in tc_new_tfilter().
We probably should use @name which contains the sanitized
version of TCA_KIND.
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:608 [inline]
BUG: KMSAN: uninit-value in string+0x522/0x690 lib/vsprintf.c:689
CPU: 1 PID: 10753 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
string_nocheck lib/vsprintf.c:608 [inline]
string+0x522/0x690 lib/vsprintf.c:689
vsnprintf+0x207d/0x31b0 lib/vsprintf.c:2574
__request_module+0x2ad/0x11c0 kernel/kmod.c:143
tcf_proto_lookup_ops+0x241/0x720 net/sched/cls_api.c:139
tcf_proto_create net/sched/cls_api.c:262 [inline]
tc_new_tfilter+0x2a4e/0x5010 net/sched/cls_api.c:2058
rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415
netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45b349
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f88b3948c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f88b39496d4 RCX: 000000000045b349
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000099f R14: 00000000004cb163 R15: 000000000075bfd4
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2774 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
|
||
Davide Caratti
|
c40e059513 |
net/sched: add delete_empty() to filters and use it in cls_flower
[ Upstream commit a5b72a083da197b493c7ed1e5730d62d3199f7d6 ]
Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:
6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.
This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.
Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller
Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes:
|
||
John Hurley
|
554d2e14c5 |
net: sched: allow indirect blocks to bind to clsact in TC
[ Upstream commit 25a443f74bcff2c4d506a39eae62fc15ad7c618a ]
When a device is bound to a clsact qdisc, bind events are triggered to
registered drivers for both ingress and egress. However, if a driver
registers to such a device using the indirect block routines then it is
assumed that it is only interested in ingress offload and so only replays
ingress bind/unbind messages.
The NFP driver supports the offload of some egress filters when
registering to a block with qdisc of type clsact. However, on unregister,
if the block is still active, it will not receive an unbind egress
notification which can prevent proper cleanup of other registered
callbacks.
Modify the indirect block callback command in TC to send messages of
ingress and/or egress bind depending on the qdisc in use. NFP currently
supports egress offload for TC flower offload so the changes are only
added to TC.
Fixes:
|
||
John Hurley
|
1b511a9d2c |
net: core: rename indirect block ingress cb function
[ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ]
With indirect blocks, a driver can register for callbacks from a device
that is does not 'own', for example, a tunnel device. When registering to
or unregistering from a new device, a callback is triggered to generate
a bind/unbind event. This, in turn, allows the driver to receive any
existing rules or to properly clean up installed rules.
When first added, it was assumed that all indirect block registrations
would be for ingress offloads. However, the NFP driver can, in some
instances, support clsact qdisc binds for egress offload.
Change the name of the indirect block callback command in flow_offload to
remove the 'ingress' identifier from it. While this does not change
functionality, a follow up patch will implement a more more generic
callback than just those currently just supporting ingress offload.
Fixes:
|
||
Eric Dumazet
|
c774abc607 |
net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
[ Upstream commit 2dd5616ecdcebdf5a8d007af64e040d4e9214efe ]
Use the new tcf_proto_check_kind() helper to make sure user
provided value is well formed.
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
__msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
string_nocheck lib/vsprintf.c:606 [inline]
string+0x4be/0x600 lib/vsprintf.c:668
vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
__request_module+0x2b1/0x11c0 kernel/kmod.c:143
tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
__sys_sendmsg net/socket.c:2356 [inline]
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2363
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45a649
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
slab_alloc_node mm/slub.c:2773 [inline]
__kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x306/0xa10 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]
___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
__sys_sendmsg net/socket.c:2356 [inline]
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2363
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
|
||
John Hurley
|
59eb87cb52 |
net: sched: prevent duplicate flower rules from tcf_proto destroy race
When a new filter is added to cls_api, the function
tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
determine if the tcf_proto is duplicated in the chain's hashtable. It then
creates a new entry or continues with an existing one. In cls_flower, this
allows the function fl_ht_insert_unque to determine if a filter is a
duplicate and reject appropriately, meaning that the duplicate will not be
passed to drivers via the offload hooks. However, when a tcf_proto is
destroyed it is removed from its chain before a hardware remove hook is
hit. This can lead to a race whereby the driver has not received the
remove message but duplicate flows can be accepted. This, in turn, can
lead to the offload driver receiving incorrect duplicate flows and out of
order add/delete messages.
Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
hash table per block stores each unique chain/protocol/prio being
destroyed. This entry is only removed when the full destroy (and hardware
offload) has completed. If a new flow is being added with the same
identiers as a tc_proto being detroyed, then the add request is replayed
until the destroy is complete.
Fixes:
|
||
Cong Wang
|
6f96c3c690 |
net_sched: fix backward compatibility for TCA_KIND
Marcelo noticed a backward compatibility issue of TCA_KIND
after we move from NLA_STRING to NLA_NUL_STRING, so it is probably
too late to change it.
Instead, to make everyone happy, we can just insert a NUL to
terminate the string with nla_strlcpy() like we do for TC actions.
Fixes:
|
||
Eric Dumazet
|
3d66b89c30 |
net: sched: fix possible crash in tcf_action_destroy()
If the allocation done in tcf_exts_init() failed,
we end up with a NULL pointer in exts->actions.
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8198 Comm: syz-executor.3 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcf_action_destroy+0x71/0x160 net/sched/act_api.c:705
Code: c3 08 44 89 ee e8 4f cb bb fb 41 83 fd 20 0f 84 c9 00 00 00 e8 c0 c9 bb fb 48 89 d8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 08 00 0f 85 c0 00 00 00 4c 8b 33 4d 85 f6 0f 84 9d 00 00 00
RSP: 0018:ffff888096e16ff0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000040000 RSI: ffffffff85b6ab30 RDI: 0000000000000000
RBP: ffff888096e17020 R08: ffff8880993f6140 R09: fffffbfff11cae67
R10: fffffbfff11cae66 R11: ffffffff88e57333 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888096e177a0 R15: 0000000000000001
FS: 00007f62bc84a700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000758040 CR3: 0000000088b64000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_exts_destroy+0x38/0xb0 net/sched/cls_api.c:3030
tcindex_set_parms+0xf7f/0x1e50 net/sched/cls_tcindex.c:488
tcindex_change+0x230/0x318 net/sched/cls_tcindex.c:519
tc_new_tfilter+0xa4b/0x1c70 net/sched/cls_api.c:2152
rtnetlink_rcv_msg+0x838/0xb00 net/core/rtnetlink.c:5214
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
Fixes:
|
||
Vlad Buslov
|
470d5060e6 |
net: sched: use get_dev() action API in flow_action infra
When filling in hardware intermediate representation tc_setup_flow_action()
directly obtains, checks and takes reference to dev used by mirred action,
instead of using act->ops->get_dev() API created specifically for this
purpose. In order to remove code duplication, refactor flow_action infra to
use action API when obtaining mirred action target dev. Extend get_dev()
with additional argument that is used to provide dev destructor to the
user.
Fixes:
|
||
Vlad Buslov
|
4a5da47d5c |
net: sched: take reference to psample group in flow_action infra
With recent patch set that removed rtnl lock dependency from cls hardware
offload API rtnl lock is only taken when reading action data and can be
released after action-specific data is parsed into intermediate
representation. However, sample action psample group is passed by pointer
without obtaining reference to it first, which makes it possible to
concurrently overwrite the action and deallocate object pointed by
psample_group pointer after rtnl lock is released but before driver
finished using the pointer.
To prevent such race condition, obtain reference to psample group while it
is used by flow_action infra. Extend psample API with function
psample_group_take() that increments psample group reference counter.
Extend struct tc_action_ops with new get_psample_group() API. Implement the
API for action sample using psample_group_take() and already existing
psample_group_put() as a destructor. Use it in tc_setup_flow_action() to
take reference to psample group pointed to by entry->sample.psample_group
and release it in tc_cleanup_flow_action().
Disable bh when taking psample_groups_lock. The lock is now taken while
holding action tcf_lock that is used by data path and requires bh to be
disabled, so doing the same for psample_groups_lock is necessary to
preserve SOFTIRQ-irq-safety.
Fixes:
|
||
Vlad Buslov
|
1158958a21 |
net: sched: extend flow_action_entry with destructor
Generalize flow_action_entry cleanup by extending the structure with pointer to destructor function. Set the destructor in tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call entry->destructor() instead of using switch that dispatches by entry->id and manually executes cleanup. This refactoring is necessary for following patches in this series that require destructor to use tc_action->ops callbacks that can't be easily obtained in tc_cleanup_flow_action(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Paul Blakey
|
95a7233c45 |
net: openvswitch: Set OvS recirc_id from tc chain index
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
1444c175a3 |
net: sched: copy tunnel info when setting flow_action entry->tunnel
In order to remove dependency on rtnl lock, modify tc_setup_flow_action() to copy tunnel info, instead of just saving pointer to tunnel_key action tunnel info. This is necessary to prevent concurrent action overwrite from releasing tunnel info while it is being used by rtnl-unlocked driver. Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info with all its options to dynamically allocated memory block. Modify tc_cleanup_flow_action() to free dynamically allocated tunnel info. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
5a6ff4b13d |
net: sched: take reference to action dev before calling offloads
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
9838b20a7f |
net: sched: take rtnl lock in tc_setup_flow_action()
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
11bd634da2 |
net: sched: conditionally obtain rtnl lock in cls hw offloads API
In order to remove dependency on rtnl lock from offloads code of classifiers, take rtnl lock conditionally before executing driver callbacks. Only obtain rtnl lock if block is bound to devices that require it. Block bind/unbind code is rtnl-locked and obtains block->cb_lock while holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions to prevent deadlock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
c9f14470d0 |
net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Extend tcf_block with additional lockeddevcnt counter that is incremented for each non-unlocked driver callback attached to device. This counter is necessary to conditionally obtain rtnl lock before calling hardware callbacks in following patches. Register mlx5 tc block offload callbacks as "unlocked". Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
a449a3e77a |
net: sched: notify classifier on successful offload add/delete
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
4011921137 |
net: sched: refactor block offloads counter usage
Without rtnl lock protection filters can no longer safely manage block offloads counter themselves. Refactor cls API to protect block offloadcnt with tcf_block->cb_lock that is already used to protect driver callback list and nooffloaddevcnt counter. The counter can be modified by concurrent tasks by new functions that execute block callbacks (which is safe with previous patch that changed its type to atomic_t), however, block bind/unbind code that checks the counter value takes cb_lock in write mode to exclude any concurrent modifications. This approach prevents race conditions between bind/unbind and callback execution code but allows for concurrency for tc rule update path. Move block offload counter, filter in hardware counter and filter flags management from classifiers into cls hardware offloads API. Make functions tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API private. Implement following new cls API to be used instead: tc_setup_cb_add() - non-destructive filter add. If filter that wasn't already in hardware is successfully offloaded, increment block offloads counter, set filter in hardware counter and flag. On failure, previously offloaded filter is considered to be intact and offloads counter is not decremented. tc_setup_cb_replace() - destructive filter replace. Release existing filter block offload counter and reset its in hardware counter and flag. Set new filter in hardware counter and flag. On failure, previously offloaded filter is considered to be destroyed and offload counter is decremented. tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block offloads counter. tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and call tc_cls_offload_cnt_update() if cb() didn't return an error. Refactor all offload-capable classifiers to atomically offload filters to hardware, change block offload counter, and set filter in hardware counter and flag by means of the new cls API functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
97394bef56 |
net: sched: change tcf block offload counter type to atomic_t
As a preparation for running proto ops functions without rtnl lock, change offload counter type to atomic. This is necessary to allow updating the counter by multiple concurrent users when offloading filters to hardware from unlocked classifiers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
4f8116c850 |
net: sched: protect block offload-related fields with rw_semaphore
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock' rwsem and use it to protect flow_block->cb_list and related counters from concurrent modification. The lock is taken in read mode for read-only traversal of cb_list in tc_setup_cb_call() and write mode in all other cases. This approach ensures that: - cb_list is not changed concurrently while filters is being offloaded on block. - block->nooffloaddevcnt is checked while holding the lock in read mode, but is only changed by bind/unbind code when holding the cb_lock in write mode to prevent concurrent modification. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
wenxu
|
1150ab0f1b |
flow_offload: support get multi-subsystem block
It provide a callback list to find the blocks of tc and nft subsystems Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
wenxu
|
4e481908c5 |
flow_offload: move tc indirect block to flow offload
move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
wenxu
|
e4da910211 |
cls_api: add flow_indr_block_call function
This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
wenxu
|
f843698857 |
cls_api: remove the tcf_block cache
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem support. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
wenxu
|
242453c227 |
cls_api: modify the tc_indr_block_ing_cmd parameters.
This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
John Hurley
|
48e584ac58 |
net: sched: add ingress mirred action to hardware IR
TC mirred actions (redirect and mirred) can send to egress or ingress of a device. Currently only egress is used for hw offload rules. Modify the intermediate representation for hw offload to include mirred actions that go to ingress. This gives drivers access to such rules and can decide whether or not to offload them. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
John Hurley
|
fb1b775a24 |
net: sched: add skbedit of ptype action to hardware IR
TC rules can impliment skbedit actions. Currently actions that modify the skb mark are passed to offloading drivers via the hardware intermediate representation in the flow_offload API. Extend this to include skbedit actions that modify the packet type of the skb. Such actions may be used to set the ptype to HOST when redirecting a packet to ingress. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
John Hurley
|
6749d59016 |
net: sched: include mpls actions in hardware intermediate representation
A recent addition to TC actions is the ability to manipulate the MPLS headers on packets. In preparation to offload such actions to hardware, update the IR code to accept and prepare the new actions. Note that no driver currently impliments the MPLS dec_ttl action so this is not included. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vlad Buslov
|
503d81d428 |
net: sched: verify that q!=NULL before setting q->flags
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:
[ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 212.925445] #PF: supervisor write access in kernel mode
[ 212.925709] #PF: error_code(0x0002) - not-present page
[ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
[ 212.926302] Oops: 0002 [#1] SMP KASAN PTI
[ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512
[ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
[ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
[ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
[ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
[ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
[ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
[ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
[ 212.930588] Call Trace:
[ 212.930682] ? tc_del_tfilter+0xa40/0xa40
[ 212.930811] ? __lock_acquire+0x5b5/0x2460
[ 212.930948] ? find_held_lock+0x85/0xa0
[ 212.931081] ? tc_del_tfilter+0xa40/0xa40
[ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 212.931332] ? rtnl_dellink+0x490/0x490
[ 212.931454] ? lockdep_hardirqs_on+0x260/0x260
[ 212.931589] ? netlink_deliver_tap+0xab/0x5a0
[ 212.931717] ? match_held_lock+0x1b/0x240
[ 212.931844] netlink_rcv_skb+0xd0/0x200
[ 212.931958] ? rtnl_dellink+0x490/0x490
[ 212.932079] ? netlink_ack+0x440/0x440
[ 212.932205] ? netlink_deliver_tap+0x161/0x5a0
[ 212.932335] ? lock_downgrade+0x360/0x360
[ 212.932457] ? lock_acquire+0xe5/0x210
[ 212.932579] netlink_unicast+0x296/0x350
[ 212.932705] ? netlink_attachskb+0x390/0x390
[ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0
[ 212.932976] netlink_sendmsg+0x394/0x600
[ 212.937998] ? netlink_unicast+0x350/0x350
[ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90
[ 212.948115] ? netlink_unicast+0x350/0x350
[ 212.953185] sock_sendmsg+0x96/0xa0
[ 212.958099] ___sys_sendmsg+0x482/0x520
[ 212.962881] ? match_held_lock+0x1b/0x240
[ 212.967618] ? copy_msghdr_from_user+0x250/0x250
[ 212.972337] ? lock_downgrade+0x360/0x360
[ 212.976973] ? rwlock_bug.part.0+0x60/0x60
[ 212.981548] ? __mod_node_page_state+0x1f/0xa0
[ 212.986060] ? match_held_lock+0x1b/0x240
[ 212.990567] ? find_held_lock+0x85/0xa0
[ 212.994989] ? do_user_addr_fault+0x349/0x5b0
[ 212.999387] ? lock_downgrade+0x360/0x360
[ 213.003713] ? find_held_lock+0x85/0xa0
[ 213.007972] ? __fget_light+0xa1/0xf0
[ 213.012143] ? sockfd_lookup_light+0x91/0xb0
[ 213.016165] __sys_sendmsg+0xba/0x130
[ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0
[ 213.023870] ? handle_mm_fault+0x337/0x470
[ 213.027592] ? page_fault+0x8/0x30
[ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100
[ 213.034999] ? mark_held_locks+0x24/0x90
[ 213.038671] ? do_syscall_64+0x1e/0xe0
[ 213.042297] do_syscall_64+0x74/0xe0
[ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 213.049354] RIP: 0033:0x7fe7c527c7b8
[ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
[ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
[ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
[ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
[ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
[ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 213.112326] CR2: 0000000000000010
[ 213.117429] ---[ end trace adb58eb0a4ee6283 ]---
Verify that q pointer is not NULL before setting the 'flags' field.
Fixes:
|
||
Pablo Neira Ayuso
|
14bfb13f0e |
net: flow_offload: add flow_block structure and use it
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.
This patch restores the block sharing feature.
Fixes:
|
||
Pablo Neira Ayuso
|
a732331151 |
net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t
Rename this type definition and adapt users. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Cong Wang
|
3f05e6886a |
net_sched: unset TCQ_F_CAN_BYPASS when adding filters
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.
This can be reproduced easily with the following script:
ip li add dev dummy0 type dummy
ifconfig dummy0 up
tc qd add dev dummy0 root fq_codel
tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
ping -I dummy0 192.168.112.1
Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.
This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that. Note, both fq_codel and sfq clear this flag in their
->bind_tcf() but this is clearly not sufficient when we don't use any
class ID.
Fixes:
|
||
Vlad Buslov
|
c1a970d06f |
net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()
After recent refactoring of block offlads infrastructure, indr_dev->block
pointer is dereferenced before it is verified to be non-NULL. Example stack
trace where this behavior leads to NULL-pointer dereference error when
creating vxlan dev on system with mlx5 NIC with offloads enabled:
[ 1157.852938] ==================================================================
[ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
[ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
[ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1157.929031] Call Trace:
[ 1157.938318] dump_stack+0x9a/0xeb
[ 1157.948362] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.960262] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.972082] __kasan_report+0x176/0x192
[ 1157.982513] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1157.994348] kasan_report+0xe/0x20
[ 1158.004324] tc_indr_block_ing_cmd.isra.41+0x9c/0x160
[ 1158.015950] ? tcf_block_setup+0x430/0x430
[ 1158.026558] ? kasan_unpoison_shadow+0x30/0x40
[ 1158.037464] __tc_indr_block_cb_register+0x5f5/0xf20
[ 1158.049288] ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
[ 1158.062344] ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
[ 1158.074498] ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
[ 1158.086580] ? br_device_event+0x98/0x480 [bridge]
[ 1158.097870] ? strcmp+0x30/0x50
[ 1158.107578] mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
[ 1158.120212] notifier_call_chain+0x6d/0xa0
[ 1158.130753] register_netdevice+0x6fc/0x7e0
[ 1158.141322] ? netdev_change_features+0xa0/0xa0
[ 1158.152218] ? vxlan_config_apply+0x210/0x310 [vxlan]
[ 1158.163593] __vxlan_dev_create+0x2ad/0x520 [vxlan]
[ 1158.174770] ? vxlan_changelink+0x490/0x490 [vxlan]
[ 1158.185870] ? rcu_read_unlock+0x60/0x60 [vxlan]
[ 1158.196798] vxlan_newlink+0x99/0xf0 [vxlan]
[ 1158.207303] ? __vxlan_dev_create+0x520/0x520 [vxlan]
[ 1158.218601] ? rtnl_create_link+0x3d0/0x450
[ 1158.228900] __rtnl_newlink+0x8a7/0xb00
[ 1158.238701] ? stack_access_ok+0x35/0x80
[ 1158.248450] ? rtnl_link_unregister+0x1a0/0x1a0
[ 1158.258735] ? find_held_lock+0x6d/0xd0
[ 1158.268379] ? is_bpf_text_address+0x67/0xf0
[ 1158.278330] ? lock_acquire+0xc1/0x1f0
[ 1158.287686] ? is_bpf_text_address+0x5/0xf0
[ 1158.297449] ? is_bpf_text_address+0x86/0xf0
[ 1158.307310] ? kernel_text_address+0xec/0x100
[ 1158.317155] ? arch_stack_walk+0x92/0xe0
[ 1158.326497] ? __kernel_text_address+0xe/0x30
[ 1158.336213] ? unwind_get_return_address+0x2f/0x50
[ 1158.346267] ? create_prof_cpu_mask+0x20/0x20
[ 1158.355936] ? arch_stack_walk+0x92/0xe0
[ 1158.365117] ? stack_trace_save+0x8a/0xb0
[ 1158.374272] ? stack_trace_consume_entry+0x80/0x80
[ 1158.384226] ? match_held_lock+0x33/0x210
[ 1158.393216] ? kasan_unpoison_shadow+0x30/0x40
[ 1158.402593] rtnl_newlink+0x53/0x80
[ 1158.410925] rtnetlink_rcv_msg+0x3a5/0x600
[ 1158.419777] ? validate_linkmsg+0x400/0x400
[ 1158.428620] ? find_held_lock+0x6d/0xd0
[ 1158.437117] ? match_held_lock+0x1b/0x210
[ 1158.445760] ? validate_linkmsg+0x400/0x400
[ 1158.454642] netlink_rcv_skb+0xc7/0x1f0
[ 1158.463150] ? netlink_ack+0x470/0x470
[ 1158.471538] ? netlink_deliver_tap+0x1f3/0x5a0
[ 1158.480607] netlink_unicast+0x2ae/0x350
[ 1158.489099] ? netlink_attachskb+0x340/0x340
[ 1158.497935] ? _copy_from_iter_full+0xde/0x3b0
[ 1158.506945] ? __virt_addr_valid+0xb6/0xf0
[ 1158.515578] ? __check_object_size+0x159/0x240
[ 1158.524515] netlink_sendmsg+0x4d3/0x630
[ 1158.532879] ? netlink_unicast+0x350/0x350
[ 1158.541400] ? netlink_unicast+0x350/0x350
[ 1158.549805] sock_sendmsg+0x94/0xa0
[ 1158.557561] ___sys_sendmsg+0x49d/0x570
[ 1158.565625] ? copy_msghdr_from_user+0x210/0x210
[ 1158.574457] ? __fput+0x1e2/0x330
[ 1158.581948] ? __kasan_slab_free+0x130/0x180
[ 1158.590407] ? kmem_cache_free+0xb6/0x2d0
[ 1158.598574] ? mark_lock+0xc7/0x790
[ 1158.606177] ? task_work_run+0xcf/0x100
[ 1158.614165] ? exit_to_usermode_loop+0x102/0x110
[ 1158.622954] ? __lock_acquire+0x963/0x1ee0
[ 1158.631199] ? lockdep_hardirqs_on+0x260/0x260
[ 1158.639777] ? match_held_lock+0x1b/0x210
[ 1158.647918] ? lockdep_hardirqs_on+0x260/0x260
[ 1158.656501] ? match_held_lock+0x1b/0x210
[ 1158.664643] ? __fget_light+0xa6/0xe0
[ 1158.672423] ? __sys_sendmsg+0xd2/0x150
[ 1158.680334] __sys_sendmsg+0xd2/0x150
[ 1158.688063] ? __ia32_sys_shutdown+0x30/0x30
[ 1158.696435] ? lock_downgrade+0x2e0/0x2e0
[ 1158.704541] ? mark_held_locks+0x1a/0x90
[ 1158.712611] ? mark_held_locks+0x1a/0x90
[ 1158.720619] ? do_syscall_64+0x1e/0x2c0
[ 1158.728530] do_syscall_64+0x78/0x2c0
[ 1158.736254] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1158.745414] RIP: 0033:0x7f62d505cb87
[ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
48 89 f3 48
[ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
[ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
[ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
[ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
[ 1158.852873] ==================================================================
Introduce new function tcf_block_non_null_shared() that verifies block
pointer before dereferencing it to obtain index. Use the function in
tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
Fixes:
|
||
Pablo Neira Ayuso
|
722d36e6e2 |
net: sched: remove tcf block API
Unused, now replaced by flow block API. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Pablo Neira Ayuso
|
955bcb6ea0 |
drivers: net: use flow block API
This patch updates flow_block_cb_setup_simple() to use the flow block API. Several drivers are also adjusted to use it. This patch introduces the per-driver list of flow blocks to account for blocks that are already in use. Remove tc_block_offload alias. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> |