If T2-shutdown timer is expired on a removed transport, kernel
panic will occur when we do failure management on that transport.
You can reproduce this use the following sequence:
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
<----------------- SHUTDOWN
(SRC=X)
ASCONF ----------------->
(Delete IP Address = X)
<----------------- ASCONF-ACK
(Success Indication)
<----------------- SHUTDOWN
(T2-shutdown timer expire)
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If socket is create by PF_INET type, it can not used IPv6 address
to send/recv DATA. So only enable IPv6 address support on PF_INET6
socket.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Just fix a typo in net/sctp/sm_statetable.c.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Use Unresolvable Address error cause instead of Invalid Mandatory
Parameter error cause when process ASCONF chunk with invalid address
since address parameters are not mandatory in the ASCONF chunk.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC5061 Section 5.2. Upon Reception of an ASCONF Chunk
V2) In processing the chunk, the receiver should build a
response message with the appropriate error TLVs, as
specified in the Parameter type bits, for any ASCONF
Parameter it does not understand. To indicate an
unrecognized parameter, Cause Type 8 should be used as
defined in the ERROR in Section 3.3.10.8, [RFC4960]. The
endpoint may also use the response to carry rejections for
other reasons, such as resource shortages, etc., using the
Error Cause TLV and an appropriate error condition.
So we should indicate an unrecognized parameter with error
SCTP_ERROR_UNKNOWN_PARAM in ACSONF-ACK chunk, not
SCTP_ERROR_INV_PARAM.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC5061 had changed the error cause codes for Dynamic Address
Reconfiguration As the following:
Cause Code
Value Cause Code
--------- ----------------
0x00A0 Request to Delete Last Remaining IP Address
0x00A1 Operation Refused Due to Resource Shortage
0x00A2 Request to Delete Source IP Address
0x00A3 Association Aborted Due to Illegal ASCONF-ACK
0x00A4 Request Refused - No Authorization
This patch fix the error cause codes.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Current code makes the UCC whose register range includes the current mdio
register to be the MII managemnt interface master of the QE. If there is more
than one mdio bus for QE, the UCC of the last mdio bus will be the MII
management interface master which will make the primary mdio bus working
unproperly, e.g. can not get the right clock. Normally the primary mdio bus is
the first UEC's mdio bus.
This patch allows the first UCC to be the MII management interface master of the
multiple UCC mdio buses.
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable fiber/copper auto selection for Marvell m88e1111 SGMII support.
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can remove anonymous union now it has one field.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sk_buff uses one union to define dst and rtable fields.
We want to replace direct access to these pointers by accessors.
First patch adds a new "unsigned long _skb_dst;" opaque field
in this union.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With bi-directional stress traffic, the receiver could hang causing the
hardware to stop and a "Detected Tx Unit Hang" message dumped to the system
logfile. Temporarily workaround this issue by disabling Tx flow control by
default. The issue is currently being investigated and a follow-on patch
will be provided to revert this when it is resolved.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides support for the next generation Intel desktop
and mobile gigabit ethernet LOM adapters. These adapters are the
follow-on parts to the LOMs tied to the prior ICH chipsets and are
comprised of a MAC in the PCH chipset and an external PHY (82577 for
mobile and 82578 for desktop versions). New features consist of PHY
wakeup to save power by completely turning off the MAC while in Sx
state, and 4K jumbo frames.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By putting the maximum frame size supported by the hardware into the
adapter structure, the change_mtu entry point function can be cleaned
up of checks for all the different max frame sizes supported by
Signed-off-by: David S. Miller <davem@davemloft.net>
The flow control thresholds, i.e. high and low watermarks of the Rx
FIFO for when the hardware should transmit PAUSE frames (XON and XOFF,
respectively), need to be tuned for more efficient use of the FIFO.
The logic to set the thresholds for parts that support early-receive
(ERT) was also wrong in that it should check whether jumbo frames are
in use.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During reset, the driver was attempting to disable the Smart Powerdown
feature even if the part does not support Smart Powerdown. Check for
support before attempting to disable the feature.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CRC stripping should be enabled by default but was not if it was not
specified as a module parameter.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the notify chain infrastructure and replace it
by a simple function pointer. This issue has been mentioned in the
mailing list several times: the use of the notify chain adds
too much overhead for something that is only used by ctnetlink.
This patch also changes nfnetlink_send(). It seems that gfp_any()
returns GFP_KERNEL for user-context request, like those via
ctnetlink, inside the RCU read-side section which is not valid.
Using GFP_KERNEL is also evil since netlink may schedule(),
this leads to "scheduling while atomic" bug reports.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch simplifies the conntrack event caching system by removing
several events:
* IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
since the have no clients.
* IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
days.
* IPCT_REFRESH which is not of any use since we always include the
timeout in the messages.
After this patch, the existing events are:
* IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
addition and deletion of entries.
* IPCT_STATUS, that notes that the status bits have changes,
eg. IPS_SEEN_REPLY and IPS_ASSURED.
* IPCT_PROTOINFO, that reports that internal protocol information has
changed, eg. the TCP, DCCP and SCTP protocol state.
* IPCT_HELPER, that a helper has been assigned or unassigned to this
entry.
* IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
covers the case when a mark is set to zero.
* IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
adjustment.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch moves the event flags from linux/netfilter/nf_conntrack_common.h
to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use
from userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
During the module removal there are no possible event listeners
since ctnetlink must be removed before to allow removing
nf_conntrack. This patch removes the event reporting for the
module removal case which is not of any use in the existing code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch cleans up the message calculation to make it similar
to rtnetlink, moreover, it removes unneeded verbose information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch is a cleanup, it removes the `nowait' parameter
from all *fill_info() function since it is always set to one.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch cleans up the message handling path in two aspects:
* it uses NLMSG_LENGTH() instead of NLMSG_SPACE() like rtnetlink
does in this case to check if there is enough room for the
Netlink/nfnetlink headers. No need to check for the padding room.
* it removes a redundant header size checking that has been
already do at the beginning of the function.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The patch below adds supporting TCP simultaneous open to conntrack. The
unused LISTEN state is replaced by a new state (SYN_SENT2) denoting the
second SYN sent from the reply direction in the new case. The state table
is updated and the function tcp_in_window is modified to handle
simultaneous open.
The functionality can fairly easily be tested by socat. A sample tcpdump
recording
23:21:34.244733 IP (tos 0x0, ttl 64, id 49224, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xe75f (correct), 3383710133:3383710133(0) win 5840 <mss 1460,sackOK,timestamp 173445629 0,nop,wscale 7>
23:21:34.244783 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.0.1.2020 > 192.168.0.254.2020: R, cksum 0x0253 (correct), 0:0(0) ack 3383710134 win 0
23:21:36.038680 IP (tos 0x0, ttl 64, id 28092, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.1.2020 > 192.168.0.254.2020: S, cksum 0x704b (correct), 2634546729:2634546729(0) win 5840 <mss 1460,sackOK,timestamp 824213 0,nop,wscale 1>
23:21:36.038777 IP (tos 0x0, ttl 64, id 49225, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xb179 (correct), 3383710133:3383710133(0) ack 2634546730 win 5840 <mss 1460,sackOK,timestamp 173447423 824213,nop,wscale 7>
23:21:36.038847 IP (tos 0x0, ttl 64, id 28093, offset 0, flags [DF], proto TCP (6), length 52) 192.168.0.1.2020 > 192.168.0.254.2020: ., cksum 0xebad (correct), ack 3383710134 win 2920 <nop,nop,timestamp 824213 173447423>
and the corresponding netlink events:
[NEW] tcp 6 120 SYN_SENT src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 [UNREPLIED] src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 120 LISTEN src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 60 SYN_RECV src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [ASSURED]
The RST packet was dropped in the raw table, thus it did not reach
conntrack. nfnetlink_conntrack is unpatched so it shows the new SYN_SENT2
state as the old unused LISTEN.
With TCP simultaneous open support we satisfy REQ-2 in RFC 5382 ;-) .
Additional minor correction in this patch is that in order to catch
uninitialized reply directions, "td_maxwin == 0" is used instead of
"td_end == 0" because the former can't be true except in uninitialized
state while td_end may accidentally be equal to zero in the mid of a
connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
By default the driver opens 8 TX queues (defined by MLX4_EN_NUM_TX_RINGS).
If the driver is configured to support Per Priority Flow Control, we open
8 additional TX rings.
dev->real_num_tx_queues is always set to be MLX4_EN_NUM_TX_RINGS.
The mlx4_en_select_queue() function uses standard hashing (skb_tx_hash)
in case that PPFC is not supported or the skb contain a vlan tag,
otherwise the queue is selected according to vlan priority.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt moderation should not depend on number of incoming
bytes, but on number of incoming packets.
The previous scheme caused very high interrupts rate for small
messages when big MTU was configured.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the initialization of one of the ports failed,
there is no need to fail the other one as well.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
en_params.c file now only handles Ethtool functionality
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each debug message, the message will show interface name in case
that the net device was registered, and PCI bus ID with port number
if we were not registered yet. Messages that are not port/netdev specific
stayed in the old format
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch to fix bad length checking in e1000. E1000 by default does two
things:
1) Spans rx descriptors for packets that don't fit into 1 skb on recieve
2) Strips the crc from a frame by subtracting 4 bytes from the length prior to
doing an skb_put
Since the e1000 driver isn't written to support receiving packets that span
multiple rx buffers, it checks the End of Packet bit of every frame, and
discards it if its not set. This places us in a situation where, if we have a
spanning packet, the first part is discarded, but the second part is not (since
it is the end of packet, and it passes the EOP bit test). If the second part of
the frame is small (4 bytes or less), we subtract 4 from it to remove its crc,
underflow the length, and wind up in skb_over_panic, when we try to skb_put a
huge number of bytes into the skb. This amounts to a remote DOS attack through
careful selection of frame size in relation to interface MTU. The fix for this
is already in the e1000e driver, as well as the e1000 sourceforge driver, but no
one ever pushed it to e1000. This is lifted straight from e1000e, and prevents
small frames from causing the underflow described above
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.
This patch provides a new socket option IP_MULTICAST_ALL.
In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.
Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Signed-off-by: Christoph Lameter<cl@linux.com>
Acked-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print-out the error value when sock_alloc_send_skb() fails in
the IPv6 neighbor discovery code - can be useful for debugging.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a phy_power_down parameter to forcedeth: set to 1 to power down the
phy and disable the link when an interface goes down; set to 0 to always
leave the phy powered up.
The phy power state persists across reboots; Windows, some BIOSes, and
older versions of Linux don't bother to power up the phy again, forcing
users to remove all power to get the interface working (see
http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
powered on is the safest default behavior. Users accustomed to seeing
the link state reflect the interface state and/or wanting to minimize
power consumption can set phy_power_down=1 if compatibility with other
OSes is not an issue.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network device TX is never run in IRQ context, and skb is freed outside
of the IRQ-disabling spin lock. So checking for IRQ was a waste of time
here.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that gprs_writeable() and gprs_xmit() check for
writeability at the same, we could stop the device queue forever.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>