Commit Graph

618 Commits

Author SHA1 Message Date
Gerrit Renker
b235dc4abb dccp ccid-2: Phase out the use of boolean Ack Vector sysctl
This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

	if (dccp_msk(sk)->dccpms_send_ack_vector)
		/* ... */
with
	if (dp->dccps_hc_rx_ackvec != NULL)
		/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
 * server when the Ack marking the transition RESPOND => OPEN arrives;
 * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been 
removed, since
 (a) connection establishment implies that the Response has been received;
 (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
     this entry will always be ignored;
 (c) it can not be used for anything useful - to detect loss for instance, only
     packets received after the loss can serve as pseudo-dupacks.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:31 +02:00
Gerrit Renker
68e074bfce dccp: Remove manual influence on NDP Count feature
Updating the NDP count feature is handled automatically now:
 * for CCID-2 it is disabled, since the code does not use NDP counts;
 * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:31 +02:00
Gerrit Renker
78673e24df dccp: Remove obsolete parts of the old CCID interface
The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs earlier in this
patch set.

Also removed the constructors for the TX CCID and the RX CCID, since the
switch rx/non-rx is done by the handler in minisocks.c (and the handler is
the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:31 +02:00
Gerrit Renker
23479cbfd3 dccp: Clean up old feature-negotiation infrastructure
The code removed by this patch is no longer referenced or used, the added
lines update documentation and copyrights.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:30 +02:00
Gerrit Renker
c49b22729f dccp: Integration of dynamic feature activation - part 3 (client side)
This integrates feature-activation in the client, with these details:

 1. When dccp_parse_options() fails, the reset code is already set, request_sent
    _state_process() currently overrides this with `Packet Error', which is not
    intended - so changed to use the reset code set in dccp_parse_options();

 2. There was a FIXME to change the error code when dccp_ackvec_add() fails.
    I have looked this up and found that: 
    * the check whether ackno < ISN is already made earlier,
    * this Response is likely the 1st packet with an Ackno that the client gets,
    * so when dccp_ackvec_add() fails, the reason is likely not a packet error.

 3. When feature negotiation fails, the socket should be marked as not usable,
    so that the application is notified that an error occurs. This is achieved
    by a new label, which uses an error code of `Aborted' and which sets the
    socket state to CLOSED, as well as sk_err.

 4. Avoids parsing the Ack twice in Respond state by not doing option processing
    again in dccp_rcv_respond_partopen_state_process (as option processing has
    already been done on the request_sock in dccp_check_req).    

Since this addresses congestion-control initialisation, a corresponding
FIXME has been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:30 +02:00
Gerrit Renker
e70cacb90d dccp: Integration of dynamic feature activation - part 2 (server side)
This patch integrates the activation of features at the end of negotiation
into the server-side code.

Note: 
  In dccp_create_openreq_child the request_sock argument is no longer constant,
  since dccp_activate_values() uses the feature-negotiation list on dreq to sort
  out the initialisation values for the different features of the child socket;
  and purges this queue after use (but the `req' argument to openreq_child
  can and does still remain constant).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:30 +02:00
Gerrit Renker
3a53a9adfa dccp: Integration of dynamic feature activation - part 1 (socket setup)
This first patch out of three replaces the hardcoded default settings with
initialisation code for the dynamic feature negotiation.

Note on retransmitting Confirm options:
---------------------------------------
This patch also defers flushing the client feature-negotiation queue,
due to the following considerations.

As long as the client is in PARTOPEN, it needs to retransmit the Confirm
options for the Change options received on the DCCP-Response from the server.

Otherwise, if the packet containing the Confirm options gets dropped in the 
network, the connection aborts due to undefined feature negotiation state.

Thanks to Leandro Melo de Sales who reported a bug in an earlier revision
of the patch set, resulting from not retransmitting the Confirm options.

The patch now ensures that the client feature-negotiation queue is flushed only
when entering the OPEN state. Since confirmed Change options are removed as
soon as they are confirmed (in the DCCP-Response), this ensures that Confirm
options are retransmitted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:30 +02:00
Gerrit Renker
c926c6aed3 dccp: Feature activation handlers
This patch provides the post-processing of feature negotiation state, after
the negotiation has completed.

To this purpose, handlers are used and added to the dccp_feat_table. Each
handler is passed a boolean flag whether the RX or TX side of the feature
is meant.

Several handlers are provided already, new handlers can easily be added.

The initialisation is now fully dynamic, i.e. CCIDs are activated only
after the feature negotiation. The integration of this dynamic activation
is done in the subsequent patches.

Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
Confirm options while copying the negotiated feature values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:30 +02:00
Gerrit Renker
d2150b7bff dccp: Processing Confirm options
Analogous to the previous patch, this adds code to interpret incoming Confirm
feature-negotiation options. Both functions operate on the feature-negotiation
list of either the request_sock (server) or the dccp_sock (client).

Thanks to Wei Yongjun for pointing out that it is overly restrictive to check
the entire list of confirmed SP values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:29 +02:00
Gerrit Renker
5a146b97d5 dccp: Process incoming Change feature-negotiation options
This adds/replaces code for processing incoming ChangeL/R options.
The main difference is that:
 * mandatory FN options are now interpreted inside the function
  (there are too many individual cases to do this externally);
 * the function returns an appropriate Reset code or 0,
   which is then used to fill in the data for the Reset packet.

Old code, which is no longer used or referenced, has been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:29 +02:00
Gerrit Renker
c664d4f4e2 dccp: Preference list reconciliation
This provides two functions to
 * reconcile preference lists (with appropriate return codes) and
 * reorder the preference list if successful reconciliation changed the
   preferred value.

The patch also removes the old code for processing SP/NN Change options, since
new code to process these is mostly there already; related references have been
commented out.

The code for processing Change options follows in the next patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:29 +02:00
Gerrit Renker
f8a644c07e dccp: Integrate feature-negotiation insertion code
The patch implements insertion of feature negotiation at the server (listening
and request socket) and the client (connecting socket).

In dccp_insert_options(), several statements have been grouped together now
to achieve (I hope) better efficiency by reducing the number of tests each
packet has to go through:
 - Ack Vectors are sent if the packet is neither a Data or a Request packet;
 - a previous issue is corrected - feature negotiation options are allowed
   on DataAck packets (5.8).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:29 +02:00
Gerrit Renker
0ef118a017 dccp: Insert feature-negotiation options into skb
This patch replaces the earlier insertion routine from options.c, so that
code specific to feature negotiation can remain in feat.c. This is possible
by calling a function already existing in options.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:29 +02:00
Gerrit Renker
cf9ddf73b9 dccp: Header option insertion routine for feature-negotiation
The patch extends existing code:
 * Confirm options divide into the confirmed value plus an optional preference
   list for SP values. Previously only the preference list was echoed for SP
   values, now the confirmed value is added as per RFC 4340, 6.1;
 * length and sanity checks are added to avoid illegal memory (or NULL) access.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:29 +02:00
Gerrit Renker
d0440ee6f6 dccp: Support for Mandatory options
Support for Mandatory options is provided by this patch, which will
be used by subsequent feature-negotiation patches.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2008-09-04 07:45:28 +02:00
Gerrit Renker
b9aaac1c53 dccp: Increase the scope of variable-length htonl/ntohl functions
This extends the scope of two available functions, encode|decode_value_var,
to work up to 6 (8) bytes, to match maximum requirements in the RFC.

These functions are going to be used both by general option processing and 
feature negotiation code, hence declarations have been put into feat.h.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2008-09-04 07:45:28 +02:00
Gerrit Renker
c8041e264b dccp: API to query the current TX/RX CCID
This provides function to query the current TX/RX CCID dynamically, without
reliance on the minisock value, using dynamic information available in the
currently loaded CCID module.

This query function is then used to 
 (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
 (b) replace the current test for "which CCID is in use" in probe.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:28 +02:00
Gerrit Renker
fade756f18 dccp: Set per-connection CCIDs via socket options
With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
overrides the defaults set by the global sysctl variables for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch set are
needed, which track dependencies and activate negotiated feature values.

Note on the maximum number of CCIDs that can be registered:
-----------------------------------------------------------
The maximum number of CCIDs that can be registered on the socket is constrained
by the space in a Confirm/Change feature negotiation option. 

The space in these in turn depends on the size of header options as defined
in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
ackvec.h into linux/dccp.h, clarifying its purpose.

Relative to this size, the maximum number of CCID identifiers that can be 
present in a Confirm option (which always consumes 1 byte more than a Change
option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
CCID-feature-type and one for the selected value.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:28 +02:00
Gerrit Renker
73bbe095bb dccp: Tidy up setsockopt calls
This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>
2008-09-04 07:45:28 +02:00
Gerrit Renker
17c30b40ed dccp: Deprecate Ack Ratio sysctl
This patch deprecates the Ack Ratio sysctl, since
 * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
 * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
 * even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts 
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.	

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
 * Ack Ratio is resolved as a dependency of the selected CCID;
 * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
 * what happens then is part of another patch set, since it concerns the 
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2008-09-04 07:45:28 +02:00
Gerrit Renker
20f41eee82 dccp: Feature negotiation for minimum-checksum-coverage
This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:27 +02:00
Gerrit Renker
668144f7b4 dccp: Deprecate old setsockopt framework
The previous setsockopt interface, which passed socket options via struct 
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation. These are 
essentially wrappers around the internal __feat_register functions, with 
checking added to avoid
 * wrong usage (type);
 * changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:27 +02:00
Gerrit Renker
d4c8741c43 dccp: Mechanism to resolve CCID dependencies
This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
				implementation_notes.html#ccid_dependencies

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:27 +02:00
Gerrit Renker
093e1f46cf dccp: Resolve dependencies of features on choice of CCID
This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
 * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
   endpoints which use CCID2 to disable Ack Vector features such a connection;

 * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

 * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:27 +02:00
Gerrit Renker
71bb49596b dccp: Query supported CCIDs
This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
 - a test function for internal use which is used to validate CCID requests
   made by the user;
 - a copy function so that the list can be used for feature-negotiation;   
 - documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
will not fail if the peer requests to use CCID3 instead of CCID2. 

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:27 +02:00
Gerrit Renker
86349c8d9c dccp: Registration routines for changing feature values
Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:27 +02:00
Gerrit Renker
5591d28628 dccp: Limit feature negotiation to connection setup phase
This patch starts the new implementation of feature negotiation:
 1. Although it is theoretically possible to perform feature negotiation at any
    time (and RFC 4340 supports this), in practice this is prohibitively complex,
    as it requires to put traffic on hold for each new negotiation.
 2. As a byproduct of restricting feature negotiation to connection setup, the
    feature-negotiation retransmit timer is no longer required. This part is now
    mapped onto the protocol-level retransmission.
    Details indicating why timers are no longer needed can be found on
    http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
	                                      implementation_notes.html

This patch disables anytime negotiation, subsequent patches work out full
feature negotiation support for connection setup.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:27 +02:00
Gerrit Renker
702083839b dccp: Cleanup routines for feature negotiation
This inserts the required de-allocation routines for memory allocated by 
feature negotiation in the socket destructors, replacing dccp_feat_clean()
in one instance.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:26 +02:00
Gerrit Renker
828755cee0 dccp: Per-socket initialisation of feature negotiation
This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:26 +02:00
Gerrit Renker
3001fc0569 dccp: List management for new feature negotiation
This adds list fields and list management functions for the new feature
negotiation implementation. The new code is kept in parallel to the old
code, until removed at the end of the patch set.

Thanks to Arnaldo for suggestions to improve the code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:26 +02:00
Gerrit Renker
b4eec20637 dccp: Implement lookup table for feature-negotiation information
A lookup table for feature-negotiation information, extracted from RFC 4340/42,
is provided by this patch. All currently known features can be found in this 
table, along with their feature location, their default value, and type.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:26 +02:00
Gerrit Renker
5c7c9451f1 dccp: Basic data structure for feature negotiation
This patch prepares for the new and extended feature-negotiation routines.

The following feature-negotiation data structures are provided:
	* a container for the various (SP or NN) values,
	* symbolic state names to track feature states,
	* an entry struct which holds all current information together,
	* elementary functions to fill in and process these structures.

Entry structs are arranged as FIFO for the following reason: RFC 4340 specifies
that if multiple options of the same type are present, they are processed in the
order of their appearance in the packet; which means that this order needs to be
preserved in the local data structure (the later insertion code also respects
this order).

The struct list_head has been chosen for the following reasons: the most 
frequent operations are
 * add new entry at tail (when receiving Change or setting socket options);
 * delete entry (when Confirm has been received);
 * deep copy of entire list (cloning from listening socket onto request socket).

The NN value has been set to 64 bit, which is a currently sufficient upper limit
(Sequence Window feature has 48 bit).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04 07:45:26 +02:00
Gerrit Renker
959fd992f0 dccp ccid-3: Replace lazy BUG_ON with condition
The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in
the loss history. If there is only a single loss interval, the calc_i_mean()
routine need in fact not be called (RFC 3448, 6.3.1). 

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:25 +02:00
Gerrit Renker
432649916b dccp: Toggle debug output without module unloading
This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful 
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

  echo 1 > /sys/module/dccp/parameters/dccp_debug
  echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
  echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
  echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:25 +02:00
Gerrit Renker
48816322ad dccp: Empty the write queue when disconnecting
dccp_disconnect() can be called due to several reasons:

 1. when the connection setup failed (inet_stream_connect());
 2. when shutting down (inet_shutdown(), inet_csk_listen_stop());
 3. when aborting the connection (dccp_close() with 0 linger time).

In case (1) the write queue is empty. This patch empties the write queue,
if in case (2) or (3) it was not yet empty.

This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues()
later on.

It also seems natural to do: when breaking an association, to delete all
packets that were originally intended for the soon-disconnected end (compare
with call to tcp_write_queue_purge in tcp_disconnect()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:25 +02:00
Gerrit Renker
eac7726bf5 dccp: Fill in the Data fields for "Option Error" Resets
This updates the use of the `out_invalid_option' label, which produces a 
Reset (code 5, "Option Error"), to fill in the  Data1...Data3 fields as
specified in RFC 4340, 5.6.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:25 +02:00
Gerrit Renker
faf61c3319 dccp: Silently ignore options with nonsensical lengths
This updates the option-parsing code with regard to RFC 4340, 5.8:
 "[..] options with nonsensical lengths (length byte less than two or more
  than the remaining space in the options portion of the header) MUST be
  ignored, and any option space following an option with nonsensical length
  MUST likewise be ignored."

Hence in the following cases erratic options will be ignored:
 1. The type byte of a multi-byte option is the last byte of the header
    options (i.e. effective option length of 1).
 2. The value of the length byte is less than the minimum 2. This has been 
    changed from previously 3: although no multi-byte option with a length
    less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid.
    (The switch-statement in dccp_parse has further per-option length checks.)
 3. The option length exceeds the length of the remaining option space.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:24 +02:00
Wei Yongjun
ba1a6c7bc0 dccp: Always generate a Reset in response to option errors
RFC4340 states that if a packet is received with an option error (such as a
Mandatory Option as the last byte of the option list), the endpoint should
repond with a Reset.

In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset,
while in the REQUEST/OPEN states, packets with option errors are just ignored.

The packet sequence is as follows:

Case 1:

  Endpoint A                           Endpoint B
  (CLOSED)                             (CLOSED)

               <----------------       REQUEST

  RESPONSE     ----------------->      (*1)
  (with invalid option)
               <----------------       RESET
                                       (with Reset Code 5, "Option Error")

  (*1) currently just ignored, no Reset is sent

Case 2:

  Endpoint A                           Endpoint B
  (OPEN)                               (OPEN)

  DATA-ACK     ----------------->      (*2)
  (with invalid option)
               <----------------       RESET
                                       (with Reset Code 5, "Option Error")

  (*2) currently just ignored, no Reset is sent

This patch fixes the problem, by generating a Reset instead of silently
ignoring option errors.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04 07:45:24 +02:00
Gerrit Renker
d28934ad8a dccp: Fix panic caused by too early termination of retransmission mechanism
Thanks is due to Wei Yongjun for the detailed analysis and description of this
bug at http://marc.info/?l=dccp&m=121739364909199&w=2

The problem is that invalid packets received by a client in state REQUEST cause
the retransmission timer for the DCCP-Request to be reset. This includes freeing
the Request-skb ( in dccp_rcv_request_sent_state_process() ). As a consequence,
 * the arrival of further packets cause a double-free, triggering a panic(),
 * the connection then may hang, since further retransmissions are blocked.

This patch changes the order of statements so that the retransmission timer is
reset, and the pending Request freed, only if a valid Response has arrived (or
the number of sysctl-retries has been exhausted).

Further changes:
----------------
To be on the safe side, replaced __kfree_skb with kfree_skb so that if due to
unexpected circumstances the sk_send_head is NULL the WARN_ON is used instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-18 21:14:20 -07:00
Arnaldo Carvalho de Melo
3e8a0a559c dccp: change L/R must have at least one byte in the dccpsf_val field
Thanks to Eugene Teo for reporting this problem.
    
Signed-off-by: Eugene Teo <eugenete@kernel.sg>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 13:48:39 -07:00
Gui Jianfeng
6edafaaf6f tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup
If the following packet flow happen, kernel will panic.
MathineA			MathineB
		SYN
	---------------------->    
        	SYN+ACK
	<----------------------
		ACK(bad seq)
	---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[  302.817075] Oops: 0000 [#1] SMP 
[  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[  302.849946] 
[  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
[  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
[  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
[  302.900140] Call Trace:
[  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
[  302.910082]  [<c04250a3>] printk+0x14/0x18
[  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
[  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
[  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
[  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
[  302.948999]  [<c040564b>] do_softirq+0x55/0x88
[  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[  302.954986]  [<c04284da>] irq_exit+0x35/0x69
[  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
[  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
[  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
[  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
[  302.972169]  =======================
[  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
[  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[  303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 23:50:04 -07:00
Wei Yongjun
860239c56b dccp: Add check for truncated ICMPv6 DCCP error packets
This patch adds a minimum-length check for ICMPv6 packets, as per the previous
patch for ICMPv4 payloads.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:11 +01:00
Wei Yongjun
18e1d83600 dccp: Fix incorrect length check for ICMPv4 packets
Unlike TCP, which only needs 8 octets of original packet data, DCCP requires
minimally 12 or 16 bytes for ICMP-payload sequence number checks.

This patch replaces the insufficient length constant of 8 with a two-stage
test, making sure that 12 bytes are available, before computing the basic
header length required for sequence number checks.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:10 +01:00
Wei Yongjun
e0bcfb0c6a dccp: Add check for sequence number in ICMPv6 message
This adds a sequence number check for ICMPv6 DCCP error packets, in the same
manner as it has been done for ICMPv4 in the previous patch.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:10 +01:00
Wei Yongjun
d68f0866f7 dccp: Fix sequence number check for ICMPv4 packets
The payload of ICMP message is a part of the packet sent by ourself,
so the sequence number check must use AWL and AWH, not SWL and SWH.

For example:
     Endpoint A                  Endpoint B

     DATA-ACK       -------->
     (SEQ=X)
                    <--------    ICMP (Fragmentation Needed)
                                 (SEQ=X)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:10 +01:00
Gerrit Renker
73f18fdbca dccp: Bug-Fix - AWL was never updated
The AWL lower Ack validity window advances in proportion to GSS, the greatest
sequence number sent. Updating AWL other than at connection setup (in the
DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code.

This bug lead to syslog messages such as

 "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] 
  P.ackno exists or LAWL(82947089) <= P.ackno(82948208)
                                   <= S.AWH(82948728), sending SYNC..."

The difference between AWL/AWH here is 1639 packets, while the expected value
(the Sequence Window) would have been 100 (the default).  A closer look showed
that LAWL = AWL = 82947089 equalled the ISS on the Response.

The patch now updates AWL with each increase of GSS.


Further changes:
----------------
The patch also enforces more stringent checks on the ISS sequence number:

 * AWL is initialised to ISS at connection setup and remains at this value;
 * AWH is then always set to GSS (via dccp_update_gss());
 * so on the first Request: AWL =      AWH = ISS,
   and on the n-th Request: AWL = ISS, AWH = ISS + n.

As a consequence, only Response packets that refer to Requests sent by this
host will pass, all others are discarded. This is the intention and in effect 
implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>   
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-07-26 11:59:10 +01:00
Gerrit Renker
59435444a1 dccp: Allow to distinguish original and retransmitted packets
This patch allows the sender to distinguish original and retransmitted packets,
which is in particular needed for the retransmission of DCCP-Requests:
 * the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS;
 * all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted
   Request has sequence number ISS + n (mod 48).

To add generic support, the patch reorganises existing code so that:
 * icsk_retransmits == 0     for the original packet and
 * icsk_retransmits = n > 0  for the n-th retransmitted packet
at the time dccp_transmit_skb() is called, via dccp_retransmit_skb().
 
Thanks to Wei Yongjun for pointing this problem out.

Further changes:
----------------
 * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head
   is used for all retransmissions (the exception is client-Acks in PARTOPEN
   state, but these do not use sk_send_head);
 * since sk_send_head always contains the original skb (via dccp_entail()),
   skb_cloned() never evaluated to true and thus pskb_copy() was never used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:09 +01:00
Ilpo Järvinen
547b792cac net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 21:43:18 -07:00
Pavel Emelyanov
de0744af1f mib: add net to NET_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:16 -07:00
Pavel Emelyanov
ca12a1a443 inet: prepare net on the stack for NET accounting macros
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:28:42 -07:00