Commit Graph

251 Commits

Author SHA1 Message Date
Patrick McHardy
b4e9b520ca [NET_SCHED]: Add mask support to fwmark classifier
Support masking the nfmark value before the search. The mask value is
global for all filters contained in one instance. It can only be set
when a new instance is created, all filters must specify the same mask.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:12 -07:00
Patrick McHardy
efa741656e [NETFILTER]: x_tables: remove unused size argument to check/destroy functions
The size is verified by x_tables and isn't needed by the modules anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:34 -07:00
Patrick McHardy
fe1cb10873 [NETFILTER]: x_tables: remove unused argument to target functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:33 -07:00
David S. Miller
e9ce1cd3cf [PKT_SCHED]: Kill pkt_act.h inlining.
This was simply making templates of functions and mostly causing a lot
of code duplication in the classifier action modules.

We solve this more cleanly by having a common "struct tcf_common" that
hash worker functions contained once in act_api.c can work with.

Callers work with real action objects that have the common struct
plus their module specific struct members.  You go from a common
object to the higher level one using a "to_foo()" macro which makes
use of container_of() to do the dirty work.

This also kills off act_generic.h which was only used by act_simple.c
and keeping it around was more work than the it's value.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:10 -07:00
Thomas Graf
2942e90050 [RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:48 -07:00
Stephen Hemminger
3696f625e2 [HTB]: rbtree cleanup
Add code to initialize rb tree nodes, and check for double deletion.
This is not a real fix, but I can make it trap sometimes and may
be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:34 -07:00
Stephen Hemminger
0cef296da9 [HTB]: Use hlist for hash lists.
Use hlist instead of list for the hash list. This saves
space, and we can check for double delete better.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:34 -07:00
Stephen Hemminger
87990467d3 [HTB]: Lindent
Code was a mess in terms of indentation.  Run through Lindent
script, and cleanup the damage. Also, don't use, vim magic
comment, and substitute inline for __inline__.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:33 -07:00
Stephen Hemminger
18a63e868b [HTB]: HTB_HYSTERESIS cleanup
Change the conditional compilation around HTB_HYSTERSIS
since code was splitting mid expression.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:32 -07:00
Stephen Hemminger
9ac961ee05 [HTB]: Remove lock macro.
Get rid of the macro's being used to obscure the locking.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:31 -07:00
Stephen Hemminger
3bf72957d2 [HTB]: Remove broken debug code.
The HTB network scheduler had debug code that wouldn't compile
and confused and obfuscated the code, remove it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:30 -07:00
Patrick McHardy
84fa7933a3 [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).

Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:53 -07:00
Herbert Xu
d7811e623d [NET]: Drop tx lock in dev_watchdog_up
Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM.

On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
>  (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290
> 
> but task is already holding lock:
>  (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290
> 
> which lock already depends on the new lock.

This turns out to be a genuine bug.  The queue lock and xmit lock are
intentionally taken out of order.  Two things are supposed to prevent
dead-locks from occuring:

1) When we hold the queue_lock we're supposed to only do try_lock on the
tx_lock.

2) We always drop the queue_lock after taking the tx_lock and before doing
anything else.

> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&dev->_xmit_lock){-+..}:
>        [<c012e7b6>] lock_acquire+0x76/0xa0
>        [<c0336241>] _spin_lock_bh+0x31/0x40
>        [<c02d25a9>] dev_activate+0x69/0x120

This path obviously breaks assumption 1) and therefore can lead to ABBA
dead-locks.

I've looked at the history and there seems to be no reason for the lock
to be held at all in dev_watchdog_up.  The lock appeared in day one and
even there it was unnecessary.  In fact, people added __dev_watchdog_up
precisely in order to get around the tx lock there.

The function dev_watchdog_up is already serialised by rtnl_lock since
its only caller dev_activate is always called under it.

So here is a simple patch to remove the tx lock from dev_watchdog_up.
In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and
replace it with dev_watchdog_up.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-18 00:22:30 -07:00
Ralf Hildebrandt
c0956bd251 [PKT_SCHED] cls_u32: Fix typo.
Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:54 -07:00
Jamal Hadi Salim
b9e2cc0f0e [PKT_SCHED]: Return ENOENT if qdisc module is unavailable
Return ENOENT if qdisc module is unavailable

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:49 -07:00
Panagiotis Issaris
0da974f4f3 [NET]: Conversions from kmalloc+memset to k(z|c)alloc.
Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:51:30 -07:00
Guillaume Chazarain
89e1df74f8 [PKT_SCHED] netem: Fix slab corruption with netem (2nd try)
CONFIG_DEBUG_SLAB found the following bug:
netem_enqueue() in sch_netem.c gets a pointer inside a slab object:
struct netem_skb_cb *cb = (struct netem_skb_cb *)skb->cb;
But then, the slab object may be freed:
skb = skb_unshare(skb, GFP_ATOMIC)
cb is still pointing inside the freed skb, so here is a patch to
initialize cb later, and make it clear that initializing it sooner
is a bad idea.

[From Stephen Hemminger: leave cb unitialized in order to let gcc
complain in case of use before initialization]

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:45:25 -07:00
Dave Jones
2724a1a55f [PATCH] sch_htb compile fix.
net/sched/sch_htb.c: In function 'htb_change_class':
net/sched/sch_htb.c:1605: error: expected ';' before 'do_gettimeofday'

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-15 11:40:20 -07:00
Stephen Hemminger
b3a6251915 [PKT_SCHED] HTB: initialize upper bound properly
The upper bound for HTB time diff needs to be scaled to PSCHED
units rather than just assuming usecs.  The field mbuffer is used
in TDIFF_SAFE(), as an upper bound.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-14 16:32:27 -07:00
Stephen Hemminger
781b456a98 [MAINTAINERS]: Add proper entry for TC classifier
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12 13:58:48 -07:00
Thomas Graf
ebbaeab18b [PKT_SCHED]: act_api: Fix module leak while flushing actions
Module reference needs to be given back if message header
construction fails.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-09 11:36:23 -07:00
Thomas Graf
4fe683f50d [PKT_SCHED]: Fix error handling while dumping actions
"return -err" and blindly inheriting the error code in the netlink
failure exception handler causes errors codes to be returned as
positive value therefore making them being ignored by the caller.

May lead to sending out incomplete netlink messages.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05 20:47:28 -07:00
Thomas Graf
d152b4e1e9 [PKT_SCHED]: Return ENOENT if action module is unavailable
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05 20:45:57 -07:00
Thomas Graf
26dab8930b [PKT_SCHED]: Fix illegal memory dereferences when dumping actions
The TCA_ACT_KIND attribute is used without checking its
availability when dumping actions therefore leading to a
value of 0x4 being dereferenced.

The use of strcmp() in tc_lookup_action_n() isn't safe
when fed with string from an attribute without enforcing
proper NUL termination.

Both bugs can be triggered with malformed netlink message
and don't require any privileges.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05 20:45:06 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Matt LaPlante
3539c272f1 Kconfig: Typos in net/sched/Kconfig
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:53:46 +02:00
Herbert Xu
f6a78bfcb1 [NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.

The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:31 -07:00
Herbert Xu
d4828d85d1 [NET]: Prevent transmission after dev_deactivate
The dev_deactivate function has bit-rotted since the introduction of
lockless drivers.  In particular, the spin_unlock_wait call at the end
has no effect on the xmit routine of lockless drivers.

With a little bit of work, we can make it much more useful by providing
the guarantee that when it returns, no more calls to the xmit routine
of the underlying driver will be made.

The idea is simple.  There are two entry points in to the xmit routine.
The first comes from dev_queue_xmit.  That one is easily stopped by
using synchronize_rcu.  This works because we set the qdisc to noop_qdisc
before the synchronize_rcu call.  That in turn causes all subsequent
packets sent to dev_queue_xmit to be dropped.  The synchronize_rcu call
also ensures all outstanding calls leave their critical section.

The other entry point is from qdisc_run.  Since we now have a bit that
indicates whether it's running, all we have to do is to wait until the
bit is off.

I've removed the loop to wait for __LINK_STATE_SCHED to clear.  This is
useless because netif_wake_queue can cause it to be set again.  It is
also harmless because we've disarmed qdisc_run.

I've also removed the spin_unlock_wait on xmit_lock because its only
purpose of making sure that all outstanding xmit_lock holders have
exited is also given by dev_watchdog_down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:26 -07:00
Herbert Xu
48d83325b6 [NET]: Prevent multiple qdisc runs
Having two or more qdisc_run's contend against each other is bad because
it can induce packet reordering if the packets have to be requeued.  It
appears that this is an unintended consequence of relinquinshing the queue
lock while transmitting.  That in turn is needed for devices that spend a
lot of time in their transmit routine.

There are no advantages to be had as devices with queues are inherently
single-threaded (the loopback device is not but then it doesn't have a
queue).

Even if you were to add a queue to a parallel virtual device (e.g., bolt
a tbf filter in front of an ipip tunnel device), you would still want to
process the queue in sequence to ensure that the packets are ordered
correctly.

The solution here is to steal a bit from net_device to prevent this.

BTW, as qdisc_restart is no longer used by anyone as a module inside the
kernel (IIRC it used to with netif_wake_queue), I have not exported the
new __qdisc_run function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19 23:57:59 -07:00
Herbert Xu
932ff279a4 [NET]: Add netif_tx_lock
Various drivers use xmit_lock internally to synchronise with their
transmission routines.  They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set.  This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire.  So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner.  The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly.  I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond.  It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission.  This is
unsafe as an IRQ can potentially wake up the queue.  So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:14 -07:00
Stephen Hemminger
338f7566e5 [PKT_SCHED]: Potential jiffy wrap bug in dev_watchdog().
There is a potential jiffy wraparound bug in the transmit watchdog
that is easily avoided by using time_after().

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-16 15:02:12 -07:00
Patrick McHardy
210525d65d [NET_SCHED]: HFSC: fix thinko in hfsc_adjust_levels()
When deleting the last child the level of a class should drop to zero.

Noticed by Andreas Mueller <andreas@stapelspeicher.org>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-11 12:22:03 -07:00
Stephen Hemminger
89bbb0a361 [PKT_SCHED] netem: fix loss
The following one line fix is needed to make loss function of
netem work right when doing loss on the local host.
Otherwise, higher layers just recover.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-29 18:33:12 -07:00
Patrick McHardy
18118cdbfd [NETFILTER]: ipt action: use xt_check_target for basic verification
The targets don't do the basic verification themselves anymore so
the ipt action needs to take care of it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-24 17:27:34 -07:00
Jamal Hadi Salim
83b950c89c [PKT_SCHED] act_police: Rename methods.
Rename policer specific _generic_ methods to be specific to
_act_police_

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09 22:25:46 -07:00
Patrick McHardy
1ae39a430b [NET_SCHED]: cls_u32: remove unnecessary NULL-ptr check
In both cases n can't be NULL without crashing anyway.

Coverity #78

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23 01:16:48 -08:00
Adrian Bunk
afb5bb5744 [PKT_SCHED]: Let NET_CLS_ACT no longer depend on EXPERIMENTAL
This option should IMHO no longer depend on EXPERIMENTAL.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:44:24 -08:00
Stephen Hemminger
1533306186 [NET]: dev_put/dev_hold cleanup
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel.  And turn dev_hold into an inline instead of a macro.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:28 -08:00
Patrick McHardy
f38c39d6ce [PKT_SCHED]: Convert sch_red to a classful qdisc
Convert sch_red to a classful qdisc. All qdiscs that maintain accurate
backlog counters are eligible as child qdiscs. When a queue limit larger
than zero is given, a bfifo qdisc is used for backwards compatibility.
Current versions of tc enforce a limit larger than zero, other users
can avoid creating the default qdisc by using zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:20:44 -08:00
Patrick McHardy
f5539eb8ca [PKT_SCHED]: Keep backlog counter in sch_sfq
Keep backlog counter in SFQ qdisc to make it usable as child qdisc
with RED.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:38 -08:00
Patrick McHardy
053cfed75d [PKT_SCHED]: Restore TBF change semantic
When TBF was converted to a classful qdisc, the semantic of the limit
parameter was broken. On initilization an inner bfifo qdisc is created
for backwards compatibility, when changing parameters however the new
limit is ignored and the current child qdisc remains in place.

Always replace the child qdisc by the default bfifo when limit is above
zero, otherwise don't touch the inner qdisc. Current tc version enforce
a limit above zero, other users can avoid creating the inner qdisc by
using zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:21 -08:00
Patrick McHardy
cdc7f8e362 [PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark}
A qdisc should set tcm_info to the child qdisc handle in its class
dump function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:06 -08:00
Patrick McHardy
6d037a26f0 [PKT_SCHED]: Qdisc drop operation is optional
The drop operation is optional and qdiscs must check if childs support it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:00:49 -08:00
Patrick McHardy
1c524830d0 [NETFILTER]: x_tables: pass registered match/target data to match/target functions
This allows to make decisions based on the revision (and address family
with a follow-up patch) at runtime.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:02:15 -08:00
Patrick McHardy
f6e57464df [NET_SCHED]: act_api: fix skb leak in error path
The skb is allocated by the function, so it needs to be freed instead
of trimmed on overrun.

Coverity #614

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:36 -08:00
Patrick McHardy
ae82af54d7 [PKT_SCHED]: Handle SCTP/DCCP in sfq_hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 13:01:06 -08:00
Amnon Aaronsohn
dd914b4082 [PKT_SCHED] sch_prio: fix qdisc bands init
Currently when PRIO is configured to use N bands, it lets the packets be
directed to any of the bands 0..N-1. However, PRIO attaches a fifo qdisc
only to the bands that appear in the priomap; the rest of the N bands
remain with a noop qdisc attached. This patch changes PRIO's behavior so
that it attaches a fifo qdisc to all of the N bands.

Signed-off-by: Amnon Aaronsohn <bla@cs.huji.ac.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:24:26 -08:00
Patrick McHardy
dca80b962a [PKT_SCHED]: Change default clock source to gettimeofday
The default of using jiffies is very bad and results in
underutilization except with very low bandwidth.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:36:55 -08:00
Harald Welte
2e4e6a17af [NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables
This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables.  In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.

o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
  wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
  are now implemented as xt_FOOBAR.c files and provide module aliases
  to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
  include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
  around the xt_FOOBAR.h headers

Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12 14:06:43 -08:00
Adrian Bunk
bb7e8c5a55 [PKT_SCHED] net/sched/Kconfig: fix typo in NET_EMATCH_META description
Noted by Matt LaPlante <webmaster@cyberdogtech.com>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:40:30 -08:00
Evgeniy Polyakov
54608b7099 [PKT_SCHED] ematch: Remove bogus include.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:16 -08:00
Jamal Hadi Salim
29f1df6cc1 [PKT_SCHED]: Fix qdisc return code.
The mapping between TC_ACTION_SHOT and the qdisc return codes is better
suited to NET_XMIT_BYPASS so as not to confuse TCP

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:26 -08:00
Patrick McHardy
4bba392592 [PKT_SCHED]: Prefix tc actions with act_
Clean up the net/sched directory a bit by prefix all actions with act_.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:14 -08:00
Patrick McHardy
541673c859 [PKT_SCHED]: Fix memory leak when dumping in pedit action
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:12 -08:00
Patrick McHardy
31bd06eb33 [PKT_SCHED]: Remove some obsolete policer exports
Also make sure the legacy code is only built when CONFIG_NET_CLS_ACT
is not set.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:10 -08:00
Patrick McHardy
f43c5a0df3 [PKT_SCHED]: Convert tc action functions to single skb pointers
tcf_action_exec only gets a single skb pointer and doesn't own the skb,
but passes double skb pointers (to a local variable) to the action
functions. Change to use single skb pointers everywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:08 -08:00
Patrick McHardy
538e43a4bd [PKT_SCHED]: Use USEC_PER_SEC
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:05 -08:00
Patrick McHardy
2941a48631 [NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:03 -08:00
Arnaldo Carvalho de Melo
14c850212e [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:21 -08:00
Stephen Hemminger
c865e5d99e [PKT_SCHED] netem: packet corruption option
Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:05 -08:00
David S. Miller
2edc2689f8 [PKT_SCHED]: Disable debug tracing logs by default in packet action API.
Noticed by Andi Kleen.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-13 22:59:50 -08:00
Andrea Bittau
aa8751667d [PKT_SCHED]: sch_netem: correctly order packets to be sent simultaneously
If two packets were queued to be sent at the same time in the future,
their order would be reversed.  This would occur because the queue is
traversed back to front, and a position is found by checking whether
the new packet needs to be sent before the packet being examined.  If
the new packet is to be sent at the same time of a previous packet, it
would end up before the old packet in the queue.  This patch places
packets in the correct order when they are queued to be sent at a same
time in the future.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 13:41:05 -08:00
Roman Zippel
05b8b0fafd [NET]: Sanitize NET_SCHED protection in /net/sched/Kconfig
On Thu, 17 Nov 2005, David Gmez wrote:

> I found out that if i select NET_CLS_ROUTE4, save my changes and exit
> menuconfig, execute again make menuconfig and go to QoS options, then the new
> available options are visible. So menuconfig has some problem refreshing
> contents :?

No, they were there before too, but you have to go up one level to see 
them.

It's better in 2.6.15-rc1-git5, but the menu structure is still a little 
messed up, the patch below properly indents all menu entries.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 15:22:39 -08:00
Jesper Juhl
a51482bde2 [NET]: kfree cleanup
From: Jesper Juhl <jesper.juhl@gmail.com>

This is the net/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in net/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-08 09:41:34 -08:00
Thomas Graf
b541ca2c5a [PKT_SCHED]: Correctly handle empty ematch trees
Fixes an invalid memory reference when the basic classifier
is used without any ematches but just actions.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:39:17 -08:00
Arnaldo Carvalho de Melo
2d43f1128a Merge branch 'red' of 84.73.165.173:/home/tgr/repos/net-2.6 2005-11-05 22:30:29 -02:00
Stephen Hemminger
eb229c4cdc [NETEM]: Add version string
Add a version string to help support issues.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 20:59:21 -02:00
Stephen Hemminger
300ce174eb [NETEM]: Support time based reordering
Change netem to support packets getting reordered because of variations in
delay. Introduce a special case version of FIFO that queues packets in order
based on the netem delay.

Since netem is classful, those users that don't want jitter based reordering
can just insert a pfifo instead of the default.

This required changes to generic skbuff code to allow finer grain manipulation
of sk_buff_head.  Insertion into the middle and reverse walk.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 20:56:41 -02:00
Thomas Graf
bdc450a0bb [PKT_SCHED]: (G)RED: Introduce hard dropping
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.

This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.

The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:29 +01:00
Thomas Graf
b38c7eef7e [PKT_SCHED]: GRED: Support ECN marking
Adds a new u8 flags in a unused padding area of the netlink
message. Adds ECN marking support to be used instead of dropping
packets immediately.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:29 +01:00
Thomas Graf
d8f64e1960 [PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and drop
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
1e4dfaf9b9 [PKT_SCHED]: GRED: Cleanup and remove unnecessary code
Removes unnecessary includes, initializers, and simplifies
the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
6214e653cc [PKT_SCHED]: GRED: Remove auto-creation of default VQ
Since we are no longer depending on the default VQ to be always
allocated we can leave it up to the user to actually create it.
This gives the user the ability to leave it out on purpose and
enqueue packets directly to the device without applying the RED
algorithm.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
7051703b99 [PKT_SCHED]: GRED: Dont abuse default VQ for equalizing
Introduces a new red parameter set for use in equalize mode,
although only the qavg variable and the idle period marker are
being used for now this makes it possible to allow a separate
parameter set to be used for equalize later on.

The use of this separate parameter set fixes a bogus start of
an idle period in gred_drop() which did start an idle period
on the default VQ even if equalize mode was disabled.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
4a591834cf [PKT_SCHED]: GRED: Remove initd flag
The case when the default VQ is not set up yet is already handled
in a less error prone way.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
18e3fb84e6 [PKT_SCHED]: GRED: Improve error handling and messages
Try to enqueue packets if we cannot associate it with a VQ, this
basically means that the default VQ has not been set up yet.

We must check if the VQ still exists while requeueing, the VQ
might have been changed between dequeue and the requeue of the
underlying qdisc.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
716a1b40b0 [PKT_SCHED]: GRED: Introduce tc_index_to_dp()
Adds a transformation function returning the DP index for a
given skb according to its tc_index.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
edf7a7b1f0 [PKT_SCHED]: GRED: Use generic queue management interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
c3b553cdaf [PKT_SCHED]: GRED: Report congestion related drops as NET_XMIT_CN
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
301d063c29 [PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_change
Qdiscs are not supposed to reset statistics in reset() and while
changing parameters. My argumentation is that if the user wants
the counters to be reset he can simply remove and readd the
qdiscs, that's what most users do anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
22b33429ab [PKT_SCHED]: GRED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.

This brings GRED back to the level of RED and improves the
accuracy of the averge queue length calculations when stab
suggests a zero shift.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
f62d6b936d [PKT_SCHED]: GRED: Use central VQ change procedure
Introduces a function gred_change_vq() acting as a central point
to change VQ parameters. Fixes priority inheritance in rio mode
when the default DP equals 0. Adds proper locking during changes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
a8aaa9958e [PKT_SCHED]: GRED: Report out-of-bound DPs as illegal
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
6639607ed9 [PKT_SCHED]: GRED: Use a central table definition change procedure
Introduces a function gred_change_table_def() acting as a central
point to change the table definition.

Adds missing validations for table definition: MAX_DPs > DPs > 0
and def_DP < DPs thus fixing possible invalid memory reference
oopses. Only root could do it but having a typo crashing the
machine is a bit hard.

Adds missing locking while changing the table definition, the
operation of changing the number of DPs and removing shadowed VQs
may not be interrupted by a dequeue.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
e06368221c [PKT_SCHED]: GRED: Dump table definition
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
05f1cc01b4 [PKT_SCHED]: GRED: Cleanup dumping
Avoids the allocation of a buffer by appending the VQs directly
to the skb and simplifies the code by using the appropriate
message construction macros.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
d6fd4e9667 [PKT_SCHED]: GRED: Transform grio to GRED_RIO_MODE
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
dea3f62852 [PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detection
Introduces a flags variable using bitops and transforms eqp to use
it. Converts the conditions of the form (wred && rio) to (wred)
since wred can only be enabled in rio mode anyway.

The patch also improves WRED mode detection. The current behaviour
does not allow WRED mode to be turned off again without removing
the whole qdisc first. The new algorithm checks each VQ against
each other looking for equal priorities every time a VQ is changed
or added. The performance is poor, O(n**2), but it's used only
during administrative tasks and the number of VQs is strictly
limited.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
dba051f36a [PKT_SCHED]: RED: Cleanup and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit. Removes Jamal's
obsolete email addresses upon his own request.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
6a1b63d467 [PKT_SCHED]: RED: Dont start idle periods while already idling
We should not interrupt and restart an idle period while idling already.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
9e178ff27c [PKT_SCHED]: RED: Use generic queue management interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
6b31b28a44 [PKT_SCHED]: RED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Stephen Hemminger
07aaa11540 [NETEM]: use PSCHED_LESS
Convert netem to use PSCHED_LESS and warn if requeue fails.
With some of the psched clock sources, the subtraction doesn't
work always work right without wrapping.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 17:03:46 -02:00
Thomas Graf
52ab4ac258 [PKT_SCHED]: Rework QoS and/or fair queueing configuration
Make "QoS and/or fair queueing" have its own menu, it's too big to be
inlined into "Network options". Remove the obsolete NET_QOS option.
Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR
but allow it to be selected manually for statistical purposes. Add
comments to separate queueing from classification. Fix dependencies
and ordering of classifiers. Improve descriptions/help texts and
remove outdated pieces.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-03 02:29:06 -02:00
Andi Kleen
34cb711ba9 [NET]: Disable NET_SCH_CLK_CPU for SMP x86 hosts
Opterons with frequency scaling have fully unsynchronized TSCs
running at different frequencies, so using TSCs there is not a good idea. 
Also some other x86 boxes have this problem. gettimeofday should be good 
enough, so just disable it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:41:44 -07:00
Eric Dumazet
81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
Ingo Molnar
8d06afab73 [PATCH] timer initialization cleanup: DEFINE_TIMER
Clean up timer initialization by introducing DEFINE_TIMER a'la
DEFINE_SPINLOCK.  Build and boot-tested on x86.  A similar patch has been
been in the -RT tree for some time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:48 -07:00
David S. Miller
29cb9f9c55 [LIB]: Make TEXTSEARCH_BM plain tristate like the others
And select it when the relevant modules are enabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:11 -07:00
Patrick McHardy
ac6d439d20 [NETLINK]: Convert netlink users to use group numbers instead of bitmasks
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:00:54 -07:00
Denis Vlasenko
0a242efc4f [NET]: Deinline netif_carrier_{on,off}().
# grep -r 'netif_carrier_o[nf]' linux-2.6.12 | wc -l
246

# size vmlinux.org vmlinux.carrier
text    data     bss     dec     hex filename
4339634 1054414  259296 5653344  564360 vmlinux.org
4337710 1054414  259296 5651420  563bdc vmlinux.carrier

And this ain't an allyesconfig kernel!

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:57:08 -07:00
Patrick McHardy
abc3bc5804 [NET]: Kill skb->tc_classid
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:18 -07:00
Thomas Graf
0fbbeb1ba4 [PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()
qdisc_create_dflt() is missing to destroy the newly allocated
default qdisc if the initialization fails resulting in leaks
of all kinds. The only caller in mainline which may trigger
this bug is sch_tbf.c in tbf_create_dflt_qdisc().

Note: qdisc_create_dflt() doesn't fulfill the official locking
      requirements of qdisc_destroy() but since the qdisc could
      never be seen by the outside world this doesn't matter
      and it can stay as-is until the locking of pkt_sched
      is cleaned up.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:12:44 -07:00
Patrick McHardy
7686ee1ad9 [EMATCH]: Remove feature ifdefs in meta ematch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:44:23 -07:00
David S. Miller
261688d01e [PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 14:43:52 -07:00
David S. Miller
28e212fb36 [PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 11:47:25 -07:00
David S. Miller
3f1c81ff10 [EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 17:10:55 -07:00
Thomas Graf
452f299da3 [PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue
The current call to __qdisc_dequeue_head leads to a branch
misprediction for every loop iteration, the fact that the
most common priority is 2 makes this even worse.  This issue
has been brought up by Eric Dumazet <dada1@cosmosbay.com>
but unlike his solution which was to manually unroll the loop,
this approach preserves the possibility to increase the number
of bands at compile time. 

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:30:53 -07:00
Thomas Graf
d7c7ed4dbc [PKT_SCHED]: Remove debugging leftover from textsearch ematch
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:29:49 -07:00
Sam Ravnborg
6a2e9b738c [NET]: move config options out to individual protocols
Move the protocol specific config options out to the specific protocols.
With this change net/Kconfig now starts to become readable and serve as a
good basis for further re-structuring.

The menu structure is left almost intact, except that indention is
fixed in most cases. Most visible are the INET changes where several
"depends on INET" are replaced with a single ifdef INET / endif pair.

Several new files were created to accomplish this change - they are
small but serve the purpose that config options are now distributed
out where they belongs.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:13:56 -07:00
David S. Miller
b03efcfb21 [NET]: Transform skb_queue_len() binary tests into skb_queue_empty()
This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:23 -07:00
Thomas Graf
63d886c96b [PKT_SCHED]: Blackhole queueing discipline
Useful in combination with classful qdiscs to drop or
temporary disable certain flows, e.g. one could block
specific ds flows with dsmark.

Unlike the noop qdisc it can be controlled by the user and
statistic accounting is done.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:29:16 -07:00
Thomas Graf
023e09a767 [PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation
Current behaviour is to not report an error if a rate
estimator is created together with a qdisc and the
configuration of the rate estimator is bogus. This leads
to unexpected behaviour because the user is not notified.

New behaviour is to report the error and let the whole
qdisc creation operation fail so the user is able to fix
his mistake.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:15:53 -07:00
Thomas Graf
3d54b82fdf [PKT_SCHED]: Cleanup qdisc creation and alignment macros
Adds qdisc_alloc() to share code between qdisc_create()
and qdisc_create_dflt(). Hides the qdisc alignment behind
macros and makes use of them.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:15:09 -07:00
Thomas Graf
e176fe8954 [NET]: Remove unused security member in sk_buff
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:12:44 -07:00
Patrick McHardy
8a47077a0b [NETLINK]: Missing padding fields in dumped structures
Plug holes with padding fields and initialized them to zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:56:45 -07:00
Patrick McHardy
9ef1d4c7c7 [NETLINK]: Missing initializations in dumped data
Mostly missing initialization of padding fields of 1 or 2 bytes length,
two instances of uninitialized nlmsgerr->msg of 16 bytes length.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:55:30 -07:00
David S. Miller
f7704347a7 [PKT_SCHED]: Make TEXTSEARCH* options only selected.
Do not present these confusing new options to the user
unless he picked some facility that makes use of it,
such as NET_EMATCH_TEXT.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 17:39:03 -07:00
David S. Miller
f2d368fa3e [PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARCh
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 23:55:41 -07:00
Thomas Graf
d675c989ed [PKT_SCHED]: Packet classification based on textsearch (ematch)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 21:00:58 -07:00
Thomas Graf
94df109a8c [PKT_SCHED]: noop/noqueue qdisc style cleanups
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:59:08 -07:00
Thomas Graf
f87a9c3ddf [PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary initializers,
and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:53 -07:00
Thomas Graf
321090e7a4 [PKT_SCHED]: Add and use prio2list() in the pfifo_fast qdisc
prio2list() returns the relevant sk_buff_head for the
band specified by the priority for a given skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:35 -07:00
Thomas Graf
821d24ae74 [PKT_SCHED]: Transform pfifo_fast to use generic queue management interface
Gives pfifo_fast a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:15 -07:00
Thomas Graf
6fc8e84f4c [PKT_SCHED]: Cleanup fifo qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:00 -07:00
Thomas Graf
aaae3013d1 [PKT_SCHED]: Transform fifo qdisc to use generic queue management interface
The simplicity of the fifo qdisc allows several qdisc operations to be
redirected to the relevant queue management function directly. Saves
a lot of code lines and gives the pfifo a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:57:42 -07:00
Jamal Hadi Salim
e431b8c004 [NETLINK]: Explicit typing
This patch converts "unsigned flags" to use more explict types like u16
instead and incrementally introduces NLMSG_NEW().
 
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:31 -07:00
Thomas Graf
af0d114176 [PKT_SCHED]: Logic simplifications and codingstyle/whitespace cleanups
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:29 -07:00
Thomas Graf
02f23f095f [PKT_SCHED]: Make dsmark use the new dumping macros
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:12 -07:00
Thomas Graf
758cc43c6d [PKT_SCHED]: Fix dsmark to apply changes consistent
Fixes dsmark to do all configuration sanity checks first and
only apply the changes if all of them can be applied without
any errors. Also fixes the weak sanity checks for DSMARK_VALUE
and DSMASK_MASK.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:52:54 -07:00
Ralf Baechle
979b6c135f [NET]: Move the netdev list to vger.kernel.org.
From: Ralf Baechle <ralf@linux-mips.org>

There are archives of the old list at http://oss.sgi.com/archives/netdev

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 14:30:40 -07:00
Thomas Graf
98e5640552 [PKT_SCHED]: Fix numeric comparison in meta ematch
This patch is brought to you by the department of applied stupidity.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:11:19 -07:00
Thomas Graf
e1e284a4bd [PKT_SCHED]: Dump classification result for basic classifier
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:11:02 -07:00
Thomas Graf
4890062960 [PKT_SCHED]: Allow socket attributes to be matched on via meta ematch
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:10:48 -07:00
Thomas Graf
b824979aec [PKT_SCHED]: Fix typo in NET_EMATCH_STACK help text
Spotted by Geert Uytterhoeven <geert@linux-m68k.org>.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:10:22 -07:00
Thomas Graf
08e9cd1fc5 [PKT_SCHED]: Disable dsmark debugging messages by default
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:17:28 -07:00
Thomas Graf
486b53e59c [PKT_SCHED]: make dsmark try using pfifo instead of noop while grafting
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:16:52 -07:00
Thomas Graf
0451eb074e [PKT_SCHED]: Fix dsmark to count ignored indices while walking
Unused indices which are ignored while walking must still
be counted to avoid dumping the same index twice.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31 15:15:58 -07:00
Stephen Hemminger
0dca51d362 [PKT_SCHED] netem: allow random reordering (with fix)
Here is a fixed up version of the reorder feature of netem.
It is the same as the earlier patch plus with the bugfix from Julio merged in.
Has expected backwards compatibility behaviour.

Go ahead and merge this one, the TCP strangeness I was seeing was due
to the reordering bug, and previous version of TSO patch.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:55:48 -07:00
Stephen Hemminger
0f9f32ac65 [PKT_SCHED] netem: use only inner qdisc -- no private skbuff queue
Netem works better if there if packets are just queued in the inner discipline
rather than having a separate delayed queue. Change to use the dequeue/requeue
to peek like TBF does.

By doing this potential qlen problems with the old method are avoided. The problems
happened when the netem_run that moved packets from the inner discipline to the nested
discipline failed (because inner queue was full). This happened in dequeue, so the
effective qlen of the netem would be decreased (because of the drop), but there was
no way to keep the outer qdisc (caller of netem dequeue) in sync.

The problem window is still there since this patch doesn't address the issue of
requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue
should always work.  Long term correct fix is to implement qdisc->peek in all the qdisc's
to allow for this (needed by several other qdisc's as well).

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:55:01 -07:00
Stephen Hemminger
0afb51e728 [PKT_SCHED]: netem: reinsert for duplication
Handle duplication of packets in netem by re-inserting at top of qdisc tree.
This avoid problems with qlen accounting with nested qdisc. This recursion
requires no additional locking but will potentially increase stack depth.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26 12:53:49 -07:00
J Hadi Salim
14d50e78f9 [PKT_SCHED]: Action repeat
Long standing bug.
Policy to repeat an action never worked.

Signed-off-by: J Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:29:13 -07:00
Stephen Hemminger
d5d75cd6b1 [PKT_SCHED]: netetm: adjust parent qlen when duplicating
Fix qlen underrun when doing duplication with netem. If netem is used
as leaf discipline, then the parent needs to be tweaked when packets
are duplicated.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:24:57 -07:00
Stephen Hemminger
771018e76a [PKT_SCHED]: netetm: make qdisc friendly to outer disciplines
Netem currently dumps packets into the queue when timer expires. This
patch makes work by self-clocking (more like TBF).  It fixes a bug
when 0 delay is requested (only doing loss or duplication).

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:24:32 -07:00
Stephen Hemminger
8cbe1d46d6 [PKT_SCHED]: netetm: trap infinite loop hange on qlen underflow
Due to bugs in netem (fixed by later patches), it is possible to get qdisc
qlen to go negative. If this happens the CPU ends up spinning forever
in qdisc_run(). So add a BUG_ON() to trap it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:24:03 -07:00
Tommy S. Christensen
cacaddf57e [NET]: Disable queueing when carrier is lost.
Some network drivers call netif_stop_queue() when detecting loss of
carrier. This leads to packets being queued up at the qdisc level for
an unbound period of time. In order to prevent this effect, the core
networking stack will now cease to queue packets for any device, that
is operationally down (i.e. the queue is flushed and disabled).

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 16:18:52 -07:00
Asim Shankar
033d899904 [PKT_SCHED]: HTB: Drop packet when direct queue is full
htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is
destined for the direct_queue but the direct_queue is full. (Before
this: erroneously returned NET_XMIT_SUCCESS even though the packet was
not enqueued)

Signed-off-by: Asim Shankar <asimshankar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 14:39:33 -07:00
Lucas Correia Villa Real
20cc6befa2 [PKT_SCHED]: fix typo on Kconfig
This is a trivial fix for a typo on Kconfig, where the Generic Random Early 
Detection algorithm is abbreviated as RED instead of GRED.

Signed-off-by: Lucas Correia Villa Real <lucasvr@gobolinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 14:34:20 -07:00
David S. Miller
cbdbf00aaf [PKT_SCHED]: Eliminate unnecessary includes in simple.c
Noted by Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25 12:15:01 -07:00
Thomas Graf
c5c13fafd6 [PKT_SCHED]: improve hashing performance of cls_fw
Calculate hashtable size to fit into a page instead of a hardcoded
256 buckets hash table. Results in a 1024 buckets hashtable on
most systems.

Replace old naive extract-8-lsb-bits algorithm with a better
algorithm xor'ing 3 or 4 bit fields at the size of the hashtable
array index in order to improve distribution if the majority of
the lower bits are unused while keeping zero collision behaviour
for the most common use case.

Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue
to attention and to Eran Mann <emann@mrv.com> for the initial
idea for this new algorithm.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 20:19:54 -07:00
Jamal Hadi Salim
db75307979 [PKT_SCHED]: Introduce simple actions.
And provide an example simply action in order to
demonstrate usage.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24 20:10:16 -07:00