The queue handlers registered by ip[6]_queue.ko at initialization should
not be unregistered according to requests from userland program
using nfnetlink_queue. If we allow that, there is no way to register
the handlers of built-in ip[6]_queue again.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust structure size and don't expect pointers passed in from
userspace to be valid. Also replace an enum in an ABI structure
by a fixed size type.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Along comes... xt_u32, a revamped ipt_u32 from POM-NG,
Plus:
* 2007-06-02: added ipv6 support
* 2007-06-05: uses kmalloc for the big buffer
* 2007-06-05: added inversion
* 2007-06-20: use skb_copy_bits() and get rid of the big buffer
and lock (suggested by Pablo Neira Ayuso)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of target checkentry functions to boolean.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the "hotdrop" variables to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This explains the allowed upper protocol numbers. IP6T_F_NOPROTO was
introduced to use 0 as Hop-by-Hop option header, not wildcard. But that
seemed to be forgotten. 0 has been used as wildcard since 2002-08-23.
Signed-off-by: Yasuyuki Kozakai <yasuyuki@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Through the IrDA netlink set mode command, we switch to IrDA monitor
mode, where one IrLAP instance receives all the packets on the media,
without ever responding to them.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
First IrDA configuration netlink layer implementation.
Currently, we only support the set/get mode commands.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new syscall TUNSETGROUP for group ownership setting of tap
devices. The user now is allowed to send packages if either his euid or
his egid matches the one specified via tunctl (via -u or -g
respecitvely). If both, gid and uid, are set via tunctl, both have to
match.
Signed-off-by: Guido Guenther <agx@sigxcpu.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new sch_rr qdisc for multiqueue network device support. Allow
sch_prio and sch_rr to be compiled with or without multiqueue hardware
support.
sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS. This
was done since sch_prio and sch_rr only differ in their dequeue
routine.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the multiqueue hardware device support API to the core network
stack. Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.
Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add struct sockaddr_pppol2tp to carry L2TP-specific address
information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
use the union inside struct sockaddr_pppox because the L2TP-specific
data is larger than the current size of the union and we must preserve
the size of struct sockaddr_pppox for binary compatibility.
Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
L2TP counters and state from the kernel.
Add new if_pppol2tp.h header.
[ Modified to use aligned_u64 in statistics structure -DaveM ]
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.
Include padding to put the new encap_rcv field on a 4-byte boundary.
Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.
This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computation
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes MIPv6 loadable module named "mip6".
Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:
alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6
Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.
Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO
These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.
The attribute looks like this:
struct {
[ compat contents ]
struct rtattr {
.rta_len = total size,
.rta_type = type,
} rta;
struct old_structure struct;
[ nested top-level attribute ]
struct rtattr {
.rta_len = nest size,
.rta_type = type,
} nest_attr;
[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len = private attribute len,
.rta_type = private attribute typ,
} nested_attr;
struct nested_data data;
};
Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.
The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.
I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.
Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:
- sendfile eth0, no NAT: sys 0m0.388s
- sendfile eth0, NAT: sys 0m1.835s
- sendfile eth0: NAT + path: sys 0m0.370s (~ -80%)
- sendfile lo, no NAT: sys 0m0.258s
- sendfile lo, NAT: sys 0m2.609s
- sendfile lo, NAT + patch: sys 0m0.260s (~ -90%)
- sendmsg eth0, no NAT: sys 0m2.508s
- sendmsg eth0, NAT: sys 0m2.539s
- sendmsg eth0, NAT + patch: sys 0m2.445s (no change)
- sendmsg lo, no NAT: sys 0m2.151s
- sendmsg lo, NAT: sys 0m3.557s
- sendmsg lo, NAT + patch: sys 0m2.159s (~ -40%)
I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides a reusable time difference function which returns the difference in
microseconds, as often used in the DCCP code.
Commiter note: renamed ktime_delta to ktime_us_delta and put it in ktime.h.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Keep track of the number of configured ingress/egress QoS mappings to
avoid iteration while calculating the netlink attribute size.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->priority has only 32 bits and even VLAN uses 32 bit values in its API.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnetlink API for creating, changing and deleting software devices.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It hasn't "summed" anything in over 7 years, and it's
just a straight mempcy ala skb_copy_to_linear_data()
so just get rid of it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
This makes code about 300 lines smaller:
The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:
2.6.22-rc1-mm1: 0x001794d5
with the previous version: 0x00179505
with this patch: 0x00179135
The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.
This patch:
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a hybrid version of the patch to add the LZO1X compression
algorithm to the kernel. Nitin and myself have merged the best parts of
the various patches to form this version which we're both happy with (and
are jointly signing off).
The performance of this version is equivalent to the original minilzo code
it was based on. Bytecode comparisons have also been made on ARM, i386 and
x86_64 with favourable results.
There are several users of LZO lined up including jffs2, crypto and reiser4
since its much faster than zlib.
Signed-off-by: Nitin Gupta <nitingupta910@gmail.com>
Signed-off-by: Richard Purdie <rpurdie@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: at91_mci: fix hanging and rework to match flowcharts
mmc: at91_mci typo
sdhci: Fix "Unexpected interrupt" handling
mmc: fix silly copy-and-paste error
mmc: move layer init and workqueue to core file
mmc: refactor host class handling
mmc: refactor bus operations
sdhci: add ene controller id
mmc: bounce requests for simple hosts
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (32 commits)
[libata] sata_mv: print out additional chip info during probe
[libata] Use ATA_UDMAx standard masks when filling driver's udma_mask info
[libata] AHCI: Add support for Marvell AHCI-like chips (initially 6145)
[libata] Clean up driver udma_mask initializers
libata: Support chips with 64K PRD quirk
Add a PCI ID for santa rosa's PATA controller.
sata_sil24: sil24_interrupt() micro-optimisation
Add irq_flags to struct pata_platform_info
sata_promise: cleanups
[libata] pata_ixp4xx: kill unused var
ata_piix: fix pio/mwdma programming
[libata] ahci: minor internal cleanups
[ATA] Add named constant for ATAPI command DEVICE RESET
[libata] sata_sx4, sata_via: minor documentation updates
[libata] ahci: minor internal cleanups
[libata] ahci: Factor out SATA port init into a separate function
[libata] pata_sil680: minor cleanups from benh
[libata] sata_sx4: named constant cleanup
[libata] pata_ixp4xx: convert to new EH
[libata] pdc_adma: Reorder initializers with a couple structs
...
* 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block:
pipe: add documentation and comments
pipe: change the ->pin() operation to ->confirm()
Remove remnants of sendfile()
xip sendfile removal
splice: completely document external interface with kerneldoc
sendfile: remove bad_sendfile() from bad_file_ops
shmem: convert to using splice instead of sendfile()
relay: use splice_to_pipe() instead of open-coding the pipe loop
pipe: allow passing around of ops private pointer
splice: divorce the splice structure/function definitions from the pipe header
splice: relay support
sendfile: convert nfsd to splice_direct_to_actor()
sendfile: convert nfs to using splice_read()
loop: convert to using splice_direct_to_actor() instead of sendfile()
splice: add void cookie to the actor data
sendfile: kill generic_file_sendfile()
sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
sys_sendfile: switch to using ->splice_read, if available
vmsplice: add vmsplice-to-user support
splice: abstract out actor data
* 'trivial-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block:
Documentation/block/barrier.txt is not in sync with the actual code: - blk_queue_ordered() no longer has a gfp_mask parameter - blk_queue_ordered_locked() no longer exists - sd_prepare_flush() looks slightly different
Use list_for_each_entry() instead of list_for_each() in the block device
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
block/Kconfig already has its own "menuconfig" so remove these
Use menuconfigs instead of menus, so the whole menu can be disabled at once
cfq-iosched: fix async queue behaviour
unexport bio_{,un}map_user
Remove legacy CDROM drivers
[PATCH] fix request->cmd == INT cases
cciss: add new controller support for P700m
[PATCH] Remove acsi.c
[BLOCK] drop unnecessary bvec rewinding from flush_dry_bio_endio
[PATCH] cdrom_sysctl_info fix
blk_hw_contig_segment(): bad segment size checks
[TRIVIAL PATCH] Kill blk_congestion_wait() stub for !CONFIG_BLOCK
With the cfq_queue hash removal, we inadvertently got rid of the
async queue sharing. This was not intentional, in fact CFQ purposely
shares the async queue per priority level to get good merging for
async writes.
So put some logic in cfq_get_queue() to track the shared queues.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Common power driver for PDAs and phones with one or two external
power supplies (AC/USB) connected to main and backup batteries,
and optional builtin charger.
It's used to stop logic duplication through different embedded
devices. So, power supply *logic* is here. pda_power register
power supplies, and will take care about notifying batteries
about power changes through external power interface.
Currently, power consumption legal limits (including USB power
consumption) should be handled by platform code, inside set_charge
function.
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
Signed-off-by: Roman Moravcik <roman.moravcik@gmail.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This class is result of "external power" and "battery" classes merge,
as suggested by David Woodhouse. He also implemented uevent support.
Here how userspace seeing it now:
# ls /sys/class/power\ supply/
ac main-battery usb
# cat /sys/class/power\ supply/ac/type
AC
# cat /sys/class/power\ supply/usb/type
USB
# cat /sys/class/power\ supply/main-battery/type
Battery
# cat /sys/class/power\ supply/ac/online
1
# cat /sys/class/power\ supply/usb/online
0
# cat /sys/class/power\ supply/main-battery/status
Charging
# cat /sys/class/leds/h5400\:red-left/trigger
none h5400-radio timer hwtimer ac-online usb-online
main-battery-charging-or-full [main-battery-charging]
main-battery-full
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As per Andrew Mortons request, here's a set of documentation for
the generic pipe_buf_operations hooks, the pipe, and pipe_buffer
structures.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>