Commit Graph

2735 Commits

Author SHA1 Message Date
Malcolm Turnbull
4856c84c13 ipvs: load balance IPv4 connections from a local process
This allows IPVS to load balance connections made by a local process.
For example a proxy server running locally.

External client --> pound:443 -> Local:443 --> IPVS:80 --> RealServer

Signed-off-by: Siim Põder <siim@p6drad-teel.net>
Signed-off-by: Malcolm Turnbull <malcolm@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:13 +10:00
Julius Volz
f94fd04140 IPVS: Allow adding IPv6 services from userspace
Allow adding IPv6 services through the genetlink interface and add checks
to see if the chosen scheduler is supported with IPv6 and whether the
supplied prefix length is sane. Make sure the service count exported via
the sockopt interface only counts IPv4 services.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:13 +10:00
Julius Volz
473b23d37b IPVS: Activate IPv6 Netfilter hooks
Register the previously defined or adapted netfilter hook functions for
IPv6 as PF_INET6 hooks.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:13 +10:00
Julius Volz
cfc78c5a09 IPVS: Adjust various debug outputs to use new macros
Adjust various debug outputs to use the new *_BUF macro variants for
correct output of v4/v6 addresses.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:12 +10:00
Vince Busam
09571c7ae3 IPVS: Add function to determine if IPv6 address is local
Add __ip_vs_addr_is_local_v6() to find out if an IPv6 address belongs to a
local interface. Use this function to decide whether to set the
IP_VS_CONN_F_LOCALNODE flag for IPv6 destinations.

Signed-off-by: Vince Busam <vbusam@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:12 +10:00
Julius Volz
a0eb662f9e IPVS: Turn off FTP application helper for IPv6
Immediately return from FTP application helper and do nothing when dealing
with IPv6 packets. IPv6 is not supported by this helper yet.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:11 +10:00
Julius Volz
c6883f5873 IVPS: Disable sync daemon for IPv6 connections
Disable the sync daemon for IPv6 connections, works only with IPv4 for now.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:11 +10:00
Vince Busam
667a5f1816 IPVS: Convert procfs files for IPv6 entry output
Correctly output IPv6 connection/service/dest entries in procfs files.

Signed-off-by: Vince Busam <vbusam@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:10 +10:00
Julius Volz
7937df1564 IPVS: Convert real server lookup functions
Convert functions for looking up destinations (real servers) to support
IPv6 services/dests.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:10 +10:00
Julius Volz
2a3b791e6e IPVS: Add/adjust Netfilter hook functions and helpers for v6
Add Netfilter hook functions or modify existing ones, if possible, to
process IPv6 packets. Some support functions are also added/modified for
this. ip_vs_nat_icmp_v6() was already added in the patch that added the v6
xmit functions, as it is called from one of them.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:09 +10:00
Julius Volz
cd17f9ed09 IPVS: Extend scheduling functions for IPv6 support
Convert ip_vs_schedule() and ip_vs_sched_persist() to support scheduling of
IPv6 connections.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:09 +10:00
Julius Volz
b3cdd2a738 IPVS: Add and bind IPv6 xmit functions
Add xmit functions for IPv6. Also add the already needed __ip_vs_get_out_rt_v6()
to ip_vs_core.c. Bind the new xmit functions to v6 connections.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:08 +10:00
Julius Volz
38cdcc9a03 IPVS: Add IPv6 support to xmit() support functions
Add IPv6 support to IP_VS_XMIT() and to the xmit routing cache, introducing
a new function __ip_vs_get_out_rt_v6().

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:08 +10:00
Julius Volz
28364a59f3 IPVS: Extend functions for getting/creating connections
Extend functions for getting/creating connections and connection
templates for IPv6 support and fix the callers.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:08 +10:00
Julius Volz
0bbdd42b7e IPVS: Extend protocol DNAT/SNAT and state handlers
Extend protocol DNAT/SNAT and state handlers to work with IPv6. Also
change/introduce new checksumming helper functions for this.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:07 +10:00
Julius Volz
3b047d9d04 IPVS: Add protocol debug functions for IPv6
Add protocol (TCP, UDP, AH, ESP) debug functions for IPv6 packet debug
output.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:06 +10:00
Julius Volz
51ef348b14 IPVS: Add 'af' args to protocol handler functions
Add 'af' arguments to conn_schedule(), conn_in_get(), conn_out_get() and
csum_check() function pointers in struct ip_vs_protocol. Extend the
respective functions for TCP, UDP, AH and ESP and adjust the callers.

The changes in the callers need to be somewhat extensive, since they now
need to pass a filled out struct ip_vs_iphdr * to the modified functions
instead of a struct iphdr *.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:06 +10:00
Julius Volz
b14198f6c1 IPVS: Add IPv6 support flag to schedulers
Add 'supports_ipv6' flag to struct ip_vs_scheduler to indicate whether a
scheduler supports IPv6. Set the flag to 1 in schedulers that work with
IPv6, 0 otherwise. This flag is checked in a later patch while trying to
add a service with a specific scheduler. Adjust debug in v6-supporting
schedulers to work with both address families.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:06 +10:00
Julius Volz
3c2e0505d2 IPVS: Add v6 support to ip_vs_service_get()
Add support for selecting services based on their address family to
ip_vs_service_get() and adjust the callers.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:05 +10:00
Julius Volz
b18610de9e IPVS: Convert __ip_vs_svc_get() and __ip_vs_fwm_get()
Add support for getting services based on their address family to
__ip_vs_service_get(), __ip_vs_fwm_get() and the helper hash function
ip_vs_svc_hashkey(). Adjust the callers.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:05 +10:00
Julius Volz
c860c6b147 IPVS: Add internal versions of sockopt interface structs
Add extended internal versions of struct ip_vs_service_user and struct
ip_vs_dest_user (the originals can't be modified as they are part
of the old sockopt interface). Adjust ip_vs_ctl.c to work with the new
data structures and add some minor AF-awareness.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:04 +10:00
Julius Volz
e7ade46a53 IPVS: Change IPVS data structures to support IPv6 addresses
Introduce new 'af' fields into IPVS data structures for specifying an
entry's address family. Convert IP addresses to be of type union
nf_inet_addr.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:03 +10:00
Julius Volz
fab0de02fb IPVS: Add CONFIG_IP_VS_IPV6 option for IPv6 support
Add boolean config option CONFIG_IP_VS_IPV6 for enabling experimental IPv6
support in IPVS. Only visible if IPv6 support is set to 'y' or both IPv6
and IPVS are modules.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-09-05 11:17:02 +10:00
David S. Miller
b171e19ed0 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/mlme.c
2008-08-29 23:06:00 -07:00
Eric Dumazet
a627266570 ip: speedup /proc/net/rt_cache handling
When scanning route cache hash table, we can avoid taking locks for
empty buckets.  Both /proc/net/rt_cache and NETLINK RTM_GETROUTE
interface are taken into account.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-28 01:11:25 -07:00
Andi Kleen
6be547a61d inet_diag: Add empty bucket optimization to inet_diag too
Skip quickly over empty buckets in inet_diag.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-28 01:09:54 -07:00
Andi Kleen
6eac560407 tcp: Skip empty hash buckets faster in /proc/net/tcp
On most systems most of the TCP established/time-wait hash buckets are empty.
When walking the hash table for /proc/net/tcp their read locks would
always be aquired just to find out they're empty. This patch changes the code
to check first if the buckets have any entries before taking the lock, which
is much cheaper than taking a lock. Since the hash tables are large
this makes a measurable difference on processing /proc/net/tcp, 
especially on architectures with slow read_lock (e.g. PPC) 

On a 2GB Core2 system time cat /proc/net/tcp > /dev/null (with a mostly
empty hash table) goes from 0.046s to 0.005s.

On systems with slower atomics (like P4 or POWER4) or larger hash tables
(more RAM) the difference is much higher.

This can be noticeable because there are some daemons around who regularly
scan /proc/net/tcp.

Original idea for this patch from Marcus Meissner, but redone by me.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-28 01:08:02 -07:00
Hugh Dickins
d994af0d50 ipv4: mode 0555 in ipv4_skeleton
vpnc on today's kernel says Cannot open "/proc/sys/net/ipv4/route/flush":
d--------- 0 root root 0 2008-08-26 11:32 /proc/sys/net/ipv4/route
d--------- 0 root root 0 2008-08-26 19:16 /proc/sys/net/ipv4/neigh

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-27 02:35:18 -07:00
Philip Love
7982d5e1b3 tcp: fix tcp header size miscalculation when window scale is unused
The size of the TCP header is miscalculated when the window scale ends
up being 0. Additionally, this can be induced by sending a SYN to a
passive open port with a window scale option with value 0.

Signed-off-by: Philip Love <love_phil@emc.com>
Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-27 02:33:50 -07:00
Simon Horman
7fd1067851 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 into lvs-next-2.6 2008-08-27 15:11:37 +10:00
Julius Volz
e3c2ced8d2 IPVS: Rename ip_vs_proto_ah.c to ip_vs_proto_ah_esp.c
After integrating ESP into ip_vs_proto_ah, rename it (and the references to
it) to ip_vs_proto_ah_esp.c and delete the old ip_vs_proto_esp.c.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-27 13:50:37 +10:00
Julius Volz
409a19669e IPVS: Integrate ESP protocol into ip_vs_proto_ah.c
Rename all ah_* functions to ah_esp_* (and adjust comments). Move ESP
protocol definition into ip_vs_proto_ah.c and remove all usage of
ip_vs_proto_esp.c.

Make the compilation of ip_vs_proto_ah.c dependent on a new config
variable, IP_VS_PROTO_AH_ESP, which is selected either by
IP_VS_PROTO_ESP or IP_VS_PROTO_AH. Only compile the selected protocols'
structures within this file.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-27 13:50:35 +10:00
Al Viro
2f4520d35d ipv4: sysctl fixes
net.ipv4.neigh should be a part of skeleton to avoid ordering problems

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-25 15:17:44 -07:00
Ilpo Järvinen
a4356b2920 tcp: Add tcp_parse_aligned_timestamp
Some duplicated code lying around. Located with my suffix tree
tool.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 05:12:29 -07:00
Ilpo Järvinen
2cf46637b5 tcp: Add tcp_collapse_one to eliminate duplicated code
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 05:11:41 -07:00
Ilpo Järvinen
cbe2d128a0 tcp: Add tcp_validate_incoming & put duplicated code there
Large block of code duplication removed.

Sadly, the return value thing is a bit tricky here but it
seems the most sensible way to return positive from validator
on success rather than negative.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 05:10:12 -07:00
Denis V. Lunev
fdc0bde90a icmp: icmp_sk() should not use smp_processor_id() in preemptible code
Pass namespace into icmp_xmit_lock, obtain socket inside and return
it as a result for caller.

Thanks Alexey Dobryan for this report:

Steps to reproduce:

	CONFIG_PREEMPT=y
	CONFIG_DEBUG_PREEMPT=y
	tracepath <something>

BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
caller is icmp_sk+0x15/0x30
Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1

Call Trace:
 [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0
 [<ffffffff80409405>] icmp_sk+0x15/0x30
 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110
 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40
 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900
 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290
 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60
 [<ffffffff803e6650>] ? dst_output+0x0/0x10
 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60
 [<ffffffff803e92e3>] ip_output+0xa3/0xf0
 [<ffffffff803e68d0>] ip_local_out+0x20/0x30
 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400
 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0
 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0
 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80
 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110
 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010
 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450
 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590
 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80
 [<ffffffff803ba50a>] sys_sendto+0xea/0x120
 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80
 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0
 [<ffffffff8024e0c6>] ? up_read+0x26/0x30
 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b

icmp6_sk() is similar.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 04:43:33 -07:00
Sven Wegener
f728bafb56 ipvs: Fix race conditions in lblcr scheduler
We can't access the cache entry outside of our critical read-locked region,
because someone may free that entry. Also getting an entry under read lock,
then locking for write and trying to delete that entry looks fishy, but should
be no problem here, because we're only comparing a pointer. Also there is no
need for our own rwlock, there is already one in the service structure for use
in the schedulers.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-19 17:37:08 +10:00
Sven Wegener
39ac50d0c7 ipvs: Fix race conditions in lblc scheduler
We can't access the cache entry outside of our critical read-locked region,
because someone may free that entry. And we also need to check in the critical
region wether the destination is still available, i.e. it's not in the trash.
If we drop our reference counter, the destination can be purged from the trash
at any time. Our caller only guarantees that no destination is moved to the
trash, while we are scheduling. Also there is no need for our own rwlock,
there is already one in the service structure for use in the schedulers.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-19 17:37:04 +10:00
Simon Horman
3f087668c4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-08-19 17:36:22 +10:00
Stephen Hemminger
9f59365374 nf_nat: use secure_ipv4_port_ephemeral() for NAT port randomization
Use incoming network tuple as seed for NAT port randomization.
This avoids concerns of leaking net_random() bits, and also gives better
port distribution. Don't have NAT server, compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

[ added missing EXPORT_SYMBOL_GPL ]

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-18 21:32:32 -07:00
Anders Grafström
46faec9858 netfilter: ipt_addrtype: Fix matching of inverted destination address type
This patch fixes matching of inverted destination address type.

Signed-off-by: Anders Grafström <grfstrm@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-18 21:29:57 -07:00
Simon Horman
51df190139 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-08-16 14:44:17 +10:00
Herbert Xu
c6153b5b77 ipv4: Disable route secret interval on zero interval
Let me first state that disabling the route cache hash rebuild
should not be done without extensive analysis on the risk profile
and careful deliberation.

However, there are times when this can be done safely or for
testing.  For example, when you have mechanisms for ensuring
that offending parties do not exist in your network.

This patch lets the user disable the rebuild if the interval is
set to zero.  This also incidentally fixes a divide-by-zero error
with name-spaces.

In addition, this patch makes the effect of an interval change
immediate rather than it taking effect at the next rebuild as
is currently the case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-15 13:44:31 -07:00
Simon Horman
4a031b0e6a ipvs: rename __ip_vs_wlc_schedule in lblc and lblcr schedulers
For the sake of clarity, rename __ip_vs_wlc_schedule() in lblc.c to
__ip_vs_lblc_schedule() and the version in lblcr.c to __ip_vs_lblc_schedule().

I guess the original name stuck from a copy and paste.

Cc: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-15 09:26:15 +10:00
Sven Wegener
a919cf4b6b ipvs: Create init functions for estimator code
Commit 8ab19ea36c ("ipvs: Fix possible deadlock
in estimator code") fixed a deadlock condition, but that condition can only
happen during unload of IPVS, because during normal operation there is at least
our global stats structure in the estimator list. The mod_timer() and
del_timer_sync() calls are actually initialization and cleanup code in
disguise. Let's make it explicit and move them to their own init and cleanup
function.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-15 09:26:15 +10:00
Sven Wegener
82dfb6f322 ipvs: Only call init_service, update_service and done_service for schedulers if defined
There are schedulers that only schedule based on data available in the service
or destination structures and they don't need any persistent storage or
initialization routine. These schedulers currently provide dummy functions for
the init_service, update_service and/or done_service functions. For the
init_service and done_service cases we already have code that only calls these
functions, if the scheduler provides them. Do the same for the update_service
case and remove the dummy functions from all schedulers.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-15 09:26:14 +10:00
Julius Volz
9a812198ae IPVS: Add genetlink interface implementation
Add the implementation of the new Generic Netlink interface to IPVS and
keep the old set/getsockopt interface for userspace backwards
compatibility.

Signed-off-by: Julius Volz <juliusv@google.com>
Acked-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2008-08-15 09:26:14 +10:00
Daniel Lezcano
877acedc0d netns: Fix crash by making igmp per namespace
This patch makes the multicast socket to be per namespace.

When a network namespace is created, other than the init_net and a
multicast packet is received, the kernel goes to a hang or a kernel panic.

How to reproduce ?

 * create a child network namespace
 * create a pair virtual device veth
    * ip link add type veth
 * move one side to the pair network device to the child namespace
    * ip link set netns <childpid> dev veth1
 * ping -I veth0 224.0.0.1

The bug appears because the function ip_mc_init_dev does not initialize
the different multicast fields as it exits because it is not the init_net.

BUG: soft lockup - CPU#0 stuck for 61s! [avahi-daemon:2695]
Modules linked in:
irq event stamp: 50350
hardirqs last  enabled at (50349): [<c03ee949>] _spin_unlock_irqrestore+0x34/0x39
hardirqs last disabled at (50350): [<c03ec639>] schedule+0x9f/0x5ff
softirqs last  enabled at (45712): [<c0374d4b>] ip_setsockopt+0x8e7/0x909
softirqs last disabled at (45710): [<c03ee682>] _spin_lock_bh+0x8/0x27

Pid: 2695, comm: avahi-daemon Not tainted (2.6.27-rc2-00029-g0872073 #3)
EIP: 0060:[<c03ee47c>] EFLAGS: 00000297 CPU: 0
EIP is at __read_lock_failed+0x8/0x10
EAX: c4f38810 EBX: c4f38810 ECX: 00000000 EDX: c04cc22e
ESI: fb0000e0 EDI: 00000011 EBP: 0f02000a ESP: c4e3faa0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: 44618a40 CR3: 04e37000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c02311f8>] ? _raw_read_lock+0x23/0x25
 [<c0390666>] ? ip_check_mc+0x1c/0x83
 [<c036d478>] ? ip_route_input+0x229/0xe92
 [<c022e2e4>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0104c9c>] ? do_IRQ+0x69/0x7d
 [<c0102e64>] ? restore_nocheck_notrace+0x0/0xe
 [<c036fdba>] ? ip_rcv+0x227/0x505
 [<c0358764>] ? netif_receive_skb+0xfe/0x2b3
 [<c03588d2>] ? netif_receive_skb+0x26c/0x2b3
 [<c035af31>] ? process_backlog+0x73/0xbd
 [<c035a8cd>] ? net_rx_action+0xc1/0x1ae
 [<c01218a8>] ? __do_softirq+0x7b/0xef
 [<c0121953>] ? do_softirq+0x37/0x4d
 [<c035b50d>] ? dev_queue_xmit+0x3d4/0x40b
 [<c0122037>] ? local_bh_enable+0x96/0xab
 [<c035b50d>] ? dev_queue_xmit+0x3d4/0x40b
 [<c012181e>] ? _local_bh_enable+0x79/0x88
 [<c035fcb8>] ? neigh_resolve_output+0x20f/0x239
 [<c0373118>] ? ip_finish_output+0x1df/0x209
 [<c0373364>] ? ip_dev_loopback_xmit+0x62/0x66
 [<c0371db5>] ? ip_local_out+0x15/0x17
 [<c0372013>] ? ip_push_pending_frames+0x25c/0x2bb
 [<c03891b8>] ? udp_push_pending_frames+0x2bb/0x30e
 [<c038a189>] ? udp_sendmsg+0x413/0x51d
 [<c038a1a9>] ? udp_sendmsg+0x433/0x51d
 [<c038f927>] ? inet_sendmsg+0x35/0x3f
 [<c034f092>] ? sock_sendmsg+0xb8/0xd1
 [<c012d554>] ? autoremove_wake_function+0x0/0x2b
 [<c022e6de>] ? copy_from_user+0x32/0x5e
 [<c022e6de>] ? copy_from_user+0x32/0x5e
 [<c034f238>] ? sys_sendmsg+0x18d/0x1f0
 [<c0175e90>] ? pipe_write+0x3cb/0x3d7
 [<c0170347>] ? do_sync_write+0xbe/0x105
 [<c012d554>] ? autoremove_wake_function+0x0/0x2b
 [<c03503b2>] ? sys_socketcall+0x176/0x1b0
 [<c01085ea>] ? syscall_trace_enter+0x6c/0x7b
 [<c0102e1a>] ? syscall_call+0x7/0xb

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 16:15:57 -07:00
David S. Miller
0a37c10ed4 Merge branch 'stealer/ipvs/for-davem' of git://git.stealer.net/linux-2.6 2008-08-11 18:04:35 -07:00