execve arguments can be quite large. There is no limit on the number of
arguments and a 4G limit on the size of an argument.
this patch prints those aruguments in bite sized pieces. a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k
single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.
Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
Keeping loginuid in audit_context is racy and results in messier
code. Taken to task_struct, out of the way of ->audit_context
changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoid section mismatch involving arch_register_cpu.
Marking arch_register_cpu as __init and removing the export
for non-hotplug-cpu configurations makes the following warning
go away:
Section mismatch in reference from the function
arch_register_cpu() to the function .devinit.text:register_cpu()
The function arch_register_cpu() references
the function __devinit register_cpu().
This is often because arch_register_cpu lacks a __devinit
annotation or the annotation of register_cpu is wrong.
The only external user of arch_register_cpu in the tree is
in drivers/acpi/processor_core.c where it is guarded by
ACPI_HOTPLUG_CPU (which depends on HOTPLUG_CPU).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.
This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.
FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.
FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.
The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.
Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.
Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The exception fixup for the futex macros __futex_atomic_op1/2 and
futex_atomic_cmpxchg_inatomic() is missing an entry when the lock
prefix is replaced by a NOP via SMP alternatives.
Chuck Ebert tracked this down from the information provided in:
https://bugzilla.redhat.com/show_bug.cgi?id=429412
A possible solution would be to add another fixup after the
LOCK_PREFIX, so both the LOCK and NOP case have their own entry in the
exception table, but it's not really worth the trouble.
Simply replace LOCK_PREFIX with lock and keep those untouched by SMP
alternatives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.
Fixup the obvious places, which miss this update.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill swap_io_context()
as-iosched: fix inconsistent ioc->lock context
ide-cd: fix leftover data BUG
block: make elevator lib checkpatch compliant
cfq-iosched: make checkpatch compliant
block: make core bits checkpatch compliant
block: new end request handling interface should take unsigned byte counts
unexport add_disk_randomness
block/sunvdc.c:print_version() must be __devinit
splice: always updated atime in direct splice
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.
There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix problems with the 528x ColdFire CPU cache setup.
Do not cache the flash region (if present), and make the runtime
settings consistent with the init setting.
Problems pointed out by Bernd Buttner <b.buettner@mkc-gmbh.de>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.
Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...
Some examples:
- Classic SFQ hash:
tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024
- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:
tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024
- Map destination IPs of 192.168.0.0/24 to classids 1-257:
tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256
- alternatively:
tc filter add ... flow map \
key dst and 0xff
- similar, but reverse ordered:
tc filter add ... flow map \
key dst and 0xff xor 0xff
Perturbation is currently not supported because we can't reliable kill the
timer on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping statistics and make internal queues visible as
classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ipv4_devconf can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Outbound sequence number overflow error status
is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
mapped to XfrmInStateInvalid then this patch fiex the error
to XfrmInStateMismatch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ip route cache implementation is not suited to large caches.
We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.
A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.
Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.
This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are
then used to implement dev_unicast_sync/unsync as well.
I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11. When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour. This means the dump
will end up skipping some routes which matches what FIB_HASH does.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.
The dccp and inet_diag, which use that lookup functions
pass the init_net into them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.
A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.
This saves half-a-kilo for ipv4 and ipv6.
add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
function old new delta
__inet_hash_connect - 577 +577
arp_ignore 108 113 +5
static.hint 8 4 -4
rt_worker_func 376 372 -4
inet6_hash_connect 584 25 -559
inet_hash_connect 586 25 -561
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.
Here's the INET6_TW_MATCH to help with it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constify a few data tables use const qualifiers on variables where
possible in the nf_*_proto_tcp sources.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_hashlimit match revision 1. It adds support for
kernel-level inversion and grouping source and/or destination IP
addresses, allowing to limit on a per-subnet basis. While this would
technically obsolete xt_limit, xt_hashlimit is a more expensive due
to the hashbucketing.
Kernel-level inversion: Previously you had to do user-level inversion:
iptables -N foo
iptables -A foo -m hashlimit --hashlimit(-upto) 5/s -j RETURN
iptables -A foo -j DROP
iptables -A INPUT -j foo
now it is simpler:
iptables -A INPUT -m hashlimit --hashlimit-over 5/s -j DROP
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder struct nf_conntrack_l4proto so all members used during packet
processing are in the same cacheline.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_ct_tuple_src_equal() and nf_ct_tuple_dst_equal() both compare the protocol
numbers. Unfortunately gcc doesn't optimize out the second comparison, so
remove it and prefix both functions with __ to indicate that they should not
be used directly.
Saves another 16 byte of text in __nf_conntrack_find() on x86_64:
nf_conntrack_tuple_taken | -20 # 320 -> 300, size inlines: 181 -> 161
__nf_conntrack_find | -16 # 267 -> 251, size inlines: 127 -> 115
__nf_conntrack_confirm | -40 # 875 -> 835, size inlines: 570 -> 537
3 functions changed, 76 bytes removed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignoring specific entries in __nf_conntrack_find() is only needed by NAT
for nf_conntrack_tuple_taken(). Remove it from __nf_conntrack_find()
and make nf_conntrack_tuple_taken() search the hash itself.
Saves 54 bytes of text in the hotpath on x86_64:
__nf_conntrack_find | -54 # 321 -> 267, # inlines: 3 -> 2, size inlines: 181 -> 127
nf_conntrack_tuple_taken | +305 # 15 -> 320, lexblocks: 0 -> 3, # inlines: 0 -> 3, size inlines: 0 -> 181
nf_conntrack_find_get | -2 # 90 -> 88
3 functions changed, 305 bytes added, 56 bytes removed, diff: +249
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the RCU conversion only write_lock usages of nf_conntrack_lock are
left (except one read_lock that should actually use write_lock in the
H.323 helper). Switch to a spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU for expectation hash. This doesn't buy much for conntrack
runtime performance, but allows to reduce the use of nf_conntrack_lock
for /proc and nf_netlink_conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CHECK net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44: expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40: expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2: got int *<noident>
CHECK net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40: expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40: got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44: expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44: got unsigned int [usertype] *size
CHECK net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44: expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40: expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2: got int *<noident>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashtable size is really unsigned so sparse complains when you pass
a signed integer. Change all uses to make it consistent.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ranges to the new revision. This doesn't affect
compatibility since the new revision was not released yet.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace.
* arpt_register_table() registers table in supplied netns.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.
P.S.: remove init_net check in inet6_create() to see the effect
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, iptables show and configure different set of rules in different
netnss'. Filtering decisions are still made by consulting only
init_net's set.
Changes are identical except naming so no splitting.
P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create()
to see the effect.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.
So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.
Do it at once to not screw bisection.
P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
full netnsization of table modules is done.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.
Every user stubbed with init_net for a while.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch from 0/-E to ptr/PTR_ERR convention.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the xt_conntrack match revision 1 by port matching (all four
{orig,repl}{src,dst}) and by packet direction matching.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the removal of the deferred output hooks, netoutdev was used in
case of VLANs on top of a bridge to store the VLAN device, so the
deferred hooks would see the correct output device. This isn't
necessary anymore since we're calling the output hooks for the correct
device directly in the IP stack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones. The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")
Thanks to Adrian Bunk who originally noticed the problem.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.
It requires CAP_NET_ADMIN capability.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.
Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms
is identified by its name and the ICV length.
For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms. This is in line with how they are negotiated using IKE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm. This lays the foundations for future support of
combined mode algorithms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move headers usbnet.h and rndis_host.h to include/linux/usb and fix includes
for drivers/net/usb modules. Headers are moved because rndis_wlan will be
outside drivers/net/usb in drivers/net/wireless and yet need these headers.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teach rfkill about wimax radios.
Had to define a KEY_WIMAX as a 'key for disabling only wimax radios',
as other radio technologies have. This makes sense as hardware has
specific keys for disabling specific radios.
The RFKILL enabling part is, otherwise, a copy and paste of any other
radio technology.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa: (299 commits)
[ALSA] version 1.0.16rc2
[ALSA] hda: fix Mic in as output
[ALSA] emu10k1 - Another EMU0404 Board ID
[ALSA] emu10k1 - Fix kthread handling at resume
[ALSA] emu10k1: General cleanup, add new locks, fix alsa bug#3501, kernel bug#9304.
[ALSA] emu10k1 - Use enum for emu_model types
[ALSA] emu10k1 - Don't create emu1010 controls for non-emu boards
[ALSA] emu10k1 - 1616(M) cardbus improvements
[ALSA] snd:emu10k1: E-Mu updates. Fixes to firmware loading and support for 0404.
[ALSA] emu10k1: Add comments regarding E-Mu ins and outs.
[ALSA] oxygen: revert SPI clock frequency change for AK4396/WM8785
[ALSA] es1938 - improve capture hw pointer reads
[ALSA] HDA-Intel - Add support for Intel SCH
[ALSA] hda: Add GPIO mute support to STAC9205
[ALSA] hda-codec - Add Dell T3400 support
[ALSA] hda-codec - Add model for HP DV9553EG laptop
[ALSA] hda-codec - Control SPDIF as slave
[ALSA] hda_intel: ALSA HD Audio patch for Intel ICH10 DeviceID's
[ALSA] Fix Oops with PCM OSS sync
[ALSA] hda-codec - Add speaker automute to ALC262 HP models
...
bring back the avr32, blackfin, sh, sparc architectures into working order,
by reverting the effects of this change that came in via the x86 tree:
commit a5a19c63f4
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Wed Jan 30 13:33:39 2008 +0100
x86: demacro asm-x86/pgalloc_32.h
Sorry about that!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch improves E-Mu 1616(M) cardbus support. It adds definitions of the
new Microdock and 1010 cardbus registers (thanks again for descriptions
James) and improves mixer for this card. Now you can use S/PDIF and ADAT on
Mirodock and also use headpohone output on host cardbus card as another
independent output.
Signed-off-by: Ctirad Fertr <c.fertr@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This is improvement of the early support of the FM-only cards where the
fm801 chip represents the PCI to tuner bridge.
The tuner initialization isn't included the mute on as well as mute support
via V4L request. Proposed patch should fix this at least for 64-PCR model.
Signed-off-by: Andy Shevchenko <andy@smile.org.ua>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Change semantics for SNDRV_PCM_TSTAMP_MMAP. Doing timestamping only in
the interrupt handler might cause that hw_ptr is not related to actual
timestamp. With this change, grab timestamp at every hw_ptr update to
have always valid timestamp + ring buffer position pair.
With this change, SNDRV_PCM_TSTAMP_MMAP was renamed to
SNDRV_PCM_TSTAMP_ENABLE. It's no regression (I think).
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This fixes a bug whereby PCMs were not being suspended when the rest of the
audio subsystem was suspended.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Added a device level dapm event so that both the machine and codec are informed
when dapm events occur.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This header file exists only for some hacks to adapt alsa-driver
tree. It's useless for building in the kernel. Let's move a few
lines in it to sound/core.h and remove it.
With this patch, sound/driver.h isn't removed but has just a single
compile warning to include it. This should be really killed in
future.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
The 'tick' in PCM is set (again) via sw_params. And, nobody uses
this feature at all except for a command line option of aplay.
(This is literally 'nobody', as I checked alsa-lib API calls in all
programs in major distros.)
Above all, if we need finer wake-ups for the position update, it's
basically an issue that the driver should solve, not tuned by each
application.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
The xfer_align sw_params parameter has never been used in a sane manner,
and no one understands what this does exactly. The current
implementation looks also buggy because it allows write of shorter size
than xfer_align. So, if you do partial writes, the write isn't actually
aligned at all.
Removing this parameter will make some pcm_lib_* code more readable
(and less buggy).
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This patch removes the indirect control access to the control elements.
The indirect access has never been used and is even broken on 32bit
ioctl wrapper. Let's clean it up.
The pointers still remain in snd_ctl_elem_* structs just to make sure
that the struct size won't change. Once after checking the size
consistency, we can get rid of them, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
info_oss: move prototype of snd_card_info_read_oss to info.h
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
We need an accurate and continuous (monotonic) time sources to do
accurate synchronization among more timing sources. This patch allows
to enable monotonic timestamps for ALSA PCM devices and enables monotonic
timestamps for ALSA timer devices.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
* support for switching rate in STAC9460 - using set_rate_val of the akm
infrastructure
* listing all STAC9460 registers in proc
* disabling mpu401 device for Prodigy192 - otherwise the currently
flawed mpu401 code hangs kernel when opening the midi device
* removing old unused commented-out code
Signed-off-by: Pavel Hofman <dustin@seznam.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
remove dead config symbols from sound code
Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Updated the forgotten SNDRV_HWDEP_IFACE_LAST to point the really last member.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Remove sequencer instrument layer from the tree.
This mechanism hasn't been used much with the actual devices. The only
reasonable user was OPL3 loader, and now it was rewritten to use hwdep
instead. So, let's remove the rest of rotten codes.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Use the exclusive access lock in hwdep instead of the own one.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Use the hwdep device for loading OPL2/3 patch data instead of the
messy sequencer instrument layer.
Due to this change, the sbiload program should be updated, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (454 commits)
[POWERPC] Cell IOMMU fixed mapping support
[POWERPC] Split out the ioid fetching/checking logic
[POWERPC] Add support to cell_iommu_setup_page_tables() for multiple windows
[POWERPC] Split out the IOMMU logic from cell_dma_dev_setup()
[POWERPC] Split cell_iommu_setup_hardware() into two parts
[POWERPC] Split out the logic that allocates struct iommus
[POWERPC] Allocate the hash table under 1G on cell
[POWERPC] Add set_dma_ops() to match get_dma_ops()
[POWERPC] 83xx: Clean up / convert mpc83xx board DTS files to v1 format.
[POWERPC] 85xx: Only invalidate TLB0 and TLB1
[POWERPC] 83xx: Fix typo in mpc837x compatible entries
[POWERPC] 85xx: convert sbc85* boards to use machine_device_initcall
[POWERPC] 83xx: rework platform Kconfig
[POWERPC] 85xx: rework platform Kconfig
[POWERPC] 86xx: Remove unused IRQ defines
[POWERPC] QE: Explicitly set address-cells and size cells for muram
[POWERPC] Convert StorCenter DTS file to /dts-v1/ format.
[POWERPC] 86xx: Convert all 86xx DTS files to /dts-v1/ format.
[PPC] Remove 85xx from arch/ppc
[PPC] Remove 83xx from arch/ppc
...
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
alpha: fix x86.git merge build error
ia64: on UP percpu variables are not small memory model
x86: fix arch/x86/kernel/test_nx.c modular build bug
s390: use generic percpu linux-2.6.git
POWERPC: use generic per cpu
ia64: use generic percpu
SPARC64: use generic percpu
percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
modules: fold percpu_modcopy into module.c
x86: export copy_from_user_ll_nocache[_nozero]
x86: fix duplicated TIF on 64-bit
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL
lguest: Use explicit includes rateher than indirect
lguest: get rid of lg variable assignments
lguest: change gpte_addr header
lguest: move changed bitmap to lg_cpu
lguest: move last_pages to lg_cpu
lguest: change last_guest to last_cpu
lguest: change spte_addr header
lguest: per-vcpu lguest pgdir management
lguest: make pending notifications per-vcpu
lguest: makes special fields be per-vcpu
lguest: per-vcpu lguest task management
lguest: replace lguest_arch with lg_cpu_arch.
lguest: make registers per-vcpu
lguest: make emulate_insn receive a vcpu struct.
lguest: map_switcher_in_guest() per-vcpu
lguest: per-vcpu interrupt processing.
lguest: per-vcpu lguest timers
lguest: make hypercalls use the vcpu struct
lguest: make write() operation smp aware
...
Manual conflict resolved (maybe even correctly, who knows) in
drivers/lguest/x86/core.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
security: compile capabilities by default
selinux: make selinux_set_mnt_opts() static
SELinux: Add warning messages on network denial due to error
SELinux: Add network ingress and egress control permission checks
NetLabel: Add auditing to the static labeling mechanism
NetLabel: Introduce static network labels for unlabeled connections
SELinux: Allow NetLabel to directly cache SIDs
SELinux: Enable dynamic enable/disable of the network access checks
SELinux: Better integration between peer labeling subsystems
SELinux: Add a new peer class and permissions to the Flask definitions
SELinux: Add a capabilities bitmap to SELinux policy version 22
SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
SELinux: Only store the network interface's ifindex
SELinux: Convert the netif code to use ifindex values
NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
NetLabel: Add secid token support to the NetLabel secattr struct
NetLabel: Consolidate the LSM domain mapping/hashing locks
NetLabel: Cleanup the LSM domain hash functions
NetLabel: Remove unneeded RCU read locks
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
a5a19c63f4 removed the include of
asm/pgalloc.h from asm-generic/tlb.h. That works fine on most
architectures, but broke ALPHA.
Fixup ALPHA by adding the include to asm-alpha/tlbflush.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tony says:
| The CONFIG_SMP=n path in ia64 makes quite radical changes ... rather
| than putting all the per-cpu stuff into the top 64K of address space
| and providing a per-cpu TLB mapping for that range to a different
| physical address ... it just makes all the per-cpu stuff link as ordinary
| variables in .data.
the new generic percpu code got confused about this as PER_CPU_ATTRIBUTES
was defined even on UP, so it picked up that small memory model - which
was not possible to get linked. The right fix is to only define that
on SMP. This resolved the build failures in my cross-compiling environment.
also link these variables into the .percpu section even on UP - some
assembly code has offset dependencies. (such as GET_IA64_MCA_DATA() in
arch/ia64/kernel/mca_asm.S)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Tony Luck <tony.luck@intel.com>
Change s390 percpu.h to use asm-generic/percpu.h
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Powerpc has a way to determine the address of the per cpu area of the
currently executing processor via the paca and the array of per cpu
offsets is avoided by looking up the per cpu area from the remote
paca's (copying x86_64).
Cc: Paul Mackerras <paulus@samba.org>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Olof Johansson <olof@lixom.net>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
ia64 has a special processor specific mapping that can be used to locate the
offset for the current per cpu area.
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sparc64 has a way of providing the base address for the per cpu area of the
currently executing processor in a global register.
Sparc64 also provides a way to calculate the address of a per cpu area
from a base address instead of performing an array lookup.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 7e9916040b
and commit eee3af4a2c
Both use the same TIF number (25) in thread_info_64.h.
This patch changes the TIF ids.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.
[jejb: rebase on to sg_table and fix up rejections]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.
After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.
In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).
[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: Avi Kivity <avi@qumranet.com>
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
IA64 also needs to see ioapic structure in irqchip.
Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch fixes a small issue where sturctures:
kvm_pic_state
kvm_ioapic_state
are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_cpuid_entry
kvm_cpuid
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move structures:
kvm_sregs
kvm_msr_entry
kvm_msrs
kvm_msr_list
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_segment
kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>