Remove some of the dependence on the host_set struct
in preparation for supporting SAS HBAs. Adds a struct device
pointer to the ata_port struct.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
At the moment libata doesn't pass pm_message_t down ata_device_suspend.
This causes drives to be powered down when we just want a freeze,
causing unnecessary wear and tear. This patch gets pm_message_t passed
down so that it can be used to determine whether to power down the
drive.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
drivers/scsi/libata-core.c | 5 +++--
drivers/scsi/libata-scsi.c | 4 ++--
drivers/scsi/scsi_sysfs.c | 2 +-
include/linux/libata.h | 4 ++--
include/scsi/scsi_host.h | 2 +-
5 files changed, 9 insertions(+), 8 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add per-dev pio/mwdma/udma_mask. All transfer mode limits used to be
applied to ap->*_mask which unnecessarily restricted other devices
sharing the port. This change will also benefit later EH speed down
and hotplug.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
[PATCH] relay: consolidate sendfile() and read() code
[PATCH] relay: add sendfile() support
[PATCH] relay: migrate from relayfs to a generic relay API
Add missing 'const' to pci_request_region[s] 'res_name' arg,
since we pass it directly to __request_region(), whose 'name' arg
is also const.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out AMD 8131 quirk only affects MSI for devices behind the 8131 bridge.
Handle this by adding a flags field in pci_bus, inherited from parent to child.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the semantics of this call to return the max reserved
bus number instead of just the max assigned bus number.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add Broadcom HT-1000 south bridge's PCI ID to i2c-piix driver. Note
that at least on Supermicro H8SSL it uses non-standard SMBHSTCFG = 3
and standard values like 0 or 9 causes hangup.
Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Drop the i2c-frodo bus driver. It isn't referenced by the build
system, and depends on code which was never included in 2.6 kernels.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR_2 macro, following pattern set by
SENSOR_ATTR. First it creates a new macro SENSOR_ATTR_2() which expands
to an initialization expression, then it uses that in SENSOR_DEVICE_ATTR_2,
which declares and initializes a struct sensor_device_attribute_2.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR macro. First it creates a new
macro SENSOR_ATTR() which expands to an initialization expression, then
it uses that in SENSOR_DEVICE_ATTR, which declares and initializes a
struct sensor_device_attribute.
IOW, SENSOR_ATTR() imitates __ATTR() in include/linux/device.h.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate a handful of cache references by keeping current in a register
instead of reloading (helps x86) and avoiding the overhead of a function
call. Inlining eventpoll_init_file() saves 24 bytes. Also reorder file
initialization to make writes occur more sequentially.
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes two needlessly global structs static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes all occurances of _INLINE_ in the kernel.
With the exception of tty_flip.h, I've simply removed the inline's since
gcc should know best which functions to be inlined.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Declarations use struct notifier_block on both legs of the ifdef, so move the
notifier_block forward declaration outside the ifdef.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The fat code uses the fat_lock always in a mutex way (taking and releasing
the lock in the same function), the patch below converts it into the new
mutex primitive. Please consider this patch for the code.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext3's truncate_sem is always released in the same function it's taken
and it otherwise is a mutex as well..
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
include/linux/cache.h. Move the per architecture section definition to
asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
catch architectures which don't implement the section.
Verified that symbols still go in .data.read_mostly on parisc,
and the compile doesn't break.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since early 2.4.x all cdrom drivers implement the block_device methods
themselves, so they can handle additional ioctls directly instead of going
through the cdrom layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.
2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.
3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.
UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory. So for large
directories we end up performing no readahead at all and we suck.
So take it all out and use the core VM's page_cache_readahead(). This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.
Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space. Fortunately the readahead code is all set up to handle this.
Tested with printk. It works. I was struggling to find a real workload which
actually cared.
(The patch also exports page_cache_readahead() to GPL modules)
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is unsafe to suspend devices if the hardware is controlled by X. Add an
extra check to prevent this from happening.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move externs from C source files to header files.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce the low level interface that can be used for handling the
snapshot of the system memory by the in-kernel swap-writing/reading code of
swsusp and the userland interface code (to be introduced shortly).
Also change the way in which swsusp records the allocated swap pages and,
consequently, simplifies the in-kernel swap-writing/reading code (this is
necessary for the userland interface too). To this end, it introduces two
helper functions in mm/swapfile.c, so that the swsusp code does not refer
directly to the swap internals.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both R1BIO_Barrier and R1BIO_Returned are 4 !!!!
This means that barrier requests don't get returned (i.e. b_endio called)
because it looks like they already have been.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is version 20 of the Wireless Extensions. This is the
completion of the RtNetlink work I started early 2004, it enables the
full Wireless Extension API over RtNetlink.
Few comments on the patch :
o totally driver transparent, no change in drivers needed.
o iwevent were already RtNetlink based since they were created
(around 2.5.7). This adds all the regular SET and GET requests over
RtNetlink, using the exact same mechanism and data format as iwevents.
o This is a Kconfig option, as currently most people have no
need for it. Surprisingly, patch is actually small and well
encapsulated.
o Tested on SMP, attention as been paid to make it 64 bits clean.
o Code do probably too many checks and could be further
optimised, but better safe than sorry.
o RtNetlink based version of the Wireless Tools available on
my web page for people inclined to try out this stuff.
I would also like to thank Alexey Kuznetsov for his helpful
suggestions to make this patch better.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for new chip 5755 which is very similar to 5787.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To really make sense of route notifications in the presence of
multiple tables, userspace also needs to be notified about routing
rule updates. Notifications are sent to the so far unused
RTNLGRP_NOP1 (now RTNLGRP_RULE) group.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Do not use inet->id of global tcp_socket when sending RST.
[NETFILTER]: Fix undefined references to get_h225_addr
[NETFILTER]: futher {ip,ip6,arp}_tables unification
[NETFILTER]: Fix xt_policy address matching
[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
[NETFILTER]: conntrack: cleanup the conntrack ID initialization
[NETFILTER]: nfnetlink_queue: fix nfnetlink message size
[NETFILTER]: ctnetlink: Fix expectaction mask dumping
[NETFILTER]: Fix Kconfig typos
[NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] Merge avlab serial board entries in parport_serial
[SERIAL] kernel console should send CRLF not LFCR
This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to
x_tables.h. This move simplifies code and future compatibility fixes.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the family field in xt_[matches|targets] registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Bump driver version and reldate.
[TG3]: Skip phy power down on some devices
[TG3]: Fix SRAM access during tg3_init_one()
[X25]: dte facilities 32 64 ioctl conversion
[X25]: allow ITU-T DTE facilities for x25
[X25]: fix kernel error message 64 bit kernel
[X25]: ioctl conversion 32 bit user to 64 bit kernel
[NET]: socket timestamp 32 bit handler for 64 bit kernel
[NET]: allow 32 bit socket ioctl in 64 bit kernel
[BLUETOOTH]: Return negative error constant
Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c
1. Extract buffer_migrate_page() from fs/buffer.c
2. Extract central migration code from vmscan.c
3. Extract some components from mempolicy.c
4. Export pageout() and remove_from_swap() from vmscan.c
5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.
I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The optional hugepage callback, hugetlb_free_pgd_range() is presently
implemented non-trivially only on ia64 (but I plan to add one for powerpc
shortly). It has its own prototype for the function in asm-ia64/pgtable.h.
However, since the function is called from generic code, it make sense for
its prototype to be in the generic hugetlb.h header file, as the protypes
other arch callbacks already are (prepare_hugepage_range(),
set_huge_pte_at(), etc.). This patch makes it so.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs. However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma. is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned. So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.
At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.
Nonetheless this should be fixed. We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().
This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64). We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64. Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range(). Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.
This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally, mm/hugetlb.c just handled the hugepage physical allocation path
and its {alloc,free}_huge_page() functions were used from the arch specific
hugepage code. These days those functions are only used with mm/hugetlb.c
itself. Therefore, this patch makes them static and removes their
prototypes from hugetlb.h. This requires a small rearrangement of code in
mm/hugetlb.c to avoid a forward declaration.
This patch causes no regressions on the libhugetlbfs testsuite (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These days, hugepages are demand-allocated at first fault time. There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.
A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations. In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available. In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.
The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated. MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL. MAP_PRIVATE mappings can still
trigger an OOM. (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)
This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.
This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.
From: David Gibson <david@gibson.dropbear.id.au>
Remove a test from the mprotect() path which checks that the mprotect()ed
range on a hugepage VMA is hugepage aligned (yes, really, the sense of
is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
In fact, we don't need this test. If the given addresses match the
beginning/end of a hugepage VMA they must already be suitably aligned. If
they don't, then mprotect_fixup() will attempt to split the VMA. The very
first test in split_vma() will check for a badly aligned address on a
hugepage VMA and return -EINVAL if necessary.
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE. The
identify of hugetlb pte is lost when changing page protection via mprotect.
A page fault occurs later will trigger a bug check in huge_pte_alloc().
The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimise page_count compound page test and make it consistent with similar
functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.
nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>
(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove __put_page from outside the core mm/. It is dangerous because it does
not handle compound pages nicely, and misses 1->0 transitions. If a user
later appears that really needs the extra speed we can reevaluate.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Turn basically everything in vmscan.c into `unsigned long'. This is to avoid
the possibility that some piece of code in there might decide to operate upon
more than 4G (or even 2G) of pages in one hit.
This might be silly, but we'll need it one day.
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled. So we should run the function with local interrupts
disabled on this CPU, too.
And do the same for UP, so the callback is run in the same environment on both
UP and SMP. (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).
Also uninlines on_each_cpu(). softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.
Oh, and fix up that comment over (under?) x86's smp_call_function(). It
drives me nuts.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure. However, that is not what happens. The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap(). Cache_reap()
is run every few seconds independently of memory pressure.
Could we remove the whole thing? Its only used by three slabs anyways and
I cannot find a reason for having this option.
There is an additional problem with SLAB_NO_REAP. If set then the recovery
of objects from alien caches is switched off. Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk. (Strangely __cache_shrink does not
check for SLAB_NO_REAP)
Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since size_t has the same size as a long on all architectures, it's enough
for overflow checks to check against ULONG_MAX.
This change could allow a compiler better optimization (especially in the
n=1 case).
The practical effect seems to be positive, but quite small:
text data bss dec hex filename
21762380 5859870 1848928 29471178 1c1b1ca vmlinux-old
21762211 5859870 1848928 29471009 1c1b121 vmlinux-patched
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clarify that preemption needs to be guarded against with the
__xxx_page_state functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have an explicit mm call to split higher order pages into individual pages.
Should help to avoid bugs and be more explicit about the code's intention.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
atomic_add_unless (atomic_inc_not_zero) no longer requires an offset refcount
to function correctly.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time. This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.
Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More atomic operation removal from page allocator
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0. So no need for atomic
operations here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the DMA_28BIT_MASK constant in dma-mapping.h
ALSA drivers using this mask are changed to use the new constant.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Jaroslav Kysela <perex@suse.cz>
Allows use of the optional user facility to insert ITU-T
(http://www.itu.int/ITU-T/) specified DTE facilities in call set-up x25
packets. This feature is optional; no facilities will be added if the ioctl
is not used, and call setup packet remains the same as before.
If the ioctls provided by the patch are used, then a facility marker will be
added to the x25 packet header so that the called dte address extension
facility can be differentiated from other types of facilities (as described in
the ITU-T X.25 recommendation) that are also allowed in the x25 packet header.
Facility markers are made up of two octets, and may be present in the x25
packet headers of call-request, incoming call, call accepted, clear request,
and clear indication packets. The first of the two octets represents the
facility code field and is set to zero by this patch. The second octet of the
marker represents the facility parameter field and is set to 0x0F because the
marker will be inserted before ITU-T type DTE facilities.
Since according to ITU-T X.25 Recommendation X.25(10/96)- 7.1 "All networks
will support the facility markers with a facility parameter field set to all
ones or to 00001111", therefore this patch should work with all x.25 networks.
While there are many ITU-T DTE facilities, this patch implements only the
called and calling address extension, with placeholders in the
x25_dte_facilities structure for the rest of the facilities.
Testing:
This patch was tested using a cisco xot router connected on its serial ports
to an X.25 network, and on its lan ports to a host running an xotd daemon.
It is also possible to test this patch using an xotd daemon and an x25tap
patch, where the xotd daemons work back-to-back without actually using an x.25
network. See www.fyonne.net for details on how to do this.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Andrew Hendry <ahendry@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the register_ioctl32_conversion() patch in the kernel is now obsolete,
provide another method to allow 32 bit user space ioctls to reach the kernel.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
HPA presence/enabled
HPA commands
Also add ata_id_is_cfa() as that is needed to detect and handle CF cards
which currently we reject.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add some dummy noop functions for use by libata clients
that do not need to do anything. Future SAS patches will
utilize these functions.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SCSI midlayer has moved hostt->eh_timed_out to transport template. As
libata doesn't need full-blown transport support yet, implement
minimal transport for libata. No transport class or whatsoever, just
empty transport template with ->eh_timed_out hook.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[CRYPTO] aes: Fixed array boundary violation
[CRYPTO] tcrypt: Fix key alignment
[CRYPTO] all: Add missing cra_alignmask
[CRYPTO] all: Use kzalloc where possible
[CRYPTO] api: Align tfm context as wide as possible
[CRYPTO] twofish: Use rol32/ror32 where appropriate
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (23 commits)
[PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
[PATCH] sysfs: don't export dir symbols
[PATCH] get_cpu_sysdev() signedness fix
[PATCH] kobject_add_dir
[PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
[PATCH] sysfs: fix problem with duplicate sysfs directories and files
[PATCH] Kobject: kobject.h: fix a typo
[PATCH] Kobject: provide better warning messages when people do stupid things
[PATCH] Driver core: add macros notice(), dev_notice()
[PATCH] firmware: fix BUG: in fw_realloc_buffer
[PATCH] sysfs: kzalloc conversion
[PATCH] fix module sysfs files reference counting
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to USB subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to RCU subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE()
[PATCH] Clean up module.c symbol searching logic
[PATCH] kobj_map semaphore to mutex conversion
[PATCH] kref: avoid an atomic operation in kref_put()
[PATCH] handle errors returned by platform_get_irq*()
[PATCH] driver core: platform_get_irq*(): return -ENXIO on error
...
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] Fix cosmetic typo in asm/irq.h
[ARM] 3367/1: CLCD mode no longer supported on the RealView boards
[ARM] 3366/1: Allow the 16bpp mode configuration in the CLCD control register
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (112 commits)
[libata] sata_mv: fix irq port status usage
[PATCH] libata: move IDENTIFY info printing from ata_dev_read_id() to ata_dev_configure()
[PATCH] libata: use local *id instead of dev->id in ata_dev_configure()
[PATCH] libata: check Word 88 validity in ata_id_xfer_mask()
[PATCH] libata: fix class handling in ata_bus_probe()
[PATCH] ahci: enable prefetching for PACKET commands
libata: turn on ATAPI by default
[PATCH] sata_sil24: lengthen softreset timeout
[PATCH] sata_sil24: exit early from softreset if SStatus reports no device
[PATCH] libata: fix missing classes[] initialization in ata_bus_probe()
[PATCH] libata: kill unused xfer_mode functions
[PATCH] libata: reimplement ata_set_mode() using xfer_mask helpers
[PATCH] libata: use xfer_mask helpers in ata_dev_set_mode()
[PATCH] libata: use ata_id_xfermask() in ata_dev_configure()
[PATCH] libata: add xfer_mask handling functions
[PATCH] libata: improve xfer mask constants and update ata_mode_string()
[PATCH] libata: rename ATA_FLAG_FLUSH_PIO_TASK to ATA_FLAG_FLUSH_PORT_TASK
[PATCH] libata: kill unused pio_task and packet_task
[PATCH] libata: convert pio_task and packet_task to port_task
[PATCH] libata: implement port_task
...
This merges the DVB tree, but fixes up the history that had gotten
screwed up by a broken commit.
The history is fixed up by re-doing the commit properly (taking the
resolve from the final result of the original), and then cherry-picking
the commits that followed the broken merge.
* dvb: (190 commits)
V4L/DVB (3545): Fixed no_overlay option and quirks on saa7134 driver
V4L/DVB (3543): Fix Makefile to adapt to bt8xx/ conversion
V4L/DVB (3538): Bt8xx documentation update
V4L/DVB (3537a): Whitespace cleanup
V4L/DVB (3533): Add WSS (wide screen signalling) module parameters
V4L/DVB (3532): Moved duplicated code of ALPS BSRU6 tuner to a standalone file.
V4L/DVB (3530): Kconfig: remove VIDEO_AUDIO_DECODER
V4L/DVB (3529): Kconfig: add menu items for cs53l32a and wm8775 A/D converters
V4L/DVB (3528): Kconfig: fix ATSC frontend menu item names by manufacturer
V4L/DVB (3527): VIDEO_CPIA2 must depend on USB
V4L/DVB (3525): Kconfig: remove VIDEO_DECODER
V4L/DVB (3524): Kconfig: add menu items for saa7115 and saa7127
V4L/DVB (3494): Kconfig: select VIDEO_MSP3400 to build msp3400.ko
V4L/DVB (3522): Fixed a trouble with other PAL standards
V4L/DVB (3521): Avoid warnings at video-buf.c
V4L/DVB (3514): SAA7113 doesn't have auto std chroma detection mode
V4L/DVB (3513): Remove saa711x driver
V4L/DVB (3509): Make a needlessly global function static.
V4L/DVB (3506): Cinergy T2 dmx cleanup on disconnect
V4L/DVB (3504): Medion 7134: Autodetect second bridge chip
...
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conflicts:
Documentation/video4linux/CARDLIST.cx88
drivers/media/video/cx88/Kconfig
drivers/media/video/em28xx/em28xx-video.c
drivers/media/video/saa7134/saa7134-dvb.c
Resolved as in the original merge by Mauro Carvalho Chehab
Since tfm contexts can contain arbitrary types we should provide at least
natural alignment (__attribute__ ((__aligned__))) for them. In particular,
this is needed on the Xscale which is a 32-bit architecture with a u64 type
that requires 64-bit alignment. This problem was reported by Ronen Shitrit.
The crypto_tfm structure's size was 44 bytes on 32-bit architectures and
80 bytes on 64-bit architectures. So adding this requirement only means
that we have to add an extra 4 bytes on 32-bit architectures.
On i386 the natural alignment is 16 bytes which also benefits the VIA
Padlock as it no longer has to manually align its context structure to
128 bits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move nf_bridge_alloc from header file to the one place it is
used and optimize it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.
Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum. We can merge these two
operations into one function with skb_pull_rcsum. This makes sense
since most pull operations on receive skb's need to update the
checksum.
I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.
Since this is a brand new function we get to add an extra check on the
len argument. As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.
The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.
Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.
One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.
There are still a few warnings to fix, but this patch deals with the
important cases.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.
Patch purpose:
This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using. The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection. In the case of UDP, the
security context is for each individual packet. An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.
Patch design approach:
- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association. The application may retrieve this context using
getsockopt. When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations. If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.
- Design for UDP
Unlike TCP, UDP is connectionless. This requires a somewhat different
API to retrieve the peer security context. With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down. With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.
The solution is to build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).
Patch implementation details:
- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag. As an example (ignoring error
checking):
getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);
The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations. If
these have security associations with security contexts, the security
context is returned.
getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.
- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt. Then the application retrieves the security context using
the auxiliary data mechanism.
An example server application for UDP should look like this:
toggle = 1;
toggle_len = sizeof(toggle);
setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
cmsg_hdr->cmsg_level == SOL_IP &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}
ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer. A new
ancillary message type SCM_SECURITY.
When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space. An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space. The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.
Testing:
We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built. For TCP, we can then
extract the peer security context using getsockopt on either end. For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated. Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.
Those days are long gone, so we can use the full 16-bits by default
now.
There is a sysctl added so that we can still interact with such old
stacks
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel. And turn dev_hold into an inline instead of a macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
gets rid of some of the leftover legacy.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here goes a patch for supporting TOIM3232 based serial IrDA dongles.
The code is based on the tekram dongle code.
It's been tested with a TOIM3232 based IRWave 320S dongle. It may work
for TOIM4232 dongles, although it's not been tested.
Signed-off-by: David Basden <davidb-irda@rcpt.to>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Uninline kfree_skb, which saves some 15k of object code on my notebook.
o Allow kfree_skb to be called with a NULL argument.
Subsequent patches can remove conditional from drivers and further
reduce source and object size.
Signed-off-by: Jrn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly.
Based upon suggestions from Yoshifuji.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nlmsvc_traverse_shares return value is always zero, hence useless.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Note that we never return non-zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use a spinlock to ensure unique sequence numbers when creating krb5 gss tokens.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done
/proc/sys/net/dccp/default/ack_ratio
2
/proc/sys/net/dccp/default/rx_ccid
3
/proc/sys/net/dccp/default/send_ackvec
1
/proc/sys/net/dccp/default/send_ndp
1
/proc/sys/net/dccp/default/seq_window
100
/proc/sys/net/dccp/default/tx_ccid
3
[root@qemu ~]#
So if wanting to test ccid3 as the tx CCID one can just do:
[root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid
[root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid
[root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid
2
3
[root@qemu ~]#
Of course we also need the setsockopt for each app to tell its preferences, but
for testing or defining something other than CCID2 as the default for apps that
don't explicitely set their preference the sysctl interface is handy.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per the draft. This fixes the build when netfilter dccp components
are built and dccp isn't. Thanks to Reuben Farrelly for reporting
this.
The following changesets will introduce /proc/sys/net/dccp/defaults/
to give more flexibility to DCCP developers and testers while apps
doesn't use setsockopt to specify the desired CCID, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips
the real hook by registering with high priority and returning NF_STOP if
skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not
set. The flag is only set during the simulated hook.
Because skb->nf_bridge is only freed when the packet is destroyed, the
packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but
in the case of tunnel devices on top of the bridge also all further ones.
Forwarded packets from a bridge encapsulated by a tunnel device and sent
as locally outgoing packet will also still have the incorrect bridge
information from the input path attached.
We already have nf_reset calls on all RX/TX paths of tunnel devices,
so simply reset the nf_bridge field there too. As an added bonus,
the bridge information for locally delivered packets is now also freed
when the packet is queued to a socket.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
does and anynways neither ccid2 nor ccid3 were using it.
2. Rename struct ccid to struct ccid_operations and introduce struct ccid
with a pointer to ccid_operations and rigth after it the rx or tx
private state.
3. Remove the pointer to the state of the half connections from struct
dccp_sock, now its derived thru ccid_priv() from the ccid pointer.
Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the core functionality needed for sync events
for ipsec. Derived work of Krisztian KOVACS <hidden@balabit.hu>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep a bitmask of multicast groups with subscribed listeners to let
netlink users check for listeners before generating multicast
messages.
Queries don't perform any locking, which may result in false
positives, it is guaranteed however that any new subscriptions are
visible before bind() or setsockopt() return.
Signed-off-by: Patrick McHardy <kaber@trash.net>
ACKed-by: Jamal Hadi Salim<hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid unneccessary event message generation by checking for netlink
listeners before building a message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to make decisions based on the revision (and address family
with a follow-up patch) at runtime.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new functions for common match/target checks (private data
size, valid hooks, valid tables and valid protocols) to get more consistent
error reporting and to avoid each module duplicating them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now CCID2 is the default, as stated in the RFC drafts, but we allow
a config where just CCID3 is built, where CCID3 becomes the default.
Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several
issues on the merge process.
For now CCID2 was turned the default for all SOCK_DCCP connections, but this
will be remedied soon with the merge of the feature negotiation code.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For iterating over list of given type continuing from existing point.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For iterate over list of given type from existing point safe against removal of
list entry.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>