The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.
lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.
Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds
Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds
Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds
Signed-off-by: Jens Axboe <axboe@suse.de>
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.
Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c
Signed-off-by: Jens Axboe <axboe@suse.de>
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.
And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.
Signed-off-by: Jens Axboe <axboe@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Update version and reldate
[TG3]: Fix bug in nvram write
[TG3]: Add reset_phy parameter to chip reset functions
[TG3]: Reset chip when changing MAC address
[TG3]: Add phy workaround
[TG3]: Call netif_carrier_off() during phy reset
[IPV6]: Fix race in route selection.
[XFRM]: fix incorrect xfrm_policy_afinfo_lock use
[XFRM]: fix incorrect xfrm_state_afinfo_lock use
[TCP]: Fix unlikely usage in tcp_transmit_skb()
[XFRM]: fix softirq-unsafe xfrm typemap->lock use
[IPSEC]: Fix IP ID selection
[NET]: use hlist_unhashed()
[IPV4]: inet_init() -> fs_initcall
[NETLINK]: cleanup unused macro in net/netlink/af_netlink.c
[PKT_SCHED] netem: fix loss
[X25]: fix for spinlock recurse and spinlock lockup with timer handler
1) The audit_ipc_perms() function has been split into two different
functions:
- audit_ipc_obj()
- audit_ipc_set_perm()
There's a key shift here... The audit_ipc_obj() collects the uid, gid,
mode, and SElinux context label of the current ipc object. This
audit_ipc_obj() hook is now found in several places. Most notably, it
is hooked in ipcperms(), which is called in various places around the
ipc code permforming a MAC check. Additionally there are several places
where *checkid() is used to validate that an operation is being
performed on a valid object while not necessarily having a nearby
ipcperms() call. In these locations, audit_ipc_obj() is called to
ensure that the information is captured by the audit system.
The audit_set_new_perm() function is called any time the permissions on
the ipc object changes. In this case, the NEW permissions are recorded
(and note that an audit_ipc_obj() call exists just a few lines before
each instance).
2) Support for an AUDIT_IPC_SET_PERM audit message type. This allows
for separate auxiliary audit records for normal operations on an IPC
object and permissions changes. Note that the same struct
audit_aux_data_ipcctl is used and populated, however there are separate
audit_log_format statements based on the type of the message. Finally,
the AUDIT_IPC block of code in audit_free_aux() was extended to handle
aux messages of this new type. No more mem leaks I hope ;-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below builds upon the patch sent earlier and adds subject label to
all audit events generated via the netlink interface. It also cleans up a few
other minor things.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The below patch should be applied after the inode and ipc sid patches.
This patch is a reworking of Tim's patch that has been updated to match
the inode and ipc patches since its similar.
[updated:
> Stephen Smalley also wanted to change a variable from isec to tsec in the
> user sid patch. ]
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below converts IPC auditing to collect sid's and convert to context
string only if it needs to output an audit record. This patch depends on the
inode audit change patch already being applied.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Previously, we were gathering the context instead of the sid. Now in this patch,
we gather just the sid and convert to context only if an audit event is being
output.
This patch brings the performance hit from 146% down to 23%
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following patch provides selinux interfaces that will allow the audit
system to perform filtering based on the process context (user, role, type,
sensitivity, and clearance). These interfaces will allow the selinux
module to perform efficient matches based on lower level selinux constructs,
rather than relying on context retrievals and string comparisons within
the audit module. It also allows for dominance checks on the mls portion
of the contexts that are impossible with only string comparisons.
Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The locking for the uart_port is over complicated, and can be
simplified if we introduce a flag to indicate that a port is "dead"
and will be removed.
This also helps the validator because it removes a case of non-nested
unlock ordering.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Use hlist_unhashed() rather than accessing inside data structure.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While writing to an event device allows to set repeat rate for an
individual input device there is no way to retrieve current settings
so we need to ressurect EVIOCGREP. Also ressurect EVIOCSREP so we
have a symmetrical interface.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
- Add new SA_PROBEIRQ which suppresses the new sharing-mismatch warning.
Some drivers like to use request_irq() to find an unused interrupt slot.
- Use it in i82365.c
- Kill unused SA_PROBE.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
[PATCH] Added URI of "linux kernel development process"
[PATCH] Kobject: possible cleanups
[PATCH] Fix OCFS2 warning when DEBUG_FS is not enabled
[PATCH] Kobject: fix build error
[PATCH] Frame buffer: remove cmap sysfs interface
This patch contains the following possible cleanups:
- #if 0 the following unused global function:
- subsys_remove_file()
- remove the following unused EXPORT_SYMBOL's:
- kset_find_obj
- subsystem_init
- remove the following unused EXPORT_SYMBOL_GPL:
- kobject_add_dir
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix the following warning which happens when OCFS2_FS is enabled but
DEBUG_FS isn't:
fs/ocfs2/dlmglue.c: In function `ocfs2_dlm_init_debug':
fs/ocfs2/dlmglue.c:2036: warning: passing arg 5 of `debugfs_create_file' discards qualifiers from pointer target type
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes a build error for various odd combinations of CONFIG_HOTPLUG
and CONFIG_NET.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Nigel Cunningham <ncunningham@cyclades.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
find_get_pages_contig() will break out if we hit a hole in the page cache.
From Andrew Morton, small modifications and documentation by me.
Signed-off-by: Jens Axboe <axboe@suse.de>
This is a workaround for the case edge-triggered irq's. Several users
seem to have broken configurations sharing edge-triggered irq's. To avoid
losing IRQ's, reshedule if more work arrives.
The changes to netdevice.h are to extract the part that puts device
back in list into separate inline.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
Providing more accurate coordinates for thumb press requires additional
steps in the filtering logic:
- Ignore samples found invalid by the debouncing logic, or the ones that
have out of bound pressure value.
- Add a parameter to repeat debouncing, so that more then two consecutive
good readings are required for a valid sample.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Acked-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Some (?) non-x86 architectures require 8byte alignment for u_int64_t
even when compiled for 32bit, using u_int32_t in compat_xt_counters
breaks on these architectures, use u_int64_t for everything but x86.
Reported by Andreas Schwab <schwab@suse.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't have to #if guard prototypes.
This also fixes a bug observed by Randy Dunlap due to a misspelled
option in the #if.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some sanity checking. truesize should be at least sizeof(struct
sk_buff) plus the current packet length. If not, then truesize is
seriously mangled and deserves a kernel log message.
Currently we'll do the check for release of stream socket buffers.
But we can add checks to more spots over time.
Incorporating ideas from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
NFS: remove needless check in nfs_opendir()
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
NFS: make 2 functions static
NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
NFS: fix PROC_FS=n compile error
VFS: Fix another open intent Oops
RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
fs/built-in.o: In function `nfs_show_stats':inode.c:(.text+0x15481a): undefined reference to `rpc_print_iostats'
net/built-in.o: In function `rpc_destroy_client': undefined reference to `rpc_free_iostats'
net/built-in.o: In function `rpc_clone_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `rpc_new_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `xprt_release': undefined reference to `rpc_count_iostats'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Noted by Sergei Shtylylov <sshtylyov@ru.mvista.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the IDE device on ATI SB600
Signed-off-by: Felix Kuehling <fkuehlin@ati.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.
We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.
prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.
Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (679 commits)
commit 7676f83aeb
Author: James Bottomley <James.Bottomley@steeleye.com>
Date: Fri Apr 14 09:47:59 2006 -0500
[SCSI] scsi_transport_sas: don't scan a non-existent end device
Any end device that can't support any of the scanning protocols
shouldn't be scanned, so set its id to -1 to prevent
scsi_scan_target() being called for it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
commit 3c0c25b97c
Author: Moore, Eric <Eric.Moore@lsil.com>
Date: Thu Apr 13 16:08:17 2006 -0600
[SCSI] mptfusion - fix panic in mptsas_slave_configure
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (169 commits)
commit 78a596b449
Author: Adrian Bunk <bunk@stusta.de>
Date: Fri Mar 31 01:38:12 2006 -0800
[PATCH] remove kernel/power/pm.c:pm_unregister()
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 21440d3133
Author: David Brownell <david-b@pacbell.net>
Date: Sat Apr 1 10:21:52 2006 -0800
[PATCH] dma doc updates
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (158 commits)
commit 4f705ae3e9
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date: Mon Apr 3 17:09:22 2006 -0700
[PATCH] DMI: move dmi_scan.c from arch/i386 to drivers/firmware/
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
...
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sparse warns about casting to a __bitwise type. However, it's correct
to do when defining the enum for pci_bus_flags_t, so add a __force to
quiet the warnings. This will fix getting
include/linux/pci.h💯26: warning: cast to restricted type
from sparse all over the build.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The naming of the constant defined for PCI ID 1022:7450 does not seem
to match the information at http://pciids.sourceforge.net/:
http://pci-ids.ucw.cz/iii/?i=1022
There 1022:7450 is listed as "AMD-8131 PCI-X Bridge" while 1022:7451
is listed as "AMD-8131 PCI-X IOAPIC". Yet, the current definition for
0x7450 is PCI_DEVICE_ID_AMD_8131_APIC. It seems to me like that name
should map to 0x7451, while a name like PCI_DEVICE_ID_AMD_8131_BRIDGE
should map to 0x7450.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print more diagnostic info to help identify the source of power management
suspend failures.
Example:
usb_hcd_pci_suspend(): pci_set_power_state+0x0/0x1af() returns -22
pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x11b() returns -22
suspend_device(): pci_device_suspend+0x0/0x34() returns -22
Work-in-progress. It needs lots more suspend_report_result() calls sprinkled
everywhere.
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[BLOCK] delay all uevents until partition table is scanned
Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.
We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move common definitions for NET2280 to <linux/usb/net2280.h>, so that I can
use them in prism54usb (it is not merged yet, but I plan to do it soon).
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pen down IRQ will toggle during each X,Y,Z measurement cycle.
Even though the IRQ is disabled it will be latched and delivered
when after enable_irq. Thus in the IRQ handler we must avoid
starting a new measurement cycle when such an "unwanted" IRQ happens.
Add a get_pendown_state platform function, which will probably
determine this by reading the current GPIO level of the pen IRQ pin.
Move the IRQ reenabling from the SPI RX function to the timer. After
the last power down message the pen IRQ pin is still active for a
while and get_pendown_state would report incorrectly a pen down state.
When suspending we should check the ts->pending flag instead of
ts->pendown, since the timer can be pending regardless of ts->pendown.
Also if ts->pending is set we can be sure that the timer is running,
so no need to rearm it. Similarly if ts->pending is not set we can
be sure that the IRQ is enabled (and the timer is not).
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Some touchscreens seem to oscillate heavily for a while after touching
the screen. Implement support for sampling the screen until we get two
consecutive values that are close enough.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The in-kernel Sangoma drivers are both not compiling and marked as BROKEN
since at least kernel 2.6.0.
Sangoma offers out-of-tree drivers, and David Mandelstam told me Sangoma
does no longer maintain the in-kernel drivers and prefers to provide them
as a separate installation package.
This patch therefore removes these drivers.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.
Signed-off-by: Jens Axboe <axboe@suse.de>
In vsyscall function do_vgettimeofday(), some functions are declared as
inlined, which is a hint for gcc to compile the function inlined but it
not forced. Sometimes compiler does not compile the function as
inlined, so here inline is replaced by __always_inline prefix.
It does not happen in gcc compiler actually, but it possibly happens.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd <-> fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed
Bugzilla Bug 6299:
A pixel size of 8 bits produces wrong logo colors in x86_64.
The driver has 2 methods for setting the color map, using the protected
mode interface provided by the video BIOS and directly writing to the VGA
registers. The former is not supported in x86_64 and the latter is enabled
only in i386.
Fix by enabling the latter method in x86_64 only if supported by the BIOS.
If both methods are unsupported, change the visual of vesafb to
STATIC_PSEUDOCOLOR.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every caller of svc_take_page ignores its return value and assumes it
succeeded. So just WARN() instead of returning an ignored error. This would
have saved some time debugging a recent nfsd4 problem.
If there are still failure cases here, then the result is probably that we
overwrite an earlier part of the reply while xdr-encoding.
While the corrupted reply is a nasty bug, it would be worse to panic here and
create the possibility of a remote DOS; hence WARN() instead of BUG().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An UML user reported (against 2.6.13.3/UML) he got kernel Oopses when
trying to rmmod (on a kernel with module unloading enabled) a module
compiled with module unloading disabled. As crashing is a very correct
thing to do in that case, a solution is altering the vermagic string to
include this too.
Possibly, however, the code should not crash in this case, even if the
module didn't support unloading - it should simply abort the module
removal. In this case, fixing that bug would be a better solution. I've
not investigated though.
(akpm: a bit marginal - root screwed up and shot himself in the foot).
Cc: Hayim Shaul <hayim@post.tau.ac.il>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the last conversions of pci_set_dma_mask(),
pci_set_consistent_dma_mask() and pci_dma_supported() to use DMA_xBIT_MASK
constants from linux/dma-mapping.h
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G. This patch fixes those.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before commit 47e65328a7, next_thread() took
a const task_t. Reinstate the const qualifier, getting the next thread
never changes the current thread.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We changed the wrong symbol. It's tty_insert_flip_string_flags() which is
called from the previously-non-GPL'ed now-inlined tty_insert_flip_char().
Fix that up, and uninline tty_schedule_flip() while we're there.
Cc: Tobias Powalowski <t.powa@gmx.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lay out the structure definitions in include/linux/leds.h to be aligned as
much as possible. Also minor updates to the comments to make them more
concise.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the scheduled unexport of panic_timeout.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.
This also changes XFS, which is the only in-tree user of this idiom that I
could find. The XFS piece is compile-tested only.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Andrew Morton <akpm@osdl.org>
net/socket.c:148: warning: initialization from incompatible pointer type
extern declarations in .c files! Bad boy.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.
Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Signed-off-by: Jens Axboe <axboe@suse.de>
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add optional input and output offsets to sys_splice(), for seekable file
descriptors:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:
- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.
- splice: small micro-optimization: less pointer dereferencing in splice
methods
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add checksum operation which takes care of verifying the checksum and
dealing with HW checksum errors and avoids multiple checksum
operations by setting ip_summed to CHECKSUM_UNNECESSARY after
successful verification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the queue rerouter intrastructure to a generic usable
infrastructure for address family specific operations as a base for
some cleanups.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the
use of typedefs as arguments, some header files need to be moved as
well.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).
If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.
Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.
This can happen with memory hotplug when the hotplug area defines an so
far empty node.
Now use bootmem to try to allocate the mem_map in other nodes.
And if it fails don't panic, but just ignore the node.
To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.
TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.
Originally from Keith Mannthey, hacked by AK.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally from Nick Piggin, just adapted to the newer branch.
You can't check PageLRU without holding zone->lru_lock. The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().
This also adds dirty page balancing logic and O_SYNC handling.
Signed-off-by: Jens Axboe <axboe@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>