Remove the numbered SW_* entries from the input system and assign names
to the existing users.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The raw read/write access to NAND (without ECC) has been changed in the
NAND rework. Expose the new way - setting the file mode via ioctl - to
userspace. Also allow to read out the ecc statistics information so userspace
tools can see that bitflips happened and whether errors where correctable
or not. Also expose the number of bad blocks for the partition, so nandwrite
can check if the data fits into the parition before writing to it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hopefully the last iteration on this!
The handling of out of band data on NAND was accompanied by tons of fruitless
discussions and halfarsed patches to make it work for a particular
problem. Sufficiently annoyed by I all those "I know it better" mails and the
resonable amount of discarded "it solves my problem" patches, I finally decided
to go for the big rework. After removing the _ecc variants of mtd read/write
functions the solution to satisfy the various requirements was to refactor the
read/write _oob functions in mtd.
The major change is that read/write_oob now takes a pointer to an operation
descriptor structure "struct mtd_oob_ops".instead of having a function with at
least seven arguments.
read/write_oob which should probably renamed to a more descriptive name, can do
the following tasks:
- read/write out of band data
- read/write data content and out of band data
- read/write raw data content and out of band data (ecc disabled)
struct mtd_oob_ops has a mode field, which determines the oob handling mode.
Aside of the MTD_OOB_RAW mode, which is intended to be especially for
diagnostic purposes and some internal functions e.g. bad block table creation,
the other two modes are for mtd clients:
MTD_OOB_PLACE puts/gets the given oob data exactly to/from the place which is
described by the ooboffs and ooblen fields of the mtd_oob_ops strcuture. It's
up to the caller to make sure that the byte positions are not used by the ECC
placement algorithms.
MTD_OOB_AUTO puts/gets the given oob data automaticaly to/from the places in
the out of band area which are described by the oobfree tuples in the ecclayout
data structre which is associated to the devicee.
The decision whether data plus oob or oob only handling is done depends on the
setting of the datbuf member of the data structure. When datbuf == NULL then
the internal read/write_oob functions are selected, otherwise the read/write
data routines are invoked.
Tested on a few platforms with all variants. Please be aware of possible
regressions for your particular device / application scenario
Disclaimer: Any whining will be ignored from those who just contributed "hot
air blurb" and never sat down to tackle the underlying problem of the mess in
the NAND driver grown over time and the big chunk of work to fix up the
existing users. The problem was not the holiness of the existing MTD
interfaces. The problems was the lack of time to go for the big overhaul. It's
easy to add more mess to the existing one, but it takes alot of effort to go
for a real solution.
Improvements and bugfixes are welcome!
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most of those macros are unused and the used ones just obfuscate
the code. Remove them and fixup all users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The nand_oobinfo structure is not fitting the newer error correction
demands anymore. Replace it by struct nand_ecclayout and fixup the users
all over the place. Keep the nand_oobinfo based ioctl for user space
compability reasons.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The info structure for out of band data was copied into
the mtd structure. Make it a pointer and remove the ability
to set it from userspace. The position of ecc bytes is
defined by the hardware and should not be changed by software.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The platform structure was lacking an oobinfo field.
The NDFC driver had some remains from another tree.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These IDs are also used by the drivers/ide/pci changes submitted by
VIA.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds new device ids for MCP61 and MCP65 chips.
Signed-Off-By: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Modularize the write function and reorganaize the internal buffer
management. Remove obsolete chip options and fixup all affected
users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the core of the read function out and implement
seperate handling functions for software and hardware
ECC.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add read/write function pointers to struct nand_ecc_ctrl to
prepare the modulaization of nand_read/write functions. The
current implementation handles every type of ecc mode
software/hardware and all kinds of strange ecc placement
schemes in one switch/if construct. Thats too complex to
maintain and too inflexible to expand. Modularization will
also shorten the code pathes of the read/write functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
FLASH - especially NAND FLASH - will become less reliable
and bit flips more likely. Add an ECC statistics struct
to struct mtd_info to keep track of this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The nand driver has a superflous read ready / command
delay in the read functions. This was added to handle
chips which have an automatic read forward. Newer
chips do not have this functionality anymore. Add this
option to avoid the delay / I/O operation. Mark all
large page chips with the new option flag.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ATA_FLAG_IRQ_MASK was added when I did the original data transfer with
IRQ masked bits for PIO. It has since been replaced by ->pio_data_xfer
methods so should be removed so nobody uses it by mistake thinking it
still works.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
We need to pass the device in order to do per device checks such as
32bit I/O enables. With the changes to include dev->ap we now don't have
to add parameters however just clean them up. Also add data_xfer methods
to the existing drivers except ata_piix (which is in the other block of
patches). If you reject the piix one just add a data_xfer to it...
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch revives pci_find_ext_capability (has been disabled a couple month
ago since it was not used anywhere. See http://lkml.org/lkml/2006/1/20/247).
It will now be used by the myri10ge driver.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
drivers/pci/pci.c | 3 +--
include/linux/pci.h | 2 ++
2 files changed, 3 insertions(+), 2 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The previous change of the command / hardware control allows to
remove the write_byte/word functions completely, as their only
user were nand_command and nand_command_lp.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The hwcontrol function enforced a step by step state machine
for any kind of hardware chip access. Let the hardware driver
know which control bits are set and inform it about a change
of the control lines. Let the hardware driver write out the
command and address bytes directly. This gives a peformance
advantage for address bus controlled chips and simplifies the
quirks in the hardware drivers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These flags are needed by userspace - move them outside __KERNEL__
(Pointed out by dwmw2)
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MTD clients are agnostic of FLASH which needs ECC suppport.
Remove the functions and fixup the callers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
First step of modularizing ECC support.
- Move ECC related functionality into a seperate embedded data structure
- Get rid of the hardware dependend constants to simplify new ECC models
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The NAND driver used a mix of unsigned char, u_char amd uint8_t
data types. Consolidate to uint8_t usage
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Replace the chip lock by a the controller lock. For simple drivers a
dummy controller structure is created by the scan code.
This simplifies the locking algorithm in nand_get/release_chip().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
At least two flashes exists that have the concept of a minimum write unit,
similar to NAND pages, but no other NAND characteristics. Therefore, rename
the minimum write unit to "writesize" for all flashes, including NAND.
Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
This patch enables agpgart on a Via "PT880 Ultra" based motherboard
(Asus P4V800D-X). The PCI ID of the PT880 Ultra is 0x0308 instead of
0x0258 of the PT880.
The patched via-agp passes testgart.
Signed-off-by: Magnus Kessler <Magnus.Kessler@gmx.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Andy added code to buddy allocator which does not require the zone's
endpoints to be aligned to MAX_ORDER. An issue is that the buddy allocator
requires the node_mem_map's endpoints to be MAX_ORDER aligned. Otherwise
__page_find_buddy could compute a buddy not in node_mem_map for partial
MAX_ORDER regions at zone's endpoints. page_is_buddy will detect that
these pages at endpoints are not PG_buddy (they were zeroed out by bootmem
allocator and not part of zone). Of course the negative here is we could
waste a little memory but the positive is eliminating all the old checks
for zone boundary conditions.
SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the logic
either because the holes and endpoints are handled differently. This
leaves checking alloc_remap and other arches which privately allocate for
node_mem_map.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- remove the following global function that is both unused and
unimplemented:
- register_firmware()
- make the following needlessly global function static:
- firmware_class_uevent()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This driver supports the SPI controller on the MPC83xx SoC devices from
Freescale. Note, this driver supports only the simple shift register SPI
controller and not the descriptor based CPM or QUICCEngine SPI controller.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the HSM error_mask mapping.
Changes:
- Better mapping in ac_err_mask()
- In HSM_ST_FIRST ans HSM_ST state, check ATA_ERR|ATA_DF and map it to AC_ERR_DEV instead of AC_ERR_HSM.
- In HSM_ST_FIRST and HSM_ST state, map DRQ=1 ERR=1 to AC_ERR_HSM.
- For PIO data in and DRQ=1 ERR=1, add check after the junk data block is read.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Patch from Pavel Pisa
There has been problems that for some paths that clock are not stopped
during new command programming and initiation. Result is issuing
of incorrect command to the card. Some other problems are cleaned too.
Noisy report of known ERRATUM #4 has been suppressed.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Device node major/minor numbers are just stored in the payload of a single
data node. Just extend that to 4 bytes and use new_encode_dev() for it.
We only use the 4-byte format if we _need_ to, if !old_valid_dev(foo).
This preserves backwards compatibility with older code as much as
possible. If we do make devices with major or minor numbers above 255, and
then mount the file system with the old code, it'll just read the first
two bytes and get the numbers wrong. If it comes to garbage-collect it,
it'll then write back those wrong numbers. But that's about the best we
can expect.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We have to pack at least the jint16_t structure, because otherwise it'll
be four bytes in size. Thankfully, we can do that and _not_ pack the
actual node structures, and the compiler still doesn't emit stupid code.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We need to be able to have a "SPI bus 0" matching chip numbering; but
that number was wrongly used to flag dynamic allocation of a bus number.
This patch resolves that issue; now negative numbers trigger dynamic alloc.
It also updates the how-to-write-a-controller-driver overview to mention
this stuff.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add spi_device hook for LSB-first word encoding, and update all the
(in-tree) controller drivers to reject such devices. Eventually,
some controller drivers will be updated to support lsb-first encodings
on the wire; no current drivers need this.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Renamed bitbang_transfer_setup to follow convention of other exported symbols
from spi-bitbang. Exported spi_bitbang_setup_transfer to allow users of
spi-bitbang to use the function in their own setup_transfer.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This removes superfluous whitespace in the <linux/spi/spi.h> header.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some protocols (like one for some bitmap displays) require different clock
speed or word size settings for each transfer in an SPI message. This adds
those parameters to struct spi_transfer. They are to be used when they are
nonzero; otherwise the defaults from spi_device are to be used.
The patch also adds a setup_transfer callback to spi_bitbang, uses it for
messages that use those overrides, and implements it so that the pure
bitbanging code can help resolve any questions about how it should work.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
can_share_swap_page() is used to check if the page has the last reference.
This avoids allocating a new page for COW if it's the last page.
However, if CONFIG_SWAP is not set, can_share_swap_page() is defined as 0,
thus always causes a copy for the last COW page. The below simple patch
fixes it.
Signed-off-by: Hua Zhong <hzhong@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
slab_is_available() indicates slab based allocators are available for use.
SPARSEMEM code needs to know this as it can be called at various times
during the boot process.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Even since a previous patch:
Fix race between CONFIG_DEBUG_SLABALLOC and modules
Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000)
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe
The function symbol_put_addr() will deadlock the kernel.
symbol_put_addr() would acquire modlist_lock, then while holding the lock call
two functions kernel_text_address() and module_text_address() which also try
to acquire the same lock. This deadlocks the kernel of course.
This patch changes symbol_put_addr() to not acquire the modlist_lock, it
doesn't need it since it never looks at the module list directly. Also, it
now uses core_kernel_text() instead of kernel_text_address(). The latter has
an additional check for addr inside a module, but we don't need to do that
since we call module_text_address() (the same function kernel_text_address
uses) ourselves.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Cc: Zwane Mwaikambo <zwane@fsmlabs.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Johannes Stezenbach <js@linuxtv.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With "Paul E. McKenney" <paulmck@us.ibm.com>
Introduce rcu_needs_cpu() interface. This can be used to tell if there
will be a new rcu batch on a cpu soon by looking at the curlist pointer.
This can be used to avoid to enter a tickless idle state where the cpu
would miss that a new batch is ready when rcu_start_batch would be called
on a different cpu.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all NCQ related stuff are in place, implement NCQ device
configuration and bump ATA_MAX_QUEUE to 32 thus activating NCQ
support.
Original implementation is from Jens Axboe.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add ap->qc_active and ap->sactive, mask of all active qcs and libata's
view of the SActive register, respectively. Also, implement
ata_qc_complete_multiple() which takes new qc_active mask and complete
multiple qcs according to the mask.
These will be used to track NCQ commands and complete them. The
distinction between ap->qc_active and ap->sactive is also useful for
later PM implementation.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Rename ap->qactive to ap->qc_allocated. This is to accomodate
addition of ap->qc_active, mask of active qcs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement new EH. The exported interface is ata_do_eh() which is to
be called from ->error_handler and performs the following steps to
recover the failed port.
ata_eh_autopsy() : analyze SError/TF, determine the cause of failure
and required recovery actions and record it in
ap->eh_context
ata_eh_report() : report the failure to user
ata_eh_recover() : perform recovery actions described in ap->eh_context
ata_eh_finish() : finish failed qcs
LLDDs can customize error handling by modifying eh_context before
calling ata_do_eh() or, if necessary, doing so inbetween each major
steps by calling each step explicitly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
struct ata_eh_info serves as the communication channel between
execution path and EH. Execution path describes detected error
condition in ap->eh_info and EH recovers the port using it. To avoid
missing error conditions detected during EH, EH makes its own copy of
eh_info and clears it on entry allowing error info to accumulate
during EH.
Most EH states including EH's copy of eh_info are stored in
ap->eh_context (struct ata_eh_context) which is owned by EH and thus
doesn't require any synchronization to access and alter. This
standardized context makes it easy to integrate various parts of EH
and extend EH to handle multiple links (for PM).
Signed-off-by: Tejun Heo <htejun@gmail.com>
This patch implements ata_ering and uses it to define dev->ering.
ata_ering is a ring buffer which records libata errors - whether a
command was for normar IO request, err_mask and timestamp. Errors are
recorded per-device in dev->ering. This will be used by EH to
determine recovery actions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update ata_scsi_error() for new EH. ata_scsi_error() is responsible
for claiming timed out qcs and invoking ->error_handler in safe and
synchronized manner. As the state of the controller is unknown if a
qc has timed out, the port is frozen in such cases.
Note that ata_scsi_timed_out() isn't used for new EH. This is because
a timed out qc cannot be claimed by EH without freezing the port and
freezing the port in ata_scsi_timed_out() results in unnecessary
abortion of other active qcs. ata_scsi_timed_out() can be removed
once all drivers are converted to new EH.
While at it, add 'TODO: kill' comments to old EH functions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Freezing is performed atomic w.r.t. host_set->lock and once frozen
LLDD is not allowed to access the port or any qc on it. Also, libata
makes sure that no new qc gets issued to a frozen port.
A frozen port is thawed after a reset operation completes
successfully, so reset methods must do its job while the port is
frozen. During initialization all ports get frozen before requesting
IRQ, so reset methods are always invoked on a frozen port.
Optional ->freeze and ->thaw operations notify LLDD that the port is
being frozen and thawed, respectively. LLDD can disable/enable
hardware interrupt in these callbacks if the controller's IRQ mask can
be changed dynamically. If the controller doesn't allow such
operation, LLDD can check for frozen state in the interrupt handler
and ack/clear interrupts unconditionally while frozen.
Signed-off-by: Tejun Heo <htejun@gmail.com>
ata_port_schedule_eh() directly schedules EH for @ap without
associated qc. Once EH scheduled, no further qc is allowed and EH
kicks in as soon as all currently active qc's are drained.
ata_port_abort() schedules all currently active commands for EH by
qc_completing them with ATA_QCFLAG_FAILED set. If ata_port_abort()
doesn't find any qc to abort, it directly schedule EH using
ata_port_schedule_eh().
These two functions provide ways to invoke EH for conditions which
aren't directly related to any specfic qc.
Signed-off-by: Tejun Heo <htejun@gmail.com>
There are several ways a qc can get schedule for EH in new EH. This
patch implements one of them - completing a qc with ATA_QCFLAG_FAILED
set or with non-zero qc->err_mask. ALL such qc's are examined by EH.
New EH schedules a qc for EH from completion iff ->error_handler is
implemented, qc is marked as failed or qc->err_mask is non-zero and
the command is not an internal command (internal cmd is handled via
->post_internal_cmd). The EH scheduling itself is performed by asking
SCSI midlayer to schedule EH for the specified scmd.
For drivers implementing old-EH, nothing changes. As this change
makes ata_qc_complete() rather large, it's not inlined anymore and
__ata_qc_complete() is exported to other parts of libata for later
use.
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH framework has clear distinction about who owns a qc. Every qc
starts owned by normal execution path - PIO, interrupt or whatever.
When an exception condition occurs which affects the qc, the qc gets
scheduled for EH. Note that some events (say, link lost and regained,
command timeout) may schedule qc's which are not directly related but
could have been affected for EH too. Scheduling for EH is atomic
w.r.t. ap->host_set->lock and once schedule for EH, normal execution
path is not allowed to access the qc in whatever way. (PIO
synchronization acts a bit different and will be dealt with later)
This patch make ata_qc_from_tag() check whether a qc is active and
owned by normal path before returning it. If conditions don't match,
NULL is returned and thus access to the qc is denied.
__ata_qc_from_tag() is the original ata_qc_from_tag() and is used by
libata core/EH layers to access inactive/failed qc's.
This change is applied only if the associated LLDD implements new EH
as indicated by non-NULL ->error_handler
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH may issue internal commands to recover from error while failed
qc's are still hanging around. To allow such usage, reserve tag
ATA_MAX_QUEUE-1 for internal command. This also makes it easy to tell
whether a qc is for internal command or not. ata_tag_internal() test
implements this test.
To avoid breaking existing drivers, ata_exec_internal() uses
ATA_TAG_INTERNAL only for drivers which implement ->error_handler.
For drivers using old EH, tag 0 is used. Note that this makes
ata_tag_internal() test valid only when ->error_handler is
implemented. This is okay as drivers on old EH should not and does
not have any reason to use ata_tag_internal().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add ATA_FLAG_EH_{PENDING|FROZEN}, ATA_ATA_QCFLAG_{FAILED|SENSE_VALID}
and ops->freeze, thaw, error_handler, post_internal_cmd() for new EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement ata_{port|dev}_printk() which prefixes the message with
proper identification string. This change is necessary for later PM
support because devices and links should be identified differently
depending on how they are attached.
This also helps unifying device id strings. Currently, there are two
forms in use (P is the port number D device number) - 'ataP(D):', and
'ataP: dev D '. These macros also make it harder to forget proper ID
string (e.g. printing only port number when a device is in question).
Debug message handling can be integrated into these printk macros by
passing debug type and level via @lv.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add dev->ap which points back to the port the device belongs to. This
makes it unnecessary to pass @ap for silly reasons (e.g. printks).
Also, this change is necessary to accomodate later PM support which
will introduce ATA link inbetween port and device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement ata_scr_{valid|read|write|write_flush}() and
ata_port_{online|offline}(). These functions replace
scr_{read|write}() and sata_dev_present().
Major difference between between the new SCR functions and the old
ones is that the new ones have a way to signal error to the caller.
This makes handling SCR-available and SCR-unavailable cases in the
same path easier. Also, it eases later PM implementation where SCR
access can fail due to various reasons.
ata_port_{online|offline}() functions return 1 only when they are
affirmitive of the condition. e.g. if SCR is unaccessible or
presence cannot be determined for other reasons, these functions
return 0. So, ata_port_online() != !ata_port_offline(). This
distinction is useful in many exception handling cases.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add qc->result_tf and ATA_QCFLAG_RESULT_TF. This moves the
responsibility of loading result TF from post-compltion path to qc
execution path. qc->result_tf is loaded if explicitly requested or
the qc failsa. This allows more efficient completion implementation
and correct handling of result TF for controllers which don't have
global TF representation such as sil3124/32.
Signed-off-by: Tejun Heo <htejun@gmail.com>
It's not a very good idea to allocate memory during EH. Use
statically allocated buffer for dev->id[] and add 512byte buffer
ap->sector_buf. This buffer is owned by EH (or probing) and to be
used as temporary buffer for various purposes (IDENTIFY, NCQ log page
10h, PM GSCR block).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Rename ata_down_sata_spd_limit() and friends to sata_down_spd_limit()
and likewise for simplicity & consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
If we use __attribute__((packed)), GCC will _also_ assume that the
structures aren't sensibly aligned, and it'll emit code to cope with
that instead of straight word load/save. This can be _very_ suboptimal
on architectures like ARM.
Ideally, we want an attribute which just tells GCC not to do any
padding, without the alignment side-effects. In the absense of that,
we'll just drop the 'packed' attribute and hope that everything stays as
it was (which to be fair is fairly much what we expect). And add some
paranoia checks in the initialisation code, which should be optimised
away completely in the normal case.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This attached patches provide xattr support including POSIX-ACL and
SELinux support on JFFS2 (version.5).
There are some significant differences from previous version posted
at last December.
The biggest change is addition of EBS(Erase Block Summary) support.
Currently, both kernel and usermode utility (sumtool) can recognize
xattr nodes which have JFFS2_NODETYPE_XATTR/_XREF nodetype.
In addition, some bugs are fixed.
- A potential race condition was fixed.
- Unexpected fail when updating a xattr by same name/value pair was fixed.
- A bug when removing xattr name/value pair was fixed.
The fundamental structures (such as using two new nodetypes and exclusion
mechanism by rwsem) are unchanged. But most of implementation were reviewed
and updated if necessary.
Espacially, we had to change several internal implementations related to
load_xattr_datum() to avoid a potential race condition.
[1/2] xattr_on_jffs2.kernel.version-5.patch
[2/2] xattr_on_jffs2.utils.version-5.patch
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
V4L1 API is depreciated and should be removed soon from kernel. This patch
adds two new options, one to disable V4L1 drivers, and another to disable
V4L1 compat module. This way, it would be easy to check what still depends
on V4L1 stuff, allowing also to test if app works fine with V4L2 only support.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
One Block of the NAND Flash Array memory is reserved as
a One-Time Programmable Block memory area.
Also, 1st Block of NAND Flash Array can be used as OTP.
The OTP block can be read, programmed and locked using the same
operations as any other NAND Flash Array memory block.
OTP block cannot be erased.
OTP block is fully-guaranteed to be a valid block.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] 8250: add locking to console write function
[SERIAL] Remove unconditional enable of TX irq for console
[SERIAL] 8250: set divisor register correctly for AMD Alchemy SoC uart
[SERIAL] AMD Alchemy UART: claim memory range
[SERIAL] Clean up serial locking when obtaining a reference to a port
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET_SCHED]: HFSC: fix thinko in hfsc_adjust_levels()
[IPV6]: skb leakage in inet6_csk_xmit
[BRIDGE]: Do sysfs registration inside rtnl.
[NET]: Do sysfs registration as part of register_netdevice.
[TG3]: Fix possible NULL deref in tg3_run_loopback().
[NET] linkwatch: Handle jiffies wrap-around
[IRDA]: Switching to a workqueue for the SIR work
[IRDA]: smsc-ircc: Minimal hotplug support.
[IRDA]: Removing unused EXPORT_SYMBOLs
[IRDA]: New maintainer.
[NET]: Make netdev_chain a raw notifier.
[IPV4]: ip_options_fragment() has no effect on fragmentation
[NET]: Add missing operstates documentation.
The last step of netdevice registration was being done by a delayed
call, but because it was delayed, it was impossible to return any error
code if the class_device registration failed.
Side effects:
* one state in registration process is unnecessary.
* register_netdevice can sleep inside class_device registration/hotplug
* code in netdev_run_todo only does unregistration so it is simpler.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This closes a couple holes in our attribute aliasing avoidance scheme:
- The current kernel fails mmaps of some /dev/mem MMIO regions because
they don't appear in the EFI memory map. This keeps X from working
on the Intel Tiger box.
- The current kernel allows UC mmap of the 0-1MB region of
/sys/.../legacy_mem even when the chipset doesn't support UC
access. This causes an MCA when starting X on HP rx7620 and rx8620
boxes in the default configuration.
There's more detail in the Documentation/ia64/aliasing.txt file this
adds, but the general idea is that if a region might be covered by
a granule-sized kernel identity mapping, any access via /dev/mem or
mmap must use the same attribute as the identity mapping.
Otherwise, we fall back to using an attribute that is supported
according to the EFI memory map, or to using UC if the EFI memory
map doesn't mention the region.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a backout of earlier patch.
The whole rescheduling hack was a bad idea. It doesn't really solve
the problem and it makes the code more complicated for no good reason.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
After dwmw2 let me know it ought to be done, I rewrote the physmap map
driver to be a platform driver. I know zilch about the driver model,
so I probably botched it in some way, but I've done some tests on an
ixp23xx board which uses physmap, and it all seems to work.
In order to not break existing physmap users, I've added some compat
code that will instantiate a platform device iff CONFIG_MTD_PHYSMAP_LEN
is defined and != 0. Also, I've changed the default value for
CONFIG_MTD_PHYSMAP_LEN to zero, so that people who inadvertently
compile in physmap (or new, platform-style, users of physmap) don't get
burned.
This works pretty well -- the new physmap driver is a drop-in replacement
for the old one, and works on said ixp23xx board without any code changes
needed. (This should hold as long as users don't touch 'physmap_map'
directly.)
Once all physmap users have been converted to instantiate their own
platform devices, the compat code can go. (Or we decide that we can
change all the in-tree users at the same time, and never merge the
compat code.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Atomically create attributes when class device is added. This avoids
the race between registering class_device (which generates hotplug
event), and the creation of attribute groups.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the support of attribute groups in class_device's to allow
groups to be created as part of the registration process. This allows
network device's to avoid race between registration and creating
groups.
Note that unlike attributes that are a property of the class object,
the groups are a property of the class_device object. This is done
because there are different types of network devices (wireless for
example).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[PATCH] powerpc: Use the ibm,pa-features property if available
powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
[PATCH] ppc32 CPM_UART: fixes and improvements
[PATCH] ppc32 CPM_UART: Fixed break send on SCC
[PATCH] powerpc/kprobes: fix singlestep out-of-line
[PATCH] powerpc/pseries: avoid crash in PCI code if mem system not up
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.
Signed-off-by: Jens Axboe <axboe@suse.de>
A number of small issues are fixed, and added the header file, missed from the
original series. With this, driver should be pretty stable as tested among
both platform-device-driven and "old way" boards. Also added missing GPL
statement , and updated year field on existing ones to reflect
code update.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The CSD contains a "read2write factor" which determines the multiplier to
be applied to the read timeout to obtain the write timeout. We were
ignoring this parameter, resulting in the possibility for writes being
timed out too early.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.
So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Audit Filter Performance
[PATCH] Rework of IPC auditing
[PATCH] More user space subject labels
[PATCH] Reworked patch for labels on user space messages
[PATCH] change lspp ipc auditing
[PATCH] audit inode patch
[PATCH] support for context based audit filtering, part 2
[PATCH] support for context based audit filtering
[PATCH] no need to wank with task_lock() and pinning task down in audit_syscall_exit()
[PATCH] drop task argument of audit_syscall_{entry,exit}
[PATCH] drop gfp_mask in audit_log_exit()
[PATCH] move call of audit_free() into do_exit()
[PATCH] sockaddr patch
[PATCH] deal with deadlocks in audit_free()
When iptables userspace adds an ipt_standard_target, it calculates the size
of the entire entry as:
sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))
ipt_standard_target looks like this:
struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};
xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.
The compat_ipt_standard_fn in the kernel adjusts the offsets by
sizeof(struct ipt_standard_target) - sizeof(struct compat_ipt_standard_target),
which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.
Thanks to Linus for lots of debugging help and testing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vmsplice: allow user to pass in gift pages
[PATCH] pipe: enable atomic copying of pipe data to/from user space
[PATCH] splice: call handle_ra_miss() on failure to lookup page
[PATCH] Add ->splice_read/splice_write to def_blk_fops
[PATCH] pipe: introduce ->pin() buffer operation
[PATCH] splice: fix bugs in pipe_to_file()
[PATCH] splice: fix bugs with stealing regular pipe pages
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.
The data must be properly page aligned and also a multiple of the page size
in length.
Signed-off-by: Jens Axboe <axboe@suse.de>
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.
lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.
Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds
Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds
Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds
Signed-off-by: Jens Axboe <axboe@suse.de>
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.
Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c
Signed-off-by: Jens Axboe <axboe@suse.de>
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.
And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.
Signed-off-by: Jens Axboe <axboe@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Update version and reldate
[TG3]: Fix bug in nvram write
[TG3]: Add reset_phy parameter to chip reset functions
[TG3]: Reset chip when changing MAC address
[TG3]: Add phy workaround
[TG3]: Call netif_carrier_off() during phy reset
[IPV6]: Fix race in route selection.
[XFRM]: fix incorrect xfrm_policy_afinfo_lock use
[XFRM]: fix incorrect xfrm_state_afinfo_lock use
[TCP]: Fix unlikely usage in tcp_transmit_skb()
[XFRM]: fix softirq-unsafe xfrm typemap->lock use
[IPSEC]: Fix IP ID selection
[NET]: use hlist_unhashed()
[IPV4]: inet_init() -> fs_initcall
[NETLINK]: cleanup unused macro in net/netlink/af_netlink.c
[PKT_SCHED] netem: fix loss
[X25]: fix for spinlock recurse and spinlock lockup with timer handler
1) The audit_ipc_perms() function has been split into two different
functions:
- audit_ipc_obj()
- audit_ipc_set_perm()
There's a key shift here... The audit_ipc_obj() collects the uid, gid,
mode, and SElinux context label of the current ipc object. This
audit_ipc_obj() hook is now found in several places. Most notably, it
is hooked in ipcperms(), which is called in various places around the
ipc code permforming a MAC check. Additionally there are several places
where *checkid() is used to validate that an operation is being
performed on a valid object while not necessarily having a nearby
ipcperms() call. In these locations, audit_ipc_obj() is called to
ensure that the information is captured by the audit system.
The audit_set_new_perm() function is called any time the permissions on
the ipc object changes. In this case, the NEW permissions are recorded
(and note that an audit_ipc_obj() call exists just a few lines before
each instance).
2) Support for an AUDIT_IPC_SET_PERM audit message type. This allows
for separate auxiliary audit records for normal operations on an IPC
object and permissions changes. Note that the same struct
audit_aux_data_ipcctl is used and populated, however there are separate
audit_log_format statements based on the type of the message. Finally,
the AUDIT_IPC block of code in audit_free_aux() was extended to handle
aux messages of this new type. No more mem leaks I hope ;-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below builds upon the patch sent earlier and adds subject label to
all audit events generated via the netlink interface. It also cleans up a few
other minor things.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The below patch should be applied after the inode and ipc sid patches.
This patch is a reworking of Tim's patch that has been updated to match
the inode and ipc patches since its similar.
[updated:
> Stephen Smalley also wanted to change a variable from isec to tsec in the
> user sid patch. ]
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below converts IPC auditing to collect sid's and convert to context
string only if it needs to output an audit record. This patch depends on the
inode audit change patch already being applied.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Previously, we were gathering the context instead of the sid. Now in this patch,
we gather just the sid and convert to context only if an audit event is being
output.
This patch brings the performance hit from 146% down to 23%
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following patch provides selinux interfaces that will allow the audit
system to perform filtering based on the process context (user, role, type,
sensitivity, and clearance). These interfaces will allow the selinux
module to perform efficient matches based on lower level selinux constructs,
rather than relying on context retrievals and string comparisons within
the audit module. It also allows for dominance checks on the mls portion
of the contexts that are impossible with only string comparisons.
Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The locking for the uart_port is over complicated, and can be
simplified if we introduce a flag to indicate that a port is "dead"
and will be removed.
This also helps the validator because it removes a case of non-nested
unlock ordering.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Use hlist_unhashed() rather than accessing inside data structure.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While writing to an event device allows to set repeat rate for an
individual input device there is no way to retrieve current settings
so we need to ressurect EVIOCGREP. Also ressurect EVIOCSREP so we
have a symmetrical interface.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
They shouldn't be using 'u32' et al in structures which are used for
communication with userspace. Switch to the proper types (__u32 etc).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It uses kernel_ulong_t but can't be wrapped in __KERNEL__ because it's
used from scripts/mod/file2alias.c -- but we _can_ hide it inside
header manually too (and it doesn't generally exist for userspace).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
- Add new SA_PROBEIRQ which suppresses the new sharing-mismatch warning.
Some drivers like to use request_irq() to find an unused interrupt slot.
- Use it in i82365.c
- Kill unused SA_PROBE.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
[PATCH] Added URI of "linux kernel development process"
[PATCH] Kobject: possible cleanups
[PATCH] Fix OCFS2 warning when DEBUG_FS is not enabled
[PATCH] Kobject: fix build error
[PATCH] Frame buffer: remove cmap sysfs interface
This patch contains the following possible cleanups:
- #if 0 the following unused global function:
- subsys_remove_file()
- remove the following unused EXPORT_SYMBOL's:
- kset_find_obj
- subsystem_init
- remove the following unused EXPORT_SYMBOL_GPL:
- kobject_add_dir
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix the following warning which happens when OCFS2_FS is enabled but
DEBUG_FS isn't:
fs/ocfs2/dlmglue.c: In function `ocfs2_dlm_init_debug':
fs/ocfs2/dlmglue.c:2036: warning: passing arg 5 of `debugfs_create_file' discards qualifiers from pointer target type
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes a build error for various odd combinations of CONFIG_HOTPLUG
and CONFIG_NET.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Nigel Cunningham <ncunningham@cyclades.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
find_get_pages_contig() will break out if we hit a hole in the page cache.
From Andrew Morton, small modifications and documentation by me.
Signed-off-by: Jens Axboe <axboe@suse.de>
There was a whole load of crap exposed which should have been inside the
existing #ifdef __KERNEL__ part. Also hide struct sched_param for now,
since glibc has its own and doesn't like being given ours (yet).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Don't include <linux/sched.h> outside __KERNEL__, and split the EM_xxx
definitions out of elf.h into elf-em.h so that audit.h can include just
that and not pollute the namespace any further than it needs to.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is a workaround for the case edge-triggered irq's. Several users
seem to have broken configurations sharing edge-triggered irq's. To avoid
losing IRQ's, reshedule if more work arrives.
The changes to netdevice.h are to extract the part that puts device
back in list into separate inline.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
Providing more accurate coordinates for thumb press requires additional
steps in the filtering logic:
- Ignore samples found invalid by the debouncing logic, or the ones that
have out of bound pressure value.
- Add a parameter to repeat debouncing, so that more then two consecutive
good readings are required for a valid sample.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Acked-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Move some inclusion of private header files and the definition of
RPC_DEBUG inside the existing #ifdef __KERNEL__
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
For now, just make sure all inclusion of private header files is done
within #ifdef __KERNEL__. There'll be more to clean up later.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It only really needs to define a few constants and include <asm/mman.h>
when it's used by userspace. Move the rest within #ifdef __KERNEL__
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It was unconditionally including a whole bunch of headers which aren't
user-visible, and also exposing a lot of private internal stuff of its
own. Also fix some legacy character set to UTF-8 while we're at it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It doesn't need to include i2c.h, because a forward declaration of
struct i2c_adapter is perfectly sufficient. And it can be inside
#ifdef __KERNEL__ along with the kernel-internal structure definition.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Some (?) non-x86 architectures require 8byte alignment for u_int64_t
even when compiled for 32bit, using u_int32_t in compat_xt_counters
breaks on these architectures, use u_int64_t for everything but x86.
Reported by Andreas Schwab <schwab@suse.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also switch it to use the same method of using off-tree nodes as
everyone else now does -- set them to point to themselves.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Seems like a strange requirement, but allegedly it was necessary for
struct address_space on CRIS, because it otherwise ended up being only
byte-aligned. It's harmless enough, and easier to just do it than to
prove it isn't necessary... although I really ought to dig out my etrax
board and test it some time.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We only used a single bit for colour information, so having a whole
machine word of space allocated for it was a bit wasteful. Instead,
store it in the lowest bit of the 'parent' pointer, since that was
always going to be aligned anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is in preparation for merging those fields into a single
'unsigned long', because using a whole machine-word for a single bit
of colour information is wasteful.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We don't have to #if guard prototypes.
This also fixes a bug observed by Randy Dunlap due to a misspelled
option in the #if.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some sanity checking. truesize should be at least sizeof(struct
sk_buff) plus the current packet length. If not, then truesize is
seriously mangled and deserves a kernel log message.
Currently we'll do the check for release of stream socket buffers.
But we can add checks to more spots over time.
Incorporating ideas from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
NFS: remove needless check in nfs_opendir()
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
NFS: make 2 functions static
NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
NFS: fix PROC_FS=n compile error
VFS: Fix another open intent Oops
RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
fs/built-in.o: In function `nfs_show_stats':inode.c:(.text+0x15481a): undefined reference to `rpc_print_iostats'
net/built-in.o: In function `rpc_destroy_client': undefined reference to `rpc_free_iostats'
net/built-in.o: In function `rpc_clone_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `rpc_new_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `xprt_release': undefined reference to `rpc_count_iostats'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Noted by Sergei Shtylylov <sshtylyov@ru.mvista.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the IDE device on ATI SB600
Signed-off-by: Felix Kuehling <fkuehlin@ati.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.
We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.
prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.
Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (679 commits)
commit 7676f83aeb
Author: James Bottomley <James.Bottomley@steeleye.com>
Date: Fri Apr 14 09:47:59 2006 -0500
[SCSI] scsi_transport_sas: don't scan a non-existent end device
Any end device that can't support any of the scanning protocols
shouldn't be scanned, so set its id to -1 to prevent
scsi_scan_target() being called for it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
commit 3c0c25b97c
Author: Moore, Eric <Eric.Moore@lsil.com>
Date: Thu Apr 13 16:08:17 2006 -0600
[SCSI] mptfusion - fix panic in mptsas_slave_configure
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (169 commits)
commit 78a596b449
Author: Adrian Bunk <bunk@stusta.de>
Date: Fri Mar 31 01:38:12 2006 -0800
[PATCH] remove kernel/power/pm.c:pm_unregister()
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 21440d3133
Author: David Brownell <david-b@pacbell.net>
Date: Sat Apr 1 10:21:52 2006 -0800
[PATCH] dma doc updates
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (158 commits)
commit 4f705ae3e9
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date: Mon Apr 3 17:09:22 2006 -0700
[PATCH] DMI: move dmi_scan.c from arch/i386 to drivers/firmware/
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
...
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sparse warns about casting to a __bitwise type. However, it's correct
to do when defining the enum for pci_bus_flags_t, so add a __force to
quiet the warnings. This will fix getting
include/linux/pci.h💯26: warning: cast to restricted type
from sparse all over the build.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The naming of the constant defined for PCI ID 1022:7450 does not seem
to match the information at http://pciids.sourceforge.net/:
http://pci-ids.ucw.cz/iii/?i=1022
There 1022:7450 is listed as "AMD-8131 PCI-X Bridge" while 1022:7451
is listed as "AMD-8131 PCI-X IOAPIC". Yet, the current definition for
0x7450 is PCI_DEVICE_ID_AMD_8131_APIC. It seems to me like that name
should map to 0x7451, while a name like PCI_DEVICE_ID_AMD_8131_BRIDGE
should map to 0x7450.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print more diagnostic info to help identify the source of power management
suspend failures.
Example:
usb_hcd_pci_suspend(): pci_set_power_state+0x0/0x1af() returns -22
pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x11b() returns -22
suspend_device(): pci_device_suspend+0x0/0x34() returns -22
Work-in-progress. It needs lots more suspend_report_result() calls sprinkled
everywhere.
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[BLOCK] delay all uevents until partition table is scanned
Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.
We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move common definitions for NET2280 to <linux/usb/net2280.h>, so that I can
use them in prism54usb (it is not merged yet, but I plan to do it soon).
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pen down IRQ will toggle during each X,Y,Z measurement cycle.
Even though the IRQ is disabled it will be latched and delivered
when after enable_irq. Thus in the IRQ handler we must avoid
starting a new measurement cycle when such an "unwanted" IRQ happens.
Add a get_pendown_state platform function, which will probably
determine this by reading the current GPIO level of the pen IRQ pin.
Move the IRQ reenabling from the SPI RX function to the timer. After
the last power down message the pen IRQ pin is still active for a
while and get_pendown_state would report incorrectly a pen down state.
When suspending we should check the ts->pending flag instead of
ts->pendown, since the timer can be pending regardless of ts->pendown.
Also if ts->pending is set we can be sure that the timer is running,
so no need to rearm it. Similarly if ts->pending is not set we can
be sure that the IRQ is enabled (and the timer is not).
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Some touchscreens seem to oscillate heavily for a while after touching
the screen. Implement support for sampling the screen until we get two
consecutive values that are close enough.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The in-kernel Sangoma drivers are both not compiling and marked as BROKEN
since at least kernel 2.6.0.
Sangoma offers out-of-tree drivers, and David Mandelstam told me Sangoma
does no longer maintain the in-kernel drivers and prefers to provide them
as a separate installation package.
This patch therefore removes these drivers.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As waiting for some register bits to change seems to be a common
operation shared by some controllers, implement helper function
ata_wait_register(). This function also takes care of register write
flushing.
Note that the condition is inverted, the wait is over when the masked
value does NOT match @val. As we're waiting for bits to change, this
test is more powerful and allows the function to be used in more
places.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
@verbose was added to ata_reset_fn_t because AHCI complained during
probing if no device was attached to the port. However, muting
failure message isn't the correct approach. Reset methods are
responsible for detecting no device condition and finishing
successfully. Now that AHCI softreset is fixed, kill @verbose.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.
Signed-off-by: Jens Axboe <axboe@suse.de>
In vsyscall function do_vgettimeofday(), some functions are declared as
inlined, which is a hint for gcc to compile the function inlined but it
not forced. Sometimes compiler does not compile the function as
inlined, so here inline is replaced by __always_inline prefix.
It does not happen in gcc compiler actually, but it possibly happens.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd <-> fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed
Bugzilla Bug 6299:
A pixel size of 8 bits produces wrong logo colors in x86_64.
The driver has 2 methods for setting the color map, using the protected
mode interface provided by the video BIOS and directly writing to the VGA
registers. The former is not supported in x86_64 and the latter is enabled
only in i386.
Fix by enabling the latter method in x86_64 only if supported by the BIOS.
If both methods are unsupported, change the visual of vesafb to
STATIC_PSEUDOCOLOR.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every caller of svc_take_page ignores its return value and assumes it
succeeded. So just WARN() instead of returning an ignored error. This would
have saved some time debugging a recent nfsd4 problem.
If there are still failure cases here, then the result is probably that we
overwrite an earlier part of the reply while xdr-encoding.
While the corrupted reply is a nasty bug, it would be worse to panic here and
create the possibility of a remote DOS; hence WARN() instead of BUG().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An UML user reported (against 2.6.13.3/UML) he got kernel Oopses when
trying to rmmod (on a kernel with module unloading enabled) a module
compiled with module unloading disabled. As crashing is a very correct
thing to do in that case, a solution is altering the vermagic string to
include this too.
Possibly, however, the code should not crash in this case, even if the
module didn't support unloading - it should simply abort the module
removal. In this case, fixing that bug would be a better solution. I've
not investigated though.
(akpm: a bit marginal - root screwed up and shot himself in the foot).
Cc: Hayim Shaul <hayim@post.tau.ac.il>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the last conversions of pci_set_dma_mask(),
pci_set_consistent_dma_mask() and pci_dma_supported() to use DMA_xBIT_MASK
constants from linux/dma-mapping.h
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G. This patch fixes those.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before commit 47e65328a7, next_thread() took
a const task_t. Reinstate the const qualifier, getting the next thread
never changes the current thread.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We changed the wrong symbol. It's tty_insert_flip_string_flags() which is
called from the previously-non-GPL'ed now-inlined tty_insert_flip_char().
Fix that up, and uninline tty_schedule_flip() while we're there.
Cc: Tobias Powalowski <t.powa@gmx.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lay out the structure definitions in include/linux/leds.h to be aligned as
much as possible. Also minor updates to the comments to make them more
concise.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the scheduled unexport of panic_timeout.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.
This also changes XFS, which is the only in-tree user of this idiom that I
could find. The XFS piece is compile-tested only.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Andrew Morton <akpm@osdl.org>
net/socket.c:148: warning: initialization from incompatible pointer type
extern declarations in .c files! Bad boy.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.
Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Signed-off-by: Jens Axboe <axboe@suse.de>
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add optional input and output offsets to sys_splice(), for seekable file
descriptors:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:
- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.
- splice: small micro-optimization: less pointer dereferencing in splice
methods
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add checksum operation which takes care of verifying the checksum and
dealing with HW checksum errors and avoids multiple checksum
operations by setting ip_summed to CHECKSUM_UNNECESSARY after
successful verification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the queue rerouter intrastructure to a generic usable
infrastructure for address family specific operations as a base for
some cleanups.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the
use of typedefs as arguments, some header files need to be moved as
well.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).
If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.
Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.
This can happen with memory hotplug when the hotplug area defines an so
far empty node.
Now use bootmem to try to allocate the mem_map in other nodes.
And if it fails don't panic, but just ignore the node.
To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.
TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.
Originally from Keith Mannthey, hacked by AK.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the ATAPI_ENABLE_DMADIR compile time option needed
by some SATA-PATA bridge to runtime module parameter.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Originally from Nick Piggin, just adapted to the newer branch.
You can't check PageLRU without holding zone->lru_lock. The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().
This also adds dirty page balancing logic and O_SYNC handling.
Signed-off-by: Jens Axboe <axboe@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of EH codes are about to be added to libata. Separate out
libata-eh.c. ata_scsi_timed_out(), ata_scsi_error(),
ata_qc_timeout(), ata_eng_timeout(), ata_eh_qc_complete() and
ata_eh_qc_retry() are moved. No code is changed by this patch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a new qc flag ATA_QCFLAG_IO. This flag gets set for normal IO
commands originating from SCSI midlayer. This information will be
used by EH to determine transfer speed reconfiguration.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_configure() should not clear dynamic device flags determined
elsewhere. Lower eight bits are reserved for feature flags, define
ATA_DFLAG_CFG_MASK and clear only those bits before configuring
device. Without this patch, ATA_DFLAG_PIO gets turned off during
revalidation making PIO mode unuseable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rename ATA_FLAG_PORT_DISABLED to ATA_FLAG_DISABLED for consistency.
(ATA_FLAG_* are always about ports).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Reorder ATA_DFLAG_* such that feature flags determined by
ata_dev_configure() are on lower bits. Reserve lower eight bits
for this purpose and allocate dynamic flags from bit 8.
* Reorder ATA_FLAG_* such that feature flags determined during driver
initiailization are on bits 0:15, dynamic flags on 16:23 and LLDD
specific flags on 24:31.
* Kill trailing white space and lower-case an one line comment for
consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Improve ata_bus_probe() such that configuration failures are handled
better. Each device is given ATA_PROBE_MAX_TRIES chances, but any
non-transient error (revalidation failure with -ENODEV, configuration
failure with -EINVAL...) disables the device directly. Any IO error
results in SATA PHY speed down and ata_set_mode() failure lowers
transfer mode. The last try always puts a device into PIO-0.
After each failure, the whole port is reset to make sure that the
controller and all the devices are in a known and stable state. The
reset also applies SATA SPD configuration if necessary.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ap->sata_spd_limit contrains SATA PHY speed of the port. It is
initialized to the configured value prior to probing thus preserving
BIOS configured value. hardreset is responsible for applying SPD
limit and sata_std_hardreset() is updated to do that. SATA SPD limit
will be used to enhance failure handling during probing and later by
EH.
This patch also normalizes some comments around affected code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
For the time being we cannot use ata_dev_present() as it was renamed
to ata_dev_enabled() but we still need presence test. Implement
negation of the test. Conveniently, the negated result is needed in
more places. This is suggested by Jeff Garzik.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch updates the comments to match the actual code.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The sliced VBI defines added in videodev2.h are removed since requires
more discussion.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
- Add KEY_BRL_* input keys and K_BRL_* keycodes;
- Add emulation of how braille keyboards usually combine braille dots
to the console keyboard driver;
- Add handling of unicode U+28xy diacritics.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Goramo finally got PCI subsystem ID for their PCI200SYN card. The
attached patch adds support for it - cards with old EEPROM data
will emit a warning with URL for update tool.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch renames ata_dev_present() to ata_dev_enabled() and adds
ata_dev_disabled(). This is to discern the state where a device is
present but disabled from not-present state. This disctinction is
necessary when configuring transfer mode because device selection
timing must not be violated even if a device fails to configure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.
Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_multiport and ip6t_multiport to xt_multiport.
As a result, this addes support for inversion and port range match
to IPv6 packets.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now
a user program needs to specify IPPROTO_ESP as protocol to use esp match
with IPv6. This means that ip6tables requires '-p esp' like iptables.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was grepping through the code and some `grep ganularity -R .` didn't
catch what I thought. Then looking closer I saw the term "granuality"
used in only four places (in comments) and granularity in many more
places describing the same idea. Some other facts:
dictionary.com does not know such a word
define:granuality on google is not found (and pages for granuality are
mostly related to patches to the kernel)
it has not been discussed as a term on LKML, AFAICS (=Can Search)
To be consistent, I think granularity should be used everywhere.
Signed-off-by: Kalin KOZHUHAROV <kalin@thinrope.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
As announced, lookup_hash() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The monochrome->color expansion routine that handles bitmaps which have
(widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian.
This is caused by a bogus bit test in slow_imageblit().
Fix.
This patch may deserve to go to the stable tree. The code has already been
well tested in little-endian machines. It's only in big-endian where there is
uncertainty and Herbert confirmed that this is the correct way to go.
It should not introduce regressions.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Backlight class attributes are currently easy to implement incorrectly.
Moving certain handling into the backlight core prevents this whilst at the
same time makes the drivers simpler and consistent. The following changes are
included:
The brightness attribute only sets and reads the brightness variable in the
backlight_properties structure.
The power attribute only sets and reads the power variable in the
backlight_properties structure.
Any framebuffer blanking events change a variable fb_blank in the
backlight_properties structure.
The backlight driver has only two functions to implement. One function is
called when any of the above properties change (to update the backlight
brightness), the second is called to return the current backlight brightness
value. A new attribute "actual_brightness" is added to return this brightness
as determined by the driver having combined all the above factors (and any
driver/device specific factors).
Additionally, the backlight core takes care of checking the maximum brightness
is not exceeded and of turning off the backlight before device removal.
The corgi backlight driver is updated to reflect these changes.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is very common to hash a dentry and then to call lookup. If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.
In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.
For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.
If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.
This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.
The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.
Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.
When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.
What that means is that the proc code that looks up a task and later
wants to sleep can now do:
rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();
The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.
Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.
Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.
The end result:
- get_task_struct is safe in an rcu context where we have just looked
up the task.
- put_task_struct() simplifies into its old pre rcu self.
This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just got nuked in mainline. Bring it back because Eric's patches use it.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The activated flag in task_struct is used to track different sleep types and
its usage is somewhat obfuscated. Convert the variable to an enum with more
descriptive names without altering the function.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.
Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.
In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.
Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.
On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The removal of the data field in the hrtimer structure enforces the
embedding of the timer into another data structure. nanosleep now uses a
private implementation of the most common used timer callback function
(simple task wakeup).
In order to avoid the reimplentation of such functionality all over the
place a generic hrtimer_sleeper functionality is created.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an LED trigger for IDE disk activity to the ide-disk driver.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for LED triggers to the LED subsystem. "Triggers" are events
which change the state of an LED. Two kinds of trigger are available, simple
ones which can be added to exising code with minimum disruption and complex
ones for implementing new or more complex functionality.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the foundations of a new LEDs subsystem. This patch adds a class which
presents LED devices within sysfs and allows their brightness to be
controlled.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the
current value of kmsg_redirect from the kernel so that it can save it and
restore it after resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Domsch noticed a startup race with the IPMI kernel thread, it was
possible (though extraordinarly unlikely) that a message could come in
before the upper layer was ready to handle it. This patch splits the
startup processing of an IPMI interface into two parts, one to get ready
and one to actually start the processes to receive messages from the
interface.
[akpm@osdl.org: cleanups]
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make baby-simple the code for /proc/devices. Based on the proven design
for /proc/interrupts.
This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:
# dd if=/proc/devices bs=1
Character devices:
1 mem
27+0 records in
27+0 records out
This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.
[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit a4a6198b80:
[PATCH] tvec_bases too large for per-cpu data
introduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_trylock_page() is an odd interface in that it doesn't take a reference
like the others. Now that XFS no longer uses it, and its last remaining
caller actually wants an elevated refcount, opencode that callsite and
schedule find_trylock_page() for removal.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- chips/sharp.c: make two needlessly global functions static
- move some declarations to a header file where they belong to
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously we added NET_IP_ALIGN so an architecture can override the
padding done to align headers. The next step is to allow the skb
headroom to be overridden.
We currently always reserve 16 bytes to grow into, meaning all DMAs
start 16 bytes into a cacheline. On ppc64 we really want DMA writes to
start on a cacheline boundary, so we increase that headroom to one
cacheline.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>