optimization
The gcc manual says:
|`-fdelete-null-pointer-checks'
| Use global dataflow analysis to identify and eliminate useless
| checks for null pointers. The compiler assumes that dereferencing
| a null pointer would have halted the program. If a pointer is
| checked after it has already been dereferenced, it cannot be null.
| Enabled at levels `-O2', `-O3', `-Os'.
Now the problem can be seen with this test case:
#include <linux/prefetch.h>
extern void bar(char *x);
void foo(char *x)
{
prefetch(x);
if (x)
bar(x);
}
Because the constraint to the inline asm used in the prefetch() macro is
a memory operand, gcc assumes that the asm code does dereference the
pointer and the delete-null-pointer-checks optimization kicks in.
Inspection of generated assembly for the above example shows that bar()
is indeed called unconditionally without any test on the value of x.
Of course in the prefetch case there is no real dereference and it
cannot be assumed that a null pointer would have been caught at that
point. This causes kernel oopses with constructs like
hlist_for_each_entry() where the list's 'next' content is prefetched
before the pointer is tested against NULL, and only when gcc feels like
applying this optimization which doesn't happen all the time with more
complex code.
It appears that the way to prevent delete-null-pointer-checks
optimization to occur in this case is to make prefetch() into a static
inline function instead of a macro. At least this is what is done on
x86_64 where a similar inline asm memory operand is used (I presume they
would have seen the same problem if it didn't work) and resulting code
for the above example confirms that.
An alternative would consist of replacing the memory operand by a
register operand containing the pointer, and use the addressing mode
explicitly in the asm template. But that would be less optimal than an
offsettable memory reference.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
L_PTE_ASID is not really required to be stored in every PTE, since we
can identify it via the address passed to set_pte_at(). So, create
set_pte_ext() which takes the address of the PTE to set, the Linux
PTE value, and the additional CPU PTE bits which aren't encoded in
the Linux PTE value.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the implicit addition of a virtual address
to the UDC registers. This should have been done
by ioremap() in the driver, not by a static map.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Don't set HWCAP_VFP in the processor support file; not only does it
depend on the processor features, but it also depends on the support
code being present. Therefore, only set it if the support code
detects that we have a VFP coprocessor attached.
Also, move the VFP handling of the coprocessor access register into
the VFP support code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (76 commits)
[ARM] 4002/1: S3C24XX: leave parent IRQs unmasked
[ARM] 4001/1: S3C24XX: shorten reboot time
[ARM] 3983/2: remove unused argument to __bug()
[ARM] 4000/1: Osiris: add third serial port in
[ARM] 3999/1: RX3715: suspend to RAM support
[ARM] 3998/1: VR1000: LED platform devices
[ARM] 3995/1: iop13xx: add iop13xx support
[ARM] 3968/1: iop13xx: add iop13xx_defconfig
[ARM] Update mach-types
[ARM] Allow gcc to optimise arm_add_memory a little more
[ARM] 3991/1: i.MX/MX1 high resolution time source
[ARM] 3990/1: i.MX/MX1 more precise PLL decode
[ARM] 3986/1: H1940: suspend to RAM support
[ARM] 3985/1: ixp4xx clocksource cleanup
[ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2)
[ARM] 3994/1: ixp23xx: fix handling of pci master aborts
[ARM] 3981/1: sched_clock for PXA2xx
[ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to 63 bit
[ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit period
[ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter
...
Merge:
Atmel AT91RM9200 and AT91SAM9260 changes
General ARM developments
Disconfiguous memory cleanups
64-bit/32-bit division and sched_clock extension patches
EP93xx support changes
IOP support changes
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cut down the time between requesting a reboot
and actually getting the reboot to happen by
a quarter.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It appears that include/asm-arm/bug.h requires include/linux/stddef.h
for the definition of NULL. It seems that stddef.h was always included
indirectly in most cases, and that issue was properly fixed a while ago.
Then commit 5047f09b56 incorrectly reverted
change from commit ff10952a54 (bad dwmw2)
and the problem recently resurfaced.
Because the third argument to __bug() is never used anyway, RMK suggested
getting rid of it entirely instead of readding #include <linux/stddef.h>
which this patch does.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The iop348 processor integrates an Xscale (XSC3 512KB L2 Cache) core with a
Serial Attached SCSI (SAS) controller, multi-ported DDR2 memory
controller, 3 Application Direct Memory Access (DMA) controllers, a 133Mhz
PCI-X interface, a x8 PCI-Express interface, and other peripherals to form
a system-on-a-chip RAID subsystem engine.
The iop342 processor replaces the SAS controller with a second Xscale core
for dual core embedded applications.
The iop341 processor is the single core version of iop342.
This patch supports the two Intel customer reference platforms iq81340mc
for external storage and iq81340sc for direct attach (HBA) development.
The developer's manual is available here:
ftp://download.intel.com/design/iio/docs/31503701.pdf
Changelog:
* removed virtual addresses from resource definitions
* cleaned up some unnecessary #include's
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Make the contents of the userspace asm/setup.h header consistent on all
architectures:
- export setup.h to userspace on all architectures
- export only COMMAND_LINE_SIZE to userspace
- frv: move COMMAND_LINE_SIZE from param.h
- i386: remove duplicate COMMAND_LINE_SIZE from param.h
- arm:
- export ATAGs to userspace
- change u8/u16/u32 to __u8/__u16/__u32
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dma_is_consistent() is ill-designed in that it does not have a struct
device pointer argument which makes proper support for systems that consist
of a mix of coherent and non-coherent DMA devices hard. Change
dma_is_consistent to take a struct device pointer as first argument and fix
the sole caller to pass it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last thing we agreed on was to remove the macros entirely for 2.6.19,
on all architectures. Unfortunately, I think nobody actually _did_ that,
so they are still there.
[akpm@osdl.org: x86_64 fix]
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Schafer <gschafer@zip.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enhanced resolution for time measurement functions.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add support to suspend and resume, using the
H1940's bootloader
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes an error in the numbering of the disk LEDs on the
Linksys NSLU2. The error crept in because the physical location
of the LEDs has the Disk 2 LED *above* the Disk 1 LED.
Thanks to Gordon Farquharson for reporting this.
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is done in a completely lockless fashion. Bits 0 to 31 of the count
are provided by the hardware while bits 32 to 62 are stored in memory.
The top bit in memory is used to synchronize with the hardware count
half-period. When the top bit of both counters (hardware and in memory)
differ then the memory is updated with a new value, incrementing it when
the hardware counter wraps around. Because a word store in memory is
atomic then the incremented value will always be in synch with the top
bit indicating to any potential concurrent reader if the value in memory
is up to date or not wrt the needed increment. And any race in updating
the value in memory is harmless as the same value would be stored more
than once.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On ARM all divisions have to be performed "manually". For 64-bit
divisions that may take more than a hundred cycles in many cases.
With 32-bit divisions gcc already use the recyprocal of constant
divisors to perform a multiplication, but not with 64-bit divisions.
Since the kernel is increasingly relying upon 64-bit divisions it is
worth optimizing at least those cases where the divisor is a constant.
This is what this patch does using plain C code that gets optimized away
at compile time.
For example, despite the amount of added C code, do_div(x, 10000) now
produces the following assembly code (where x is assigned to r0-r1):
adr r4, .L0
ldmia r4, {r4-r5}
umull r2, r3, r4, r0
mov r2, #0
umlal r3, r2, r5, r0
umlal r3, r2, r4, r1
mov r3, #0
umlal r2, r3, r5, r1
mov r0, r2, lsr #11
orr r0, r0, r3, lsl #21
mov r1, r3, lsr #11
...
.L0:
.word 948328779
.word 879609302
which is the fastest that can be done for any value of x in that case,
many times faster than the __do_div64 code (except for the small x value
space for which the result ends up being zero or a single bit).
The fact that this code is generated inline produces a tiny increase in
.text size, but not significant compared to the needed code around each
__do_div64 call site this code is replacing.
The algorithm used has been validated on a 16-bit scale for all possible
values, and then recodified for 64-bit values. Furthermore I've been
running it with the final BUG_ON() uncommented for over two months now
with no problem.
Note that this new code is compiled with gcc versions 4.0 or later.
Earlier gcc versions proved themselves too problematic and only the
original code is used with them.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (82 commits)
[PATCH] pata_ali: small fixes
[PATCH] pata_via: VIA 8251 bridged systems are now out and about
[PATCH] trivial piix: swap bogus dot for comma space
[PATCH] sata_promise: PHYMODE4 fixup
[PATCH] libata: always use polling IDENTIFY
[libata] pata_cs5535: fix build
[PATCH] ahci: do not powerdown during initialization
[PATCH] libata: prepare ata_sg_clean() for invocation from EH
[PATCH] libata: separate out rw ATA taskfile building into ata_build_rw_tf()
[PATCH] libata: implement ata_exec_internal_sg()
[PATCH] libata: make sure IRQ is cleared after ata_bmdma_freeze()
[PATCH] libata: move BMDMA host status recording from EH to interrupt handler
[PATCH] libata: make sure sdev doesn't go away while rescanning
[PATCH] libata: don't request sense if the port is frozen
[PATCH] libata: fix READ CAPACITY simulation
[PATCH] libata: implement ATA_FLAG_SETXFER_POLLING and use it in pata_via, take #2
[PATCH] libata: set IRQF_SHARED for legacy PCI IDE IRQs
[PATCH] libata: remove unused HSM_ST_UNKNOWN
[PATCH] libata: kill unnecessary sht->max_sectors initializations
[PATCH] libata: add missing sht->slave_destroy
...
Removed the infinite loop in our arch_reset().
After calling arch_reset(), the kernel waits for 1 second before
printing a "reboot failed" message and then waits for ever itself.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The USB Device port registers are already defined in
drivers/usb/gadget/at91_udc.h. This file can therefore just be removed.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Replace the 'is_b' variable with 'slot_b' in at91_mmc_data.
Also add the new 'chipselect' variable for CF/PCMCIA and 'bus_width_16'
variable for NAND.
This (and previous patches) will unfortunately break the current MMC,
USB Gadget and PCMCIA drivers. Updates and fixes for those drivers will
be submitted to the various subsystem maintainers.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
XScale cores either have a DSP coprocessor (which contains a single
40 bit accumulator register), or an iWMMXt coprocessor (which contains
eight 64 bit registers.)
Because of the small amount of state in the DSP coprocessor, access to
the DSP coprocessor (CP0) is always enabled, and DSP context switching
is done unconditionally on every task switch. Access to the iWMMXt
coprocessor (CP0/CP1) is enabled only when an iWMMXt instruction is
first issued, and iWMMXt context switching is done lazily.
CONFIG_IWMMXT is supposed to mean 'the cpu we will be running on will
have iWMMXt support', but boards are supposed to select this config
symbol by hand, and at least one pxa27x board doesn't get this right,
so on that board, proc-xscale.S will incorrectly assume that we have a
DSP coprocessor, enable CP0 on boot, and we will then only save the
first iWMMXt register (wR0) on context switches, which is Bad.
This patch redefines CONFIG_IWMMXT as 'the cpu we will be running on
might have iWMMXt support, and we will enable iWMMXt context switching
if it does.' This means that with this patch, running a CONFIG_IWMMXT=n
kernel on an iWMMXt-capable CPU will no longer potentially corrupt iWMMXt
state over context switches, and running a CONFIG_IWMMXT=y kernel on a
non-iWMMXt capable CPU will still do DSP context save/restore.
These changes should make iWMMXt work on PXA3xx, and as a side effect,
enable proper acc0 save/restore on non-iWMMXt capable xsc3 cores such
as IOP13xx and IXP23xx (which will not have CONFIG_CPU_XSCALE defined),
as well as setting and using HWCAP_IWMMXT properly.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* sanitize prototypes, annotate
* kill csum_partial_copy
* usual ntohs->shift, this time in assembler part
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge L_PTE_COHERENT with L_PTE_SHARED and free up a L_PTE_* bit.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add arch specific dev_archdata to struct device
Adds an arch specific struct dev_arch to struct device. This enables
architecture to add specific fields to every device in the system, like
DMA operation pointers, NUMA node ID, firmware specific data, etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@suse.de>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Create include/asm-arm/arch-ixp4xx/udc.h and
add platfrom device ixp4xx_udc_device into
arch/arm/mach-ixp4xx/common.c.
This allows us to use pxa2xx-udc on
the ixp4xx platfrom. Both pxa2xx and
ixp4xx use the same device controller.
Signed-off-by:Milan Svoboda <msvoboda@ra.rockwell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch includes a number of small changes for integrating the
AT91SAM9261 and AT91SAM0260 support.
* Can only select support for one AT91 processor at a time.
* Remove most of the remaining static memory mapping for the
AT91RM9200.
* Reserve 1Mb of memory below the IO for mapping the internal SRAM
and any custom board-specific devices (ie, FPGA).
* The SAM9260 has more serial ports, so increase the maximum to 7.
* Define the standard chipselect addresses, and define other
addresses relative to those.
* CLOCK_TICK_RATE is different on the SAM926x's.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add support for the timer on the Atmel AT91SAM9261 and AT91SAM9260
processors.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Most architectures have fairly simple discontiguous memory - a
simple set of successive regions each containing some memory.
These can be described simply as a log2 of their maximum size,
along with the base address of the first region and the number
of regions.
The base address is already described by PHYS_PFN_OFFSET, and
the number of regions via the MAX_NUMNODES and the number of
online nodes.
If we then supply the log2 of their maximum size, all the other
discontigmem macros can move into generic code.
There is one exception: lh7a40x seems to have a more complicated
setup; this is left alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds definitions for the new peripherals integrated in the
AT91SAM9260 and AT91SAM9261 processors: ECC, LCD, RSTC, RTT, SHDWC,
WDT, MATRIX, SDRAMC, SMC.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch updates the drivers (and other files) which include the
hardware headers. This fixes the breakage introduced in patches 3950/1
and 3951/1 (those patches were getting big).
The AVR32 architecture uses the same serial driver and had its own copy
of at91rm9200_pdc.h. Renamed it to at91_pdc.h
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Most of the AT91RM9200 user peripherals are also integrated into the
Atmel SAM9 range of processors. This patch renames the headers from
at91rm9200_xx.h to at91_xx.h to indicate they're not
at91rm9200-specific.
The new SAM9-specific registers and register bits have also been
defined.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The AT91RM9200 system header file (at91rm9200_sys.h) has been split into
separate header files for each peripheral. This was necessary since
some of the system peripherals are also used on AT91SAM9260 and
AT91SAM9261.
The new SAM9-specific register bits have also been defined.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds the initial support for the newer Atmel AT91SAM9261 and
AT91SAM9260 processors. The code is based on, and makes use of, the
existing AT91RM9200 support.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch replaces the arch_identify() in system.h with a set of
cpu_is_XXX() macro's. This allows for compile-time checking of the
target AT91 processor.
Original patch from David Brownell.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The external interrupt sources are different on the various AT91
processors. This patch introduces the global 'at91_extern_irq' variable
that contains a bitset of the available external interrupt sources.
The processor reset mechanism also differs on the various AT91
processors. This patch also adds a global 'at91_arch_reset' callback
(from system.h) into the processor-specific code to perform the reset.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove ARM local cache of 4 struct thread_info.
Can cause oops under certain circumstances.
Russell indicated the original optimization was
required on older kernels to avoid thread starvation
on memory fragmentation, but may no longer be
required. I've updated the patch to 19rc4 and
ensured no <config.h> dain-bramage slipped in this
time (sorry about that).
Original description follows:
I was given some test results which pointed to an
Oops in alloc_thread_info (happened 2x), and after
looking at the code, I see that ARM has its own
local cache of 4 struct thread_info. There wasn't
any clear (to me) synchronization between the
alloc_thread_info and the free_thread_info.
I looked over the other arch, and they all simply
allocate them on an as needed basis, so I simplified
the ARM to do the same, based on the other arch
(e.g. PPC) and the folks doing the testing have
indicated that this fixed the oops.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The "translated" io macros were never really used. Remove them.
Preserve the L7200 inb() and friends by defining the __io()
macro, so that the generic versions can be used instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since the last definitions of this macros have been removed, we
can remove the warnings in asm/io.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These functions have been deprecated for quite some time, and in
fact are no longer used. They just add to clutter. Remove them.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix warnings and errors in arch/arm/mm for nommu build.
Remove commented out function prototype in pgtable-nommu.h
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* We don't need this header anymore - there is no data we need to share this way. FB driver gets this data through a resources structure. MCU Driver api will go to a jornada720_mcu.h file.
Signed-off-by: Filip Zyzniewski <filip.zyzniewski@tefnet.pl>
Signed-off-by: Kristoffer Ericson <Kristoffer_e1@hotmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch updates a bit definition name to align with the PXA27x
spec.EORINTR(End-Of-Receive Intr) bit in DCSR register (DMA Channel
Control/Status Register)
Signed-off-by: Stanley Cai <stanley.w.cai@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch updates several bit definitions name in UDCISR1 register.
Signed-off-by: Stanley Cai <stanley.w.cai@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move HWCAP_* definitions into asm/elf.h, where they should belong.
Since userspace wants to get at these definitions by including
asm/procinfo.h, include asm/elf.h from this file if __KERNEL__
is not defined, and issue a warning suggesting to fix the program
up to use asm/elf.h instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These files want to provide/access ELF hwcap information, so should
be including asm/elf.h rather than asm/procinfo.h
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>