pointed out by pageexec@freemail.hu:
> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).
To fix this we pass in pt_regs by pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on a report from Arne Georg Gleditsch about user-space apps
misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
of the code revealed that the "NOP patching" done there is
fundamentally unsafe for a number of reasons:
1) the patching code runs without synchronizing other CPUs
2) it inserts NOPs even if there is no clock source which provides vread
3) when the clock source changes to one without vread we run in
exactly the same problem as in #2
4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out
as a result it is possible to break user-space via this patching.
The only safe thing for now is to remove the patching.
This code was broken since v2.6.21.
Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
text but data, bss and init sections as well.
That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
because i knew how big of _text_ my images have and i knew about the 40 MB
"text" limit so i wrongly thought to be on the safe side of the 40 MB limit
with my 29 MB of text, while the total image size was slightly above 40 MB.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.
after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:
#define KERNEL_TEXT_SIZE (40*1024*1024)
which limit my vmlinux just happened to pass:
text data bss dec hex filename
29703744 4222751 8646224 42572719 2899baf vmlinux
40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/
So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)
So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)
We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is
bad, fix it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix mtrr kernel-doc warning:
Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The proper way to terminate the e820 chain is with %ebx == 0 on the
last legitimate memory block. However, several BIOSes don't do that
and instead return error (CF = 1) when trying to read off the end of
the list. For this error return, %eax doesn't necessarily return the
SMAP signature -- correctly so, since %ah should contain an error code
in this case.
To deal with some particularly broken BIOSes, we clear the entire e820
chain if the SMAP signature is missing in the middle, indicating a
plain insane e820 implementation. However, we need to make the test
for CF = 1 before the SMAP check.
This fixes at least one HP laptop (nc6400) for which none of the
memory-probing methods (e820, e801, 88) functioned fully according to
spec.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add comments describing the various NOP sequences.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
P6_NOPs are definitely not supported on some VIA CPUs, and possibly
(unverified) on AMD K7s. It is also the only thing that prevents a
686 kernel from running on Transmeta TM3x00/5x00 (Crusoe) series.
The performance benefit over generic NOPs is very small, so when
building for generic consumption, avoid using them.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The P6 family of NOPs are only available on family >= 6 or above, so
enforce that in the boot code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class
based on the notion that they contain all the user-space visible
features of an i686-class chip. However, this is not actually true:
they lack the EA-taking long NOPs (0F 1F /0). Since this is a
userspace-visible incompatibility, downgrade these CPUs to the
manufacturer-defined i586 level.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use PF_MEMALLOC to prevent recursive calls in the DBEUG_PAGEALLOC
case. This makes the code simpler and more robust against allocation
failures.
This fixes the following fallback to non-mmconfig:
http://lkml.org/lkml/2008/2/20/551http://bugzilla.kernel.org/show_bug.cgi?id=10083
Also, for DEBUG_PAGEALLOC=n reduce the pool size to one page.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hi all,
Beginning from commits close to v2.6.25-rc2, running lguest always oopses
the host kernel. Oops is at [1].
Bisection led to the following commit:
commit 37cc8d7f96
x86/early_ioremap: don't assume we're using swapper_pg_dir
At the early stages of boot, before the kernel pagetable has been
fully initialized, a Xen kernel will still be running off the
Xen-provided pagetables rather than swapper_pg_dir[]. Therefore,
readback cr3 to determine the base of the pagetable rather than
assuming swapper_pg_dir[].
static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
{
- pgd_t *pgd = &swapper_pg_dir[pgd_index(addr)];
+ /* Don't assume we're using swapper_pg_dir at this point */
+ pgd_t *base = __va(read_cr3());
+ pgd_t *pgd = &base[pgd_index(addr)];
pud_t *pud = pud_offset(pgd, addr);
pmd_t *pmd = pmd_offset(pud, addr);
Trying to analyze the problem, it seems on the guest side of lguest,
%cr3 has a different value from &swapper_pg-dir (which
is AFAIK fine on a pravirt guest):
Putting some debugging messages in early_ioremap_pmd:
/* Appears 3 times */
[ 0.000000] ***************************
[ 0.000000] __va(%cr3) = c0000000, &swapper_pg_dir = c02cc000
[ 0.000000] ***************************
After 8 hours of debugging and staring on lguest code, I noticed something
strange in paravirt_ops->set_pmd hypercall invocation:
static void lguest_set_pmd(pmd_t *pmdp, pmd_t pmdval)
{
*pmdp = pmdval;
lazy_hcall(LHCALL_SET_PMD, __pa(pmdp)&PAGE_MASK,
(__pa(pmdp)&(PAGE_SIZE-1))/4, 0);
}
The first hcall parameter is global pgdir which looks fine. The second
parameter is the pmd index in the pgdir which is suspectful.
AFAIK, calculating the index of pmd does not need a divisoin over four.
Removing the division made lguest work fine again . Patch is at [2].
I am not sure why the division over four existed in the first place. It
seems bogus, maybe the Xen patch just made the problem appear ?
[2]: The patch:
[PATCH] lguest: fix pgdir pmd index cacluation
Remove an error in index calculation which leads to removing
a not existing shadow page table (leading to a Null dereference).
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Added a declaration to asm-x86/lguest.h and moved the extern arrays there
as well. As an alternative to including asm/lguest.h directly, an
include could be put in linux/lguest.h
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: "rusty@rustcorp.com.au" <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the Intel ICH10 SMBus Controller DeviceID's and updates
Tolapai support.
Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Don't require platform code to be #ifdeffed according to whether
I2C is enabled or not ... if it's not enabled, let GCC compile out
all I2C device declarations. (Issue noted on an NSLU2 build that
didn't configure I2C.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
When probing i2c-pca-isa writes to legacy ioports, which crashes the kernel
if there is no device at that port.
This patch adds a check_legacy_ioport call, so probe fails gracefully
and thus prevents the oops.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Commit 8b798c4d16 broke
alchemy build, fix it. Pointed out by Adrian Bunk.
Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
While working on the PCA9564-platform driver, I sometimes had a glimpse at the
pxa-driver. I found some suspicious places, and this patch contains my
suggestions. Note: They are not tested, due to no hardware.
[JD: Some more fixes.]
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Mike Rapoport <mike@compulab.co.il>
Tested-by: Eric Miao <ymiao3@marvell.com>
Each call to i2c_get_adapter() must be followed by a call to
i2c_put_adapter() to release the grabbed reference. Otherwise the
reference count grows forever and the adapter can never be
unregistered.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Vladimir Ananiev <vovan888@gmail.com>
Acked-by: Tony Lindgren <tony@atomide.com>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-core: fix kernel-doc warning
sata_fsl: fix build with ATA_VERBOSE_DEBUG
[libata] ahci: AMD SB700/SB800 SATA support 64bit DMA
libata-pmp: clear hob for pmp register accesses
libata: automatically use DMADIR if drive/bridge requires it
power_state: get rid of write-only variable in SATA
pata_atiixp: Use 255 sector limit
Fix libata-core kernel-doc warning:
Warning(linux-2.6.25-rc2-git6//drivers/ata/libata-core.c:168): No description found for parameter 'ap'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch fixes build and few warnings when ATA_VERBOSE_DEBUG
is defined:
CC drivers/ata/sata_fsl.o
drivers/ata/sata_fsl.c: In function ‘sata_fsl_fill_sg’:
drivers/ata/sata_fsl.c:338: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘void *’
drivers/ata/sata_fsl.c:338: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘struct prde *’
drivers/ata/sata_fsl.c: In function ‘sata_fsl_qc_issue’:
drivers/ata/sata_fsl.c:459: error: ‘csr_base’ undeclared (first use in this function)
drivers/ata/sata_fsl.c:459: error: (Each undeclared identifier is reported only once
drivers/ata/sata_fsl.c:459: error: for each function it appears in.)
drivers/ata/sata_fsl.c: In function ‘sata_fsl_freeze’:
drivers/ata/sata_fsl.c:525: error: ‘csr_base’ undeclared (first use in this function)
make[2]: *** [drivers/ata/sata_fsl.o] Error 1
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SB700 SATA controller can support 64 bit DMA, the previous commit
badc234157 was added with
careless reference to SB600, which should be modified by this patch.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
>> Mark Lord wrote:
>>> Tejun, I've added PMP to sata_mv, and am now trying to get it
>>> to work with a Marvell PM attached.
>>>
>>> And the behaviour I see is very bizarre.
>>>
>>> After hard+soft resets, the PM signature is found,
>>> and libata interrogates the PM registers.
>>>
>>> It successfully reads register 0, and then register 1.
>>> But all subsequent registers read out (incorrectly) as zeros.
...
This behavior has been confirmed by Marvell with a SATA analyzer.
The Marvell port-multiplier apparently likes to see clean HOB
information when accessing PMP registers.
Since sata_mv uses PIO shadow register access, this doesn't happen
automatically, as it might in a more purely FIS-based driver (eg. ahci).
One way to fix this is to flag these commands with ATA_TFLAG_LBA48,
forcing libata to write out the HOB fields with known (zero) values.
Signed-off-by: Saeed Bishara <saeed@marvell.com>
Acked-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Back in 2.6.17-rc2, a libata module parameter was added for atapi_dmadir.
That's nice, but most SATA devices which need it will tell us about it
in their IDENTIFY PACKET response, as bit-15 of word-62 of the
returned data (as per ATA7, ATA8 specifications).
So for those which specify it, we should automatically use the DMADIR bit.
Otherwise, disc writing will fail by default on many SATA-ATAPI drives.
This patch adds ATA_DFLAG_DMADIR and make ata_dev_configure() set it
if atapi_dmadir is set or identify data indicates DMADIR is necessary.
atapi_xlat() is converted to check ATA_DFLAG_DMADIR before setting
DMADIR.
Original patch is from Mark Lord.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
power_state is scheduled for removal, and libata uses it in write-only
mode. Remove it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
[NETFILTER]: fix ebtable targets return
[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
[NET]: Restore sanity wrt. print_mac().
[NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
tg3: ethtool phys_id default
[BNX2]: Update version to 1.7.4.
[BNX2]: Disable parallel detect on an HP blade.
[BNX2]: More 5706S link down workaround.
ssb: Fix support for PCI devices behind a SSB->PCI bridge
zd1211rw: fix sparse warnings
rtl818x: fix sparse warnings
ssb: Fix pcicore cardbus mode
ssb: Make the GPIO API reentrancy safe
ssb: Fix the GPIO API
ssb: Fix watchdog access for devices without a chipcommon
ssb: Fix serial console on new bcm47xx devices
ath5k: Fix build warnings on some 64-bit platforms.
WDEV, ath5k, don't return int from bool function
WDEV: ath5k, fix lock imbalance
...
This fixes the following compile error caused by commit
3a2d5b7001 ("PM: Introduce
PM_EVENT_HIBERNATE callback state")
CC [M] drivers/usb/host/u132-hcd.o
drivers/usb/host/u132-hcd.c: In function ‘u132_suspend’:
drivers/usb/host/u132-hcd.c:3224: error: expected expression before ‘int’
drivers/usb/host/u132-hcd.c:3225: error: ‘ports’ undeclared (first use in this function)
...
Signed-off-by: Mirco Tischler <mt-ml@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function ebt_do_table doesn't take NF_DROP as a verdict from the targets.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.
Thanks Patrick for noticing this.
[ The way this works is, when the device is actually registered,
the generic code noticed the '%' in the name and invokes
dev_alloc_name() to fully resolve the name. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC_FMT had only one user and we tried to get rid of
that, but this created more problems than it solved.
As a result, this reverts three commits:
235365f3aa ("net/8021q/vlan_dev.c: Use
print_mac."), fea5fa875e ("[NET]: Remove
MAC_FMT"), and 8f789c4844 ("[NET]:
Elminate spurious print_mac() calls.")
Signed-off-by: David S. Miller <davem@davemloft.net>
The neigh_hash_grow() may update the tbl->hash_rnd value, which
is used in all tbl->hash callbacks to calculate the hashval.
Two lookup routines may race with this, since they call the
->hash callback without the tbl->lock held. Since the hash_rnd
is changed with this lock write-locked moving the calls to ->hash
under this lock read-locked closes this gap.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev->addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.
The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When asked to blink LEDs the tg3 driver behaves when using:
ethtool -p ethX
The default value for data is zero, and other drivers interpret this
as blink forever (or at least a really long time). The tg3 driver
interprets this as blink once. All drivers should have the same
behaviour.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of some board issues, we need to disable parallel detect on
an HP blade. Without this patch, the link state can become stuck
when it goes into parallel detect mode.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patches to workaround the 5706S on an HP blade were not
sufficient. The link state still does not change properly in some
cases. This patch adds polling to make it completely reliable.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleg Nesterov and others have pointed out that on some architectures,
the traditional sequence of
set_current_state(TASK_INTERRUPTIBLE);
if (CONDITION)
return;
schedule();
is racy wrt another CPU doing
CONDITION = 1;
wake_up_process(p);
because while set_current_state() has a memory barrier separating
setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION
variable, there is no such memory barrier on the wakeup side.
Now, wake_up_process() does actually take a spinlock before it reads and
sets the task state on the waking side, and on x86 (and many other
architectures) that spinlock is in fact equivalent to a memory barrier,
but that is not generally guaranteed. The write that sets CONDITION
could move into the critical region protected by the runqueue spinlock.
However, adding a smp_wmb() to before the spinlock should now order the
writing of CONDITION wrt the lock itself, which in turn is ordered wrt
the accesses within the spinlock (which includes the reading of the old
state).
This should thus close the race (which probably has never been seen in
practice, but since smp_wmb() is a no-op on x86, it's not like this will
make anything worse either on the most common architecture where the
spinlock already gave the required protection).
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(sorry for being offtpoic, but while experts are here...)
A "typical" implementation of atomic_add_unless() can return 0 immediately
after the first atomic_read() (before doing cmpxchg). In that case it doesn't
provide any barrier semantics. See include/asm-ia64/atomic.h as an example.
We should either change the implementation, or fix the docs.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cgroup requires the subsystem to return negative error code on error in the
create method.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>