Commit Graph

356 Commits

Author SHA1 Message Date
Cliff Wickman
1df57c0c21 [IA64] enable dumps to capture second page of kernel stack
In SLES10 (2.6.16) crash dumping (in my experience, LKCD) is unable to
capture the second page of the 2-page task/stack allocation.
This is particularly troublesome for dump analysis, as the stack traceback
cannot be done.
  (A similar convention is probably needed throughout the kernel to make
   kernel multi-page allocations detectable for dumping)

Multi-page kernel allocations are represented by the single page structure
associated with the first page of the allocation.  The page structures
associated with the other pages are unintialized.

If the dumper is selecting only kernel pages it has no way to identify
any but the first page of the allocation.

The fix is to make the task/stack allocation a compound page.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-27 14:31:16 -07:00
Jack Steiner
f0fe253c47 [IA64-SGI] - Fix discover of nearest cpu node to IO node
Fix a bug that causes discovery of the nearest node/cpu to
a TIO (IO node) to fail.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-27 14:28:37 -07:00
Christoph Lameter
e5ecc192df [IA64] Setup an IA64 specific reclaim distance
RECLAIM_DISTANCE is checked on bootup against the SLIT table distances.
Zone reclaim is important for system that have higher latencies but not for
systems that have multiple nodes on one motherboard and therefore low latencies.

We found that on motherboard latencies are typically 1 to 1.4 of local memory
access speed whereas multinode systems which benefit from zone reclaim have
usually more than 1.5 times the latency of a local access.

Set the reclaim distance for IA64 to 1.5 times.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-21 10:57:40 -07:00
Satoru Takeuchi
a72391e42f [IA64] eliminate compile time warnings
This patch removes following compile time warnings:

drivers/pci/pci-sysfs.c: In function `pci_read_legacy_io':
drivers/pci/pci-sysfs.c:257: warning: implicit declaration of function `ia64_pci_legacy_read'
drivers/pci/pci-sysfs.c: In function `pci_write_legacy_io':
drivers/pci/pci-sysfs.c:280: warning: implicit declaration of function `ia64_pci_legacy_write'

It also fixes wrong definition of ia64_pci_legacy_write (type of `bus' is not
`pci_dev', but `pci_bus').

Signed-Off-By: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-20 17:06:54 -07:00
Russ Anderson
86db2f4239 [IA64-SGI] SN SAL call to inject memory errors
The SGI Altix SAL provides an interface for modifying
the ECC on memory to create memory errors.  The SAL call
can be used to inject memory errors for testing MCA recovery
code.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-20 17:05:43 -07:00
Jack Steiner
0d9adec525 [IA64] - Fix MAX_PXM_DOMAINS for systems with > 256 nodes
Correctly size the PXM-related arrays for systems that have more than
256 nodes.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-20 10:16:11 -07:00
Russ Anderson
308a878210 [IA64] Remove unused variable in sn_sal.h
cnodeid was being set but not used.  The dead code was
left over from a previous version that grabbed a per node lock.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-20 10:14:56 -07:00
Jens Axboe
70524490ee [PATCH] splice: add support for sys_tee()
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.

Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:51:17 +02:00
Linus Torvalds
b3967dc566 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Prefetch mmap_sem in ia64_do_page_fault()
  [IA64] Failure to resume after INIT in user space
  [IA64] Pass more data to the MCA/INIT notify_die hooks
  [IA64] always map VGA framebuffer UC, even if it supports WB
  [IA64] fix bug in ia64 __mutex_fastpath_trylock
  [IA64] for_each_possible_cpu: ia64
  [IA64] update HP CSR space discovery via ACPI
  [IA64] Wire up new syscalls {set,get}_robust_list
  [IA64] 'msg' may be used uninitialized in xpc_initiate_allocate()
  [IA64] Wire up new syscall sync_file_range()
2006-04-11 06:40:17 -07:00
Yasunori Goto
c80d79d746 [PATCH] Configurable NODES_SHIFT
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch.  Its definition is sometimes configurable.  Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree.  But it looks a bit messy.

SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config.  Suitable node's number may be changed in the
future even if it is other architecture.  So, I wrote configurable node's
number.

This patch set defines just default value for each arch which needs multi
nodes except ia64.  But, it is easy to change to configurable if necessary.

On ia64 the number of nodes can be already configured in generic ia64 and SN2
config.  But, NODES_SHIFT is defined for DIG64 and HP'S machine too.  So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT.  It
would be simpler.

See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:39 -07:00
Keith Owens
958b166c00 [IA64] Pass more data to the MCA/INIT notify_die hooks
The MCA/INIT handlers maintain important state in the SAL to OS (sos)
area and in the monarch_cpu flag.  Kernel debuggers (such as KDB) need
this data, and may need to adjust the monarch_cpu field so make the
data available to the notify_die hooks.  Define two more events for
calling the functions on the notify_die chain.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-07 22:51:51 -07:00
Bjorn Helgaas
2db8d99ffd [IA64] always map VGA framebuffer UC, even if it supports WB
EFI on some machines, e.g., Intel Tiger, reports that the VGA framebuffer
supports WB access.  ioremap() prefers WB when possible, so it can work
when mapping main memory.

But it doesn't make sense to map a framebuffer WB, because the driver
doesn't flush explicitly, so updates won't make it to the device
immediately.

This is due to Zou Nan hai <nanhai.zou@intel.com>.

More extensive fix that adds a "size" argument coming soon.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-07 22:47:12 -07:00
Chen, Kenneth W
cfab9d0e1d [IA64] fix bug in ia64 __mutex_fastpath_trylock
The parenthesis around "likely" used in ia64 __mutex_fastpath_trylock
is incorrect, and it leads to broken mutex_trylock.  Here is the
patch that fixed the bug.  I removed the likely altogether because
there is no branch and gcc does a reasonable job at predicating the
return value.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-07 22:39:49 -07:00
Bjorn Helgaas
03fbaca36a [IA64] update HP CSR space discovery via ACPI
Get rid of the manual search of _CRS, in favor of
acpi_get_vendor_resource() which is now provided by the ACPI CA.  And fall
back to searching for a consumer-only address space descriptor if no
vendor-defined resource is found.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-06 14:42:38 -07:00
Tony Luck
b8cd2af862 [IA64] Wire up new syscalls {set,get}_robust_list
Join the dots to enable Ingo's robut futex syscalls.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-06 14:20:16 -07:00
Tony Luck
d905b00b3b [IA64] Wire up new syscall sync_file_range()
Also reserve syscall numbers for {set,get}_robust_list

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-04-04 14:08:11 -07:00
Tony Luck
2ab9391dea [IA64] Avoid "u64 foo : 32;" for gcc3 vs. gcc4 compatibility
gcc3 thinks that a 32-bit field of a u64 type is itself a u64, so
should be printed with "%ld".  gcc4 thinks it needs just "%d".
Make both versions happy by avoiding this construct.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-31 10:28:29 -08:00
Zhang, Yanmin
f19180056e [IA64] Export cpu cache info by sysfs
The patch exports 8 attributes of cpu cache info under
/sys/devices/system/cpu/cpuX/cache/indexX:
1) level
2) type
3) coherency_line_size
4) ways_of_associativity
5) size
6) shared_cpu_map
7) attributes
8) number_of_sets: number_of_sets=size/ways_of_associativity/coherency_line_size.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-30 17:14:12 -08:00
Linus Torvalds
d1127e40e8 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] ioremap() should prefer WB over UC
  [IA64] Add __mca_table to the DISCARD list in gate.lds
  [IA64] Move __mca_table out of the __init section
  [IA64] simplify some condition checks in iosapic_check_gsi_range
  [IA64] correct some messages and fixes some minor things
  [IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode()
  [IA64-SGI] sn_hwperf use of num_online_cpus()
  [IA64] optimize flush_tlb_range on large numa box
  [IA64] lazy_mmu_prot_update needs to be aware of huge pages
2006-03-30 12:38:18 -08:00
Jens Axboe
5274f052e7 [PATCH] Introduce sys_splice() system call
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).

From the splice.c comments:

   "splice": joining two ropes together by interweaving their strands.

   This is the "extended pipe" functionality, where a pipe is used as
   an arbitrary in-memory buffer. Think of a pipe as a small kernel
   buffer that you can use to transfer data from one end to the other.

   The traditional unix read/write is extended with a "splice()" operation
   that transfers data buffers to or from a pipe buffer.

   Named by Larry McVoy, original implementation from Linus, extended by
   Jens to support splicing to files and fixing the initial implementation
   bugs.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 12:28:18 -08:00
Jes Sorensen
3283a67d86 [IA64] Add __mca_table to the DISCARD list in gate.lds
Add __mca_table to the DISCARD list for the gate.lds linker script to
avoid broken linker references when linking the final vmlinux file.

Also add comment to include/asm-ia64/asmmacros.h to avoid anyone else
hitting this problem in the future.

Credits to James Bottomley <James.Bottomley@SteelEye.com> for spotting
the DISCARD list in gate.lds.S

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-30 09:04:19 -08:00
Alan Stern
e041c68341 [PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe.  There is no
protection against entries being added to or removed from a chain while the
chain is in use.  The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

	"Blocking" chains are always called from a process context
	and the callout routines are allowed to sleep;

	"Atomic" chains can be called from an atomic context and
	the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.  Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name).  New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain.  The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed.  For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections.  (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with.  For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem.  Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain.  (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications.  None
of them use the raw API, so for the moment it is only a placeholder.

  ATOMIC CHAINS
  -------------
arch/i386/kernel/traps.c:		i386die_chain
arch/ia64/kernel/traps.c:		ia64die_chain
arch/powerpc/kernel/traps.c:		powerpc_die_chain
arch/sparc64/kernel/traps.c:		sparc64die_chain
arch/x86_64/kernel/traps.c:		die_chain
drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
kernel/panic.c:				panic_notifier_list
kernel/profile.c:			task_free_notifier
net/bluetooth/hci_core.c:		hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
net/ipv6/addrconf.c:			inet6addr_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
net/netlink/af_netlink.c:		netlink_chain

  BLOCKING CHAINS
  ---------------
arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
arch/s390/kernel/process.c:		idle_chain
arch/x86_64/kernel/process.c		idle_notifier
drivers/base/memory.c:			memory_chain
drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
drivers/macintosh/adb.c:		adb_client_list
drivers/macintosh/via-pmu.c		sleep_notifier_list
drivers/macintosh/via-pmu68k.c		sleep_notifier_list
drivers/macintosh/windfarm_core.c	wf_client_list
drivers/usb/core/notify.c		usb_notifier_list
drivers/video/fbmem.c			fb_notifier_list
kernel/cpu.c				cpu_chain
kernel/module.c				module_notify_list
kernel/profile.c			munmap_notifier
kernel/profile.c			task_exit_notifier
kernel/sys.c				reboot_notifier_list
net/core/dev.c				netdev_chain
net/decnet/dn_dev.c:			dnaddr_chain
net/ipv4/devinet.c:			inetaddr_chain

It's possible that some of these classifications are wrong.  If they are,
please let us know or submit a patch to fix them.  Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar
66e863acd7 [PATCH] ia64: add ptr_to_compat()
Add ptr_to_compat() to ia64 - needed by the robust-futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
KAMEZAWA Hiroyuki
0ecd702bcb [PATCH] unify pfn_to_page: ia64 pfn_to_page
ia64 has special config CONFIG_VIRTUAL_MEM_MAP.
CONFIG_DISCONTIGMEM=y && CONFIG_VIRTUAL_MEM_MAP!=y is bug ?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
Akinobu Mita
2875aef8bd [PATCH] bitops: ia64: use generic bitops
- remove generic_fls64()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove sched_find_first_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
67b0ad574b [PATCH] bitops: use non atomic operations for minix_*_bit() and ext2_*_bit()
Bitmap functions for the minix filesystem and the ext2 filesystem except
ext2_set_bit_atomic() and ext2_clear_bit_atomic() do not require the atomic
guarantees.

But these are defined by using atomic bit operations on several architectures.
 (cris, frv, h8300, ia64, m32r, m68k, m68knommu, mips, s390, sh, sh64, sparc,
sparc64, v850, and xtensa)

This patch switches to non atomic bit operation.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Bjorn Helgaas
b2c99e3c70 [PATCH] EFI: keep physical table addresses in efi structure
Almost all users of the table addresses from the EFI system table want
physical addresses.  So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.

This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.

This is essentially the same as an earlier patch by Matt Tolentino:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.

Matt's original patch was backed out because it caused MCAs on HP sx1000
systems.  That problem is resolved by the ioremap() attribute checking added
for ia64.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
e9b0a07121 [PATCH] ia64: ioremap: check EFI for valid memory attributes
Check the EFI memory map so we can use the correct memory attributes for
ioremap().  Previously, we always used uncacheable access, which blows up on
some machines for regular system memory.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
136939a2b5 [PATCH] EFI, /dev/mem: simplify efi_mem_attribute_range()
Pass the size, not a pointer to the size, to efi_mem_attribute_range().

This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity.  Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Matt Domsch
3ed3bce846 [PATCH] ia64: use i386 dmi_scan.c
Enable DMI table parsing on ia64.

Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64.  dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.

This patch adds equivalent changes for ia64 as to what is in the x86_64
tree.  In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000.  On non-EFI systems, it continues the
brute-force search.

My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table.  Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Linus Torvalds
7d14f145f8 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] New IA64 core/thread detection patch
  [IA64] Increase max node count on SN platforms
  [IA64] Increase max node count on SN platforms
  [IA64] Increase max node count on SN platforms
  [IA64] Increase max node count on SN platforms
  [IA64] Tollhouse HP: IA64 arch changes
  [IA64] cleanup dig_irq_init
  [IA64] MCA recovery: kernel context recovery table
  IA64: Use early_parm to handle mvec_name and nomca
  [IA64] move patchlist and machvec into init section
  [IA64] add init declaration - nolwsys
  [IA64] add init declaration - gate page functions
  [IA64] add init declaration to memory initialization functions
  [IA64] add init declaration to cpu initialization functions
  [IA64] add __init declaration to mca functions
  [IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT
  [IA64] sn_check_intr: use ia64_get_irr()
  [IA64] fix ia64 is_hugepage_only_range
2006-03-25 08:49:25 -08:00
Davide Libenzi
f348d70a32 [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications
Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Fenghua Yu
4129a953ad [IA64] New IA64 core/thread detection patch
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add
proc_number=-1 to get core/thread mapping info on the running processer.

Based on this change, we had better to update existing core/thread
detection in IA64 kernel correspondingly. The attached patch implements
this change. It simplifies detection code and eliminates potential race
condition. It also runs a bit faster and has better scalability especially
when cores and threads number grows up in one package.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:15:23 -08:00
Jack Steiner
a9de983514 [IA64] Increase max node count on SN platforms
Node number are kept in the cpu_to_node_map which is
currently defined as u8. Change to u16 to accomodate
larger node numbers.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:14:41 -08:00
Jack Steiner
3ad5ef8b9d [IA64] Increase max node count on SN platforms
Add support in IA64 acpi for platforms that support more than
256 nodes. Currently, ACPI is limited to 256 nodes because the
proximity domain number is 8-bits.

Long term, we expect to use ACPI3.0 to support >256 nodes.
This patch is an interim solution that works with platforms
that pass the  high order bits of the proximity domain in
"reserved" fields of the ACPI tables. This code is enabled
ONLY on SN platforms.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:14:21 -08:00
Jack Steiner
b354a83888 [IA64] Increase max node count on SN platforms
Add a configuration option to allow the maximum
number of nodes to be configurable for GENERIC or SN
kernels.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:14:03 -08:00
Prarit Bhargava
f90aa8c4fe [IA64] Tollhouse HP: IA64 arch changes
arch/ia64/sn and include/asm-ia64/sn changes required to support Tollhouse
system PCI hotplug, fixes the ia64_sn_sysctl_ioboard_get call, and introduces
the PRF_HOTPLUG_SUPPORT feature bit.

Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:13:06 -08:00
Chen, Kenneth W
b17ea91a43 [IA64] cleanup dig_irq_init
dig_irq_init is equivalent to machvec_noop, no need to define
another empty function.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 13:12:46 -08:00
Russ Anderson
d2a28ad9fa [IA64] MCA recovery: kernel context recovery table
Memory errors encountered by user applications may surface
when the CPU is running in kernel context.  The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0).  This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.

An example is a user process lauching a load from memory
and the data in memory had bad ECC.  Before the bad data
gets to the CPU register, and interrupt comes in.  The
code jumps to the IVT interrupt entry point and begins
execution in kernel context.  The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA.  The MCA surfaces in
kernel context, even though the load was initiated from
user context.

As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table.  One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers.  A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address.  This limits clutter in the code.

This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers.  Other spots
can be added as needed by adding a single macro.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24 09:49:52 -08:00
Jan Beulich
ab7efcc97e [PATCH] abstract type/size specification for assembly
Provide abstraction for generating type and size information of assembly
routines and data, while permitting architectures to override these
defaults.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Russell King" <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Andi Kleen" <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:25 -08:00
Alexey Dobriyan
53b3531bbb [PATCH] s/;;/;/g
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:24 -08:00
Adrian Bunk
cdb0452789 [PATCH] kill include/linux/platform.h, default_idle() cleanup
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.

This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
  - parisc
  - sparc64
- make the needlessly global function static:
  - arm
  - h8300
  - m68k
  - m68knommu
  - s390
  - v850
  - x86_64
- add a prototype in asm/system.h:
  - cris
  - i386
  - ia64

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:21 -08:00
Nick Piggin
0b2fcfdb8b [PATCH] atomic: add_unless cmpxchg optimise
Without branch hints, the very unlikely chance of the loop repeating due to
cmpxchg failure is unrolled with gcc-4 that I have tested.

Improve this for architectures with a native cas/cmpxchg.  llsc archs
should try to implement this natively.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Kyle McMartin
804f1594cc [PATCH] Move read_mostly definition to asm/cache.h
Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
include/linux/cache.h.  Move the per architecture section definition to
asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
catch architectures which don't implement the section.

Verified that symbols still go in .data.read_mostly on parisc,
and the compile doesn't break.

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:10 -08:00
Chen, Kenneth W
244fd54540 [IA64] add init declaration to cpu initialization functions
Add init declaration to cpu initialization functions.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-22 16:04:37 -08:00
Chen, Kenneth W
2332c9ae79 [IA64] fix ia64 is_hugepage_only_range
fix is_hugepage_only_range() definition to be "overlaps"
instead of "within architectural restricted hugetlb address
range".  Simplify the ia64 specific code that used to use
is_hugepage_only_range() to just check which region the
address is in.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-22 14:35:08 -08:00
David Gibson
42b88befd6 [PATCH] hugepage: is_aligned_hugepage_range() cleanup
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.

Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range().  On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.

In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).

This patch cleans up by abolishing is_aligned_hugepage_range().  Instead
prepare_hugepage_range() is defined directly.  Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage.  ia64 and powerpc define custom versions.  The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned.  The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.

No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson
3915bcf38f [PATCH] hugepage: Move hugetlb_free_pgd_range() prototype to hugetlb.h
The optional hugepage callback, hugetlb_free_pgd_range() is presently
implemented non-trivially only on ia64 (but I plan to add one for powerpc
shortly).  It has its own prototype for the function in asm-ia64/pgtable.h.
 However, since the function is called from generic code, it make sense for
its prototype to be in the generic hugetlb.h header file, as the protypes
other arch callbacks already are (prepare_hugepage_range(),
set_huge_pte_at(), etc.).  This patch makes it so.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson
9da61aef0f [PATCH] hugepage: Fix hugepage logic in free_pgtables()
free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma.  is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned.  So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.

At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.

Nonetheless this should be fixed.  We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().

This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64).  We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range().  Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.

This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
Zhang, Yanmin
8f860591ff [PATCH] Enable mprotect on huge pages
2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.

From: David Gibson <david@gibson.dropbear.id.au>

  Remove a test from the mprotect() path which checks that the mprotect()ed
  range on a hugepage VMA is hugepage aligned (yes, really, the sense of
  is_aligned_hugepage_range() is the opposite of what you'd guess :-/).

  In fact, we don't need this test.  If the given addresses match the
  beginning/end of a hugepage VMA they must already be suitably aligned.  If
  they don't, then mprotect_fixup() will attempt to split the VMA.  The very
  first test in split_vma() will check for a badly aligned address on a
  hugepage VMA and return -EINVAL if necessary.

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>

  On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
  identify of hugetlb pte is lost when changing page protection via mprotect.
  A page fault occurs later will trigger a bug check in huge_pte_alloc().

  The fix is to always make new pte a hugetlb pte and also to clean up
  legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00