2008-01-13 22:51:16 -05:00
|
|
|
config ARCH
|
|
|
|
string
|
|
|
|
option env="ARCH"
|
|
|
|
|
|
|
|
config KERNELVERSION
|
|
|
|
string
|
|
|
|
option env="KERNELVERSION"
|
|
|
|
|
2006-06-09 01:12:45 -04:00
|
|
|
config DEFCONFIG_LIST
|
|
|
|
string
|
[PATCH] uml: use DEFCONFIG_LIST to avoid reading host's config
This should make sure that, for UML, host's configuration files are not
considered, which avoids various pains to the user. Our dependency are such
that the obtained Kconfig will be valid and will lead to successful
compilation - however they cannot prevent an user from disabling any boot
device, and if an option is not set in the read .config (say
/boot/config-XXX), with make menuconfig ARCH=um, it is not set. This always
disables UBD and all console I/O channels, which leads to non-working UML
kernels, so this bothers users - especially now, since it will happen on
almost every machine (/boot/config-`uname -r` exists almost on every machine).
It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
can be avoided, so please _do_ merge this patch.
Given the existence of options, it could be interesting to implement
(additionally) "option required" - with it, Kconfig will refuse reading a
.config file (from wherever it comes) if the given option is not set. With
this, one could mark with it the option characteristic of the given
architecture (it was an old proposal of Roman Zippel, when I pointed out our
problem):
config UML
option required
default y
However this should be further discussed:
*) for x86, it must support constructs like:
==arch/i386/Kconfig==
config 64BIT
option required
default n
where Kconfig must require that CONFIG_64BIT is disabled or not present in the
read .config.
*) do we want to do such checks only for the starting defconfig or also for
.config? Which leads to:
*) I may want to port a x86_64 .config to x86 and viceversa, or even among more
different archs. Should that be allowed, and in which measure (the user may
force skipping the check for a .config or it is only given a warning by
default)?
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <kbuild-devel@lists.sourceforge.net>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 02:28:23 -04:00
|
|
|
depends on !UML
|
2006-06-09 01:12:45 -04:00
|
|
|
option defconfig_list
|
|
|
|
default "/lib/modules/$UNAME_RELEASE/.config"
|
|
|
|
default "/etc/kernel-config"
|
|
|
|
default "/boot/config-$UNAME_RELEASE"
|
2008-05-25 17:03:18 -04:00
|
|
|
default "$ARCH_DEFCONFIG"
|
2006-06-09 01:12:45 -04:00
|
|
|
default "arch/$ARCH/defconfig"
|
|
|
|
|
2009-06-17 19:28:03 -04:00
|
|
|
config CONSTRUCTORS
|
|
|
|
bool
|
|
|
|
depends on !UML
|
|
|
|
default y
|
|
|
|
|
2007-07-31 03:39:23 -04:00
|
|
|
menu "General setup"
|
2005-04-16 18:20:36 -04:00
|
|
|
|
|
|
|
config EXPERIMENTAL
|
|
|
|
bool "Prompt for development and/or incomplete code/drivers"
|
|
|
|
---help---
|
|
|
|
Some of the various things that Linux supports (such as network
|
|
|
|
drivers, file systems, network protocols, etc.) can be in a state
|
|
|
|
of development where the functionality, stability, or the level of
|
|
|
|
testing is not yet high enough for general use. This is usually
|
|
|
|
known as the "alpha-test" phase among developers. If a feature is
|
|
|
|
currently in alpha-test, then the developers usually discourage
|
|
|
|
uninformed widespread use of this feature by the general public to
|
|
|
|
avoid "Why doesn't this work?" type mail messages. However, active
|
|
|
|
testing and use of these systems is welcomed. Just be aware that it
|
|
|
|
may not meet the normal level of reliability or it may fail to work
|
|
|
|
in some special cases. Detailed bug reports from people familiar
|
|
|
|
with the kernel internals are usually welcomed by the developers
|
|
|
|
(before submitting bug reports, please read the documents
|
|
|
|
<file:README>, <file:MAINTAINERS>, <file:REPORTING-BUGS>,
|
|
|
|
<file:Documentation/BUG-HUNTING>, and
|
|
|
|
<file:Documentation/oops-tracing.txt> in the kernel source).
|
|
|
|
|
|
|
|
This option will also make obsoleted drivers available. These are
|
|
|
|
drivers that have been replaced by something else, and/or are
|
|
|
|
scheduled to be removed in a future kernel release.
|
|
|
|
|
|
|
|
Unless you intend to help test and develop a feature or driver that
|
|
|
|
falls into this category, or you have a situation that requires
|
|
|
|
using these features, you should probably say N here, which will
|
|
|
|
cause the configurator to present you with fewer choices. If
|
|
|
|
you say Y here, you will be offered the choice of using features or
|
|
|
|
drivers that are currently considered to be in the alpha-test phase.
|
|
|
|
|
|
|
|
config BROKEN
|
|
|
|
bool
|
|
|
|
|
|
|
|
config BROKEN_ON_SMP
|
|
|
|
bool
|
|
|
|
depends on BROKEN || !SMP
|
|
|
|
default y
|
|
|
|
|
|
|
|
config LOCK_KERNEL
|
|
|
|
bool
|
|
|
|
depends on SMP || PREEMPT
|
|
|
|
default y
|
|
|
|
|
|
|
|
config INIT_ENV_ARG_LIMIT
|
|
|
|
int
|
2006-06-30 04:55:51 -04:00
|
|
|
default 32 if !UML
|
|
|
|
default 128 if UML
|
2005-04-16 18:20:36 -04:00
|
|
|
help
|
2005-10-30 18:01:46 -05:00
|
|
|
Maximum of each of the number of arguments and environment
|
|
|
|
variables passed to init from the kernel command line.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
config LOCALVERSION
|
|
|
|
string "Local version - append to kernel release"
|
|
|
|
help
|
|
|
|
Append an extra string to the end of your kernel version.
|
|
|
|
This will show up when you type uname, for example.
|
|
|
|
The string you set here will be appended after the contents of
|
|
|
|
any files with a filename matching localversion* in your
|
|
|
|
object and source tree, in that order. Your total string can
|
|
|
|
be a maximum of 64 characters.
|
|
|
|
|
2005-07-31 04:57:49 -04:00
|
|
|
config LOCALVERSION_AUTO
|
|
|
|
bool "Automatically append version information to the version string"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This will try to automatically determine if the current tree is a
|
2007-05-01 17:08:11 -04:00
|
|
|
release tree by looking for git tags that belong to the current
|
|
|
|
top of tree revision.
|
2005-07-31 04:57:49 -04:00
|
|
|
|
|
|
|
A string of the format -gxxxxxxxx will be added to the localversion
|
2007-05-01 17:08:11 -04:00
|
|
|
if a git-based tree is found. The string generated by this will be
|
2005-07-31 04:57:49 -04:00
|
|
|
appended after any matching localversion* files, and after the value
|
2007-05-01 17:08:11 -04:00
|
|
|
set in CONFIG_LOCALVERSION.
|
2005-07-31 04:57:49 -04:00
|
|
|
|
2007-05-01 17:08:11 -04:00
|
|
|
(The actual string used here is the first eight characters produced
|
|
|
|
by running the command:
|
|
|
|
|
|
|
|
$ git rev-parse --verify HEAD
|
|
|
|
|
|
|
|
which is done within the script "scripts/setlocalversion".)
|
2005-07-31 04:57:49 -04:00
|
|
|
|
2009-01-04 18:41:25 -05:00
|
|
|
config HAVE_KERNEL_GZIP
|
|
|
|
bool
|
|
|
|
|
|
|
|
config HAVE_KERNEL_BZIP2
|
|
|
|
bool
|
|
|
|
|
|
|
|
config HAVE_KERNEL_LZMA
|
|
|
|
bool
|
|
|
|
|
2009-01-04 16:46:17 -05:00
|
|
|
choice
|
2009-01-04 18:41:25 -05:00
|
|
|
prompt "Kernel compression mode"
|
|
|
|
default KERNEL_GZIP
|
|
|
|
depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA
|
|
|
|
help
|
2009-01-04 16:46:17 -05:00
|
|
|
The linux kernel is a kind of self-extracting executable.
|
|
|
|
Several compression algorithms are available, which differ
|
|
|
|
in efficiency, compression and decompression speed.
|
|
|
|
Compression speed is only relevant when building a kernel.
|
|
|
|
Decompression speed is relevant at each boot.
|
|
|
|
|
|
|
|
If you have any problems with bzip2 or lzma compressed
|
|
|
|
kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
|
|
|
|
version of this functionality (bzip2 only), for 2.4, was
|
|
|
|
supplied by Christian Ludwig)
|
|
|
|
|
|
|
|
High compression options are mostly useful for users, who
|
|
|
|
are low on disk space (embedded systems), but for whom ram
|
|
|
|
size matters less.
|
|
|
|
|
|
|
|
If in doubt, select 'gzip'
|
|
|
|
|
|
|
|
config KERNEL_GZIP
|
2009-01-04 18:41:25 -05:00
|
|
|
bool "Gzip"
|
|
|
|
depends on HAVE_KERNEL_GZIP
|
|
|
|
help
|
|
|
|
The old and tried gzip compression. Its compression ratio is
|
|
|
|
the poorest among the 3 choices; however its speed (both
|
|
|
|
compression and decompression) is the fastest.
|
2009-01-04 16:46:17 -05:00
|
|
|
|
|
|
|
config KERNEL_BZIP2
|
|
|
|
bool "Bzip2"
|
2009-01-04 18:41:25 -05:00
|
|
|
depends on HAVE_KERNEL_BZIP2
|
2009-01-04 16:46:17 -05:00
|
|
|
help
|
|
|
|
Its compression ratio and speed is intermediate.
|
2009-01-04 18:41:25 -05:00
|
|
|
Decompression speed is slowest among the three. The kernel
|
|
|
|
size is about 10% smaller with bzip2, in comparison to gzip.
|
|
|
|
Bzip2 uses a large amount of memory. For modern kernels you
|
|
|
|
will need at least 8MB RAM or more for booting.
|
2009-01-04 16:46:17 -05:00
|
|
|
|
|
|
|
config KERNEL_LZMA
|
2009-01-04 18:41:25 -05:00
|
|
|
bool "LZMA"
|
|
|
|
depends on HAVE_KERNEL_LZMA
|
|
|
|
help
|
|
|
|
The most recent compression algorithm.
|
|
|
|
Its ratio is best, decompression speed is between the other
|
|
|
|
two. Compression is slowest. The kernel size is about 33%
|
|
|
|
smaller with LZMA in comparison to gzip.
|
2009-01-04 16:46:17 -05:00
|
|
|
|
|
|
|
endchoice
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config SWAP
|
|
|
|
bool "Support for paging of anonymous memory (swap)"
|
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 14:45:40 -04:00
|
|
|
depends on MMU && BLOCK
|
2005-04-16 18:20:36 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option allows you to choose whether you want to have support
|
2006-01-14 20:40:08 -05:00
|
|
|
for so called swap devices or swap files in your kernel that are
|
2005-04-16 18:20:36 -04:00
|
|
|
used to provide more virtual memory than the actual RAM present
|
|
|
|
in your computer. If unsure say Y.
|
|
|
|
|
|
|
|
config SYSVIPC
|
|
|
|
bool "System V IPC"
|
|
|
|
---help---
|
|
|
|
Inter Process Communication is a suite of library functions and
|
|
|
|
system calls which let processes (running programs) synchronize and
|
|
|
|
exchange information. It is generally considered to be a good thing,
|
|
|
|
and some programs won't run unless you say Y here. In particular, if
|
|
|
|
you want to run the DOS emulator dosemu under Linux (read the
|
|
|
|
DOSEMU-HOWTO, available from <http://www.tldp.org/docs.html#howto>),
|
|
|
|
you'll need to say Y here.
|
|
|
|
|
|
|
|
You can find documentation about IPC with "info ipc" and also in
|
|
|
|
section 6.4 of the Linux Programmer's Guide, available from
|
|
|
|
<http://www.tldp.org/guides.html>.
|
|
|
|
|
2007-02-14 03:34:06 -05:00
|
|
|
config SYSVIPC_SYSCTL
|
|
|
|
bool
|
|
|
|
depends on SYSVIPC
|
|
|
|
depends on SYSCTL
|
|
|
|
default y
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config POSIX_MQUEUE
|
|
|
|
bool "POSIX Message Queues"
|
|
|
|
depends on NET && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
POSIX variant of message queues is a part of IPC. In POSIX message
|
|
|
|
queues every message has a priority which decides about succession
|
|
|
|
of receiving it by a process. If you want to compile and run
|
|
|
|
programs written e.g. for Solaris with use of its POSIX message
|
2007-05-09 01:25:13 -04:00
|
|
|
queues (functions mq_*) say Y here.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
|
|
|
POSIX message queues are visible as a filesystem called 'mqueue'
|
|
|
|
and can be mounted somewhere if you want to do filesystem
|
|
|
|
operations on message queues.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2009-04-06 22:01:11 -04:00
|
|
|
config POSIX_MQUEUE_SYSCTL
|
|
|
|
bool
|
|
|
|
depends on POSIX_MQUEUE
|
|
|
|
depends on SYSCTL
|
|
|
|
default y
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config BSD_PROCESS_ACCT
|
|
|
|
bool "BSD Process Accounting"
|
|
|
|
help
|
|
|
|
If you say Y here, a user level program will be able to instruct the
|
|
|
|
kernel (via a special system call) to write process accounting
|
|
|
|
information to a file: whenever a process exits, information about
|
|
|
|
that process will be appended to the file by the kernel. The
|
|
|
|
information includes things such as creation time, owning user,
|
|
|
|
command name, memory usage, controlling terminal etc. (the complete
|
|
|
|
list is in the struct acct in <file:include/linux/acct.h>). It is
|
|
|
|
up to the user level program to do useful things with this
|
|
|
|
information. This is generally a good idea, so say Y.
|
|
|
|
|
|
|
|
config BSD_PROCESS_ACCT_V3
|
|
|
|
bool "BSD Process Accounting version 3 file format"
|
|
|
|
depends on BSD_PROCESS_ACCT
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
If you say Y here, the process accounting information is written
|
|
|
|
in a new file format that also logs the process IDs of each
|
|
|
|
process and it's parent. Note that this file format is incompatible
|
|
|
|
with previous v0/v1/v2 file formats, so you will need updated tools
|
|
|
|
for processing it. A preliminary version of these tools is available
|
2008-06-18 04:45:13 -04:00
|
|
|
at <http://www.gnu.org/software/acct/>.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
2006-07-14 03:24:40 -04:00
|
|
|
config TASKSTATS
|
|
|
|
bool "Export task/process statistics through netlink (EXPERIMENTAL)"
|
|
|
|
depends on NET
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Export selected statistics for tasks/processes through the
|
|
|
|
generic netlink interface. Unlike BSD process accounting, the
|
|
|
|
statistics are available during the lifetime of tasks/processes as
|
|
|
|
responses to commands. Like BSD accounting, they are sent to user
|
|
|
|
space on task exit.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2006-07-14 03:24:36 -04:00
|
|
|
config TASK_DELAY_ACCT
|
|
|
|
bool "Enable per-task delay accounting (EXPERIMENTAL)"
|
2006-07-14 03:24:41 -04:00
|
|
|
depends on TASKSTATS
|
2006-07-14 03:24:36 -04:00
|
|
|
help
|
|
|
|
Collect information on time spent by a task waiting for system
|
|
|
|
resources like cpu, synchronous block I/O completion and swapping
|
|
|
|
in pages. Such statistics can help in setting a task's priorities
|
|
|
|
relative to other tasks for cpu, io, rss limits etc.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2007-02-10 04:46:44 -05:00
|
|
|
config TASK_XACCT
|
|
|
|
bool "Enable extended accounting over taskstats (EXPERIMENTAL)"
|
|
|
|
depends on TASKSTATS
|
|
|
|
help
|
|
|
|
Collect extended task accounting data and send the data
|
|
|
|
to userland for processing over the taskstats interface.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
|
|
|
config TASK_IO_ACCOUNTING
|
|
|
|
bool "Enable per-task storage I/O accounting (EXPERIMENTAL)"
|
|
|
|
depends on TASK_XACCT
|
|
|
|
help
|
|
|
|
Collect information on the number of bytes of storage I/O which this
|
|
|
|
task has caused.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config AUDIT
|
|
|
|
bool "Auditing support"
|
2005-05-11 05:52:45 -04:00
|
|
|
depends on NET
|
2005-04-16 18:20:36 -04:00
|
|
|
help
|
|
|
|
Enable auditing infrastructure that can be used with another
|
|
|
|
kernel subsystem, such as SELinux (which requires this for
|
|
|
|
logging of avc messages output). Does not do system-call
|
|
|
|
auditing without CONFIG_AUDITSYSCALL.
|
|
|
|
|
|
|
|
config AUDITSYSCALL
|
|
|
|
bool "Enable system-call auditing support"
|
2009-10-16 03:21:37 -04:00
|
|
|
depends on AUDIT && (X86 || PPC || S390 || IA64 || UML || SPARC64 || SUPERH)
|
2005-04-16 18:20:36 -04:00
|
|
|
default y if SECURITY_SELINUX
|
|
|
|
help
|
|
|
|
Enable low-overhead system-call auditing infrastructure that
|
|
|
|
can be used independently or with another kernel subsystem,
|
[PATCH] audit: path-based rules
In this implementation, audit registers inotify watches on the parent
directories of paths specified in audit rules. When audit's inotify
event handler is called, it updates any affected rules based on the
filesystem event. If the parent directory is renamed, removed, or its
filesystem is unmounted, audit removes all rules referencing that
inotify watch.
To keep things simple, this implementation limits location-based
auditing to the directory entries in an existing directory. Given
a path-based rule for /foo/bar/passwd, the following table applies:
passwd modified -- audit event logged
passwd replaced -- audit event logged, rules list updated
bar renamed -- rule removed
foo renamed -- untracked, meaning that the rule now applies to
the new location
Audit users typically want to have many rules referencing filesystem
objects, which can significantly impact filtering performance. This
patch also adds an inode-number-based rule hash to mitigate this
situation.
The patch is relative to the audit git tree:
http://kernel.org/git/?p=linux/kernel/git/viro/audit-current.git;a=summary
and uses the inotify kernel API:
http://lkml.org/lkml/2006/6/1/145
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-04-07 16:55:56 -04:00
|
|
|
such as SELinux. To use audit's filesystem watch feature, please
|
|
|
|
ensure that INOTIFY is configured.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
[PATCH] audit: watching subtrees
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is. Limitations:
* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there. New command
tells audit to recalculate the trees, trimming such sources of false
positives.
Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-07-22 08:04:18 -04:00
|
|
|
config AUDIT_TREE
|
|
|
|
def_bool y
|
2009-05-21 17:02:01 -04:00
|
|
|
depends on AUDITSYSCALL
|
|
|
|
select INOTIFY
|
[PATCH] audit: watching subtrees
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is. Limitations:
* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there. New command
tells audit to recalculate the trees, trimming such sources of false
positives.
Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-07-22 08:04:18 -04:00
|
|
|
|
2009-01-15 15:28:29 -05:00
|
|
|
menu "RCU Subsystem"
|
|
|
|
|
|
|
|
choice
|
|
|
|
prompt "RCU Implementation"
|
2009-04-03 00:06:25 -04:00
|
|
|
default TREE_RCU
|
2009-01-15 15:28:29 -05:00
|
|
|
|
|
|
|
config TREE_RCU
|
|
|
|
bool "Tree-based hierarchical RCU"
|
|
|
|
help
|
|
|
|
This option selects the RCU implementation that is
|
|
|
|
designed for very large SMP system with hundreds or
|
2009-06-23 20:12:47 -04:00
|
|
|
thousands of CPUs. It also scales down nicely to
|
|
|
|
smaller systems.
|
2009-01-15 15:28:29 -05:00
|
|
|
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
config TREE_PREEMPT_RCU
|
|
|
|
bool "Preemptable tree-based hierarchical RCU"
|
|
|
|
depends on PREEMPT
|
|
|
|
help
|
|
|
|
This option selects the RCU implementation that is
|
|
|
|
designed for very large SMP systems with hundreds or
|
|
|
|
thousands of CPUs, but for which real-time response
|
2009-09-13 12:15:08 -04:00
|
|
|
is also required. It also scales down nicely to
|
|
|
|
smaller systems.
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
|
2009-01-15 15:28:29 -05:00
|
|
|
endchoice
|
|
|
|
|
|
|
|
config RCU_TRACE
|
|
|
|
bool "Enable tracing for RCU"
|
2009-08-22 16:56:53 -04:00
|
|
|
depends on TREE_RCU || TREE_PREEMPT_RCU
|
2009-01-15 15:28:29 -05:00
|
|
|
help
|
|
|
|
This option provides tracing in RCU which presents stats
|
|
|
|
in debugfs for debugging RCU implementation.
|
|
|
|
|
|
|
|
Say Y here if you want to enable RCU tracing
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
|
|
|
config RCU_FANOUT
|
|
|
|
int "Tree-based hierarchical RCU fanout value"
|
|
|
|
range 2 64 if 64BIT
|
|
|
|
range 2 32 if !64BIT
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
depends on TREE_RCU || TREE_PREEMPT_RCU
|
2009-01-15 15:28:29 -05:00
|
|
|
default 64 if 64BIT
|
|
|
|
default 32 if !64BIT
|
|
|
|
help
|
|
|
|
This option controls the fanout of hierarchical implementations
|
|
|
|
of RCU, allowing RCU to work efficiently on machines with
|
|
|
|
large numbers of CPUs. This value must be at least the cube
|
|
|
|
root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
|
|
|
|
systems and up to 262,144 for 64-bit systems.
|
|
|
|
|
|
|
|
Select a specific number if testing RCU itself.
|
|
|
|
Take the default if unsure.
|
|
|
|
|
|
|
|
config RCU_FANOUT_EXACT
|
|
|
|
bool "Disable tree-based hierarchical RCU auto-balancing"
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
depends on TREE_RCU || TREE_PREEMPT_RCU
|
2009-01-15 15:28:29 -05:00
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option forces use of the exact RCU_FANOUT value specified,
|
|
|
|
regardless of imbalances in the hierarchy. This is useful for
|
|
|
|
testing RCU itself, and might one day be useful on systems with
|
|
|
|
strong NUMA behavior.
|
|
|
|
|
|
|
|
Without RCU_FANOUT_EXACT, the code will balance the hierarchy.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
|
|
|
config TREE_RCU_TRACE
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
|
2009-01-15 15:28:29 -05:00
|
|
|
select DEBUG_FS
|
|
|
|
help
|
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22 16:56:52 -04:00
|
|
|
This option provides tracing for the TREE_RCU and
|
|
|
|
TREE_PREEMPT_RCU implementations, permitting Makefile to
|
|
|
|
trivially select kernel/rcutree_trace.c.
|
2009-01-15 15:28:29 -05:00
|
|
|
|
|
|
|
endmenu # "RCU Subsystem"
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config IKCONFIG
|
2006-10-01 02:27:25 -04:00
|
|
|
tristate "Kernel .config support"
|
2005-04-16 18:20:36 -04:00
|
|
|
---help---
|
|
|
|
This option enables the complete Linux kernel ".config" file
|
|
|
|
contents to be saved in the kernel. It provides documentation
|
|
|
|
of which kernel options are used in a running kernel or in an
|
|
|
|
on-disk kernel. This information can be extracted from the kernel
|
|
|
|
image file with the script scripts/extract-ikconfig and used as
|
|
|
|
input to rebuild the current kernel or to build another kernel.
|
|
|
|
It can also be extracted from a running kernel by reading
|
|
|
|
/proc/config.gz if enabled (below).
|
|
|
|
|
|
|
|
config IKCONFIG_PROC
|
|
|
|
bool "Enable access to .config through /proc/config.gz"
|
|
|
|
depends on IKCONFIG && PROC_FS
|
|
|
|
---help---
|
|
|
|
This option enables access to the kernel configuration file
|
|
|
|
through /proc/config.gz.
|
|
|
|
|
2007-05-08 03:31:15 -04:00
|
|
|
config LOG_BUF_SHIFT
|
|
|
|
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
|
|
|
|
range 12 21
|
2008-04-29 03:58:58 -04:00
|
|
|
default 17
|
2007-05-08 03:31:15 -04:00
|
|
|
help
|
|
|
|
Select kernel log buffer size as a power of 2.
|
2008-04-29 03:58:58 -04:00
|
|
|
Examples:
|
|
|
|
17 => 128 KB
|
|
|
|
16 => 64 KB
|
|
|
|
15 => 32 KB
|
|
|
|
14 => 16 KB
|
2007-05-08 03:31:15 -04:00
|
|
|
13 => 8 KB
|
|
|
|
12 => 4 KB
|
|
|
|
|
2008-05-05 17:19:50 -04:00
|
|
|
#
|
|
|
|
# Architectures with an unreliable sched_clock() should select this:
|
|
|
|
#
|
|
|
|
config HAVE_UNSTABLE_SCHED_CLOCK
|
|
|
|
bool
|
|
|
|
|
2008-02-13 09:45:40 -05:00
|
|
|
config GROUP_SCHED
|
|
|
|
bool "Group CPU scheduler"
|
2008-05-03 20:42:34 -04:00
|
|
|
depends on EXPERIMENTAL
|
|
|
|
default n
|
2007-10-15 11:00:07 -04:00
|
|
|
help
|
2007-10-15 11:00:12 -04:00
|
|
|
This feature lets CPU scheduler recognize task groups and control CPU
|
2007-10-15 11:00:09 -04:00
|
|
|
bandwidth allocation to such task groups.
|
2009-01-07 21:07:30 -05:00
|
|
|
In order to create a group from arbitrary set of processes, use
|
|
|
|
CONFIG_CGROUPS. (See Control Group support.)
|
2007-10-15 11:00:07 -04:00
|
|
|
|
2008-02-13 09:45:40 -05:00
|
|
|
config FAIR_GROUP_SCHED
|
|
|
|
bool "Group scheduling for SCHED_OTHER"
|
|
|
|
depends on GROUP_SCHED
|
2008-05-03 20:42:34 -04:00
|
|
|
default GROUP_SCHED
|
2008-02-13 09:45:40 -05:00
|
|
|
|
|
|
|
config RT_GROUP_SCHED
|
|
|
|
bool "Group scheduling for SCHED_RR/FIFO"
|
|
|
|
depends on EXPERIMENTAL
|
|
|
|
depends on GROUP_SCHED
|
|
|
|
default n
|
2008-04-19 13:45:01 -04:00
|
|
|
help
|
|
|
|
This feature lets you explicitly allocate real CPU bandwidth
|
|
|
|
to users or control groups (depending on the "Basis for grouping tasks"
|
|
|
|
setting below. If enabled, it will also make it impossible to
|
|
|
|
schedule realtime tasks for non-root users until you allocate
|
|
|
|
realtime bandwidth for them.
|
2008-11-12 19:23:55 -05:00
|
|
|
See Documentation/scheduler/sched-rt-group.txt for more information.
|
2008-02-13 09:45:40 -05:00
|
|
|
|
2007-10-15 11:00:09 -04:00
|
|
|
choice
|
2008-02-13 09:45:40 -05:00
|
|
|
depends on GROUP_SCHED
|
2007-10-15 11:00:09 -04:00
|
|
|
prompt "Basis for grouping tasks"
|
2008-02-13 09:45:40 -05:00
|
|
|
default USER_SCHED
|
2007-10-15 11:00:09 -04:00
|
|
|
|
2008-02-13 09:45:40 -05:00
|
|
|
config USER_SCHED
|
2007-10-15 11:00:12 -04:00
|
|
|
bool "user id"
|
|
|
|
help
|
|
|
|
This option will choose userid as the basis for grouping
|
|
|
|
tasks, thus providing equal CPU bandwidth to each user.
|
2007-10-15 11:00:09 -04:00
|
|
|
|
2008-02-13 09:45:40 -05:00
|
|
|
config CGROUP_SCHED
|
2007-10-19 02:41:03 -04:00
|
|
|
bool "Control groups"
|
|
|
|
depends on CGROUPS
|
|
|
|
help
|
|
|
|
This option allows you to create arbitrary task groups
|
|
|
|
using the "cgroup" pseudo filesystem and control
|
|
|
|
the cpu bandwidth allocated to each such task group.
|
2009-01-15 16:50:59 -05:00
|
|
|
Refer to Documentation/cgroups/cgroups.txt for more
|
|
|
|
information on "cgroup" pseudo filesystem.
|
2007-10-19 02:41:03 -04:00
|
|
|
|
2007-10-15 11:00:09 -04:00
|
|
|
endchoice
|
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
menuconfig CGROUPS
|
|
|
|
boolean "Control Group support"
|
2009-01-07 21:07:30 -05:00
|
|
|
help
|
2009-01-15 16:50:58 -05:00
|
|
|
This option adds support for grouping sets of processes together, for
|
2009-01-07 21:07:30 -05:00
|
|
|
use with process control subsystems such as Cpusets, CFS, memory
|
|
|
|
controls or device isolation.
|
|
|
|
See
|
|
|
|
- Documentation/scheduler/sched-design-CFS.txt (CFS)
|
2009-01-15 16:50:59 -05:00
|
|
|
- Documentation/cgroups/ (features for grouping, isolation
|
|
|
|
and resource control)
|
2009-01-07 21:07:30 -05:00
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
if CGROUPS
|
|
|
|
|
2009-01-07 21:07:30 -05:00
|
|
|
config CGROUP_DEBUG
|
|
|
|
bool "Example debug cgroup subsystem"
|
|
|
|
depends on CGROUPS
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option enables a simple cgroup subsystem that
|
|
|
|
exports useful debugging information about the cgroups
|
2009-01-15 16:50:58 -05:00
|
|
|
framework.
|
2009-01-07 21:07:30 -05:00
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
Say N if unsure.
|
2009-01-07 21:07:30 -05:00
|
|
|
|
|
|
|
config CGROUP_NS
|
2009-01-15 16:50:58 -05:00
|
|
|
bool "Namespace cgroup subsystem"
|
|
|
|
depends on CGROUPS
|
|
|
|
help
|
|
|
|
Provides a simple namespace cgroup subsystem to
|
|
|
|
provide hierarchical naming of sets of namespaces,
|
|
|
|
for instance virtual servers and checkpoint/restart
|
|
|
|
jobs.
|
2009-01-07 21:07:30 -05:00
|
|
|
|
|
|
|
config CGROUP_FREEZER
|
2009-01-15 16:50:58 -05:00
|
|
|
bool "Freezer cgroup subsystem"
|
|
|
|
depends on CGROUPS
|
|
|
|
help
|
|
|
|
Provides a way to freeze and unfreeze all tasks in a
|
2009-01-07 21:07:30 -05:00
|
|
|
cgroup.
|
|
|
|
|
|
|
|
config CGROUP_DEVICE
|
|
|
|
bool "Device controller for cgroups"
|
|
|
|
depends on CGROUPS && EXPERIMENTAL
|
|
|
|
help
|
|
|
|
Provides a cgroup implementing whitelists for devices which
|
|
|
|
a process in the cgroup can mknod or open.
|
|
|
|
|
|
|
|
config CPUSETS
|
|
|
|
bool "Cpuset support"
|
2009-04-02 19:57:55 -04:00
|
|
|
depends on CGROUPS
|
2009-01-07 21:07:30 -05:00
|
|
|
help
|
|
|
|
This option will let you create and manage CPUSETs which
|
|
|
|
allow dynamically partitioning a system into sets of CPUs and
|
|
|
|
Memory Nodes and assigning tasks to run only within those sets.
|
|
|
|
This is primarily useful on large SMP or NUMA systems.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
config PROC_PID_CPUSET
|
|
|
|
bool "Include legacy /proc/<pid>/cpuset file"
|
|
|
|
depends on CPUSETS
|
|
|
|
default y
|
|
|
|
|
2007-12-02 14:04:49 -05:00
|
|
|
config CGROUP_CPUACCT
|
|
|
|
bool "Simple CPU accounting cgroup subsystem"
|
|
|
|
depends on CGROUPS
|
|
|
|
help
|
|
|
|
Provides a simple Resource Controller for monitoring the
|
2009-01-15 16:50:58 -05:00
|
|
|
total CPU consumed by the tasks in a cgroup.
|
2007-12-02 14:04:49 -05:00
|
|
|
|
2008-02-07 03:13:49 -05:00
|
|
|
config RESOURCE_COUNTERS
|
|
|
|
bool "Resource counters"
|
|
|
|
help
|
|
|
|
This option enables controller independent resource accounting
|
2009-01-15 16:50:58 -05:00
|
|
|
infrastructure that works with cgroups.
|
2008-02-07 03:13:49 -05:00
|
|
|
depends on CGROUPS
|
|
|
|
|
2008-03-04 17:28:39 -05:00
|
|
|
config CGROUP_MEM_RES_CTLR
|
|
|
|
bool "Memory Resource Controller for Control Groups"
|
|
|
|
depends on CGROUPS && RESOURCE_COUNTERS
|
cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.
This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.
A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.
This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.
I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.
This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.
After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 04:00:16 -04:00
|
|
|
select MM_OWNER
|
2008-03-04 17:28:39 -05:00
|
|
|
help
|
2008-10-29 17:01:06 -04:00
|
|
|
Provides a memory resource controller that manages both anonymous
|
2009-02-04 04:12:08 -05:00
|
|
|
memory and page cache. (See Documentation/cgroups/memory.txt)
|
2008-03-04 17:28:39 -05:00
|
|
|
|
|
|
|
Note that setting this option increases fixed memory overhead
|
2008-10-29 17:01:06 -04:00
|
|
|
associated with each page of memory in the system. By this,
|
|
|
|
20(40)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
|
|
|
|
usage tracking struct at boot. Total amount of this is printed out
|
|
|
|
at boot.
|
2008-03-04 17:28:39 -05:00
|
|
|
|
|
|
|
Only enable when you're ok with these trade offs and really
|
2008-10-29 17:01:06 -04:00
|
|
|
sure you need the memory resource controller. Even when you enable
|
|
|
|
this, you can set "cgroup_disable=memory" at your boot option to
|
|
|
|
disable memory resource controller and you can avoid overheads.
|
2009-01-07 21:07:35 -05:00
|
|
|
(and lose benefits of memory resource controller)
|
2008-03-04 17:28:39 -05:00
|
|
|
|
cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.
This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.
A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.
This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.
I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.
This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.
After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 04:00:16 -04:00
|
|
|
This config option also selects MM_OWNER config option, which
|
|
|
|
could in turn add some fork/exit overhead.
|
|
|
|
|
2009-01-07 21:07:57 -05:00
|
|
|
config CGROUP_MEM_RES_CTLR_SWAP
|
|
|
|
bool "Memory Resource Controller Swap Extension(EXPERIMENTAL)"
|
|
|
|
depends on CGROUP_MEM_RES_CTLR && SWAP && EXPERIMENTAL
|
|
|
|
help
|
|
|
|
Add swap management feature to memory resource controller. When you
|
|
|
|
enable this, you can limit mem+swap usage per cgroup. In other words,
|
|
|
|
when you disable this, memory resource controller has no cares to
|
|
|
|
usage of swap...a process can exhaust all of the swap. This extension
|
|
|
|
is useful when you want to avoid exhaustion swap but this itself
|
|
|
|
adds more overheads and consumes memory for remembering information.
|
|
|
|
Especially if you use 32bit system or small memory system, please
|
|
|
|
be careful about enabling this. When memory resource controller
|
|
|
|
is disabled by boot option, this will be automatically disabled and
|
|
|
|
there will be no overhead from this. Even when you set this config=y,
|
|
|
|
if boot option "noswapaccount" is set, swap will not be accounted.
|
2009-04-02 19:57:47 -04:00
|
|
|
Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
|
|
|
|
size is 4096bytes, 512k per 1Gbytes of swap.
|
2009-01-07 21:07:57 -05:00
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
endif # CGROUPS
|
2009-01-07 21:07:57 -05:00
|
|
|
|
2009-01-15 16:50:58 -05:00
|
|
|
config MM_OWNER
|
|
|
|
bool
|
2009-01-07 21:07:30 -05:00
|
|
|
|
2006-09-14 05:23:28 -04:00
|
|
|
config SYSFS_DEPRECATED
|
2008-03-04 08:54:47 -05:00
|
|
|
bool
|
|
|
|
|
|
|
|
config SYSFS_DEPRECATED_V2
|
2009-04-16 13:56:37 -04:00
|
|
|
bool "remove sysfs features which may confuse old userspace tools"
|
2007-12-31 13:05:34 -05:00
|
|
|
depends on SYSFS
|
2009-04-16 13:56:37 -04:00
|
|
|
default n
|
2008-03-04 08:54:47 -05:00
|
|
|
select SYSFS_DEPRECATED
|
2006-09-14 05:23:28 -04:00
|
|
|
help
|
2008-11-01 09:03:00 -04:00
|
|
|
This option switches the layout of sysfs to the deprecated
|
2009-04-16 13:56:37 -04:00
|
|
|
version. Do not use it on recent distributions.
|
2008-11-01 09:03:00 -04:00
|
|
|
|
|
|
|
The current sysfs layout features a unified device tree at
|
|
|
|
/sys/devices/, which is able to express a hierarchy between
|
|
|
|
class devices. If the deprecated option is set to Y, the
|
|
|
|
unified device tree is split into a bus device tree at
|
|
|
|
/sys/devices/ and several individual class device trees at
|
|
|
|
/sys/class/. The class and bus devices will be connected by
|
|
|
|
"<subsystem>:<name>" and the "device" links. The "block"
|
|
|
|
class devices, will not show up in /sys/class/block/. Some
|
|
|
|
subsystems will suppress the creation of some devices which
|
|
|
|
depend on the unified device tree.
|
|
|
|
|
|
|
|
This option is not a pure compatibility option that can
|
|
|
|
be safely enabled on newer distributions. It will change the
|
|
|
|
layout of sysfs to the non-extensible deprecated version,
|
|
|
|
and disable some features, which can not be exported without
|
|
|
|
confusing older userspace tools. Since 2007/2008 all major
|
|
|
|
distributions do not enable this option, and ship no tools which
|
|
|
|
depend on the deprecated layout or this option.
|
|
|
|
|
|
|
|
If you are using a new kernel on an older distribution, or use
|
|
|
|
older userspace tools, you might need to say Y here. Do not say Y,
|
|
|
|
if the original kernel, that came with your distribution, has
|
|
|
|
this option set to N.
|
2006-09-14 05:23:28 -04:00
|
|
|
|
2006-03-23 13:56:55 -05:00
|
|
|
config RELAY
|
|
|
|
bool "Kernel->user space relay support (formerly relayfs)"
|
|
|
|
help
|
|
|
|
This option enables support for relay interface support in
|
|
|
|
certain file systems (such as debugfs).
|
|
|
|
It is designed to provide an efficient mechanism for tools and
|
|
|
|
facilities to relay large amounts of data from kernel space to
|
|
|
|
user space.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2008-02-08 07:18:19 -05:00
|
|
|
config NAMESPACES
|
|
|
|
bool "Namespaces support" if EMBEDDED
|
|
|
|
default !EMBEDDED
|
|
|
|
help
|
|
|
|
Provides the way to make tasks work with different objects using
|
|
|
|
the same id. For example same IPC id may refer to different objects
|
|
|
|
or same user id or pid may refer to different tasks when used in
|
|
|
|
different namespaces.
|
|
|
|
|
2008-02-08 07:18:21 -05:00
|
|
|
config UTS_NS
|
|
|
|
bool "UTS namespace"
|
|
|
|
depends on NAMESPACES
|
|
|
|
help
|
|
|
|
In this namespace tasks see different info provided with the
|
|
|
|
uname() system call
|
|
|
|
|
2008-02-08 07:18:22 -05:00
|
|
|
config IPC_NS
|
|
|
|
bool "IPC namespace"
|
2009-04-06 22:01:08 -04:00
|
|
|
depends on NAMESPACES && (SYSVIPC || POSIX_MQUEUE)
|
2008-02-08 07:18:22 -05:00
|
|
|
help
|
|
|
|
In this namespace tasks work with IPC ids which correspond to
|
2009-04-06 22:01:08 -04:00
|
|
|
different IPC objects in different namespaces.
|
2008-02-08 07:18:22 -05:00
|
|
|
|
2008-02-08 07:18:23 -05:00
|
|
|
config USER_NS
|
|
|
|
bool "User namespace (EXPERIMENTAL)"
|
|
|
|
depends on NAMESPACES && EXPERIMENTAL
|
|
|
|
help
|
|
|
|
This allows containers, i.e. vservers, to use user namespaces
|
|
|
|
to provide different user info for different servers.
|
|
|
|
If unsure, say N.
|
|
|
|
|
2008-02-08 07:18:24 -05:00
|
|
|
config PID_NS
|
|
|
|
bool "PID Namespaces (EXPERIMENTAL)"
|
|
|
|
default n
|
|
|
|
depends on NAMESPACES && EXPERIMENTAL
|
|
|
|
help
|
2008-07-06 08:48:02 -04:00
|
|
|
Support process id namespaces. This allows having multiple
|
2009-01-26 05:12:25 -05:00
|
|
|
processes with the same pid as long as they are in different
|
2008-02-08 07:18:24 -05:00
|
|
|
pid namespaces. This is a building block of containers.
|
|
|
|
|
|
|
|
Unless you want to work with an experimental feature
|
|
|
|
say N here.
|
|
|
|
|
2009-01-26 15:25:55 -05:00
|
|
|
config NET_NS
|
|
|
|
bool "Network namespace"
|
|
|
|
default n
|
|
|
|
depends on NAMESPACES && EXPERIMENTAL && NET
|
|
|
|
help
|
|
|
|
Allow user space to create what appear to be multiple instances
|
|
|
|
of the network stack.
|
|
|
|
|
2007-03-06 04:42:17 -05:00
|
|
|
config BLK_DEV_INITRD
|
|
|
|
bool "Initial RAM filesystem and RAM disk (initramfs/initrd) support"
|
|
|
|
depends on BROKEN || !FRV
|
|
|
|
help
|
|
|
|
The initial RAM filesystem is a ramfs which is loaded by the
|
|
|
|
boot loader (loadlin or lilo) and that is mounted as root
|
|
|
|
before the normal boot procedure. It is typically used to
|
|
|
|
load modules needed to mount the "real" root file system,
|
|
|
|
etc. See <file:Documentation/initrd.txt> for details.
|
|
|
|
|
|
|
|
If RAM disk support (BLK_DEV_RAM) is also included, this
|
|
|
|
also enables initial RAM disk (initrd) support and adds
|
|
|
|
15 Kbytes (more on some other architectures) to the kernel size.
|
|
|
|
|
|
|
|
If unsure say Y.
|
|
|
|
|
2007-02-10 04:44:43 -05:00
|
|
|
if BLK_DEV_INITRD
|
|
|
|
|
2005-08-10 14:44:50 -04:00
|
|
|
source "usr/Kconfig"
|
|
|
|
|
2007-02-10 04:44:43 -05:00
|
|
|
endif
|
|
|
|
|
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 21:52:21 -05:00
|
|
|
config CC_OPTIMIZE_FOR_SIZE
|
2008-04-27 19:39:43 -04:00
|
|
|
bool "Optimize for size"
|
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 21:52:21 -05:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enabling this option will pass "-Os" instead of "-O2" to gcc
|
|
|
|
resulting in a smaller kernel.
|
|
|
|
|
2008-07-15 18:31:16 -04:00
|
|
|
If unsure, say Y.
|
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 21:52:21 -05:00
|
|
|
|
2006-10-01 02:28:13 -04:00
|
|
|
config SYSCTL
|
|
|
|
bool
|
|
|
|
|
2009-03-10 15:55:46 -04:00
|
|
|
config ANON_INODES
|
|
|
|
bool
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
menuconfig EMBEDDED
|
|
|
|
bool "Configure standard kernel features (for small systems)"
|
|
|
|
help
|
|
|
|
This option allows certain base kernel options and settings
|
|
|
|
to be disabled or tweaked. This is for specialized
|
|
|
|
environments which can tolerate a "non-standard" kernel.
|
|
|
|
Only use this if you really know what you are doing.
|
|
|
|
|
2006-09-16 15:15:53 -04:00
|
|
|
config UID16
|
|
|
|
bool "Enable 16-bit UID system calls" if EMBEDDED
|
2008-04-26 06:17:12 -04:00
|
|
|
depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && COMPAT) || UML || (X86_64 && IA32_EMULATION)
|
2006-09-16 15:15:53 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
This enables the legacy 16-bit UID syscall wrappers.
|
|
|
|
|
2006-09-27 04:51:04 -04:00
|
|
|
config SYSCTL_SYSCALL
|
2006-10-01 02:28:13 -04:00
|
|
|
bool "Sysctl syscall support" if EMBEDDED
|
2006-11-08 20:44:51 -05:00
|
|
|
default y
|
2006-09-27 04:51:04 -04:00
|
|
|
select SYSCTL
|
2006-09-16 15:15:53 -04:00
|
|
|
---help---
|
2006-11-08 20:44:51 -05:00
|
|
|
sys_sysctl uses binary paths that have been found challenging
|
|
|
|
to properly maintain and use. The interface in /proc/sys
|
|
|
|
using paths with ascii names is now the primary path to this
|
|
|
|
information.
|
2006-09-27 04:51:04 -04:00
|
|
|
|
2006-11-08 20:44:51 -05:00
|
|
|
Almost nothing using the binary sysctl interface so if you are
|
|
|
|
trying to save some space it is probably safe to disable this,
|
|
|
|
making your kernel marginally smaller.
|
2006-09-27 04:51:04 -04:00
|
|
|
|
2006-11-08 20:44:51 -05:00
|
|
|
If unsure say Y here.
|
2006-09-16 15:15:53 -04:00
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config KALLSYMS
|
2006-12-12 13:25:11 -05:00
|
|
|
bool "Load all symbols for debugging/ksymoops" if EMBEDDED
|
2005-04-16 18:20:36 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Say Y here to let the kernel print out symbolic crash information and
|
|
|
|
symbolic stack backtraces. This increases the size of the kernel
|
|
|
|
somewhat, as all symbols have to be loaded into the kernel image.
|
|
|
|
|
|
|
|
config KALLSYMS_ALL
|
|
|
|
bool "Include all symbols in kallsyms"
|
|
|
|
depends on DEBUG_KERNEL && KALLSYMS
|
|
|
|
help
|
|
|
|
Normally kallsyms only contains the symbols of functions, for nicer
|
|
|
|
OOPS messages. Some debuggers can use kallsyms for other
|
2005-07-19 23:43:05 -04:00
|
|
|
symbols too: say Y here to include all symbols, if you need them
|
|
|
|
and you don't care about adding 300k to the size of your kernel.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
|
|
|
Say N.
|
|
|
|
|
|
|
|
config KALLSYMS_EXTRA_PASS
|
|
|
|
bool "Do an extra kallsyms pass"
|
|
|
|
depends on KALLSYMS
|
|
|
|
help
|
|
|
|
If kallsyms is not working correctly, the build will fail with
|
|
|
|
inconsistent kallsyms data. If that occurs, log a bug report and
|
|
|
|
turn on KALLSYMS_EXTRA_PASS which should result in a stable build.
|
|
|
|
Always say N here unless you find a bug in kallsyms, which must be
|
|
|
|
reported. KALLSYMS_EXTRA_PASS is only a temporary workaround while
|
|
|
|
you wait for kallsyms to be fixed.
|
|
|
|
|
2005-05-01 11:59:02 -04:00
|
|
|
|
2005-11-16 14:27:07 -05:00
|
|
|
config HOTPLUG
|
|
|
|
bool "Support for hot-pluggable devices" if EMBEDDED
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option is provided for the case where no hotplug or uevent
|
|
|
|
capabilities is wanted by the kernel. You should only consider
|
|
|
|
disabling this option for embedded systems that do not use modules, a
|
|
|
|
dynamic /dev tree, or dynamic device discovery. Just say Y.
|
|
|
|
|
2005-05-01 11:59:02 -04:00
|
|
|
config PRINTK
|
|
|
|
default y
|
|
|
|
bool "Enable support for printk" if EMBEDDED
|
|
|
|
help
|
|
|
|
This option enables normal printk support. Removing it
|
|
|
|
eliminates most of the message strings from the kernel image
|
|
|
|
and makes the kernel more or less silent. As this makes it
|
|
|
|
very difficult to diagnose system problems, saying N here is
|
|
|
|
strongly discouraged.
|
|
|
|
|
2005-05-01 11:59:01 -04:00
|
|
|
config BUG
|
|
|
|
bool "BUG() support" if EMBEDDED
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Disabling this option eliminates support for BUG and WARN, reducing
|
|
|
|
the size of your kernel image and potentially quietly ignoring
|
|
|
|
numerous fatal conditions. You should only consider disabling this
|
|
|
|
option for embedded systems with no facilities for reporting errors.
|
|
|
|
Just say Y.
|
|
|
|
|
2006-01-08 04:05:25 -05:00
|
|
|
config ELF_CORE
|
|
|
|
default y
|
|
|
|
bool "Enable ELF core dumps" if EMBEDDED
|
|
|
|
help
|
|
|
|
Enable support for generating core dumps. Disabling saves about 4k.
|
|
|
|
|
2008-05-07 06:39:56 -04:00
|
|
|
config PCSPKR_PLATFORM
|
|
|
|
bool "Enable PC-Speaker support" if EMBEDDED
|
|
|
|
depends on ALPHA || X86 || MIPS || PPC_PREP || PPC_CHRP || PPC_PSERIES
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option allows to disable the internal PC-Speaker
|
|
|
|
support, saving some memory.
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config BASE_FULL
|
|
|
|
default y
|
|
|
|
bool "Enable full-sized data structures for core" if EMBEDDED
|
|
|
|
help
|
|
|
|
Disabling this option reduces the size of miscellaneous core
|
|
|
|
kernel data structures. This saves memory on small machines,
|
|
|
|
but may reduce performance.
|
|
|
|
|
|
|
|
config FUTEX
|
|
|
|
bool "Enable futex support" if EMBEDDED
|
|
|
|
default y
|
2006-06-27 05:54:53 -04:00
|
|
|
select RT_MUTEXES
|
2005-04-16 18:20:36 -04:00
|
|
|
help
|
|
|
|
Disabling this option will cause the kernel to be built without
|
|
|
|
support for "fast userspace mutexes". The resulting kernel may not
|
|
|
|
run glibc-based applications correctly.
|
|
|
|
|
|
|
|
config EPOLL
|
|
|
|
bool "Enable eventpoll support" if EMBEDDED
|
|
|
|
default y
|
2007-07-31 03:39:10 -04:00
|
|
|
select ANON_INODES
|
2005-04-16 18:20:36 -04:00
|
|
|
help
|
|
|
|
Disabling this option will cause the kernel to be built without
|
|
|
|
support for epoll family of system calls.
|
|
|
|
|
signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:13 -04:00
|
|
|
config SIGNALFD
|
|
|
|
bool "Enable signalfd() system call" if EMBEDDED
|
2007-07-31 03:39:10 -04:00
|
|
|
select ANON_INODES
|
signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:13 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable the signalfd() system call that allows to receive signals
|
|
|
|
on a file descriptor.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
signal/timer/event: timerfd core
This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.
The system call is defined as:
int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.
The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:
http://www.xmailserver.org/timerfd-test.c
[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:16 -04:00
|
|
|
config TIMERFD
|
|
|
|
bool "Enable timerfd() system call" if EMBEDDED
|
2007-07-31 03:39:10 -04:00
|
|
|
select ANON_INODES
|
signal/timer/event: timerfd core
This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.
The system call is defined as:
int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.
The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:
http://www.xmailserver.org/timerfd-test.c
[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:16 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable the timerfd() system call that allows to receive timer
|
|
|
|
events on a file descriptor.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:
int eventfd(unsigned int count);
The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:
struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);
The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:
http://www.xmailserver.org/eventfd-bench.c
This is the eventfd-based version of pipetest-4 (pipe(2) based):
http://www.xmailserver.org/pipetest-4.c
Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.
[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:19 -04:00
|
|
|
config EVENTFD
|
|
|
|
bool "Enable eventfd() system call" if EMBEDDED
|
2007-07-31 03:39:10 -04:00
|
|
|
select ANON_INODES
|
signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:
int eventfd(unsigned int count);
The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:
struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);
The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:
http://www.xmailserver.org/eventfd-bench.c
This is the eventfd-based version of pipetest-4 (pipe(2) based):
http://www.xmailserver.org/pipetest-4.c
Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.
[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 01:23:19 -04:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable the eventfd() system call that allows to receive both
|
|
|
|
kernel notification (ie. KAIO) or userspace notifications.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config SHMEM
|
|
|
|
bool "Use full shmem filesystem" if EMBEDDED
|
|
|
|
default y
|
|
|
|
depends on MMU
|
|
|
|
help
|
|
|
|
The shmem is an internal filesystem used to manage shared memory.
|
|
|
|
It is backed by swap and manages resource limits. It is also exported
|
|
|
|
to userspace as tmpfs if TMPFS is enabled. Disabling this
|
|
|
|
option replaces shmem and tmpfs with the much simpler ramfs code,
|
|
|
|
which may be appropriate on small systems without swap.
|
|
|
|
|
2008-10-16 01:05:12 -04:00
|
|
|
config AIO
|
|
|
|
bool "Enable AIO support" if EMBEDDED
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option enables POSIX asynchronous I/O which may by used
|
|
|
|
by some high performance threaded applications. Disabling
|
|
|
|
this option saves about 7k.
|
|
|
|
|
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 06:02:48 -04:00
|
|
|
config HAVE_PERF_EVENTS
|
2008-12-04 14:12:29 -05:00
|
|
|
bool
|
2009-06-12 13:17:43 -04:00
|
|
|
help
|
|
|
|
See tools/perf/design.txt for details.
|
2008-12-04 14:12:29 -05:00
|
|
|
|
2009-09-21 10:08:49 -04:00
|
|
|
config PERF_USE_VMALLOC
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
See tools/perf/design.txt for details
|
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
menu "Kernel Performance Events And Counters"
|
2008-12-04 14:12:29 -05:00
|
|
|
|
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 06:02:48 -04:00
|
|
|
config PERF_EVENTS
|
2009-09-21 06:20:38 -04:00
|
|
|
bool "Kernel performance events and counters"
|
|
|
|
default y if (PROFILING || PERF_COUNTERS)
|
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 06:02:48 -04:00
|
|
|
depends on HAVE_PERF_EVENTS
|
2008-12-08 13:38:33 -05:00
|
|
|
select ANON_INODES
|
2008-12-04 14:12:29 -05:00
|
|
|
help
|
2009-09-21 06:20:38 -04:00
|
|
|
Enable kernel support for various performance events provided
|
|
|
|
by software and hardware.
|
2008-12-04 14:12:29 -05:00
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
Software events are supported either build-in or via the
|
|
|
|
use of generic tracepoints.
|
2008-12-04 14:12:29 -05:00
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
Most modern CPUs support performance events via performance
|
|
|
|
counter registers. These registers count the number of certain
|
2008-12-04 14:12:29 -05:00
|
|
|
types of hw events: such as instructions executed, cachemisses
|
|
|
|
suffered, or branches mis-predicted - without slowing down the
|
|
|
|
kernel or applications. These registers can also trigger interrupts
|
|
|
|
when a threshold number of events have passed - and can thus be
|
|
|
|
used to profile the code that runs on that CPU.
|
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
The Linux Performance Event subsystem provides an abstraction of
|
|
|
|
these software and hardware cevent apabilities, available via a
|
|
|
|
system call and used by the "perf" utility in tools/perf/. It
|
2008-12-04 14:12:29 -05:00
|
|
|
provides per task and per CPU counters, and it provides event
|
|
|
|
capabilities on top of those.
|
|
|
|
|
|
|
|
Say Y if unsure.
|
|
|
|
|
2009-03-19 15:26:17 -04:00
|
|
|
config EVENT_PROFILE
|
2009-07-29 04:50:09 -04:00
|
|
|
bool "Tracepoint profiling sources"
|
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 06:02:48 -04:00
|
|
|
depends on PERF_EVENTS && EVENT_TRACING
|
2009-03-19 15:26:17 -04:00
|
|
|
default y
|
2009-07-29 04:50:09 -04:00
|
|
|
help
|
2009-09-21 06:20:38 -04:00
|
|
|
Allow the use of tracepoints as software performance events.
|
2009-07-29 04:50:09 -04:00
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
When this is enabled, you can create perf events based on
|
2009-07-29 04:50:09 -04:00
|
|
|
tracepoints using PERF_TYPE_TRACEPOINT and the tracepoint ID
|
|
|
|
found in debugfs://tracing/events/*/*/id. (The -e/--events
|
|
|
|
option to the perf tool can parse and interpret symbolic
|
|
|
|
tracepoints, in the subsystem:tracepoint_name format.)
|
2009-03-19 15:26:17 -04:00
|
|
|
|
2009-09-21 06:20:38 -04:00
|
|
|
config PERF_COUNTERS
|
|
|
|
bool "Kernel performance counters (old config option)"
|
|
|
|
depends on HAVE_PERF_EVENTS
|
|
|
|
help
|
|
|
|
This config has been obsoleted by the PERF_EVENTS
|
|
|
|
config option - please see that one for details.
|
|
|
|
|
|
|
|
It has no effect on the kernel whether you enable
|
|
|
|
it or not, it is a compatibility placeholder.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2009-09-21 10:08:49 -04:00
|
|
|
config DEBUG_PERF_USE_VMALLOC
|
|
|
|
default n
|
|
|
|
bool "Debug: use vmalloc to back perf mmap() buffers"
|
|
|
|
depends on PERF_EVENTS && DEBUG_KERNEL
|
|
|
|
select PERF_USE_VMALLOC
|
|
|
|
help
|
|
|
|
Use vmalloc memory to back perf mmap() buffers.
|
|
|
|
|
|
|
|
Mostly useful for debugging the vmalloc code on platforms
|
|
|
|
that don't require it.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2008-12-04 14:12:29 -05:00
|
|
|
endmenu
|
|
|
|
|
2006-06-30 04:55:45 -04:00
|
|
|
config VM_EVENT_COUNTERS
|
|
|
|
default y
|
|
|
|
bool "Enable VM event counters for /proc/vmstat" if EMBEDDED
|
|
|
|
help
|
2006-12-22 04:06:10 -05:00
|
|
|
VM event counters are needed for event counts to be shown.
|
|
|
|
This option allows the disabling of the VM event counters
|
|
|
|
on EMBEDDED systems. /proc/vmstat will only show page counts
|
|
|
|
if VM event counters are disabled.
|
2006-06-30 04:55:45 -04:00
|
|
|
|
2008-08-19 04:28:24 -04:00
|
|
|
config PCI_QUIRKS
|
|
|
|
default y
|
2008-10-22 02:53:25 -04:00
|
|
|
bool "Enable PCI quirk workarounds" if EMBEDDED
|
|
|
|
depends on PCI
|
2008-08-19 04:28:24 -04:00
|
|
|
help
|
|
|
|
This enables workarounds for various PCI chipset
|
|
|
|
bugs/quirks. Disable this only if your target machine is
|
|
|
|
unaffected by PCI quirks.
|
|
|
|
|
2007-05-09 05:32:44 -04:00
|
|
|
config SLUB_DEBUG
|
|
|
|
default y
|
|
|
|
bool "Enable SLUB debugging support" if EMBEDDED
|
2008-04-29 19:16:06 -04:00
|
|
|
depends on SLUB && SYSFS
|
2007-05-09 05:32:44 -04:00
|
|
|
help
|
|
|
|
SLUB has extensive debug support features. Disabling these can
|
|
|
|
result in significant savings in code size. This also disables
|
|
|
|
SLUB sysfs support. /sys/slab will not exist and there will be
|
|
|
|
no support for cache validation etc.
|
|
|
|
|
2009-03-10 15:55:46 -04:00
|
|
|
config COMPAT_BRK
|
|
|
|
bool "Disable heap randomization"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Randomizing heap placement makes heap exploits harder, but it
|
|
|
|
also breaks ancient binaries (including anything libc5 based).
|
|
|
|
This option changes the bootup default to heap randomization
|
2009-01-26 05:12:25 -05:00
|
|
|
disabled, and can be overridden at runtime by setting
|
2009-03-10 15:55:46 -04:00
|
|
|
/proc/sys/kernel/randomize_va_space to 2.
|
|
|
|
|
|
|
|
On non-ancient distros (post-2000 ones) N is usually a safe choice.
|
|
|
|
|
2007-05-06 17:49:36 -04:00
|
|
|
choice
|
|
|
|
prompt "Choose SLAB allocator"
|
2007-07-17 07:03:32 -04:00
|
|
|
default SLUB
|
2007-05-06 17:49:36 -04:00
|
|
|
help
|
|
|
|
This option allows to select a slab allocator.
|
|
|
|
|
|
|
|
config SLAB
|
|
|
|
bool "SLAB"
|
|
|
|
help
|
|
|
|
The regular slab allocator that is established and known to work
|
2007-05-09 05:32:47 -04:00
|
|
|
well in all environments. It organizes cache hot objects in
|
2008-11-05 17:18:19 -05:00
|
|
|
per cpu and per node queues.
|
2007-05-06 17:49:36 -04:00
|
|
|
|
|
|
|
config SLUB
|
|
|
|
bool "SLUB (Unqueued Allocator)"
|
|
|
|
help
|
|
|
|
SLUB is a slab allocator that minimizes cache line usage
|
|
|
|
instead of managing queues of cached objects (SLAB approach).
|
|
|
|
Per cpu caching is realized using slabs of objects instead
|
|
|
|
of queues of objects. SLUB can use memory efficiently
|
2008-11-05 17:18:19 -05:00
|
|
|
and has enhanced diagnostics. SLUB is the default choice for
|
|
|
|
a slab allocator.
|
2007-05-06 17:49:36 -04:00
|
|
|
|
|
|
|
config SLOB
|
2007-07-16 02:38:24 -04:00
|
|
|
depends on EMBEDDED
|
2007-05-06 17:49:36 -04:00
|
|
|
bool "SLOB (Simple Allocator)"
|
|
|
|
help
|
2008-02-05 01:29:38 -05:00
|
|
|
SLOB replaces the stock allocator with a drastically simpler
|
|
|
|
allocator. SLOB is generally more space efficient but
|
|
|
|
does not perform as well on large systems.
|
2007-05-06 17:49:36 -04:00
|
|
|
|
|
|
|
endchoice
|
|
|
|
|
2008-02-02 15:10:36 -05:00
|
|
|
config PROFILING
|
|
|
|
bool "Profiling support (EXPERIMENTAL)"
|
|
|
|
help
|
|
|
|
Say Y here to enable the extended profiling support mechanisms used
|
|
|
|
by profilers such as OProfile.
|
|
|
|
|
2008-07-23 08:15:22 -04:00
|
|
|
#
|
|
|
|
# Place an empty function call at each tracepoint site. Can be
|
|
|
|
# dynamically changed for a probe function.
|
|
|
|
#
|
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 12:16:16 -04:00
|
|
|
config TRACEPOINTS
|
2008-07-23 08:15:22 -04:00
|
|
|
bool
|
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 12:16:16 -04:00
|
|
|
|
2008-02-02 15:10:33 -05:00
|
|
|
source "arch/Kconfig"
|
|
|
|
|
2009-04-03 11:42:35 -04:00
|
|
|
config SLOW_WORK
|
|
|
|
default n
|
2009-04-06 10:47:25 -04:00
|
|
|
bool
|
2009-04-03 11:42:35 -04:00
|
|
|
help
|
|
|
|
The slow work thread pool provides a number of dynamically allocated
|
|
|
|
threads that can be used by the kernel to perform operations that
|
|
|
|
take a relatively long time.
|
|
|
|
|
|
|
|
An example of this would be CacheFiles doing a path lookup followed
|
|
|
|
by a series of mkdirs and a create call, all of which have to touch
|
|
|
|
disk.
|
|
|
|
|
2009-04-06 10:47:25 -04:00
|
|
|
See Documentation/slow-work.txt.
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
endmenu # General setup
|
|
|
|
|
2008-06-29 06:18:46 -04:00
|
|
|
config HAVE_GENERIC_DMA_COHERENT
|
|
|
|
bool
|
|
|
|
default n
|
|
|
|
|
2008-01-02 16:04:48 -05:00
|
|
|
config SLABINFO
|
|
|
|
bool
|
|
|
|
depends on PROC_FS
|
2008-04-14 11:53:02 -04:00
|
|
|
depends on SLAB || SLUB_DEBUG
|
2008-01-02 16:04:48 -05:00
|
|
|
default y
|
|
|
|
|
2006-09-16 15:15:53 -04:00
|
|
|
config RT_MUTEXES
|
|
|
|
boolean
|
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config BASE_SMALL
|
|
|
|
int
|
|
|
|
default 0 if BASE_FULL
|
|
|
|
default 1 if !BASE_FULL
|
|
|
|
|
2007-07-16 02:39:29 -04:00
|
|
|
menuconfig MODULES
|
2005-04-16 18:20:36 -04:00
|
|
|
bool "Enable loadable module support"
|
|
|
|
help
|
|
|
|
Kernel modules are small pieces of compiled code which can
|
|
|
|
be inserted in the running kernel, rather than being
|
|
|
|
permanently built into the kernel. You use the "modprobe"
|
|
|
|
tool to add (and sometimes remove) them. If you say Y here,
|
|
|
|
many parts of the kernel can be built as modules (by
|
|
|
|
answering M instead of Y where indicated): this is most
|
|
|
|
useful for infrequently used options which are not required
|
|
|
|
for booting. For more information, see the man pages for
|
|
|
|
modprobe, lsmod, modinfo, insmod and rmmod.
|
|
|
|
|
|
|
|
If you say Y here, you will need to run "make
|
|
|
|
modules_install" to put the modules under /lib/modules/
|
|
|
|
where modprobe can find them (you may need to be root to do
|
|
|
|
this).
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2008-08-04 13:31:32 -04:00
|
|
|
if MODULES
|
|
|
|
|
2008-05-04 20:04:16 -04:00
|
|
|
config MODULE_FORCE_LOAD
|
|
|
|
bool "Forced module loading"
|
|
|
|
default n
|
|
|
|
help
|
2008-05-09 02:25:28 -04:00
|
|
|
Allow loading of modules without version information (ie. modprobe
|
|
|
|
--force). Forced module loading sets the 'F' (forced) taint flag and
|
|
|
|
is usually a really bad idea.
|
2008-05-04 20:04:16 -04:00
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config MODULE_UNLOAD
|
|
|
|
bool "Module unloading"
|
|
|
|
help
|
|
|
|
Without this option you will not be able to unload any
|
|
|
|
modules (note that some modules may not be unloadable
|
2008-07-22 20:24:26 -04:00
|
|
|
anyway), which makes your kernel smaller, faster
|
|
|
|
and simpler. If unsure, say Y.
|
2005-04-16 18:20:36 -04:00
|
|
|
|
|
|
|
config MODULE_FORCE_UNLOAD
|
|
|
|
bool "Forced module unloading"
|
|
|
|
depends on MODULE_UNLOAD && EXPERIMENTAL
|
|
|
|
help
|
|
|
|
This option allows you to force a module to unload, even if the
|
|
|
|
kernel believes it is unsafe: the kernel will remove the module
|
|
|
|
without waiting for anyone to stop using it (using the -f option to
|
|
|
|
rmmod). This is mainly for kernel developers and desperate users.
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config MODVERSIONS
|
2005-12-26 17:04:02 -05:00
|
|
|
bool "Module versioning support"
|
2005-04-16 18:20:36 -04:00
|
|
|
help
|
|
|
|
Usually, you have to use modules compiled with your kernel.
|
|
|
|
Saying Y here makes it sometimes possible to use modules
|
|
|
|
compiled for different kernels, by adding enough information
|
|
|
|
to the modules to (hopefully) spot any changes which would
|
|
|
|
make them incompatible with the kernel you are running. If
|
|
|
|
unsure, say N.
|
|
|
|
|
|
|
|
config MODULE_SRCVERSION_ALL
|
|
|
|
bool "Source checksum for all modules"
|
|
|
|
help
|
|
|
|
Modules which contain a MODULE_VERSION get an extra "srcversion"
|
|
|
|
field inserted into their modinfo section, which contains a
|
|
|
|
sum of the source files which made it. This helps maintainers
|
|
|
|
see exactly which source was used to build a module (since
|
|
|
|
others sometimes change the module source without updating
|
|
|
|
the version). With this option, such a "srcversion" field
|
|
|
|
will be created for all modules. If unsure, say N.
|
|
|
|
|
2008-08-04 13:31:32 -04:00
|
|
|
endif # MODULES
|
|
|
|
|
2008-12-13 05:49:41 -05:00
|
|
|
config INIT_ALL_POSSIBLE
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Back when each arch used to define their own cpu_online_map and
|
|
|
|
cpu_possible_map, some of them chose to initialize cpu_possible_map
|
|
|
|
with all 1s, and others with all 0s. When they were centralised,
|
|
|
|
it was better to provide this option than to break all the archs
|
2009-01-26 05:12:25 -05:00
|
|
|
and have several arch maintainers pursuing me down dark alleys.
|
2008-12-13 05:49:41 -05:00
|
|
|
|
2005-04-16 18:20:36 -04:00
|
|
|
config STOP_MACHINE
|
|
|
|
bool
|
|
|
|
default y
|
|
|
|
depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
|
|
|
|
help
|
|
|
|
Need stop_machine() primitive.
|
2005-11-04 02:43:35 -05:00
|
|
|
|
|
|
|
source "block/Kconfig"
|
2007-10-17 02:27:31 -04:00
|
|
|
|
|
|
|
config PREEMPT_NOTIFIERS
|
|
|
|
bool
|
2008-01-25 15:08:24 -05:00
|
|
|
|