android_kernel_xiaomi_sm8350/kernel
Oleg Nesterov 762a24beed pid namespaces: rework forget_original_parent()
A pid namespace is a "view" of a particular set of tasks on the system.  They
work in a similar way to filesystem namespaces.  A file (or a process) can be
accessed in multiple namespaces, but it may have a different name in each.  In
a filesystem, this name might be /etc/passwd in one namespace, but
/chroot/etc/passwd in another.

For processes, a process may have pid 1234 in one namespace, but be pid 1 in
another.  This allows new pid namespaces to have basically arbitrary pids, and
not have to worry about what pids exist in other namespaces.  This is
essential for checkpoint/restart where a restarted process's pid might collide
with an existing process on the system's pid.

In this particular implementation, pid namespaces have a parent-child
relationship, just like processes.  A process in a pid namespace may see all
of the processes in the same namespace, as well as all of the processes in all
of the namespaces which are children of its namespace.  Processes may not,
however, see others which are in their parent's namespace, but not in their
own.  The same goes for sibling namespaces.

The know issue to be solved in the nearest future is signal handling in the
namespace boundary.  That is, currently the namespace's init is treated like
an ordinary task that can be killed from within an namespace.  Ideally, the
signal handling by the namespace's init should have two sides: when signaling
the init from its namespace, the init should look like a real init task, i.e.
receive only those signals, that is explicitly wants to; when signaling the
init from one of the parent namespaces, init should look like an ordinary
task, i.e.  receive any signal, only taking the general permissions into
account.

The pid namespace was developed by Pavel Emlyanov and Sukadev Bhattiprolu and
we eventually came to almost the same implementation, which differed in some
details.  This set is based on Pavel's patches, but it includes comments and
patches that from Sukadev.

Many thanks to Oleg, who reviewed the patches, pointed out many BUGs and made
valuable advises on how to make this set cleaner.

This patch:

We have to call exit_task_namespaces() only after the exiting task has
reparented all his children and is sure that no other threads will reparent
theirs for it.  Why this is needed is explained in appropriate patch.  This
one only reworks the forget_original_parent() so that after calling this a
task cannot be/become parent of any other task.

We check PF_EXITING instead of ->exit_state while choosing the new parent.
Note that tasklits_lock acts as a barrier, everyone who takes tasklist after
us (when forget_original_parent() drops it) must see PF_EXITING.

The other changes are just cleanups.  They just move some code from
exit_notify to forget_original_parent().  It is a bit silly to declare
ptrace_dead in exit_notify(), take tasklist, pass ptrace_dead to
forget_original_parent(), unlock-lock-unlock tasklist, and then use
ptrace_dead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:38 -07:00
..
irq Compile handle_percpu_irq even for uniprocessor kernels 2007-10-17 08:43:00 -07:00
power Hibernation: Enter platform hibernation state in a consistent way 2007-10-18 14:37:20 -07:00
time kernel/time/clocksource.c: Use list_for_each_entry instead of list_for_each 2007-10-19 11:53:38 -07:00
.gitignore
acct.c whitespace fixes: process accounting 2007-10-18 14:37:24 -07:00
audit.c whitespace fixes: system auditing 2007-10-18 14:37:25 -07:00
audit.h
auditfilter.c whitespace fixes: audit filtering 2007-10-18 14:37:24 -07:00
auditsc.c whitespace fixes: syscall auditing 2007-10-18 14:37:25 -07:00
capability.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c Add cgroupstats 2007-10-19 11:53:36 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c
cpu_acct.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
cpu.c cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE 2007-10-18 14:37:21 -07:00
cpuset.c Task Control Groups: make cpusets a client of cgroups 2007-10-19 11:53:36 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c pid namespaces: rework forget_original_parent() 2007-10-19 11:53:38 -07:00
extable.c
fork.c Make access to task's nsproxy lighter 2007-10-19 11:53:37 -07:00
futex_compat.c
futex.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
hrtimer.c hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier 2007-10-18 22:54:18 +02:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c
Kconfig.hz
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
kfifo.c
kmod.c
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c
latency.c
lockdep_internals.h
lockdep_proc.c
lockdep.c workqueue: debug flushing deadlocks with lockdep 2007-10-19 11:53:38 -07:00
Makefile cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
module.c whitespace fixes: module loading 2007-10-18 14:37:25 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c Make access to task's nsproxy lighter 2007-10-19 11:53:37 -07:00
panic.c whitespace fixes: panic handling 2007-10-18 14:37:25 -07:00
params.c param_sysfs_builtin memchr argument fix 2007-10-18 14:37:21 -07:00
pid.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
posix-cpu-timers.c
posix-timers.c hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier 2007-10-18 22:54:18 +02:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c make kernel/profile.c:time_hook static 2007-10-17 08:42:55 -07:00
ptrace.c m32r: convert to generic sys_ptrace 2007-10-16 09:43:04 -07:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c memory unplug: memory hotplug cleanup 2007-10-16 09:43:01 -07:00
rtmutex_common.h
rtmutex-debug.c kernel/rtmutex-debug.c: cleanups 2007-10-17 08:42:50 -07:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_debug.c sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched_fair.c sched: fix new task startup crash 2007-10-17 16:55:11 +02:00
sched_idletask.c
sched_rt.c
sched_stats.h sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
seccomp.c
signal.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
softirq.c
softlockup.c softlockup: add a /proc tuning parameter 2007-10-17 08:42:47 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c kernel/sys_ni.c: add dummy sys_ni_syscall() prototype 2007-10-17 08:42:55 -07:00
sys.c pid namespaces: round up the API 2007-10-19 11:53:37 -07:00
sysctl_check.c V3 file capabilities: alter behavior of cap_setpcap 2007-10-18 14:37:24 -07:00
sysctl.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
taskstats.c Add cgroupstats 2007-10-19 11:53:36 -07:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user_namespace.c
user.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched 2007-10-17 09:11:18 -07:00
utsname_sysctl.c
utsname.c
wait.c
workqueue.c workqueue: debug flushing deadlocks with lockdep 2007-10-19 11:53:38 -07:00