Historically we've seen cases where permissions are requested for classes
where they do not exist. In particular we have seen CIFS forget to set
i_mode to indicate it is a directory so when we later check something like
remove_name we have problems since it wasn't defined in tclass file. This
used to result in a avc which included the permission 0x2000 or something.
Currently the kernel will deny the operations (good thing) but will not
print ANY information (bad thing). First the auditdeny field is no
extended to include unknown permissions. After that is fixed the logic in
avc_dump_query to output this information isn't right since it will remove
the permission from the av and print the phrase "<NULL>". This takes us
back to the behavior before the classmap rewrite.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
For SELinux to do better filtering in userspace we send the name of the
module along with the AVC denial when a program is denied module_request.
Example output:
type=SYSCALL msg=audit(11/03/2009 10:59:43.510:9) : arch=x86_64 syscall=write success=yes exit=2 a0=3 a1=7fc28c0d56c0 a2=2 a3=7fffca0d7440 items=0 ppid=1727 pid=1729 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.nfsd exe=/usr/sbin/rpc.nfsd subj=system_u:system_r:nfsd_t:s0 key=(null)
type=AVC msg=audit(11/03/2009 10:59:43.510:9) : avc: denied { module_request } for pid=1729 comm=rpc.nfsd kmod="net-pf-10" scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
The SELinux dynamic class work in c6d3aaa4e3
creates a number of dynamic header files and scripts. Add .gitignore files
so git doesn't complain about these.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Ensure that we release the policy read lock on all exit paths from
security_compute_av.
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Drop remapping of netlink classes and bypass of permission checking
based on netlink message type for policy version < 18. This removes
compatibility code introduced when the original single netlink
security class used for all netlink sockets was split into
finer-grained netlink classes based on netlink protocol and when
permission checking was added based on netlink message type in Linux
2.6.8. The only known distribution that shipped with SELinux and
policy < 18 was Fedora Core 2, which was EOL'd on 2005-04-11.
Given that the remapping code was never updated to address the
addition of newer netlink classes, that the corresponding userland
support was dropped in 2005, and that the assumptions made by the
remapping code about the fixed ordering among netlink classes in the
policy may be violated in the future due to the dynamic class/perm
discovery support, we should drop this compatibility code now.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Add a simple utility (scripts/selinux/genheaders) and invoke it to
generate the kernel-private class and permission indices in flask.h
and av_permissions.h automatically during the kernel build from the
security class mapping definitions in classmap.h. Adding new kernel
classes and permissions can then be done just by adding them to classmap.h.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Modify SELinux to dynamically discover class and permission values
upon policy load, based on the dynamic object class/perm discovery
logic from libselinux. A mapping is created between kernel-private
class and permission indices used outside the security server and the
policy values used within the security server.
The mappings are only applied upon kernel-internal computations;
similar mappings for the private indices of userspace object managers
is handled on a per-object manager basis by the userspace AVC. The
interfaces for compute_av and transition_sid are split for kernel
vs. userspace; the userspace functions are distinguished by a _user
suffix.
The kernel-private class indices are no longer tied to the policy
values and thus do not need to skip indices for userspace classes;
thus the kernel class index values are compressed. The flask.h
definitions were regenerated by deleting the userspace classes from
refpolicy's definitions and then regenerating the headers. Going
forward, we can just maintain the flask.h, av_permissions.h, and
classmap.h definitions separately from policy as they are no longer
tied to the policy values. The next patch introduces a utility to
automate generation of flask.h and av_permissions.h from the
classmap.h definitions.
The older kernel class and permission string tables are removed and
replaced by a single security class mapping table that is walked at
policy load to generate the mapping. The old kernel class validation
logic is completely replaced by the mapping logic.
The handle unknown logic is reworked. reject_unknown=1 is handled
when the mappings are computed at policy load time, similar to the old
handling by the class validation logic. allow_unknown=1 is handled
when computing and mapping decisions - if the permission was not able
to be mapped (i.e. undefined, mapped to zero), then it is
automatically added to the allowed vector. If the class was not able
to be mapped (i.e. undefined, mapped to zero), then all permissions
are allowed for it if allow_unknown=1.
avc_audit leverages the new security class mapping table to lookup the
class and permission names from the kernel-private indices.
The mdp program is updated to use the new table when generating the
class definitions and allow rules for a minimal boot policy for the
kernel. It should be noted that this policy will not include any
userspace classes, nor will its policy index values for the kernel
classes correspond with the ones in refpolicy (they will instead match
the kernel-private indices).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch resets the security_ops to the secondary_ops before it flushes
the avc. It's still possible that a task on another processor could have
already passed the security_ops dereference and be executing an selinux hook
function which would add a new avc entry. That entry would still not be
freed. This should however help to reduce the number of needless avcs the
kernel has when selinux is disabled at run time. There is no wasted
memory if selinux is disabled on the command line or not compiled.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Ratan Nalumasu reported that in a process with many threads doing
unnecessary wakeups. Every waiting thread in the process wakes up to loop
through the children and see that the only ones it cares about are still
not ready.
Now that we have struct wait_opts we can change do_wait/__wake_up_parent
to use filtered wakeups.
We can make child_wait_callback() more clever later, right now it only
checks eligible_child().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ratan Nalumasu <rnalumasu@gmail.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before SELinux is disabled at boot it can create AVC entries. This patch
will flush those entries before disabling SELinux.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Move the avc_cache flushing into it's own function so it can be reused when
disabling SELinux.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
__validate_process_creds should check if selinux is actually enabled before
running tests on the selinux portion of the credentials struct.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds a setxattr handler to the file, directory, and symlink
inode_operations structures for sysfs. The patch uses hooks introduced in the
previous patch to handle the getting and setting of security information for
the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
sysfs_dirent structure has been replaced by a structure which contains the
iattr, secdata and secdata length to allow the changes to persist in the event
that the inode representing the sysfs_dirent is evicted. Because sysfs only
stores this information when a change is made all the optional data is moved
into one dynamically allocated field.
This patch addresses an issue where SELinux was denying virtd access to the PCI
configuration entries in sysfs. The lack of setxattr handlers for sysfs
required that a single label be assigned to all entries in sysfs. Granting virtd
access to every entry in sysfs is not an acceptable solution so fine grained
labeling of sysfs is required such that individual entries can be labeled
appropriately.
[sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch introduces three new hooks. The inode_getsecctx hook is used to get
all relevant information from an LSM about an inode. The inode_setsecctx is
used to set both the in-core and on-disk state for the inode based on a context
derived from inode_getsecctx.The final hook inode_notifysecctx will notify the
LSM of a change for the in-core state of the inode in question. These hooks are
for use in the labeled NFS code and addresses concerns of how to set security
on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's
explanation of the reason for these hooks is pasted below.
Quote Stephen Smalley
inode_setsecctx: Change the security context of an inode. Updates the
in core security context managed by the security module and invokes the
fs code as needed (via __vfs_setxattr_noperm) to update any backing
xattrs that represent the context. Example usage: NFS server invokes
this hook to change the security context in its incore inode and on the
backing file system to a value provided by the client on a SETATTR
operation.
inode_notifysecctx: Notify the security module of what the security
context of an inode should be. Initializes the incore security context
managed by the security module for this inode. Example usage: NFS
client invokes this hook to initialize the security context in its
incore inode to the value provided by the server for the file when the
server returned the file's attributes to the client.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add a keyctl to install a process's session keyring onto its parent. This
replaces the parent's session keyring. Because the COW credential code does
not permit one process to change another process's credentials directly, the
change is deferred until userspace next starts executing again. Normally this
will be after a wait*() syscall.
To support this, three new security hooks have been provided:
cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
the blank security creds and key_session_to_parent() - which asks the LSM if
the process may replace its parent's session keyring.
The replacement may only happen if the process has the same ownership details
as its parent, and the process has LINK permission on the session keyring, and
the session keyring is owned by the process, and the LSM permits it.
Note that this requires alteration to each architecture's notify_resume path.
This has been done for all arches barring blackfin, m68k* and xtensa, all of
which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the
replacement to be performed at the point the parent process resumes userspace
execution.
This allows the userspace AFS pioctl emulation to fully emulate newpag() and
the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
alter the parent process's PAG membership. However, since kAFS doesn't use
PAGs per se, but rather dumps the keys into the session keyring, the session
keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
the newpag flag.
This can be tested with the following program:
#include <stdio.h>
#include <stdlib.h>
#include <keyutils.h>
#define KEYCTL_SESSION_TO_PARENT 18
#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
int main(int argc, char **argv)
{
key_serial_t keyring, key;
long ret;
keyring = keyctl_join_session_keyring(argv[1]);
OSERROR(keyring, "keyctl_join_session_keyring");
key = add_key("user", "a", "b", 1, keyring);
OSERROR(key, "add_key");
ret = keyctl(KEYCTL_SESSION_TO_PARENT);
OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");
return 0;
}
Compiled and linked with -lkeyutils, you should see something like:
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: _ses
355907932 --alswrv 4043 -1 \_ keyring: _uid.4043
[dhowells@andromeda ~]$ /tmp/newpag
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: _ses
1055658746 --alswrv 4043 4043 \_ user: a
[dhowells@andromeda ~]$ /tmp/newpag hello
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: hello
340417692 --alswrv 4043 4043 \_ user: a
Where the test program creates a new session keyring, sticks a user key named
'a' into it and then installs it on its parent.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
for credential management. The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to see that
this number never exceeds the usage count of the cred struct (which includes
all references, not just those from task_structs).
Furthermore, if SELinux is enabled, the code also checks that the security
pointer in the cred struct is never seen to be invalid.
This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
kernel thread on seeing cred->security be a NULL pointer (it appears that the
credential struct has been previously released):
http://www.kerneloops.org/oops.php?number=252883
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add support for the new TUN LSM hooks: security_tun_dev_create(),
security_tun_dev_post_create() and security_tun_dev_attach(). This includes
the addition of a new object class, tun_socket, which represents the socks
associated with TUN devices. The _tun_dev_create() and _tun_dev_post_create()
hooks are fairly similar to the standard socket functions but _tun_dev_attach()
is a bit special. The _tun_dev_attach() is unique because it involves a
domain attaching to an existing TUN device and its associated tun_socket
object, an operation which does not exist with standard sockets and most
closely resembles a relabel operation.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
As suggested by OGAWA Hirofumi in thread:
http://lkml.org/lkml/2009/8/7/132, we should let selinux_inode_setattr()
to match our ATTR_* rules. ATTR_FORCE should not force things like
ATTR_SIZE.
[hirofumi@mail.parknet.co.jp: tweaks]
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Currently SELinux does not check CAP_SYS_RAWIO in the file_mmap hook. This
means there is no DAC check on the ability to mmap low addresses in the
memory space. This function adds the DAC check for CAP_SYS_RAWIO while
maintaining the selinux check on mmap_zero. This means that processes
which need to mmap low memory will need CAP_SYS_RAWIO and mmap_zero but will
NOT need the SELinux sys_rawio capability.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Convert avc_audit in security/selinux/avc.c to use lsm_audit.h,
for better maintainability.
- changed selinux to use common_audit_data instead of
avc_audit_data
- eliminated code in avc.c and used code from lsm_audit.h instead.
Had to add a LSM_AUDIT_NO_AUDIT to lsm_audit.h so that avc_audit
can call common_lsm_audit and do the pre and post callbacks without
doing the actual dump. This makes it so that the patched version
behaves the same way as the unpatched version.
Also added a denied field to the selinux_audit_data private space,
once again to make it so that the patched version behaves like the
unpatched.
I've tested and confirmed that AVCs look the same before and after
this patch.
Signed-off-by: Thomas Liu <tliu@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds a new selinux hook so SELinux can arbitrate if a given
process should be allowed to trigger a request for the kernel to try to
load a module. This is a different operation than a process trying to load
a module itself, which is already protected by CAP_SYS_MODULE.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix memory leakage in /security/selinux/hooks.c
The buffer always needs to be freed here; we either error
out or allocate more memory.
Reported-by: iceberg <strakh@ispras.ru>
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Currently SELinux does not check CAP_SYS_RAWIO in the file_mmap hook. This
means there is no DAC check on the ability to mmap low addresses in the
memory space. This function adds the DAC check for CAP_SYS_RAWIO while
maintaining the selinux check on mmap_zero. This means that processes
which need to mmap low memory will need CAP_SYS_RAWIO and mmap_zero but will
NOT need the SELinux sys_rawio capability.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
- is_single_threaded(task) is not safe unless task == current,
we can't use task->signal or task->mm.
- it doesn't make sense unless task == current, the task can
fork right after the check.
Rename it to current_is_single_threaded() and kill the argument.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Convert avc_audit in security/selinux/avc.c to use lsm_audit.h,
for better maintainability and for less code duplication.
- changed selinux to use common_audit_data instead of
avc_audit_data
- eliminated code in avc.c and used code from lsm_audit.h instead.
I have tested to make sure that the avcs look the same before and
after this patch.
Signed-off-by: Thomas Liu <tliu@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Added a call to free the avc_node_cache when inside selinux_disable because
it should not waste resources allocated during avc_init if SELinux is disabled
and the cache will never be used.
Signed-off-by: Thomas Liu <tliu@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
The ->ptrace_may_access() methods are named confusingly - the real
ptrace_may_access() returns a bool, while these security checks have
a retval convention.
Rename it to ptrace_access_check, to reduce the confusion factor.
[ Impact: cleanup, no code changed ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: James Morris <jmorris@namei.org>
Restore the optimization to skip revalidation in selinux_file_permission
if nothing has changed since the dentry_open checks, accidentally removed by
389fb800. Also remove redundant test from selinux_revalidate_file_permission.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
The attached patch adds support to generate audit messages on two cases.
The first one is a case when a multi-thread process tries to switch its
performing security context using setcon(3), but new security context is
not bounded by the old one.
type=SELINUX_ERR msg=audit(1245311998.599:17): \
op=security_bounded_transition result=denied \
oldcontext=system_u:system_r:httpd_t:s0 \
newcontext=system_u:system_r:guest_webapp_t:s0
The other one is a case when security_compute_av() masked any permissions
due to the type boundary violation.
type=SELINUX_ERR msg=audit(1245312836.035:32): \
op=security_compute_av reason=bounds \
scontext=system_u:object_r:user_webapp_t:s0 \
tcontext=system_u:object_r:shadow_t:s0:c0 \
tclass=file perms=getattr,open
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
It is a cleanup patch to cut down a line within 80 columns.
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
--
security/selinux/ss/services.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
Signed-off-by: James Morris <jmorris@namei.org>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Audit trees defined 2 new netlink messages but the netlink mapping tables for
selinux permissions were not set up. This patch maps these 2 new operations
to AUDIT_WRITE.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
On Tue, 2009-05-19 at 00:05 -0400, Eamon Walsh wrote:
> Recent versions of coreutils have bumped the read buffer size from 4K to
> 32K in several of the utilities.
>
> This means that "cat /selinux/booleans/xserver_object_manager" no longer
> works, it returns "Invalid argument" on F11. getsebool works fine.
>
> sel_read_bool has a check for "count > PAGE_SIZE" that doesn't seem to
> be present in the other read functions. Maybe it could be removed?
Yes, that check is obsoleted by the conversion of those functions to
using simple_read_from_buffer(), which will reduce count if necessary to
what is available in the buffer.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
The selinuxfs superblock magic is used inside the IMA code, but is being
defined in two places and could someday get out of sync. This patch moves the
declaration into magic.h so it is only done once.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
The CRED patch incorrectly converted the SELinux send_sigiotask hook to
use the current task SID rather than the target task SID in its
permission check, yielding the wrong permission check. This fixes the
hook function. Detected by the ltp selinux testsuite and confirmed to
correct the test failure.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
We shouldn't worry about the tracer if current is ptraced, exec() must not
succeed if the tracer has no rights to trace this task after cred changing.
But we should notify ->real_parent which is, well, real parent.
Also, we don't need _irq to take tasklist, and we don't need parent's
->siglock to wake_up_interruptible(real_parent->signal->wait_chldexit).
Since we hold tasklist, real_parent->signal must be stable. Otherwise
spin_lock(siglock) is not safe too and can't help anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
hook. This isn't really a security problem: if the SIGKILL came before the
credentials were changed, then we were right to receive it at the time, and
should honour it; if it came after the creds were changed, then we definitely
should honour it; and in any case, all that will happen is that the process
will be scrapped before it ever returns to userspace.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
We are still calling secondary_ops->sysctl even though the capabilities
module does not define a sysctl operation.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch enables applications to handle permissive domain correctly.
Since the v2.6.26 kernel, SELinux has supported an idea of permissive
domain which allows certain processes to work as if permissive mode,
even if the global setting is enforcing mode.
However, we don't have an application program interface to inform
what domains are permissive one, and what domains are not.
It means applications focuses on SELinux (XACE/SELinux, SE-PostgreSQL
and so on) cannot handle permissive domain correctly.
This patch add the sixth field (flags) on the reply of the /selinux/access
interface which is used to make an access control decision from userspace.
If the first bit of the flags field is positive, it means the required
access control decision is on permissive domain, so application should
allow any required actions, as the kernel doing.
This patch also has a side benefit. The av_decision.flags is set at
context_struct_compute_av(). It enables to check required permissions
without read_lock(&policy_rwlock).
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
--
security/selinux/avc.c | 2 +-
security/selinux/include/security.h | 4 +++-
security/selinux/selinuxfs.c | 4 ++--
security/selinux/ss/services.c | 30 +++++-------------------------
4 files changed, 11 insertions(+), 29 deletions(-)
Signed-off-by: James Morris <jmorris@namei.org>
The SELinux "compat_net" is marked as deprecated, the time has come to
finally remove it from the kernel. Further code simplifications are
likely in the future, but this patch was intended to be a simple,
straight-up removal of the compat_net code.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
The current NetLabel/SELinux behavior for incoming TCP connections works but
only through a series of happy coincidences that rely on the limited nature of
standard CIPSO (only able to convey MLS attributes) and the write equality
imposed by the SELinux MLS constraints. The problem is that network sockets
created as the result of an incoming TCP connection were not on-the-wire
labeled based on the security attributes of the parent socket but rather based
on the wire label of the remote peer. The issue had to do with how IP options
were managed as part of the network stack and where the LSM hooks were in
relation to the code which set the IP options on these newly created child
sockets. While NetLabel/SELinux did correctly set the socket's on-the-wire
label it was promptly cleared by the network stack and reset based on the IP
options of the remote peer.
This patch, in conjunction with a prior patch that adjusted the LSM hook
locations, works to set the correct on-the-wire label format for new incoming
connections through the security_inet_conn_request() hook. Besides the
correct behavior there are many advantages to this change, the most significant
is that all of the NetLabel socket labeling code in SELinux now lives in hooks
which can return error codes to the core stack which allows us to finally get
ride of the selinux_netlbl_inode_permission() logic which greatly simplfies
the NetLabel/SELinux glue code. In the process of developing this patch I
also ran into a small handful of AF_INET6 cleanliness issues that have been
fixed which should make the code safer and easier to extend in the future.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
Drop the printk message when an inode is found without an associated
dentry. This should only happen when userspace can't be accessing those
inodes and those labels will get set correctly on the next d_instantiate.
Thus there is no reason to send this message.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
New selinux permission to separate the ability to turn on tty auditing from
the ability to set audit rules.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
When I did open permissions I didn't think any sockets would have an open.
Turns out AF_UNIX sockets can have an open when they are bound to the
filesystem namespace. This patch adds a new SOCK_FILE__OPEN permission.
It's safe to add this as the open perms are already predicated on
capabilities and capabilities means we have unknown perm handling so
systems should be as backwards compatible as the policy wants them to
be.
https://bugzilla.redhat.com/show_bug.cgi?id=475224
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Rick McNeal from LSI identified a panic in selinux_netlbl_inode_permission()
caused by a certain sequence of SUNRPC operations. The problem appears to be
due to the lack of NULL pointer checking in the function; this patch adds the
pointer checks so the function will exit safely in the cases where the socket
is not completely initialized.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
At some point we (okay, I) managed to break the ability for users to use the
setsockopt() syscall to set IPv4 options when NetLabel was not active on the
socket in question. The problem was noticed by someone trying to use the
"-R" (record route) option of ping:
# ping -R 10.0.0.1
ping: record route: No message of desired type
The solution is relatively simple, we catch the unlabeled socket case and
clear the error code, allowing the operation to succeed. Please note that we
still deny users the ability to override IPv4 options on socket's which have
NetLabel labeling active; this is done to ensure the labeling remains intact.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
We do not need O(1) access to the tail of the avc cache lists and so we are
wasting lots of space using struct list_head instead of struct hlist_head.
This patch converts the avc cache to use hlists in which there is a single
pointer from the head which saves us about 4k of global memory.
Resulted in about a 1.5% decrease in time spent in avc_has_perm_noaudit based
on oprofile sampling of tbench. Although likely within the noise....
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
The code making use of struct avc_cache was not easy to read thanks to liberal
use of &avc_cache.{slots_lock,slots}[hvalue] throughout. This patch simply
creates local pointers and uses those instead of the long global names.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
It appears there was an intention to have the security server only decide
certain permissions and leave other for later as some sort of a portential
performance win. We are currently always deciding all 32 bits of
permissions and this is a useless couple of branches and wasted space.
This patch completely drops the av.decided concept.
This in a 17% reduction in the time spent in avc_has_perm_noaudit
based on oprofile sampling of a tbench benchmark.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
we are often needlessly jumping through hoops when it comes to avd
entries in avc_has_perm_noaudit and we have extra initialization and memcpy
which are just wasting performance. Try to clean the function up a bit.
This patch resulted in a 13% drop in time spent in avc_has_perm_noaudit in my
oprofile sampling of a tbench benchmark.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Currently SELinux code has an atomic which was intended to track how many
times an avc entry was used and to evict entries when they haven't been
used recently. Instead we never let this atomic get above 1 and evict when
it is first checked for eviction since it hits zero. This is a total waste
of time so I'm completely dropping ae.used.
This change resulted in about a 3% faster avc_has_perm_noaudit when running
oprofile against a tbench benchmark.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
The avc update node callbacks do not check the seqno of the caller with the
seqno of the node found. It is possible that a policy change could happen
(although almost impossibly unlikely) in which a permissive or
permissive_domain decision is not valid for the entry found. Simply pass
and check that the seqno of the caller and the seqno of the node found
match.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
When a context is pulled in from disk we don't know that it is null
terminated. This patch forecebly null terminates contexts when we pull
them from disk.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Currently when an inode is read into the kernel with an invalid label
string (can often happen with removable media) we output a string like:
SELinux: inode_doinit_with_dentry: context_to_sid([SOME INVALID LABEL])
returned -22 dor dev=[blah] ino=[blah]
Which is all but incomprehensible to all but a couple of us. Instead, on
EINVAL only, I plan to output a much more user friendly string and I plan to
ratelimit the printk since many of these could be generated very rapidly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
For cleanliness and efficiency remove all calls to secondary-> and instead
call capabilities code directly. capabilities are the only module that
selinux stacks with and so the code should not indicate that other stacking
might be possible.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Remove SELinux hooks which do nothing except defer to the capabilites
hooks (or in one case, replicates the function).
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Remove secondary ops call to shm_shmat, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to unix_stream_connect, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to task_kill, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to task_setrlimit, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove unused cred_commit hook from SELinux. This
currently calls into the capabilities hook, which is a noop.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to task_create, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to file_mprotect, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_setattr, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_permission, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_follow_link, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_mknod, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_unlink, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to inode_link, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to sb_umount, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to sb_mount, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to bprm_committed_creds, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove secondary ops call to bprm_committing_creds, which is
a noop in capabilities.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove unused bprm_check_security hook from SELinux. This
currently calls into the capabilities hook, which is a noop.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Context mounts and genfs labeled file systems behave differently with respect to
setting file system labels. This patch brings genfs labeled file systems in line
with context mounts in that setxattr calls to them should return EOPNOTSUPP and
fscreate calls will be ignored.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@macbook.localdomain>
There is no easy way to tell if a file system supports SELinux security labeling.
Because of this a new flag is being added to the super block security structure
to indicate that the particular super block supports labeling. This flag is set
for file systems using the xattr, task, and transition labeling methods unless
that behavior is overridden by context mounts.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@macbook.localdomain>
The super block security structure currently has three fields for what are
essentially flags. The flags field is used for mount options while two other
char fields are used for initialization and proc flags. These latter two fields are
essentially bit fields since the only used values are 0 and 1. These fields
have been collapsed into the flags field and new bit masks have been added for
them. The code is also fixed to work with these new flags.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@macbook.localdomain>
Fix a regression in cap_capable() due to:
commit 3b11a1dece
Author: David Howells <dhowells@redhat.com>
Date: Fri Nov 14 10:39:26 2008 +1100
CRED: Differentiate objective and effective subjective credentials on a task
The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.
There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.
Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds. However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.
One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.
The affected capability check is in generic_permission():
if (!(mask & MAY_EXEC) || execute_ok(inode))
if (capable(CAP_DAC_OVERRIDE))
return 0;
This change passes the set of credentials to be tested down into the commoncap
and SELinux code. The security functions called by capable() and
has_capability() select the appropriate set of credentials from the process
being checked.
This can be tested by compiling the following program from the XFS testsuite:
/*
* t_access_root.c - trivial test program to show permission bug.
*
* Written by Michael Kerrisk - copyright ownership not pursued.
* Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
*/
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"
static void
errExit(char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
} /* errExit */
static void
accessTest(char *file, int mask, char *mstr)
{
printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */
int
main(int argc, char *argv[])
{
int fd, perm, uid, gid;
char *testpath;
char cmd[PATH_MAX + 20];
testpath = (argc > 1) ? argv[1] : TESTPATH;
perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM;
uid = (argc > 3) ? atoi(argv[3]) : UID;
gid = (argc > 4) ? atoi(argv[4]) : GID;
unlink(testpath);
fd = open(testpath, O_RDWR | O_CREAT, 0);
if (fd == -1) errExit("open");
if (fchown(fd, uid, gid) == -1) errExit("fchown");
if (fchmod(fd, perm) == -1) errExit("fchmod");
close(fd);
snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
system(cmd);
if (seteuid(uid) == -1) errExit("seteuid");
accessTest(testpath, 0, "0");
accessTest(testpath, R_OK, "R_OK");
accessTest(testpath, W_OK, "W_OK");
accessTest(testpath, X_OK, "X_OK");
accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");
exit(EXIT_SUCCESS);
} /* main */
This can be run against an Ext3 filesystem as well as against an XFS
filesystem. If successful, it will show:
[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx
access(/tmp/xxx, 0) returns 0
access(/tmp/xxx, R_OK) returns 0
access(/tmp/xxx, W_OK) returns 0
access(/tmp/xxx, X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK) returns 0
access(/tmp/xxx, R_OK | X_OK) returns -1
access(/tmp/xxx, W_OK | X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1
If unsuccessful, it will show:
[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx
access(/tmp/xxx, 0) returns 0
access(/tmp/xxx, R_OK) returns -1
access(/tmp/xxx, W_OK) returns -1
access(/tmp/xxx, X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK) returns -1
access(/tmp/xxx, R_OK | X_OK) returns -1
access(/tmp/xxx, W_OK | X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1
I've also tested the fix with the SELinux and syscalls LTP testsuites.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I started playing with pahole today and decided to put it against the
selinux structures. Found we could save a little bit of space on x86_64
(and no harm on i686) just reorganizing some structs.
Object size changes:
av_inherit: 24 -> 16
selinux_class_perm: 48 -> 40
context: 80 -> 72
Admittedly there aren't many of av_inherit or selinux_class_perm's in
the kernel (33 and 1 respectively) But the change to the size of struct
context reverberate out a bit. I can get some hard number if they are
needed, but I don't see why they would be. We do change which cacheline
context->len and context->str would be on, but I don't see that as a
problem since we are clearly going to have to load both if the context
is to be of any value. I've run with the patch and don't seem to be
having any problems.
An example of what's going on using struct av_inherit would be:
form: to:
struct av_inherit { struct av_inherit {
u16 tclass; const char **common_pts;
const char **common_pts; u32 common_base;
u32 common_base; u16 tclass;
};
(notice all I did was move u16 tclass to the end of the struct instead
of the beginning)
Memory layout before the change:
struct av_inherit {
u16 tclass; /* 2 */
/* 6 bytes hole */
const char** common_pts; /* 8 */
u32 common_base; /* 4 */
/* 4 byes padding */
/* size: 24, cachelines: 1 */
/* sum members: 14, holes: 1, sum holes: 6 */
/* padding: 4 */
};
Memory layout after the change:
struct av_inherit {
const char ** common_pts; /* 8 */
u32 common_base; /* 4 */
u16 tclass; /* 2 */
/* 2 bytes padding */
/* size: 16, cachelines: 1 */
/* sum members: 14, holes: 0, sum holes: 0 */
/* padding: 2 */
};
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix a regression in cap_capable() due to:
commit 5ff7711e635b32f0a1e558227d030c7e45b4a465
Author: David Howells <dhowells@redhat.com>
Date: Wed Dec 31 02:52:28 2008 +0000
CRED: Differentiate objective and effective subjective credentials on a task
The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.
There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.
Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds. However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.
One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.
The affected capability check is in generic_permission():
if (!(mask & MAY_EXEC) || execute_ok(inode))
if (capable(CAP_DAC_OVERRIDE))
return 0;
This change splits capable() from has_capability() down into the commoncap and
SELinux code. The capable() security op now only deals with the current
process, and uses the current process's subjective creds. A new security op -
task_capable() - is introduced that can check any task's objective creds.
strictly the capable() security op is superfluous with the presence of the
task_capable() op, however it should be faster to call the capable() op since
two fewer arguments need be passed down through the various layers.
This can be tested by compiling the following program from the XFS testsuite:
/*
* t_access_root.c - trivial test program to show permission bug.
*
* Written by Michael Kerrisk - copyright ownership not pursued.
* Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
*/
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"
static void
errExit(char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
} /* errExit */
static void
accessTest(char *file, int mask, char *mstr)
{
printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */
int
main(int argc, char *argv[])
{
int fd, perm, uid, gid;
char *testpath;
char cmd[PATH_MAX + 20];
testpath = (argc > 1) ? argv[1] : TESTPATH;
perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM;
uid = (argc > 3) ? atoi(argv[3]) : UID;
gid = (argc > 4) ? atoi(argv[4]) : GID;
unlink(testpath);
fd = open(testpath, O_RDWR | O_CREAT, 0);
if (fd == -1) errExit("open");
if (fchown(fd, uid, gid) == -1) errExit("fchown");
if (fchmod(fd, perm) == -1) errExit("fchmod");
close(fd);
snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
system(cmd);
if (seteuid(uid) == -1) errExit("seteuid");
accessTest(testpath, 0, "0");
accessTest(testpath, R_OK, "R_OK");
accessTest(testpath, W_OK, "W_OK");
accessTest(testpath, X_OK, "X_OK");
accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");
exit(EXIT_SUCCESS);
} /* main */
This can be run against an Ext3 filesystem as well as against an XFS
filesystem. If successful, it will show:
[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx
access(/tmp/xxx, 0) returns 0
access(/tmp/xxx, R_OK) returns 0
access(/tmp/xxx, W_OK) returns 0
access(/tmp/xxx, X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK) returns 0
access(/tmp/xxx, R_OK | X_OK) returns -1
access(/tmp/xxx, W_OK | X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1
If unsuccessful, it will show:
[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx
access(/tmp/xxx, 0) returns 0
access(/tmp/xxx, R_OK) returns -1
access(/tmp/xxx, W_OK) returns -1
access(/tmp/xxx, X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK) returns -1
access(/tmp/xxx, R_OK | X_OK) returns -1
access(/tmp/xxx, W_OK | X_OK) returns -1
access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1
I've also tested the fix with the SELinux and syscalls LTP testsuites.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Don't store the field->op in the messy (and very inconvenient for e.g.
audit_comparator()) form; translate to dense set of values and do full
validation of userland-submitted value while we are at it.
->audit_init_rule() and ->audit_match_rule() get new values now; in-tree
instances updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Impact: cleanup
In future, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: James Morris <jmorris@namei.org>
Cc: Eric Biederman <ebiederm@xmission.com>
This patch is the first step towards removing the old "compat_net" code from
the kernel. Secmark, the "compat_net" replacement was first introduced in
2.6.18 (September 2006) and the major Linux distributions with SELinux support
have transitioned to Secmark so it is time to start deprecating the "compat_net"
mechanism. Testing a patched version of 2.6.28-rc6 with the initial release of
Fedora Core 5 did not show any problems when running in enforcing mode.
This patch adds an entry to the feature-removal-schedule.txt file and removes
the SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT configuration option, forcing
Secmark on by default although it can still be disabled at runtime. The patch
also makes the Secmark permission checks "dynamic" in the sense that they are
only executed when Secmark is configured; this should help prevent problems
with older distributions that have not yet migrated to Secmark.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
Don't bother checking permissions when the kernel performs an
internal mount, as this should always be allowed.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>