Commit Graph

549 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
7b80f11563 ANDROID: GKI: hack up fs/sysfs/file.c to prevent GENKSYMS change
In commit 5f42436428 ("sysfs: Add sysfs_emit and sysfs_emit_at to
format sysfs output") in 5.4.103, a #include <linux/mm.h> addition is
added to the fs/sysfs/file.c file, before sysfs.h is included in the
build.  This ends up causing a previously "UNKNOWN" symbol type
definition of struct vm_operations_struct to become "known" which
propagates to a CRC change for some sysfs symbol definitions.

To prevent all this mess, add a simple #ifndef to prevent the mm.h file
from being included when running the symbol tool, and all is fine.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibcc22bedc4f39d2ef49a52ebd7962a8dba5b0583
2021-03-08 10:09:14 +01:00
Joe Perches
5f42436428 sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output
commit 2efc459d06f1630001e3984854848a5647086232 upstream.

Output defects can exist in sysfs content using sprintf and snprintf.

sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Add a generic sysfs_emit function that knows that the size of the
temporary buffer and ensures that no overrun is done.

Add a generic sysfs_emit_at function that can be used in multiple
call situations that also ensures that no overrun is done.

Validate the output buffer argument to be page aligned.
Validate the offset len argument to be within the PAGE_SIZE buf.

Signed-off-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/884235202216d464d61ee975f7465332c86f76b2.1600285923.git.joe@perches.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07 12:20:48 +01:00
Linus Torvalds
933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Jiri Olsa
aac1f7f95f sysfs: Add sysfs_update_groups function
Adding sysfs_update_groups function to update
multiple groups.

  sysfs_update_groups - given a directory kobject, create a bunch of attribute groups
  @kobj:      The kobject to update the group on
  @groups:    The attribute groups to update, NULL terminated

This function update a bunch of attribute groups.  If an error occurs when
updating a group, all previously updated groups will be removed together
with already existing (not updated) attributes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-2-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-03 11:58:20 +02:00
Al Viro
f7a9945184 no need to protect against put_user_ns(NULL)
it's a no-op

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25 17:59:54 -04:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Linus Torvalds
924973508b Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount infrastructure fix from Al Viro:
 "Fixup for sysfs braino.

  Capabilities checks for sysfs mount do include those on netns, but
  only if CONFIG_NET_NS is enabled. Sorry, should've caught that
  earlier..."

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix sysfs_init_fs_context() in !CONFIG_NET_NS case
2019-03-16 10:31:02 -07:00
Al Viro
ab81dabda1 fix sysfs_init_fs_context() in !CONFIG_NET_NS case
Permission checks on current's netns should be done only when
netns are enabled.

Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Fixes: 23bf1b6be9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-16 09:45:42 -04:00
Linus Torvalds
7b47a9e7c8 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount infrastructure updates from Al Viro:
 "The rest of core infrastructure; no new syscalls in that pile, but the
  old parts are switched to new infrastructure. At that point
  conversions of individual filesystems can happen independently; some
  are done here (afs, cgroup, procfs, etc.), there's also a large series
  outside of that pile dealing with NFS (quite a bit of option-parsing
  stuff is getting used there - it's one of the most convoluted
  filesystems in terms of mount-related logics), but NFS bits are the
  next cycle fodder.

  It got seriously simplified since the last cycle; documentation is
  probably the weakest bit at the moment - I considered dropping the
  commit introducing Documentation/filesystems/mount_api.txt (cutting
  the size increase by quarter ;-), but decided that it would be better
  to fix it up after -rc1 instead.

  That pile allows to do followup work in independent branches, which
  should make life much easier for the next cycle. fs/super.c size
  increase is unpleasant; there's a followup series that allows to
  shrink it considerably, but I decided to leave that until the next
  cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
  afs: Use fs_context to pass parameters over automount
  afs: Add fs_context support
  vfs: Add some logging to the core users of the fs_context log
  vfs: Implement logging through fs_context
  vfs: Provide documentation for new mount API
  vfs: Remove kern_mount_data()
  hugetlbfs: Convert to fs_context
  cpuset: Use fs_context
  kernfs, sysfs, cgroup, intel_rdt: Support fs_context
  cgroup: store a reference to cgroup_ns into cgroup_fs_context
  cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
  cgroup_do_mount(): massage calling conventions
  cgroup: stash cgroup_root reference into cgroup_fs_context
  cgroup2: switch to option-by-option parsing
  cgroup1: switch to option-by-option parsing
  cgroup: take options parsing into ->parse_monolithic()
  cgroup: fold cgroup1_mount() into cgroup1_get_tree()
  cgroup: start switching to fs_context
  ipc: Convert mqueue fs to fs_context
  proc: Add fs_context support to procfs
  ...
2019-03-12 14:08:19 -07:00
David Howells
23bf1b6be9 kernfs, sysfs, cgroup, intel_rdt: Support fs_context
Make kernfs support superblock creation/mount/remount with fs_context.

This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
be made to support fs_context also.

Notes:

 (1) A kernfs_fs_context struct is created to wrap fs_context and the
     kernfs mount parameters are moved in here (or are in fs_context).

 (2) kernfs_mount{,_ns}() are made into kernfs_get_tree().  The extra
     namespace tag parameter is passed in the context if desired

 (3) kernfs_free_fs_context() is provided as a destructor for the
     kernfs_fs_context struct, but for the moment it does nothing except
     get called in the right places.

 (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
     pass, but possibly this should be done anyway in case someone wants to
     add a parameter in future.

 (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
     the cgroup v1 and v2 mount parameters are all moved there.

 (6) cgroup1 parameter parsing error messages are now handled by invalf(),
     which allows userspace to collect them directly.

 (7) cgroup1 parameter cleanup is now done in the context destructor rather
     than in the mount/get_tree and remount functions.

Weirdies:

 (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
     but then uses the resulting pointer after dropping the locks.  I'm
     told this is okay and needs commenting.

 (*) The cgroup refcount web.  This really needs documenting.

 (*) cgroup2 only has one root?

Add a suggestion from Thomas Gleixner in which the RDT enablement code is
placed into its own function.

[folded a leak fix from Andrey Vagin]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: cgroups@vger.kernel.org
cc: fenghua.yu@intel.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:34 -05:00
Ondrej Mosnacek
5b2f2bd62e sysfs: remove unused include of kernfs-internal.h
This include is not needed (fs/sysfs/file.c builds just fine without
it). Remove it.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-08 12:57:31 +01:00
Stephen Martin
4bd4e92cfe sysfs: fix blank line coding style warning
Fixed a coding style issue.

Signed-off-by: Stephen Martin <lockwood@opperline.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 16:45:17 +01:00
Greg Kroah-Hartman
de96e9fea7 sysfs: convert BUG_ON to WARN_ON
It's rude to crash the system just because the developer did something
wrong, as it prevents them from usually even seeing what went wrong.

So convert the few BUG_ON() calls that have snuck into the sysfs code
over the years to WARN_ON() to make it more "friendly".  All of these
are able to be recovered from, so it makes no sense to crash.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-07 08:53:32 +01:00
Jani Nikula
9ee4685c9a sysfs: constify sysfs create/remove files harder
Let the passed in array be const (and thus placed in rodata) instead of
a mutable array of const pointers.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181004143750.30880-1-jani.nikula@intel.com
2018-12-03 18:18:19 +02:00
Linus Torvalds
a18d783fed Driver core patches for 4.19-rc1
Here are all of the driver core and related patches for 4.19-rc1.
 
 Nothing huge here, just a number of small cleanups and the ability to
 now stop the deferred probing after init happens.
 
 All of these have been in linux-next for a while with only a merge issue
 reported.  That merge issue is in fs/sysfs/group.c and Stephen has
 posted the diff of what it should be to resolve this.  I'll follow up
 with that diff to this pull request.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW3g86Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynyXQCePaZSW8wft4b7nLN8RdZ98ATBru0Ani10lrJa
 HQeQJRNbWU1AZ0ym7695
 =tOaH
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are all of the driver core and related patches for 4.19-rc1.

  Nothing huge here, just a number of small cleanups and the ability to
  now stop the deferred probing after init happens.

  All of these have been in linux-next for a while with only a merge
  issue reported"

* tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
  base: core: Remove WARN_ON from link dependencies check
  drivers/base: stop new probing during shutdown
  drivers: core: Remove glue dirs from sysfs earlier
  driver core: remove unnecessary function extern declare
  sysfs.h: fix non-kernel-doc comment
  PM / Domains: Stop deferring probe at the end of initcall
  iommu: Remove IOMMU_OF_DECLARE
  iommu: Stop deferring probe at end of initcalls
  pinctrl: Support stopping deferred probe after initcalls
  dt-bindings: pinctrl: add a 'pinctrl-use-default' property
  driver core: allow stopping deferred probe after init
  driver core: add a debugfs entry to show deferred devices
  sysfs: Fix internal_create_group() for named group updates
  base: fix order of OF initialization
  linux/device.h: fix kernel-doc notation warning
  Documentation: update firmware loader fallback reference
  kobject: Replace strncpy with memcpy
  drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number
  kernfs: Replace strncpy with memcpy
  device: Add #define dev_fmt similar to #define pr_fmt
  ...
2018-08-18 11:44:53 -07:00
Linus Torvalds
72f02ba66b SCSI misc on 20180815
This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
 hisi_sas, smartpqi, megaraid_sas, arcmsr.  In addition, with the
 continuing absence of Nic we have target updates for tcmu and target
 core (all with reviews and acks).  The biggest observable change is
 going to be that we're (again) trying to switch to mulitqueue as the
 default (a user can still override the setting on the kernel command
 line).  Other major core stuff is the removal of the remaining
 Microchannel drivers, an update of the internal timers and some
 reworks of completion and result handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW3R3niYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishauRAP4yfBKK
 dbxF81c/Bxi/Stk16FWkOOrjs4CizwmnMcpM5wD/UmM9o6ebDzaYpZgA8wIl7X/N
 o/JckEZZpIp+5NySZNc=
 =ggLB
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
  hisi_sas, smartpqi, megaraid_sas, arcmsr.

  In addition, with the continuing absence of Nic we have target updates
  for tcmu and target core (all with reviews and acks).

  The biggest observable change is going to be that we're (again) trying
  to switch to mulitqueue as the default (a user can still override the
  setting on the kernel command line).

  Other major core stuff is the removal of the remaining Microchannel
  drivers, an update of the internal timers and some reworks of
  completion and result handling"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
  scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue
  scsi: ufs: remove unnecessary query(DM) UPIU trace
  scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done()
  scsi: aacraid: Spelling fix in comment
  scsi: mpt3sas: Fix calltrace observed while running IO & reset
  scsi: aic94xx: fix an error code in aic94xx_init()
  scsi: st: remove redundant pointer STbuffer
  scsi: qla2xxx: Update driver version to 10.00.00.08-k
  scsi: qla2xxx: Migrate NVME N2N handling into state machine
  scsi: qla2xxx: Save frame payload size from ICB
  scsi: qla2xxx: Fix stalled relogin
  scsi: qla2xxx: Fix race between switch cmd completion and timeout
  scsi: qla2xxx: Fix Management Server NPort handle reservation logic
  scsi: qla2xxx: Flush mailbox commands on chip reset
  scsi: qla2xxx: Fix unintended Logout
  scsi: qla2xxx: Fix session state stuck in Get Port DB
  scsi: qla2xxx: Fix redundant fc_rport registration
  scsi: qla2xxx: Silent erroneous message
  scsi: qla2xxx: Prevent sysfs access when chip is down
  scsi: qla2xxx: Add longer window for chip reset
  ...
2018-08-15 22:06:26 -07:00
Bart Van Assche
2afc9166f7 scsi: sysfs: Introduce sysfs_{un,}break_active_protection()
Introduce these two functions and export them such that the next patch
can add calls to these functions from the SCSI core.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02 15:53:15 -04:00
Tyler Hicks
d175339027 sysfs: Fix regression when adding a file to an existing group
Commit 5f81880d52 ("sysfs, kobject: allow creating kobject belonging
to arbitrary users") incorrectly changed the argument passed as the
parent parameter when calling sysfs_add_file_mode_ns(). This caused some
sysfs attribute files to not be added correctly to certain groups.

Fixes: 5f81880d52 ("sysfs, kobject: allow creating kobject belonging to arbitrary users")
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29 13:11:28 -07:00
Dmitry Torokhov
5f81880d52 sysfs, kobject: allow creating kobject belonging to arbitrary users
Normally kobjects and their sysfs representation belong to global root,
however it is not necessarily the case for objects in separate namespaces.
For example, objects in separate network namespace logically belong to the
container's root and not global root.

This change lays groundwork for allowing network namespace objects
ownership to be transferred to container's root user by defining
get_ownership() callback in ktype structure and using it in sysfs code to
retrieve desired uid/gid when creating sysfs objects for given kobject.

Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:44:35 -07:00
Dmitry Torokhov
488dee96bb kernfs: allow creating kernfs objects with arbitrary uid/gid
This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.

When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.

Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:44:35 -07:00
Rajat Jain
c855cf2759 sysfs: Fix internal_create_group() for named group updates
There are a couple of problems with named group updates in the code
today:

* sysfs_update_group() will always fail for a named group, because
  internal_create_group() will try to create a new sysfs directory
  unconditionally, which will ofcourse fail with -EEXIST.

* We can leak the kernfs_node for grp->name if some one tries to:
  - rename a group (change grp->name), or
  - update a named group, to an unnamed group

It appears that the whole purpose of sysfs_update_group() was to
allow changing the permissions or visibility of attributes and not
the names. So make it clear in the comments, and allow it to update
an existing named group.

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-07 17:54:46 +02:00
Al Viro
7b745a4e40 unfuck sysfs_mount()
new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
is not a solution - IS_ERR(root) is true in some cases when new_sb is true.

Make sure new_sb is initialized (and matches the reality) in all cases and
fix the condition for dropping kobj reference - we want it done precisely
in those situations where the reference has not been transferred into a new
super_block instance.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:09 -04:00
Grygorii Strashko
2399ac42e7 sysfs: symlink: export sysfs_create_link_nowarn()
The sysfs_create_link_nowarn() is going to be used in phylib framework in
subsequent patch which can be built as module. Hence, export
sysfs_create_link_nowarn() to avoid build errors.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19 21:14:26 -04:00
Linus Torvalds
ab486bc9a5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Add a console_msg_format command line option:

     The value "default" keeps the old "[time stamp] text\n" format. The
     value "syslog" allows to see the syslog-like "<log
     level>[timestamp] text" format.

     This feature was requested by people doing regression tests, for
     example, 0day robot. They want to have both filtered and full logs
     at hands.

 - Reduce the risk of softlockup:

     Pass the console owner in a busy loop.

     This is a new approach to the old problem. It was first proposed by
     Steven Rostedt on Kernel Summit 2017. It marks a context in which
     the console_lock owner calls console drivers and could not sleep.
     On the other side, printk() callers could detect this state and use
     a busy wait instead of a simple console_trylock(). Finally, the
     console_lock owner checks if there is a busy waiter at the end of
     the special context and eventually passes the console_lock to the
     waiter.

     The hand-off works surprisingly well and helps in many situations.
     Well, there is still a possibility of the softlockup, for example,
     when the flood of messages stops and the last owner still has too
     much to flush.

     There is increasing number of people having problems with
     printk-related softlockups. We might eventually need to get better
     solution. Anyway, this looks like a good start and promising
     direction.

 - Do not allow to schedule in console_unlock() called from printk():

     This reverts an older controversial commit. The reschedule helped
     to avoid softlockups. But it also slowed down the console output.
     This patch is obsoleted by the new console waiter logic described
     above. In fact, the reschedule made the hand-off less effective.

 - Deprecate "%pf" and "%pF" format specifier:

     It was needed on ia64, ppc64 and parisc64 to dereference function
     descriptors and show the real function address. It is done
     transparently by "%ps" and "pS" format specifier now.

     Sergey Senozhatsky found that all the function descriptors were in
     a special elf section and could be easily detected.

 - Remove printk_symbol() API:

     It has been obsoleted by "%pS" format specifier, and this change
     helped to remove few continuous lines and a less intuitive old API.

 - Remove redundant memsets:

     Sergey removed unnecessary memset when processing printk.devkmsg
     command line option.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
  printk: drop redundant devkmsg_log_str memsets
  printk: Never set console_may_schedule in console_trylock()
  printk: Hide console waiter logic into helpers
  printk: Add console owner and waiter logic to load balance console writes
  kallsyms: remove print_symbol() function
  checkpatch: add pF/pf deprecation warning
  symbol lookup: introduce dereference_symbol_descriptor()
  parisc64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  ia64: Add .opd based function descriptor dereference
  sections: split dereference_function_descriptor()
  openrisc: Fix conflicting types for _exext and _stext
  lib: do not use print_symbol()
  irq debug: do not use print_symbol()
  sysfs: do not use print_symbol()
  drivers: do not use print_symbol()
  x86: do not use print_symbol()
  unicore32: do not use print_symbol()
  sh: do not use print_symbol()
  mn10300: do not use print_symbol()
  ...
2018-02-01 13:36:15 -08:00
Greg Kroah-Hartman
78fae52cf4 sysfs: remove DEBUG defines
It isn't needed at all in these files, dynamic debug is the best way to
enable this type of thing, if you really want it.  As it is, these
defines were not doing anything at all.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 10:19:23 +01:00
Greg Kroah-Hartman
619daeeeb8 sysfs: use SPDX identifiers
Move the license "mark" of the sysfs files to be in SPDX form, instead
of the custom text that it currently is in.  This is in a quest to get
rid of the 700+ different ways we say "GPLv2" in the kernel tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 10:19:10 +01:00
Greg Kroah-Hartman
5d54f948aa sysfs: turn WARN() into pr_warn()
It's not good to crash the machine if panic_on_warn() is set just
because someone made a stupid mistake of trying to create a sysfs file
with the same name of an existing one.  This makes the automated testing
tools a lot harder to find the real bugs in the kernel.

So just print a warning out and dump the stack to get the attention of
the developer that they did something foolish.  Then keep on trucking,
as this should not be a fatal error at all.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-22 16:11:12 +01:00
Sergey Senozhatsky
9e6d35ff0a sysfs: do not use print_symbol()
print_symbol() is a very old API that has been obsoleted by %pS format
specifier in a normal printk() call.

Replace print_symbol() with a direct printk("%pS") call.

Link: http://lkml.kernel.org/r/20171211125025.2270-11-sergey.senozhatsky@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>
To: Russell King <linux@armlinux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Salter <msalter@redhat.com>
To: Tony Luck <tony.luck@intel.com>
To: David Howells <dhowells@redhat.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Guan Xuetao <gxt@mprc.pku.edu.cn>
To: Borislav Petkov <bp@alien8.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
To: Vineet Gupta <vgupta@synopsys.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-sh@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: updated commit message]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-05 15:23:59 +01:00
Vasyl Gomonovych
90b3d2f6c0 sysfs: Use PTR_ERR_OR_ZERO()
Fix ptr_ret.cocci warnings:
fs/sysfs/group.c:409:8-14: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-18 16:47:27 +01:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
NeilBrown
c8a139d001 sysfs: be careful of error returns from ops->show()
ops->show() can return a negative error code.
Commit 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.

Commit 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.

This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.

I can reproduce it reliably only by putting delay like
	usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
  echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state

The bug can be triggered without the usleep.

Fixes: 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 17:33:32 +02:00
Linus Torvalds
f34d3606f7 Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - tracepoints for basic cgroup management operations added

 - kernfs and cgroup path formatting functions updated to behave in the
   style of strlcpy()

 - non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
  cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
  cpuset: fix error handling regression in proc_cpuset_show()
  cgroup: add tracepoints for basic operations
  cgroup: make cgroup_path() and friends behave in the style of strlcpy()
  kernfs: remove kernfs_path_len()
  kernfs: make kernfs_path*() behave in the style of strlcpy()
  kernfs: add dummy implementation of kernfs_path_from_node()
2016-10-14 12:18:50 -07:00
Johannes Thumshirn
78618d395b sysfs print name of undiscoverable attribute group
Print the name of an undiscoverable attribute group and not the
pointer's address.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-27 12:24:29 +02:00
Konstantin Khlebnikov
17d0774f80 sysfs: correctly handle read offset on PREALLOC attrs
Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.

This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 4ef67a8c95 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
Cc: Stable <stable@vger.kernel.org> # v3.19+
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31 15:14:44 +02:00
Tejun Heo
3abb1d90f5 kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs_path*() functions always return the length of the full path but
the path content is undefined if the length is larger than the
provided buffer.  This makes its behavior different from strlcpy() and
requires error handling in all its users even when they don't care
about truncation.  In addition, the implementation can actully be
simplified by making it behave properly in strlcpy() style.

* Update kernfs_path_from_node_locked() to always fill up the buffer
  with path.  If the buffer is not large enough, the output is
  truncated and terminated.

* kernfs_path() no longer needs error handling.  Make it a simple
  inline wrapper around kernfs_path_from_node().

* sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
  Updated accordingly.

* cgroup_path()'s use of kernfs_path() updated to retain the old
  behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-08-10 11:23:44 -04:00
Eric W. Biederman
29a517c232 kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
The cgroup filesystem is in the same boat as sysfs.  No one ever
permits executables of any kind on the cgroup filesystem, and there is
no reasonable future case to support executables in the future.

Therefore move the setting of SB_I_NOEXEC which makes the code proof
against future mistakes of accidentally creating executables from
sysfs to kernfs itself.  Making the code simpler and covering the
sysfs, cgroup, and cgroup2 filesystems.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:56 -05:00
Eric W. Biederman
8654df4e2a mnt: Refactor fs_fully_visible into mount_too_revealing
Replace the call of fs_fully_visible in do_new_mount from before the
new superblock is allocated with a call of mount_too_revealing after
the superblock is allocated.   This winds up being a much better location
for maintainability of the code.

The first change this enables is the replacement of FS_USERNS_VISIBLE
with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
to sb_iflags on the superblock.

Unfortunately mount_too_revealing fundamentally needs to touch
mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
times.  If the mnt_flags did not need to be touched the code
could be easily moved into the filesystem specific mount code.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:46 -05:00
Linus Torvalds
63f4f7e8df platform/chrome: Branch for v4.4
Here's the branch of chrome platform changes for v4.4. Some have been queued
 up for the full 4.3 release cycle since I forgot to send them in for that
 round (rebased early on to deal with fixes conflicts).
 
 Most of these enable EC communication stuff -- Pixel 2015 support, enabling
 building for ARM64 platforms, and a few fixes for memory leaks.
 
 There's also a patch in here to allow reading/writing the verified boot
 context, which depends on a sysfs patch acked by Greg.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQWj/AAoJEIwa5zzehBx3figQAJuZK+2PDMeOe3hg38WEHpAA
 NVaSkZss2VLN8SQCJ8BqB/xoM2lm3GSef2R705m7s20/FsPP7rcJZoeeNkWUK+rn
 Dc01uJpVwGVtBIjst0FkxS+3xBeUrX6QLQyAnxQV4SIOg4t1EtZvgxK/4wnE726H
 BV0bTsUHKo0+FSrtI/QoHFkSFbUBansQ7JuN00YOfuwJPdhKgpInbJPqVkmRAYqI
 ylFFiTUR946SNDMUvmDnd37I2FrIzxslNTBu/oKl0DKI7DPmN3ITvX2JqnnuxktA
 ru7TWbny+oa2TQAwynYxBq9hpMFHMVVbtDPXE/7MisfL4+KUaSE0e95b4ONfG7yZ
 oxzf6S1RxzhA0Eo/bFZ0i29DZ4ogR8ql65RNolWJg13XACrPVyGrTWW5ZFe6xbI9
 HS13oBOHXIllSDfq9QjMRtNHAv268/oTiF8LXLPrxSCYllGZnKL7g9Ae9dFqOdXx
 mnoqYtVPsaErek4ktqOZIMRY5mXpX2JyR642PjGbK71IjRKmGvvRqpee+iCpF7ew
 OR80tAq56/R3NQqp5pgReshglOwQsLdu99spSc+6anjdeSb85ujM2jwXGjLNHwDi
 9XpyIAXox5dy3CX8LqgqryIKp/vqRd4q1vugKXLL1dBMSAm/QltjqPWxB2fCRCve
 IqNEhz4EI6uhJVeqDSNo
 =dv1O
 -----END PGP SIGNATURE-----

Merge tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform

Pull chrome platform updates from Olof Johansson:
 "Here's the branch of chrome platform changes for v4.4.  Some have been
  queued up for the full 4.3 release cycle since I forgot to send them
  in for that round (rebased early on to deal with fixes conflicts).

  Most of these enable EC communication stuff -- Pixel 2015 support,
  enabling building for ARM64 platforms, and a few fixes for memory
  leaks.

  There's also a patch in here to allow reading/writing the verified
  boot context, which depends on a sysfs patch acked by Greg"

* tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
  platform/chrome: Fix i2c-designware adapter name
  platform/chrome: Support reading/writing the vboot context
  sysfs: Support is_visible() on binary attributes
  platform/chrome: cros_ec: Fix possible leak in led_rgb_store()
  platform/chrome: cros_ec: Fix leak in sequence_store()
  platform/chrome: Enable Chrome platforms on 64-bit ARM
  platform/chrome: cros_ec_dev - Add a platform device ID table
  platform/chrome: cros_ec_lpc - Add support for Google Pixel 2
  platform/chrome: cros_ec_lpc - Use existing function to check EC result
  platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO
  Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"
2015-11-13 21:53:18 -08:00
Linus Torvalds
1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Jarkko Sakkinen
37c1c04cca sysfs: added __compat_only_sysfs_link_entry_to_kobj()
Added a new function __compat_only_sysfs_link_group_to_kobj() that adds
a symlink from attribute or group to a kobject. This needed for
maintaining backwards compatibility with PPI attributes in the TPM
driver.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-10-19 01:01:19 +02:00
Emilio López
7f5028cf61 sysfs: Support is_visible() on binary attributes
According to the sysfs header file:

    "The returned value will replace static permissions defined in
     struct attribute or struct bin_attribute."

but this isn't the case, as is_visible is only called on struct attribute
only. This patch introduces a new is_bin_visible() function to implement
the same functionality for binary attributes, and updates documentation
accordingly.

Note that to keep functionality and code similar to that of normal
attributes, the mode is now checked as well to ensure it contains only
read/write permissions or SYSFS_PREALLOC.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Emilio López <emilio.lopez@collabora.co.uk>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2015-10-07 15:05:31 -07:00
NeilBrown
65da3484d9 sysfs: correctly handle short reads on PREALLOC attrs.
attributes declared with __ATTR_PREALLOC use sysfs_kf_read()
which ignores the 'count' arg.
So a 1-byte read request can return more bytes than that.

This is seen with the 'dash' shell when 'read' is used on
some 'md' sysfs attributes.

So only return the 'min' of count and the attribute length.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04 19:42:22 +01:00
Eric W. Biederman
90f8572b0f vfs: Commit to never having exectuables on proc and sysfs.
Today proc and sysfs do not contain any executable files.  Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc.  Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-10 10:39:25 -05:00
Linus Torvalds
0cbee99269 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
 "Long ago and far away when user namespaces where young it was realized
  that allowing fresh mounts of proc and sysfs with only user namespace
  permissions could violate the basic rule that only root gets to decide
  if proc or sysfs should be mounted at all.

  Some hacks were put in place to reduce the worst of the damage could
  be done, and the common sense rule was adopted that fresh mounts of
  proc and sysfs should allow no more than bind mounts of proc and
  sysfs.  Unfortunately that rule has not been fully enforced.

  There are two kinds of gaps in that enforcement.  Only filesystems
  mounted on empty directories of proc and sysfs should be ignored but
  the test for empty directories was insufficient.  So in my tree
  directories on proc, sysctl and sysfs that will always be empty are
  created specially.  Every other technique is imperfect as an ordinary
  directory can have entries added even after a readdir returns and
  shows that the directory is empty.  Special creation of directories
  for mount points makes the code in the kernel a smidge clearer about
  it's purpose.  I asked container developers from the various container
  projects to help test this and no holes were found in the set of mount
  points on proc and sysfs that are created specially.

  This set of changes also starts enforcing the mount flags of fresh
  mounts of proc and sysfs are consistent with the existing mount of
  proc and sysfs.  I expected this to be the boring part of the work but
  unfortunately unprivileged userspace winds up mounting fresh copies of
  proc and sysfs with noexec and nosuid clear when root set those flags
  on the previous mount of proc and sysfs.  So for now only the atime,
  read-only and nodev attributes which userspace happens to keep
  consistent are enforced.  Dealing with the noexec and nosuid
  attributes remains for another time.

  This set of changes also addresses an issue with how open file
  descriptors from /proc/<pid>/ns/* are displayed.  Recently readlink of
  /proc/<pid>/fd has been triggering a WARN_ON that has not been
  meaningful since it was added (as all of the code in the kernel was
  converted) and is not now actively wrong.

  There is also a short list of issues that have not been fixed yet that
  I will mention briefly.

  It is possible to rename a directory from below to above a bind mount.
  At which point any directory pointers below the renamed directory can
  be walked up to the root directory of the filesystem.  With user
  namespaces enabled a bind mount of the bind mount can be created
  allowing the user to pick a directory whose children they can rename
  to outside of the bind mount.  This is challenging to fix and doubly
  so because all obvious solutions must touch code that is in the
  performance part of pathname resolution.

  As mentioned above there is also a question of how to ensure that
  developers by accident or with purpose do not introduce exectuable
  files on sysfs and proc and in doing so introduce security regressions
  in the current userspace that will not be immediately obvious and as
  such are likely to require breaking userspace in painful ways once
  they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Remove incorrect debugging WARN in prepend_path
  mnt: Update fs_fully_visible to test for permanently empty directories
  sysfs: Create mountpoints with sysfs_create_mount_point
  sysfs: Add support for permanently empty directories to serve as mount points.
  kernfs: Add support for always empty directories.
  proc: Allow creating permanently empty directories that serve as mount points
  sysctl: Allow creating permanently empty directories that serve as mountpoints.
  fs: Add helper functions for permanently empty directories.
  vfs: Ignore unlocked mounts in fs_fully_visible
  mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
  mnt: Refactor the logic for mounting sysfs and proc in a user namespace
2015-07-03 15:20:57 -07:00
Eric W. Biederman
87d2846fcf sysfs: Add support for permanently empty directories to serve as mount points.
Add two functions sysfs_create_mount_point and
sysfs_remove_mount_point that hang a permanently empty directory off
of a kobject or remove a permanently emptpy directory hanging from a
kobject.  Export these new functions so modular filesystems can use
them.

Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-01 10:36:45 -05:00
Vladimir Zapolskiy
eaa5cd9263 fs: sysfs: don't pass count == 0 to bin file readers
If count == 0 bytes are requested by a reader, sysfs_kf_bin_read()
deliberately returns 0 without passing a potentially harmful value to
some externally defined underlying battr->read() function.

However in case of (pos == size && count) the next clause always sets
count to 0 and this value is handed over to battr->read().

The change intends to make obsolete (and remove later) a redundant
sanity check in battr->read(), if it is present, or add more
protection to struct bin_attribute users, who does not care about
input arguments.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-01 10:17:17 +09:00
Antonio Ospite
ed1dc8a894 sysfs: disambiguate between "error code" and "failure" in comments
The sentence "Returns 0 on success or error" might be misinterpreted as
"the function will always returns 0", make it less ambiguous.

Also, use the word "failure" as the contrary of "success".

Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24 12:31:33 -07:00
Eric W. Biederman
1b852bceb0 mnt: Refactor the logic for mounting sysfs and proc in a user namespace
Fresh mounts of proc and sysfs are a very special case that works very
much like a bind mount.  Unfortunately the current structure can not
preserve the MNT_LOCK... mount flags.  Therefore refactor the logic
into a form that can be modified to preserve those lock bits.

Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
of the filesystem be fully visible in the current mount namespace,
before the filesystem may be mounted.

Move the logic for calling fs_fully_visible from proc and sysfs into
fs/namespace.c where it has greater access to mount namespace state.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-05-13 21:44:11 -05:00
Vivien Didelot
d8bf8c92e8 sysfs: Only accept read/write permissions for file attributes
For sysfs file attributes, only read and write permissions make sense.
Mask provided attribute permissions accordingly and send a warning
to the console if invalid permission bits are set.

This patch is originally from Guenter [1] and includes the fixup
explained in the thread, that is printing permissions in octal format
and limiting the scope of attributes to SYSFS_PREALLOC | 0664.

[1] https://lkml.org/lkml/2015/1/19/599

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:27:57 +01:00
Guenter Roeck
da4759c73b sysfs: Use only return value from is_visible for the file mode
Up to now, is_visible can only be used to either remove visibility
of a file entirely or to add permissions, but not to reduce permissions.
This makes it impossible, for example, to use DEVICE_ATTR_RW to define
file attributes and reduce permissions to read-only.

This behavior is undesirable and unnecessarily complicates code which
needs to reduce permissions; instead of just returning the desired
permissions, it has to ensure that the permissions in the attribute
variable declaration only reflect the minimal permissions ever needed.

Change semantics of is_visible to only use the permissions returned
from it instead of oring the returned value with the hard-coded
permissions.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:27:57 +01:00