android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Eric W. Biederman	86a71dbd3e	[PATCH] sysctl: hide the sysctl proc inodes from selinux Since the security checks are applied on each read and write of a sysctl file, just like they are applied when calling sys_sysctl, they are redundant on the standard VFS constructs. Since it is difficult to compute the security labels on the standard VFS constructs we just mark the sysctl inodes in proc private so selinux won't even bother with them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	3fbfa98112	[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	77b14db502	[PATCH] sysctl: reimplement the sysctl proc support With this change the sysctl inodes can be cached and nothing needs to be done when removing a sysctl table. For a cost of 2K code we will save about 4K of static tables (when we remove de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or about half that on a 32bit arch. The speed feels about the same, even though we can now cache the sysctl dentries :( We get the core advantage that we don't need to have a 1 to 1 mapping between ctl table entries and proc files. Making it possible to have /proc/sys vary depending on the namespace you are in. The currently merged namespaces don't have an issue here but the network namespace under /proc/sys/net needs to have different directories depending on which network adapters are visible. By simply being a cache different directories being visible depending on who you are is trivial to implement. [akpm@osdl.org: fix uninitialised var] [akpm@osdl.org: fix ARM build] [bunk@stusta.de: make things static] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	2abc26fc6b	[PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry binfmt_misc has a mount point in the middle of the sysctl and that mount point is created as a proc_generic directory. Doing it that way gets in the way of cleaning up the sysctl proc support as it continues the existence of a horrible hack. So instead simply create the directory as an ordinary sysctl directory. At least that removes the magic special case. [akpm@osdl.org: warning fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	0e03036c97	[PATCH] sysctl: register the ocfs2 sysctl numbers ocfs2 was did not have the binary number it uses under CTL_FS registered in sysctl.h. Register it to avoid future conflicts, and change the name of the definition to be in line with the rest of the sysctl numbers. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Eric W. Biederman	4ed075e93b	[PATCH] sysctl: C99 convert ctl_tables in NTFS and remove sys_sysctl support Putting ntfs-debug under FS_NRINODE was not a kosher thing to do so don't give it any binary number. [akpm@osdl.org: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Eric W. Biederman	fd6065b4fd	[PATCH] sysctl: C99 convert coda ctl_tables and remove binary sysctls Will converting the coda sysctl initializers I discovered that it is yet another user of sysctl that was stomping CTL_KERN. So off with it's sys_sysctl support since it wasn't done in a supportable way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
NeilBrown	af6a4e280e	[PATCH] knfsd: add some new fsid types Add support for using a filesystem UUID to identify and export point in the filehandle. For NFSv2, this UUID is xor-ed down to 4 or 8 bytes so that it doesn't take up too much room. For NFSv3+, we use the full 16 bytes, and possibly also a 64bit inode number for exports beneath the root of a filesystem. When generating an fsid to return in 'stat' information, use the UUID (hashed down to size) if it is available and a small 'fsid' was not specifically provided. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:53 -08:00
NeilBrown	982aedfd09	[PATCH] knfsd: tidy up choice of filesystem-identifier when creating a filehandle If we are using the same version/fsid as a current filehandle, then there is no need to verify the the numbers are valid for this export, and they must be (we used them to find this export). This allows us to simplify the fsid selection code. Also change "ref_fh_version" and "ref_fh_fsid_type" to "version" and "fsid_type", as the important thing isn't that they are the version/type of the reference filehandle, but they are the chosen type for the new filehandle. And tidy up some indenting. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:53 -08:00
NeilBrown	8971a1016b	[PATCH] knfsd: fix return value for writes to some files in 'nfsd' filesystem Most files in the 'nfsd' filesystem are transactional. When you write, a reply is generated that can be read back only on the same 'file'. If the reply has zero length, the 'write' will incorrectly return a value of '0' instead of the length that was written. This causes 'rpc.nfsd' to give an annoying warning. This patch fixes the test. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:53 -08:00
Trond Myklebust	ac98695d6c	Merge branch 'master' of /home/trondmy/kernel/linux-2.6/	2007-02-13 22:02:32 -08:00
Linus Torvalds	9468482bd4	Merge master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] on reconnect to Samba - reset the unix capabilities [CIFS] Allow update of EOF on remote extend of file [CIFS] POSIX CIFS Extensions (continued) - POSIX Open [CIFS] Additional POSIX CIFS Extensions infolevels	2007-02-13 21:15:42 -08:00
Steve French	8af1897158	[CIFS] on reconnect to Samba - reset the unix capabilities After temporary server or network failure and reconneciton, we were not resending the unix capabilities via SetFSInfo - which confused Samba posix byte range locking code. Discovered by jra Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-02-14 04:42:51 +00:00
Linus Torvalds	552ce544ed	Revert "[PATCH] Fix d_path for lazy unmounts" This reverts commit `eb3dfb0cb1`. It causes some strange Gnome problem with dbus-daemon getting stuck, so we'll revert it until that problem is understood. Reported by both walt and Greg KH, who both independently git-bisected the problem to this commit. Andreas is looking at it. Reported-by: walt <wa1ter@myrealbox.com> Reported-by: Greg KH <greg@kroah.com> Acked-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-13 12:08:18 -08:00
Andi Kleen	9fbbd4dd17	[PATCH] x86: Don't require the vDSO for handling a.out signals and in other strange binfmts. vDSO is not necessarily mapped there. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:26 +01:00
Trond Myklebust	d9bc125caf	Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ Conflicts: net/sunrpc/auth_gss/gss_krb5_crypto.c net/sunrpc/auth_gss/gss_spkm3_token.c net/sunrpc/clnt.c Merge with mainline and fix conflicts.	2007-02-12 22:43:25 -08:00
Chuck Lever	43d78ef2ba	NFS: disconnect before retrying NFSv4 requests over TCP RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request twice on the same connection unless it is the NULL procedure. Section 3.1.1 suggests that the client should disconnect and reconnect if it wants to retry a request. Implement this by adding an rpc_clnt flag that an ULP can use to specify that the underlying transport should be disconnected on a major timeout. The NFSv4 client asserts this new flag, and requests no retries after a minor retransmit timeout. Note that disconnecting on a retransmit is in general not safe to do if the RPC client does not reuse the TCP port number when reconnecting. See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-12 22:40:45 -08:00
Trond Myklebust	a301b77771	NFS: Don't use ClearPageUptodate() when writeback fails ClearPageUptodate() will just cause races here. What we really want to do is to invalidate the page cache. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-12 22:40:38 -08:00
Trond Myklebust	b0c4fddca2	NFS: Cleanup - avoid rereading 'jiffies' more than once in the same routine Micro-optimisations for nfs_fhget() and nfs_wcc_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-12 22:40:30 -08:00
Trond Myklebust	3e7d950a52	NFS: Fix a wraparound issue with nfsi->cache_change_attribute Fix wraparound issue with nfsi->cache_change_attribute. If it is found to lie in the future, then update it to lie in the past. Patch based on a suggestion by Neil Brown. ..and minor micro-optimisation: avoid reading 'jiffies' more than once in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-12 22:40:22 -08:00
Josef 'Jeff' Sipek	ee9b6d61a2	[PATCH] Mark struct super_operations const This patch is inspired by Arjan's "Patch series to mark struct file_operations and struct inode_operations const". Compile tested with gcc & sparse. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:47 -08:00
Arjan van de Ven	c5ef1c42c5	[PATCH] mark struct inode_operations const 3 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	92e1d5be91	[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	754661f143	[PATCH] mark struct inode_operations const 1 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	00977a59b9	[PATCH] mark struct file_operations const 6 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:45 -08:00
Evgeniy Dushistov	54fb996ac1	[PATCH] ufs2 write: block allocation update Patch adds ability to work with 64bit metadata, this made by replacing work with 32bit pointers by inline functions. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Evgeniy Dushistov	3313e29267	[PATCH] ufs2 write: inodes write This patch adds into write inode path function to write UFS2 inode, and modifys allocate inode path to allocate and init additional inode chunks. Also some cleanups: - remove not used parameters in some functions - remove i_gen field from ufs_inode_info structure, there is i_generation in inode structure with same purposes. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Evgeniy Dushistov	cbcae39fa1	[PATCH] ufs2 write: mount as rw These series of patches add UFS2 write-support. UFS2 - is default file system for recent versions of FreeBSD. The main differences from UFS1 from write support point of view are: 1)Not all inodes are allocated during formatation of disk. 2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they are 32bit). So patch series consist of 1)make possible mount UFS2 in read-write mode 2)code to write ufs2 inodes and code to initialize inodes chunks. 3)work with 64bit meta-data I made simple testing like create/deleting/writing/reading/truncating, also I ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD fsck do not find any errors in fs. This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation: remove note about bug(it fixed by reallocate blocks on the fly patch) and add me in the list of people who want receive bug reports. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Michael Halcrow	0a9ac38246	[PATCH] eCryptfs: add flush_dcache_page() calls Call flush_dcache_page() after modifying a pagecache by hand. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Michael Halcrow	e2bd99ec5c	[PATCH] eCryptfs: open-code flag checking and manipulation Open-code flag checking and manipulation. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Trevor Highland <tshighla@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Michael Halcrow	9d8b8ce556	[PATCH] eCryptfs: convert kmap() to kmap_atomic() Replace kmap() with kmap_atomic(). Reduce the amount of time that mappings are held. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Trevor Highland <tshighla@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Michael Halcrow	70456600f4	[PATCH] eCryptfs: convert f_op->write() to vfs_write() sys_write() takes a local copy of f_pos and writes that back into the struct file. It does this so that two concurrent write() callers don't make a mess of f_pos, and of the file contents. ecryptfs should be calling vfs_write(). That way we also get the fsnotify notifications, which ecryptfs presently appears to have subverted. Convert direct calls to f_op->write() into calls to vfs_write(). Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Michael Halcrow	e77a56ddce	[PATCH] eCryptfs: Encrypted passthrough Provide an option to provide a view of the encrypted files such that the metadata is always in the header of the files, regardless of whether the metadata is actually in the header or in the extended attribute. This mode of operation is useful for applications like incremental backup utilities that do not preserve the extended attributes when directly accessing the lower files. With this option enabled, the files under the eCryptfs mount point will be read-only. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	dd2a3b7ad9	[PATCH] eCryptfs: Generalize metadata read/write Generalize the metadata reading and writing mechanisms, with two targets for now: metadata in file header and metadata in the user.ecryptfs xattr of the lower file. [akpm@osdl.org: printk warning fix] [bunk@stusta.de: make some needlessly global code static] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	17398957aa	[PATCH] eCryptfs: xattr flags and mount options This patch set introduces the ability to store cryptographic metadata into an lower file extended attribute rather than the lower file header region. This patch set implements two new mount options: ecryptfs_xattr_metadata - When set, newly created files will have their cryptographic metadata stored in the extended attribute region of the file rather than the header. When storing the data in the file header, there is a minimum of 8KB reserved for the header information for each file, making each file at least 12KB in size. This can take up a lot of extra disk space if the user creates a lot of small files. By storing the data in the extended attribute, each file will only occupy at least of 4KB of space. As the eCryptfs metadata set becomes larger with new features such as multi-key associations, most popular filesystems will not be able to store all of the information in the xattr region in some cases due to space constraints. However, the majority of users will only ever associate one key per file, so most users will be okay with storing their data in the xattr region. This option should be used with caution. I want to emphasize that the xattr must be maintained under all circumstances, or the file will be rendered permanently unrecoverable. The last thing I want is for a user to forget to set an xattr flag in a backup utility, only to later discover that their backups are worthless. ecryptfs_encrypted_view - When set, this option causes eCryptfs to present applications a view of encrypted files as if the cryptographic metadata were stored in the file header, whether the metadata is actually stored in the header or in the extended attributes. No matter what eCryptfs winds up doing in the lower filesystem, I want to preserve a baseline format compatibility for the encrypted files. As of right now, the metadata may be in the file header or in an xattr. There is no reason why the metadata could not be put in a separate file in future versions. Without the compatibility mode, backup utilities would have to know to back up the metadata file along with the files. The semantics of eCryptfs have always been that the lower files are self-contained units of encrypted data, and the only additional information required to decrypt any given eCryptfs file is the key. That is what has always been emphasized about eCryptfs lower files, and that is what users expect. Providing the encrypted view option will provide a way to userspace applications wherein they can always get to the same old familiar eCryptfs encrypted files, regardless of what eCryptfs winds up doing with the metadata behind the scenes. This patch: Add extended attribute support to version bit vector, flags to indicate when xattr or encrypted view modes are enabled, and support for the new mount options. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	dddfa461fc	[PATCH] eCryptfs: Public key; packet management Public key support code. This reads and writes packets in the header that contain public key encrypted file keys. It calls the messaging code in the previous patch to send and receive encryption and decryption request packets from the userspace daemon. [akpm@osdl.org: cleab fix] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	88b4a07e66	[PATCH] eCryptfs: Public key transport mechanism This is the transport code for public key functionality in eCryptfs. It manages encryption/decryption request queues with a transport mechanism. Currently, netlink is the only implemented transport. Each inode has a unique File Encryption Key (FEK). Under passphrase, a File Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase combo on mount. This FEKEK encrypts each FEK and writes it into the header of each file using the packet format specified in RFC 2440. This is all symmetric key encryption, so it can all be done via the kernel crypto API. These new patches introduce public key encryption of the FEK. There is no asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes the FEK encryption and decryption out to a userspace daemon. After considering our requirements and determining the complexity of using various transport mechanisms, we settled on netlink for this communication. eCryptfs stores authentication tokens into the kernel keyring. These tokens correlate with individual keys. For passphrase mode of operation, the authentication token contains the symmetric FEKEK. For public key, the authentication token contains a PKI type and an opaque data blob managed by individual PKI modules in userspace. Each user who opens a file under an eCryptfs partition mounted in public key mode must be running a daemon. That daemon has the user's credentials and has access to all of the keys to which the user should have access. The daemon, when started, initializes the pluggable PKI modules available on the system and registers itself with the eCryptfs kernel module. Userspace utilities register public key authentication tokens into the user session keyring. These authentication tokens correlate key signatures with PKI modules and PKI blobs. The PKI blobs contain PKI-specific information necessary for the PKI module to carry out asymmetric key encryption and decryption. When the eCryptfs module parses the header of an existing file and finds a Tag 1 (Public Key) packet (see RFC 2440), it reads in the public key identifier (signature). The asymmetrically encrypted FEK is in the Tag 1 packet; eCryptfs puts together a decrypt request packet containing the signature and the encrypted FEK, then it passes it to the daemon registered for the current->euid via a netlink unicast to the PID of the daemon, which was registered at the time the daemon was started by the user. The daemon actually just makes calls to libecryptfs, which implements request packet parsing and manages PKI modules. libecryptfs grabs the public key authentication token for the given signature from the user session keyring. This auth tok tells libecryptfs which PKI module should receive the request. libecryptfs then makes a decrypt() call to the PKI module, and it passes along the PKI block from the auth tok. The PKI uses the blob to figure out how it should decrypt the data passed to it; it performs the decryption and passes the decrypted data back to libecryptfs. libecryptfs then puts together a reply packet with the decrypted FEK and passes that back to the eCryptfs module. The eCryptfs module manages these request callouts to userspace code via message context structs. The module maintains an array of message context structs and places the elements of the array on two lists: a free and an allocated list. When eCryptfs wants to make a request, it moves a msg ctx from the free list to the allocated list, sets its state to pending, and fires off the message to the user's registered daemon. When eCryptfs receives a netlink message (via the callback), it correlates the msg ctx struct in the alloc list with the data in the message itself. The msg->index contains the offset of the array of msg ctx structs. It verifies that the registered daemon PID is the same as the PID of the process that sent the message. It also validates a sequence number between the received packet and the msg ctx. Then, it copies the contents of the message (the reply packet) into the msg ctx struct, sets the state in the msg ctx to done, and wakes up the process that was sleeping while waiting for the reply. The sleeping process was whatever was performing the sys_open(). This process originally called ecryptfs_send_message(); it is now in ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx state was set to done, it returns a pointer to the message contents (the reply packet) and returns. If all went well, this packet contains the decrypted FEK, which is then copied into the crypt_stat struct, and life continues as normal. The case for creation of a new file is very similar, only instead of a decrypt request, eCryptfs sends out an encrypt request. > - We have a great clod of key mangement code in-kernel. Why is that > not suitable (or growable) for public key management? eCryptfs uses Howells' keyring to store persistent key data and PKI state information. It defers public key cryptographic transformations to userspace code. The userspace data manipulation request really is orthogonal to key management in and of itself. What eCryptfs basically needs is a secure way to communicate with a particular daemon for a particular task doing a syscall, based on the UID. Nothing running under another UID should be able to access that channel of communication. > - Is it appropriate that new infrastructure for public key > management be private to a particular fs? The messaging.c file contains a lot of code that, perhaps, could be extracted into a separate kernel service. In essence, this would be a sort of request/reply mechanism that would involve a userspace daemon. I am not aware of anything that does quite what eCryptfs does, so I was not aware of any existing tools to do just what we wanted. > What happens if one of these daemons exits without sending a quit > message? There is a stale uid<->pid association in the hash table for that user. When the user registers a new daemon, eCryptfs cleans up the old association and generates a new one. See ecryptfs_process_helo(). > - _why_ does it use netlink? Netlink provides the transport mechanism that would minimize the complexity of the implementation, given that we can have multiple daemons (one per user). I explored the possibility of using relayfs, but that would involve having to introduce control channels and a protocol for creating and tearing down channels for the daemons. We do not have to worry about any of that with netlink. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Adrian Bunk	b5d5dfbd59	[PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC NFS_SUPER_MAGIC is already defined in include/linux/magic.h Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	27459f0940	[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses Expand the rq_addr field to allow it to contain larger addresses. Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then everywhere the 'sockaddr_in' was referenced, we use instead an accessor function (svc_addr_in) which safely casts the _storage to _in. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	ad06e4bd62	[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing There are loads of places where the RPC server assumes that the rq_addr fields contains an IPv4 address. Top among these are error and debugging messages that display the server's IP address. Let's refactor the address printing into a separate function that's smart enough to figure out the difference between IPv4 and IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	482fb94e1b	[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper Sometimes we need to create an RPC service but not register it with the local portmapper. NFSv4 delegation callback, for example. Change the svc_makesock() API to allow optionally creating temporary or permanent sockets, optionally registering with the local portmapper, and make it return the ephemeral port of the new socket. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Eric W. Biederman	41487c65bf	[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task There isn't any real advantage to this change except that it allows the old functions to be removed. Which is easier on maintenance and puts the code in a more uniform style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	ab521dc0f8	[PATCH] tty: update the tty layer to work with struct pid Of kernel subsystems that work with pids the tty layer is probably the largest consumer. But it has the nice virtue that the assiation with a session only lasts until the session leader exits. Which means that no reference counting is required. So using struct pid winds up being a simple optimization to avoid hash table lookups. In the long term the use of pid_nr also ensures that when we have multiple pid spaces mixed everything will work correctly. Signed-off-by: Eric W. Biederman <eric@maxwell.lnxi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Andries Brouwer	939b00df03	[PATCH] Minix V3 support This morning I needed to read a Minix V3 filesystem, but unfortunately my 2.6.19 did not support that, and neither did the downloaded 2.6.20rc4. Fortunately, google told me that Daniel Aragones had already done the work, patch found at http://www.terra.es/personal2/danarag/ Unfortunaly, looking at the patch was painful to my eyes, so I polished it a bit before applying. The resulting kernel boots, and reads the filesystem it needed to read. Signed-off-by: Daniel Aragones <danarag@gmail.com> Signed-off-by: Andries Brouwer <aeb@cwi.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Eric Dumazet	163da958ba	[PATCH] FS: speed up rw_verify_area() oprofile hunting showed a stall in rw_verify_area(), because of triple indirection and potential cache misses. (file->f_path.dentry->d_inode->i_flock) By moving initialization of 'struct inode' pointer before the pos/count sanity tests, we allow the compiler and processor to perform two loads by anticipation, reducing stall, without prefetch() hints. Even x86 arch has enough registers to not use temporary variables and not increase text size. I validated this patch running a bench and studied oprofile changes, and absolute perf of the test program. Results of my epoll_pipe_bench (source available on request) on a Pentium-M 1.6 GHz machine Before : # ./epoll_pipe_bench -l 30 -t 20 Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples per call (best value out of 10 runs) After : # ./epoll_pipe_bench -l 30 -t 20 Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples per call (best value out of 10 runs) oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 % for the rw_verify_area() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Tomasz Kvarsin	3991d3bd15	[PATCH] warning fix: unsigned->signed While compiling my code with -Wconversion using gcc-trunk, I always get a bunch of warrning from headers, here is fix for them: __getblk is alawys called with unsigned argument, but it takes signed, the same story with __bread,__breadahead and so on. Signed-off-by: Tomasz Kvarsin Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Ahmed S. Darwish	79a81aef76	[PATCH] reiserfs: Use ARRAY_SIZE macro when appropriate Use ARRAY_SIZE macro already defined in kernel.h Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Nick Piggin	f9e4acf3be	[PATCH] inotify: read return val fix Fix for inotify read bug (bugzilla.kernel.org #6999) Problem Description: When reading from an inotify device with an insufficient sized buffer, read(2) will return 0 with no errno set. This is because of an logically incorrect action from the user program thus should return an more logical value. My suggestion is return -EINVAL as for bind(2). This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and feedback from John McCutchan <john@johnmccutchan.com>. Return -EINVAL if we have not passed in enough buffer space to read a single inotify event, rather than 0 which indicates that there is nothing to read. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: "John McCutchan" <john@johnmccutchan.com> Cc: Ryan <wolf0403@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Christoph Hellwig	d003fb70fd	[PATCH] remove sb->s_files and file_list_lock usage in dquot.c Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref. This reduces list search and lock hold time aswell as getting rid of one of the few uses of file_list_lock which Ingo identified as a scalability problem. Previously we called dq_op->initialize for every inode handing of a writeable file that wasn't initialized before. Now we're calling it for every inode that has a non-zero i_writecount, aka a writeable file descriptor refering to it. Thanks a lot to Jan Kara for running this patch through his quota test harness. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Christoph Hellwig	fb58b7316a	[PATCH] move remove_dquot_ref to dqout.c Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under #ifdef CONFIG_QUOTA. Also clean the resulting code up a tiny little bit by testing sb->dq_op earlier - it's constant over a filesystems lifetime. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Andreas Gruenbacher	eb3dfb0cb1	[PATCH] Fix d_path for lazy unmounts Here is a bugfix to d_path. First, when d_path() hits a lazily unmounted mount point, it tries to prepend the name of the lazily unmounted dentry to the path name. It gets this wrong, and also overwrites the slash that separates the name from the following pathname component. This is demonstrated by the attached test case, which prints "getcwd returned d_path-bugsubdir" with the bug. The correct result would be "getcwd returned d_path-bug/subdir". It could be argued that the name of the root dentry should not be part of the result of d_path in the first place. On the other hand, what the unconnected namespace was once reachable as may provide some useful hints to users, and so that seems okay. Second, it isn't always possible to tell from the __d_path result whether the specified root and rootmnt (i.e., the chroot) was reached: lazy unmounts of bind mounts will produce a path that does start with a non-slash so we can tell from that, but other lazy unmounts will produce a path that starts with a slash, just like "ordinary" paths. The attached patch cleans up __d_path() to fix the bug with overlapping pathname components. It also adds a @fail_deleted argument, which allows to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock can then also be moved into __d_path(). The patch also makes sure that paths will only start with a slash for paths which are connected to the root and rootmnt. The @fail_deleted argument could be added to d_path() as well: this would allow callers to recognize deleted files, without having to resort to the ambiguous check for the " (deleted)" string at the end of the pathnames. This is not currently done, but it might be worthwhile. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Cc: Neil Brown <neilb@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Robert P. J. Day	5c3bd438cc	[PATCH] NTFS: rename incorrect check of NTFS_DEBUG with just DEBUG Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with just "#ifdef DEBUG". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Andrew Morton	215122e111	[PATCH] register_chrdev_region() don't hand out the LOCAL/EXPERIMENTAL majors As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic chardev major allocation can hand out majors which LANANA has defined as being for local/experimental use. Cc: Torben Mathiasen <device@lanana.org> Cc: Greg KH <greg@kroah.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tomas Klas <tomas.klas@mepatek.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
David Chinner	6ab8eb1cff	[PATCH] Make XFS use BH_Unwritten and BH_Delay correctly Don't hide buffer_unwritten behind buffer_delay() and remove the hack that clears unexpected buffer_unwritten() states now that it can't happen. Signed-off-by: Dave Chinner <dgc@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
David Chinner	33a266dda9	[PATCH] Make BH_Unwritten a first class bufferhead flag V2 Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a bufferhead. Recently, I found the long standing mmap/unwritten extent conversion bug, and it was to do with partial page invalidation not clearing the unwritten flag from bufferheads attached to the page but beyond EOF. See here for a full explaination: http://oss.sgi.com/archives/xfs/2006-12/msg00196.html The solution I have checked into the XFS dev tree involves duplicating code from block_invalidatepage to clear the unwritten flag from the bufferhead(s), and then calling block_invalidatepage() to do the rest. Christoph suggested that this would be better solved by pushing the unwritten flag into the common buffer head flags and just adding the call to discard_buffer(): http://oss.sgi.com/archives/xfs/2006-12/msg00239.html The following patch makes BH_Unwritten a first class citizen. Signed-off-by: Dave Chinner <dgc@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Linus Torvalds	958b7f37ee	Merge git://oss.sgi.com:8090/xfs/xfs-2.6 * git://oss.sgi.com:8090/xfs/xfs-2.6: (33 commits) [XFS] Don't use kmap in xfs_iozero. [XFS] Remove a bunch of unused functions from XFS. [XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros. [XFS] Remove unused header files for MAC and CAP checking functionality. [XFS] Make freeze code a little cleaner. [XFS] Remove unused argument to xfs_bmap_finish [XFS] Clean up use of VFS attr flags [XFS] Remove useless memory barrier [XFS] XFS sysctl cleanups [XFS] Fix assertion in xfs_attr_shortform_remove(). [XFS] Fix callers of xfs_iozero() to zero the correct range. [XFS] Ensure a frozen filesystem has a clean log before writing the dummy [XFS] Fix sub-block zeroing for buffered writes into unwritten extents. [XFS] Re-initialize the per-cpu superblock counters after recovery. [XFS] Fix block reservation changes for non-SMP systems. [XFS] Fix block reservation mechanism. [XFS] Make growfs work for amounts greater than 2TB [XFS] Fix inode log item use-after-free on forced shutdown [XFS] Fix attr2 corruption with btree data extents [XFS] Workaround log space issue by increasing XFS_TRANS_PUSH_AIL_RESTARTS ...	2007-02-11 11:53:39 -08:00
Linus Torvalds	c827ba4cb4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC64]: Add PCI MSI support on Niagara. [SPARC64] IRQ: Use irq_desc->chip_data instead of irq_desc->handler_data [SPARC64]: Add obppath sysfs attribute for SBUS and PCI devices. [PARTITION]: Add whole_disk attribute.	2007-02-11 11:37:45 -08:00
Alexey Dobriyan	4b98d11b40	[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct They are fat: 4x8 bytes in task_struct. They are uncoditionally updated in every fork, read, write and sendfile. They are used only if you have some "extended acct fields feature". And please, please, please, read(2) knows about bytes, not characters, why it is called "rchar"? Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Dmitriy Monakhov	3e4fdaf8ae	[PATCH] jbd layer function called instead of fs specific one jbd function called instead of fs specific one. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Robert P. J. Day	730c385bc5	[PATCH] Remove unused kernel config option ZISOFS_FS Remove the kernel config option ZISOFS_FS, since it appears that the actual option is simply ZISOFS. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Nick Piggin	72ed3d0358	[PATCH] buffer: memorder fix unlock_buffer(), like unlock_page(), must not clear the lock without ensuring that the critical section is closed. Mingming later sent the same patch, saying: We are running SDET benchmark and saw double free issue for ext3 extended attributes block, which complains the same xattr block already being freed (in ext3_xattr_release_block()). The problem could also been triggered by multiple threads loop untar/rm a kernel tree. The race is caused by missing a memory barrier at unlock_buffer() before the lock bit being cleared, resulting in possible concurrent h_refcounter update. That causes a reference counter leak, then later leads to the double free that we have seen. Inside unlock_buffer(), there is a memory barrier is placed after the lock bit is being cleared, however, there is no memory barrier before the bit is cleared. On some arch the h_refcount update instruction and the clear bit instruction could be reordered, thus leave the critical section re-entered. The race is like this: For example, if the h_refcount is initialized as 1, cpu 0: cpu1 -------------------------------------- ----------------------------------- lock_buffer() /* test_and_set_bit / clear_buffer_locked(bh); lock_buffer() / test_and_set_bit / h_refcount = h_refcount+1; / = 2/ h_refcount = h_refcount + 1; /= 2 */ clear_buffer_locked(bh); .... ...... We lost a h_refcount here. We need a memory barrier before the buffer head lock bit being cleared to force the order of the two writes. Please apply. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:15:24 -08:00
Robert P. J. Day	82ddcb0405	[PATCH] extend the set of "__attribute__" shortcut macros Extend the set of "__attribute__" shortcut macros, and remove identical (and now superfluous) definitions from a couple of source files. based on a page at robert love's blog: http://rlove.org/log/2005102601 extend the set of shortcut macros defined in compiler-gcc.h with the following: #define __packed __attribute__((packed)) #define __weak __attribute__((weak)) #define __naked __attribute__((naked)) #define __noreturn __attribute__((noreturn)) #define __pure __attribute__((pure)) #define __aligned(x) __attribute__((aligned(x))) #define __printf(a,b) __attribute__((format(printf,a,b))) Once these are in place, it's up to subsystem maintainers to decide if they want to take advantage of them. there is already a strong precedent for using shortcuts like this in the source tree. The ones that might give people pause are "__aligned" and "__printf", but shortcuts for both of those are already in use, and in some ways very confusingly. note the two very different definitions for a macro named "ALIGNED": drivers/net/sgiseeq.c:#define ALIGNED(x) ((((unsigned long)(x)) + 0xf) & ~(0xf)) drivers/scsi/ultrastor.c:#define ALIGNED(x) __attribute__((aligned(x))) also: include/acpi/platform/acgcc.h: #define ACPI_PRINTF_LIKE(c) __attribute__ ((__format__ (__printf__, c, c+1))) Given the precedent, then, it seems logical to at least standardize on a consistent set of these macros. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:35 -08:00
Eric Sandeen	731b9a5498	[PATCH] remove ext[34]_inc_count and _dec_count - Naming is confusing, ext3_inc_count manipulates i_nlink not i_count - handle argument passed in is not used - ext3 and ext4 already call inc_nlink and dec_nlink directly in other places Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	2988a7740d	[PATCH] return ENOENT from ext3_link when racing with unlink Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is 0. Doing otherwise has the potential to corrupt the orphan inode list, because we'd wind up with an inode with a non-zero link count on the list, and it will never get properly cleaned up & removed from the orphan list before it is freed. [akpm@osdl.org: build fix] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Hugh Dickins	2e7842b887	[PATCH] fix umask when noACL kernel meets extN tuned for ACLs Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a kernel built without ACL support, then umask was ignored when creating inodes - though root or user has umask 022, touch creates files as 0666, and mkdir creates directories as 0777. This appears to have worked right until 2.6.11, when a fix to the default mode on symlinks (always 0777) assumed VFS applies umask: which it does, unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in s_flags according to s_mount_opt set according to def_mount_opts. We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test); but other filesystems only set MS_POSIXACL when ACLs are configured. We could fix this at another level; but it seems most robust to avoid setting the s_mount_opt flag in the first place (at the expense of more ifdefs). Likewise don't set the XATTR_USER flag when built without XATTR support. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Alexey Dobriyan	9bbf81e483	[PATCH] seq_file conversion: coda Compile-tested. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	ead6596b9e	[PATCH] ext4: refuse ro to rw remount of fs with orphan inodes In the rare case where we have skipped orphan inode processing due to a readonly block device, and the block device subsequently changes back to read-write, disallow a remount,rw transition of the filesystem when we have an unprocessed orphan inodes as this would corrupt the list. Ideally we should process the orphan inode list during the remount, but that's trickier, and this plugs the hole for now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	ea9a05a133	[PATCH] ext3: refuse ro to rw remount of fs with orphan inodes In the rare case where we have skipped orphan inode processing due to a readonly block device, and the block device subsequently changes back to read-write, disallow a remount,rw transition of the filesystem when we have an unprocessed orphan inodes as this would corrupt the list. Ideally we should process the orphan inode list during the remount, but that's trickier, and this plugs the hole for now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Andrew Morton	100bb9349e	[PATCH] proc_misc warning fix fs/proc/proc_misc.c: In function 'proc_misc_init': fs/proc/proc_misc.c:764: warning: unused variable 'entry' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Olaf Hering	a470e18f53	[PATCH] msdos partitions: fix logic error in AIX detection Correct the AIX magic check to let 'echo > /dev/sdb' actually work. Signed-off-by: Olaf Hering <olh@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Olaf Hering	4419d1ac7d	[PATCH] relax check for AIX in msdos partition table The patch to identify AIX disks and ignore them has caused at least one machine to fail to find the root partition on 2.6.19. The patch is: http://lkml.org/lkml/2006/7/31/117 The problem is some disk formatters do not blow away the first 4 bytes of the disk. If the disk we are installing to used to have AIX on it, then the first 4 bytes will still have IBMA in EBCDIC. The install in question was debian etch. Im not sure what the best fix is, perhaps the AIX detection code could check more than the first 4 bytes. The whole partition info for primary partitions is in this block: dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be )) All other data do not matter, beside the 0x55aa marker at the end of the first block. Signed-off-by: Olaf Hering <olh@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Andrew Morton	fc0ecff698	[PATCH] remove invalidate_inode_pages() Convert all calls to invalidate_inode_pages() into open-coded calls to invalidate_mapping_pages(). Leave the invalidate_inode_pages() wrapper in place for now, marked as deprecated. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Eric Sandeen	d8adb9cef7	[PATCH] ext2: skip pages past number of blocks in ext2_find_entry This one was pointed out on the MOKB site: http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html If a directory's i_size is corrupted, ext2_find_entry() will keep processing pages until the i_size is reached, even if there are no more blocks associated with the directory inode. This patch puts in some minimal sanity-checking so that we don't keep checking pages (and issuing errors) if we know there can be no more data to read, based on the block count of the directory inode. This is somewhat similar in approach to the ext3 patch I sent earlier this year. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Jan Blunck	4a3b0a490d	[PATCH] igrab() should check for I_CLEAR When igrab() is calling __iget() on an inode it should check if clear_inode() has been called on the inode already. Otherwise there is a race window between clear_inode() and destroy_inode() where igrab() calls __iget() which leads to already free inodes on the inode lists. Signed-off-by: Vandana Rungta <vandana@novell.com> Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Eric Dumazet	37756ced1f	[PATCH] avoid one conditional branch in touch_atime() I added IS_NOATIME(inode) macro definition in include/linux/fs.h, true if the inode superblock is marked readonly or noatime. This new macro is then used in touch_atime() instead of separatly testing MS_RDONLY and MS_NOATIME Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:25 -08:00
Ken Chen	4662629631	[PATCH] convert ramfs to use __set_page_dirty_no_writeback As pointed out by Hugh, ramfs would also benefit from using the new set_page_dirty aop method for memory backed file systems. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Christoph Lameter	65e458d43d	[PATCH] Drop get_zone_counts() Values are available via ZVC sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:18 -08:00
Robert P. J. Day	e10a4437cb	[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag Remove the last vestiges of the long-deprecated "MAP_ANON" page protection flag: use "MAP_ANONYMOUS" instead. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:17 -08:00
Fabio Massimo Di Nitto	d18d7682c1	[PARTITION]: Add whole_disk attribute. Some partitioning systems create special partitions that span the entire disk. One example are Sun partitions, and this whole-disk partition exists to tell the firmware the extent of the entire device so it can load the boot block and do other things. Such partitions should not be treated as normal partitions, because all the other partitions overlap this whole-disk one. So we'd see multiple instances of the same UUID etc. which we do not want. udev and friends can thus search for this 'whole_disk' attribute and use it to decide to ignore the partition. Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:50:00 -08:00
David Chinner	e7ff6aed87	[XFS] Don't use kmap in xfs_iozero. kmap() is inefficient and does not scale well. kmap_atomic() is a better choice. Use the generic wrapper function instead of open coding the kmap-memset-dcache flush-kunmap stuff. SGI-PV: 960904 SGI-Modid: xfs-linux-melb:xfs-kern:28041a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:46 +11:00
Eric Sandeen	6be145bfb1	[XFS] Remove a bunch of unused functions from XFS. Patch provided by Eric Sandeen (sandeen@sandeen.net). SGI-PV: 960897 SGI-Modid: xfs-linux-melb:xfs-kern:28038a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:40 +11:00
Eric Sandeen	2c36ddeda7	[XFS] Remove unused arguments from the XFS_BTREE__ADDR macros. It makes it incrementally clearer to read the code when the top of a macro spaghetti-pile only receives the 3 arguments it uses, rather than 2 extra ones which are not used. Also when you start pulling this thread out of the sweater (i.e. remove unused args from XFS_BTREE__ADDR), a couple other third arms etc fall off too. If they're not used in the macro, then they sometimes don't need to be passed to the function calling the macro either, etc.... Patch provided by Eric Sandeen (sandeen@sandeen.net). SGI-PV: 960197 SGI-Modid: xfs-linux-melb:xfs-kern:28037a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:33 +11:00
Eric Sandeen	7bc5306d74	[XFS] Remove unused header files for MAC and CAP checking functionality. xfs_mac.h and xfs_cap.h provide definitions and macros that aren't used anywhere in XFS at all. They are left-overs from "to be implement at some point in the future" functionality that Irix XFS has. If this functionality ever goes into Linux, it will be provided at a different layer, most likely through the security hooks in the kernel so we will never need this functionality in XFS. Patch provided by Eric Sandeen (sandeen@sandeen.net). SGI-PV: 960895 SGI-Modid: xfs-linux-melb:xfs-kern:28036a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:28 +11:00
David Chinner	3c0dc77b42	[XFS] Make freeze code a little cleaner. Fixes a few small issues (mostly cosmetic) that were picked up during the review cycle for the last set of freeze path changes. SGI-PV: 959267 SGI-Modid: xfs-linux-melb:xfs-kern:28035a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:22 +11:00
Eric Sandeen	f7c99b6fc7	[XFS] Remove unused argument to xfs_bmap_finish The firstblock argument to xfs_bmap_finish is not used by that function. Remove it and cleanup the code a bit. Patch provided by Eric Sandeen. SGI-PV: 960196 SGI-Modid: xfs-linux-melb:xfs-kern:28034a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:16 +11:00
Eric Sandeen	39058a0e12	[XFS] Clean up use of VFS attr flags Use the the generic VFS attr flags where appropriate instead of open coding them to the same values. Patch provided by Eric Sandeen. SGI-PV: 960868 SGI-Modid: xfs-linux-melb:xfs-kern:28033a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:10 +11:00
Ralf Baechle	4cf3b52080	[XFS] Remove useless memory barrier wake_up's implementation does an implicit memory barrier so the explicit memory barrier is not needed in vfs_sync_worker. Patch provided by Ralf Baechle. SGI-PV: 960867 SGI-Modid: xfs-linux-melb:xfs-kern:28032a Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:04 +11:00
Eric W. Biederman	3a68cbfe02	[XFS] XFS sysctl cleanups Removes unneeded sysctl insert at head behaviour. Cleans up sysctl definitions to use C99 initialisers. Patch provided by Eric W. Biederman. SGI-PV: 960192 SGI-Modid: xfs-linux-melb:xfs-kern:28031a Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:59 +11:00
Lachlan McIlroy	c167b77d5e	[XFS] Fix assertion in xfs_attr_shortform_remove(). SGI-PV: 960791 SGI-Modid: xfs-linux-melb:xfs-kern:28021a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:53 +11:00
Lachlan McIlroy	6816016137	[XFS] Fix callers of xfs_iozero() to zero the correct range. The problem is the two callers of xfs_iozero() are rounding out the range to be zeroed to the end of a fsb and in some cases this extends past the new eof. The call to commit_write() in xfs_iozero() will cause the Linux inode's file size to be set too high. SGI-PV: 960788 SGI-Modid: xfs-linux-melb:xfs-kern:28013a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:47 +11:00
David Chinner	2823945fda	[XFS] Ensure a frozen filesystem has a clean log before writing the dummy record. The current Linux XFS freeze code is a mess. We flush the metadata buffers out while we are still allowing new transactions to start and then fail to flush the dirty buffers back out before writing the unmount and dummy records to the log. This leads to problems when the frozen filesystem is used for snapshots - we do log recovery on a readonly image and often it appears that the log image in the snapshot is not correct. Hence we end up with hangs, oops and mount failures when trying to mount a snapshot image that has been created when the filesystem has not been correctly frozen. To fix this, we need to move th metadata flush to after we wait for all current transactions to complete in teh second stage of the freeze. This means that when we write the final log records, the log should be clean and recovery should never occur on a snapshot image created from a frozen filesystem. SGI-PV: 959267 SGI-Modid: xfs-linux-melb:xfs-kern:28010a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:40 +11:00
David Chinner	549054afad	[XFS] Fix sub-block zeroing for buffered writes into unwritten extents. When writing less than a filesystem block of data into an unwritten extent via buffered I/O, __xfs_get_blocks fails to set the buffer new flag. As a result, the generic code will not zero either edge of the block resulting in garbage being written to disk either side of the real data. Set the buffer new state on bufferd writes to unwritten extents to ensure that zeroing occurs. SGI-PV: 960328 SGI-Modid: xfs-linux-melb:xfs-kern:28000a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:35 +11:00
Lachlan McIlroy	5478eead85	[XFS] Re-initialize the per-cpu superblock counters after recovery. After filesystem recovery the superblock is re-read to bring in any changes. If the per-cpu superblock counters are not re-initialized from the superblock then the next time the per-cpu counters are disabled they might overwrite the global counter with a bogus value. SGI-PV: 957348 SGI-Modid: xfs-linux-melb:xfs-kern:27999a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:29 +11:00
Kevin Jamieson	c97be73605	[XFS] Fix block reservation changes for non-SMP systems. SGI-PV: 956323 SGI-Modid: xfs-linux-melb:xfs-kern:27940a Signed-off-by: Kevin Jamieson <kjamieson@bycast.com> Signed-off-by: David Chatterton <chatz@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:23 +11:00
David Chinner	dbcabad19a	[XFS] Fix block reservation mechanism. The block reservation mechanism has been broken since the per-cpu superblock counters were introduced. Make the block reservation code work with the per-cpu counters by syncing the counters, snapshotting the amount of available space and then doing a modifcation of the counter state according to the result. Continue in a loop until we either have no space available or we reserve some space. SGI-PV: 956323 SGI-Modid: xfs-linux-melb:xfs-kern:27895a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:17 +11:00
David Chinner	20f4ebf2bf	[XFS] Make growfs work for amounts greater than 2TB The free block modification code has a 32bit interface, limiting the size the filesystem can be grown even on 64 bit machines. On 32 bit machines, there are other 32bit variables in transaction structures and interfaces that need to be expanded to allow this to work. SGI-PV: 959978 SGI-Modid: xfs-linux-melb:xfs-kern:27894a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:10 +11:00
David Chinner	f74eaf59b3	[XFS] Fix inode log item use-after-free on forced shutdown SGI-PV: 959388 SGI-Modid: xfs-linux-melb:xfs-kern:27805a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:04 +11:00
Barry Naujok	e5889e90dd	[XFS] Fix attr2 corruption with btree data extents SGI-PV: 958747 SGI-Modid: xfs-linux-melb:xfs-kern:27792a Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Russell Cattelan <cattelan@thebarn.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:58 +11:00
Vlad Apostolov	7666ab5fb3	[XFS] Workaround log space issue by increasing XFS_TRANS_PUSH_AIL_RESTARTS SGI-PV: 959264 SGI-Modid: xfs-linux-melb:xfs-kern:27750a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: David Chatterton <chatz@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:52 +11:00
Lachlan McIlroy	5180602e6f	[XFS] remove unused filp from ioctl functions SGI-PV: 959140 SGI-Modid: xfs-linux-melb:xfs-kern:27712a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:46 +11:00
Lachlan McIlroy	a3227fb996	[XFS] mraccessf & mrupdatef are supposed to be the "flags" versions of the functions, but they a) ignore the flags parameter completely, and b) are never called directly, only via the flag-less defines anyway So, drop the #define indirection, and rename mraccessf to mraccess, etc. SGI-PV: 959138 SGI-Modid: xfs-linux-melb:xfs-kern:27711a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:40 +11:00
Lachlan McIlroy	1f9b3b64d4	[XFS] remove unused xflags parameter from sync routines SGI-PV: 959137 SGI-Modid: xfs-linux-melb:xfs-kern:27710a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:33 +11:00
Lachlan McIlroy	1c91ad3aed	[XFS] fix sparse warning in xfs_da_btree.c SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:27702a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:27 +11:00
Lachlan McIlroy	e5eb7f202b	[XFS] use struct kvec in struct uio SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:27701a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:21 +11:00
David Chinner	03135cf726	[XFS] Fix UP build breakage due to undefined m_icsb_mutex. SGI-PV: 952227 SGI-Modid: xfs-linux-melb:xfs-kern:27692a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:15 +11:00
David Chinner	20b642858b	[XFS] Reduction global superblock lock contention near ENOSPC. The existing per-cpu superblock counter code uses the global superblock spin lock when we approach ENOSPC for global synchronisation. On larger machines than this code was originally tested on this can still get catastrophic spinlock contention due increasing rebalance frequency near ENOSPC. By introducing a sleeping lock that is used to serialise balances and modifications near ENOSPC we prevent contention from needlessly from wasting the CPU time of potentially hundreds of CPUs. To reduce the number of balances occuring, we separate the need rebalance case from the slow allocate case. Now, a counter running dry will trigger a rebalance during which counters are disabled. Any thread that sees a disabled counter enters a different path where it waits on the new mutex. When it gets the new mutex, it checks if the counter is disabled. If the counter is disabled, then we _know_ that we have to use the global counter and lock and it is safe to do so immediately. Otherwise, we drop the mutex and go back to trying the per-cpu counters which we know were re-enabled. SGI-PV: 952227 SGI-Modid: xfs-linux-melb:xfs-kern:27612a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:09 +11:00
Eric Sandeen	804195b63a	[XFS] Get rid of old 5.3/6.1 v1 log items. Cleanup patch sent in by Eric Sandeen. SGI-PV: 958736 SGI-Modid: xfs-linux-melb:xfs-kern:27596a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:35:02 +11:00
David Chinner	7989cb8ef5	[XFS] Keep stack usage down for 4k stacks by using noinline. gcc-4.1 and more recent aggressively inline static functions which increases XFS stack usage by ~15% in critical paths. Prevent this from occurring by adding noinline to the STATIC definition. Also uninline some functions that are too large to be inlined and were causing problems with CONFIG_FORCED_INLINING=y. Finally, clean up all the different users of inline, __inline and __inline__ and put them under one STATIC_INLINE macro. For debug kernels the STATIC_INLINE macro uninlines those functions. SGI-PV: 957159 SGI-Modid: xfs-linux-melb:xfs-kern:27585a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: David Chatterton <chatz@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:34:56 +11:00
David Chinner	5e6a07dfe4	[XFS] Current usage of buftarg flags is incorrect. The {test,set,clear}_bit() operations take a bit index for the bit to operate on. The XBT_* flags are defined as bit fields which is incorrect, not to mention the way the bit fields are enumerated is broken too. This was only working by chance. Fix the definitions of the flags and make the code using them use the {test,set,clear}_bit() operations correctly. SGI-PV: 958639 SGI-Modid: xfs-linux-melb:xfs-kern:27565a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:34:49 +11:00
Lachlan McIlroy	dc74eaad8c	[XFS] Prevent buffer overrun in cmn_err(). The message buffer used by cmn_err() is only 256 bytes and some CXFS messages were exceeding this length. Since we were using vsprintf() and not checking for buffer overruns we were clobbering memory beyond the buffer. The size of the buffer has been increased to 1024 bytes so we can capture these larger messages and we are now using vsnprintf() to prevent overrunning the buffer size. SGI-PV: 958599 SGI-Modid: xfs-linux-melb:xfs-kern:27561a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Geoffrey Wehrman <gwehrman@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:34:38 +11:00
David Chinner	585e6d8856	[XFS] Fix a synchronous buftarg flush deadlock when freezing. At the last stage of a freeze, we flush the buftarg synchronously over and over again until it succeeds twice without skipping any buffers. The delwri list flush skips pinned buffers, but tries to flush all others. It removes the buffers from the delwri list, then tries to lock them one at a time as it traverses the list to issue the I/O. It holds them locked until we issue all of the I/O and then unlocks them once we've waited for it to complete. The problem is that during a freeze, the filesystem may still be doing stuff - like flushing delalloc data buffers - in the background and hence we can be trying to lock buffers that were on the delwri list at the same time. Hence we can get ABBA deadlocks between threads doing allocation and the buftarg flush (freeze) thread. Fix it by skipping locked (and pinned) buffers as we traverse the delwri buffer list. SGI-PV: 957195 SGI-Modid: xfs-linux-melb:xfs-kern:27535a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:32:29 +11:00
David Chinner	dac61f521b	[XFS] Make quiet mounts quiet The XFS quiet mount logic was inverted making quiet mounts noisy and vice versa. Fix it. SGI-PV: 958469 SGI-Modid: xfs-linux-melb:xfs-kern:27520a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:27:56 +11:00
Dave Kleikamp	c9e3ad6021	JFS: Get rid of "may be used uninitialized" warnings Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2007-02-09 15:36:47 -06:00
Greg Ungerer	72613e5f44	[PATCH] uclinux: correctly remap bin_fmtflat exe allocated mem regions remap() the region we get from mmap() to mark the fact that we are using all of the available slack space. Any slack space is used to form a simple brk region, and potentially more stack space than requested at load time. Any searches of the vma chain may well fail looking for stack (and especially arg) addresses if the remaping is not done. The simplest example is /proc/<pid>/cmdline, since the args are pretty much always at the top of the data/bss/stack region. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 10:45:33 -08:00
Adrian Bunk	835d90c421	[PATCH] v9fs_vfs_mkdir(): fix a double free Fix a double free of "dfid" introduced by commit `da977b2c7e` and spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:47 -08:00
Ken Chen	6649a38632	[PATCH] hugetlb: preserve hugetlb pte dirty state __unmap_hugepage_range() is buggy that it does not preserve dirty state of huge_pte when unmapping hugepage range. It causes data corruption in the event of dop_caches being used by sys admin. For example, an application creates a hugetlb file, modify pages, then unmap it. While leaving the hugetlb file alive, comes along sys admin doing a "echo 3 > /proc/sys/vm/drop_caches". drop_pagecache_sb() will happily free all pages that aren't marked dirty if there are no active mapping. Later when application remaps the hugetlb file back and all data are gone, triggering catastrophic flip over on application. Not only that, the internal resv_huge_pages count will also get all messed up. Fix it up by marking page dirty appropriately. Signed-off-by: Ken Chen <kenchen@google.com> Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: <stable@kernel.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:46 -08:00
Evgeniy Dushistov	f336953bfd	[PATCH] ufs: restore back support of openstep This is a fix of regression, which triggered by ~2.6.16. Patch with name ufs-directory-and-page-cache-from-blocks-to-pages.patch: in additional to conversation from block to page cache mechanism added new checks of directory integrity, one of them that directory entry do not across directory chunks. But some kinds of UFS: OpenStep UFS and Apple UFS (looks like these are the same filesystems) have different directory chunk size, then common UFSes(BSD and Solaris UFS). So this patch adds ability to works with variable size of directory chunks, and set it for ufstype=openstep to right size. Tested on darwin ufs. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:46 -08:00
Al Viro	58addbffdd	[PATCH] dlm: use kern_recvmsg() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:14:06 -08:00
Artem Bityutskiy	a7a6ace140	[JFFS2] Use MTD_OOB_AUTO to automatically place cleanmarker on NAND Nowadays MTD supports an MTD_OOB_AUTO option which allows users to access free bytes in NAND's OOB as a contiguous buffer, although it may be highly discontinuous. This patch teaches JFFS2 to use this nice feature instead of the old MTD_OOB_PLACE option. This for example caused problems with OneNAND. Now JFFS2 does not care how are the free bytes situated. This may change position of the clean marker on some flashes, but this is not a problem. JFFS2 will just re-erase the empty eraseblocks and write the new (correct) clean marker. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-02-09 15:34:08 +00:00
Dmitry Adamushko	cfa72397cf	JFFS2: memory leak in jffs2_do_mount_fs() If jffs2_sum_init() fails, c->blocks is not freed neither in jffs2_do_mount_fs() nor in jffs2_do_fill_super(). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-02-09 15:00:21 +00:00
David S. Miller	9783e1df7a	Merge branch 'HEAD' of master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6 Conflicts: crypto/Kconfig	2007-02-08 15:25:18 -08:00
Linus Torvalds	5986a2ec35	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2: (22 commits) configfs: Zero terminate data in configfs attribute writes. [PATCH] ocfs2 heartbeat: clean up bio submission code ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages [PATCH] ocfs2: drop INET from Kconfig, not needed ocfs2_dlm: Add timeout to dlm join domain ocfs2_dlm: Silence some messages during join domain ocfs2_dlm: disallow a domain join if node maps mismatch ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres ocfs2: Binds listener to the configured ip address ocfs2_dlm: Calling post handler function in assert master handler ocfs2: Added post handler callable function in o2net message handler ocfs2_dlm: Cookies in locks not being printed correctly in error messages ocfs2_dlm: Silence a failed convert ocfs2_dlm: wake up sleepers on the lockres waitqueue ocfs2_dlm: Dlm dispatch was stopping too early ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres ocfs2_dlm: Flush dlm workqueue before starting to migrate ocfs2_dlm: Fix migrate lockres handler queue scanning ocfs2_dlm: Make dlmunlock() wait for migration to complete ocfs2_dlm: Fixes race between migrate and dirty ...	2007-02-08 10:37:22 -08:00
Steve French	7ba526316a	[CIFS] Allow update of EOF on remote extend of file Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-02-08 18:14:13 +00:00
Steve French	595dcfecf6	[CIFS] POSIX CIFS Extensions (continued) - POSIX Open Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-02-08 18:11:42 +00:00
Joel Becker	ff05d1c464	configfs: Zero terminate data in configfs attribute writes. Attributes in configfs are text files. As such, most handlers expect to be able to call functions like simple_strtoul() without checking the bounds of the buffer. Change the call to zero terminate the buffer before calling the client's ->store() method. This does reduce the attribute size from PAGE_SIZE to PAGE_SIZE-1. Also, change get_zeroed_page() to alloc_page(), as we are handling the termination. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:17:08 -08:00
Philipp Reisner	b559292e06	[PATCH] ocfs2 heartbeat: clean up bio submission code As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700 that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily predictable manner. That is not true, especially for devices with own merge_bvec_fn(). Therefore OCFS2's heartbeat code is very likely to fail on such devices. Move the bio_put() call into the bio's bi_end_io() function. This makes the whole idea of trying to predict the behaviour of bio_add_page() unnecessary. Removed compute_max_sectors() and o2hb_compute_request_limits(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:15:58 -08:00
Zhen Wei	925037bcba	ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages When there is a lot of multithreaded I/O usage, two threads can collide while sending out a message to the other nodes. This is due to the lack of locking between threads while sending out the messages. When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects itself by acquiring a lock at invocation by calling lock_sock(). tcp_sendmsg() then loops over the buffers in the iovec, allocating associated sk_buff's and cache pages for use in the actual send. As it does so, it pushes the data out to tcp for actual transmission. However, if one of those allocation fails (because a large number of large sends is being processed, for example), it must wait for memory to become available. It does so by jumping to wait_for_sndbuf or wait_for_memory, both of which eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory() contains a code path that calls sk_wait_event(). Finally, sk_wait_event() contains the call to release_sock(). The following patch adds a lock to the socket container in order to properly serialize outbound requests. From: Zhen Wei <zwei@novell.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:15:11 -08:00
Randy Dunlap	f71aa8a55a	[PATCH] ocfs2: drop INET from Kconfig, not needed OCFS2: drop 'depends on INET' since local mounts are now allowed. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:14:27 -08:00
Sunil Mushran	0dd82141b2	ocfs2_dlm: Add timeout to dlm join domain Currently the ocfs2 dlm has no timeout during dlm join domain. While this is not a problem in normal operation, this does become an issue if, say, the other node is refusing to let the node join the domain because of a stuck recovery. This patch adds a 90 sec timeout. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:10:39 -08:00
Sunil Mushran	e4968476a9	ocfs2_dlm: Silence some messages during join domain These messages can easily be activated using the mlog infrastructure and don't need to be enabled by default. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:10:14 -08:00
Srinivas Eeda	1faf289454	ocfs2_dlm: disallow a domain join if node maps mismatch There is a small window where a joining node may not see the node(s) that just died but are still part of the domain. To fix this, we must disallow join requests if the joining node has a different node map. A new field node_map is added to dlm_query_join_request to send the current nodes nodemap along with join request. On the receiving end the nodes that are part of the cluster verifies if this new node sees all the nodes that are still part of the cluster. They disallow the join if the maps mismatch. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:09:14 -08:00
Sunil Mushran	f3f854648d	ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres Eventhough the set refmap bit message is sent before the clear refmap message, currently there is no guarentee that the set message will be handled before the clear. This patch prevents the clear refmap to be processed while the node is sending assert master messages to other nodes. (The set refmap message is sent as a response to the assert master request). Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:08:13 -08:00
Sunil Mushran	ab81afd30b	ocfs2: Binds listener to the configured ip address This patch binds the o2net listener to the configured ip address instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:07:49 -08:00
Kurt Hackel	3b8118cffa	ocfs2_dlm: Calling post handler function in assert master handler This patch prevents the dlm from sending the clear refmap message before the set refmap. We use the newly created post function handler routine to accomplish the task. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:07:24 -08:00
Kurt Hackel	d74c9803a9	ocfs2: Added post handler callable function in o2net message handler Currently o2net allows one handler function per message type. This patch adds the ability to call another function to be called after the handler has returned the message to the other node. Handlers are now given the option of returning a context (in the form of a void **) which will be passed back into the post message handler function. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:06:56 -08:00
Kurt Hackel	74aa25856c	ocfs2_dlm: Cookies in locks not being printed correctly in error messages The dlm encodes the node number and a sequence number in the lock cookie. It also stores the cookie in the lockres in the big endian format to avoid swapping 8 bytes on each lock request. The bug here was that it was assuming the cookie to be in the cpu format when decoding it for printing the error message. This patch swaps the bytes before the print. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:06:24 -08:00
Kurt Hackel	90aaaf1c23	ocfs2_dlm: Silence a failed convert When the lockres is in migrate or recovery state, all convert requests are denied with the appropriate error status that is handled on the requester node. This patch silences the erroneous error message printed on the master node. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:05:48 -08:00
Kurt Hackel	a6fa36402a	ocfs2_dlm: wake up sleepers on the lockres waitqueue The dlm was not waking up threads waiting on the lockres wait queue, waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS and the DLM_LOCK_RES_MIGRATING states. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:05:19 -08:00
Kurt Hackel	28b72d9c92	ocfs2_dlm: Dlm dispatch was stopping too early dlm_dispatch_work was not processing the queued up tasks at the first sign of the node leaving the domain leading to not only incompleted tasks but also a mismatch in the dlm refcnt. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:04:27 -08:00
Kurt Hackel	50635f15b3	ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:03:42 -08:00
Kurt Hackel	1cd04dbe33	ocfs2_dlm: Flush dlm workqueue before starting to migrate This is to prevent the condition in which a previously queued up assert master asserts after we start the migration. Now migration ensures the workqueue is flushed before proceeding with migrating the lock to another node. This condition is typically encountered during parallel umounts. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:03:02 -08:00
Kurt Hackel	e17e75ecb8	ocfs2_dlm: Fix migrate lockres handler queue scanning The migrate lockres handler was only searching for its lock on migrated lockres on the expected queue. This could be problematic as the new master could have also issued a convert request during the migration and thus moved the lock to the convert queue. We now search for the lock on all three queues. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:02:40 -08:00
Kurt Hackel	71ac106243	ocfs2_dlm: Make dlmunlock() wait for migration to complete dlmunlock() was not waiting for migration to complete before releasing locks on locally mastered locks. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:01:49 -08:00
Kurt Hackel	ddc09c8dda	ocfs2_dlm: Fixes race between migrate and dirty dlmthread was removing lockres' from the dirty list and resetting the dirty flag before shuffling the list. This patch retains the dirty state flag until the lists are shuffled. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 12:00:57 -08:00
Adrian Bunk	faf0ec9f13	[PATCH] fs/ocfs2/dlm/: make functions static This patch makes some needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 11:53:31 -08:00
Kurt Hackel	ba2bf21851	ocfs2_dlm: fix cluster-wide refcounting of lock resources This was previously broken and migration of some locks had to be temporarily disabled. We use a new (and backward-incompatible) set of network messages to account for all references to a lock resources held across the cluster. once these are all freed, the master node may then free the lock resource memory once its local references are dropped. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-02-07 11:53:07 -08:00
Eric W. Biederman	b592fcfe7f	sysfs: Shadow directory support The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/*. What I want is a separate /sys/class/net directory in sysfs for each network namespace, and I want to name each of them /sys/class/net. I looked and the VFS actually allows that. All that is needed is for /sys/class/net to implement a follow link method to redirect lookups to the real directory you want. Implementing a follow link method that is sensitive to the current network namespace turns out to be 3 lines of code so it looks like a clean approach. Modifying sysfs so it doesn't get in my was is a bit trickier. I am calling the concept of multiple directories all at the same path in the filesystem shadow directories. With the directory entry really at that location the shadow master. The following patch modifies sysfs so it can handle a directory structure slightly different from the kobject tree so I can implement the shadow directories for handling /sys/class/net/. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:14 -08:00

1 2 3 4 5 ...

5047 Commits