android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Josef Bacik	ff782e0a13	Btrfs: optimize fsync for the single writer case This patch optimizes the tree logging stuff so it doesn't always wait 1 jiffie for new people to join the logging transaction if there is only ever 1 writer. This helps a little bit with latency where we have something like RPM where it will fdatasync every file it writes, and so waiting the 1 jiffie for every fdatasync really starts to add up. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-08 15:30:04 -04:00
Josef Bacik	e3ccfa9897	Btrfs: async delalloc flushing under space pressure This patch moves the delalloc flushing that occurs when we are under space pressure off to a async thread pool. This helps since we only free up metadata space when we actually insert the extent item, which means it takes quite a while for space to be free'ed up if we wait on all ordered extents. However, if space is freed up due to inline extents being inserted, we can wake people who are waiting up early, and they can finish their work. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-08 15:21:23 -04:00
Josef Bacik	32c00aff71	Btrfs: release delalloc reservations on extent item insertion This patch fixes an issue with the delalloc metadata space reservation code. The problem is we used to free the reservation as soon as we allocated the delalloc region. The problem with this is if we are not inserting an inline extent, we don't actually insert the extent item until after the ordered extent is written out. This patch does 3 things, 1) It moves the reservation clearing stuff into the ordered code, so when we remove the ordered extent we remove the reservation. 2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear delalloc bits in the cases where we want to clear the metadata reservation when we clear the delalloc extent, in the case that we do an inline extent or we invalidate the page. 3) It adds another waitqueue to the space info so that when we start a fs wide delalloc flush, anybody else who also hits that area will simply wait for the flush to finish and then try to make their allocation. This has been tested thoroughly to make sure we did not regress on performance. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-08 15:21:10 -04:00
Chris Mason	a3429ab70b	Btrfs: delay clearing EXTENT_DELALLOC for compressed extents When compression is on, the cow_file_range code is farmed off to worker threads. This allows us to do significant CPU work in parallel on SMP machines. But it is a delicate balance around when we clear flags and how. In the past we cleared the delalloc flag immediately, which was safe because the pages stayed locked. But this is causing problems with the newest ENOSPC code, and with the recent extent state cleanups we can now clear the delalloc bit at the same time the uncompressed code does. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-08 15:11:50 -04:00
Chris Mason	a791e35e12	Btrfs: cleanup extent_clear_unlock_delalloc flags extent_clear_unlock_delalloc has a growing set of ugly parameters that is very difficult to read and maintain. This switches to a flag field and well named flag defines. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-08 15:11:49 -04:00
Alex Elder	e09d39968b	Merge branch 'master' into for-linus	2009-10-08 13:53:44 -05:00
Christoph Hellwig	d0800703fe	xfs: stop calling filemap_fdatawait inside ->fsync Now that the VFS actually waits for the data I/O to complete before calling into ->fsync we can stop doing it ourselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:02:48 -05:00
Eric Sandeen	8e69ce1471	fix readahead calculations in xfs_dir2_leaf_getdents() This is for bug #850, http://oss.sgi.com/bugzilla/show_bug.cgi?id=850 XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31 The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because xfs_bmapi() ASSERTs that it has been asked for at least one map, and it was getting 0. The root cause is that our guesstimated "bufsize" from xfs_file_readdir was fairly small, and the bufsize -= length; in the loop was going negative - except bufsize is a size_t, so it was wrapping to a very large number. Then when we did ra_want = howmany(bufsize + mp->m_dirblksize, mp->m_sb.sb_blocksize) - 1; with that very large number, the (int) ra_want was coming out negative, and a subsequent compare: if (1 + ra_want > map_blocks ... was coming out -true- (negative int compare w/ uint) and we went back to xfs_bmapi() for more, even though we did not need more, and asked for 0 maps, and hit the ASSERT. We have kind of a type mess here, but just keeping bufsize from going negative is probably sufficient to avoid the problem. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:02:12 -05:00
Dave Chinner	dce5065a57	xfs: make sure xfs_sync_fsdata covers the log We want to always cover the log after writing out the superblock, and in case of a synchronous writeout make sure we actually wait for the log to be covered. That way a filesystem that has been sync()ed can be considered clean by log recovery. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:01:49 -05:00
Dave Chinner	932640e8ad	xfs: mark inodes dirty before issuing I/O To make sure they get properly waited on in sync when I/O is in flight and we latter need to update the inode size. Requires a new helper to check if an ioend structure is beyond the current EOF. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:01:26 -05:00
Christoph Hellwig	69961a26b8	xfs: cleanup ->sync_fs Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case as that is just an optional first pass and the superblock will be written back properly in the next call with wait = 1. Instead perform an opportunistic quota writeback to have less work later. Also remove the freeze special case as we do a proper wait = 1 call in the freeze code anyway. Also rename the function to xfs_fs_sync_fs to match the normal naming convention, update comments and avoid calling into the laptop_mode logic on an error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:01:03 -05:00
Dave Chinner	c90b07e8dd	xfs: fix xfs_quiesce_data We need to do a synchronous xfs_sync_fsdata to make sure the superblock actually is on disk when we return. Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular flag is never checked. Move xfs_filestream_flush call later to only release inodes after they have been written out. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:00:36 -05:00
Christoph Hellwig	f9581b1443	xfs: implement ->dirty_inode to fix timestamp handling This is picking up on Felix's repost of Dave's patch to implement a .dirty_inode method. We really need this notification because the VFS keeps writing directly into the inode structure instead of going through methods to update this state. In addition to the long-known atime issue we now also have a caller in VM code that updates c/mtime that way for shared writeable mmaps. And I found another one that no one has noticed in practice in the FIFO code. So implement ->dirty_inode to set i_update_core whenever the inode gets externally dirtied, and switch the c/mtime handling to the same scheme we already use for atime (always picking up the value from the Linux inode). Note that this patch also removes the xfs_synchronize_atime call in xfs_reclaim it was superflous as we already synchronize the time when writing the inode via the log (xfs_inode_item_format) or the normal buffers (xfs_iflush_int). In addition also remove the I_CLEAR check before copying the Linux timestamps - now that we always have the Linux inode available we can always use the timestamps in it. Also switch to just using file_update_time for regular reads/writes - that will get us all optimization done to it for free and make sure we notice early when it breaks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:00:03 -05:00
Mimi Zohar	36520be8e3	ima: ecryptfs fix imbalance message The unencrypted files are being measured. Update the counters to get rid of the ecryptfs imbalance message. (http://bugzilla.redhat.com/519737) Reported-by: Sachin Garg Cc: Eric Paris <eparis@redhat.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: James Morris <jmorris@namei.org> Cc: David Safford <safford@watson.ibm.com> Cc: stable@kernel.org Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2009-10-08 11:31:38 -05:00
Tyler Hicks	ed1f21857e	eCryptfs: Remove Kconfig NET dependency and select MD5 eCryptfs no longer uses a netlink interface to communicate with ecryptfsd, so NET is not a valid dependency anymore. MD5 is required and must be built for eCryptfs to be of any use. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2009-10-08 11:31:36 -05:00
Randy Dunlap	664fc5a4e7	ecryptfs: depends on CRYPTO ecryptfs uses crypto APIs so it should depend on CRYPTO. Otherwise many build errors occur. [63 lines not pasted] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: ecryptfs-devel@lists.launchpad.net Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2009-10-08 11:21:12 -05:00
Trond Myklebust	3050141bae	NFSv4: Kill nfs4_renewd_prepare_shutdown() The NFSv4 renew daemon is shared between all active super blocks that refer to a particular NFS server, so it is wrong to be shutting it down in nfs4_kill_super every time a super block is destroyed. This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and leaves it up to nfs4_shutdown_client() to also shut down the renew daemon by means of the existing call to nfs4_kill_renewd(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-08 11:50:55 -04:00
Wu Fengguang	253fb02d62	pagemap: export KPF_HWPOISON This flag indicates a hardware detected memory corruption on the page. Any future access of the page data may bring down the machine. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-08 07:36:39 -07:00
Jaswinder Singh Rajput	4055e97318	fs: includecheck fix: proc, kcore.c fix the following 'make includecheck' warning: fs/proc/kcore.c: linux/mm.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-08 07:36:38 -07:00
Trond Myklebust	517be09def	NFSv4: Fix the referral mount code Fix a typo which causes try_location() to use the wrong length argument when calling nfs_parse_server_name(). This again, causes the initialisation of the mount's sockaddr structure to fail. Also ensure that if nfs4_pathname_string() returns an error, then we pass that error back up the stack instead of ENOENT. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-06 15:42:20 -04:00
Ben Hutchings	f4373bf9e6	nfs: Avoid overrun when copying client IP address string As seen in <http://bugs.debian.org/549002>, nfs4_init_client() can overrun the source string when copying the client IP address from nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr. Since these are both treated as null-terminated strings elsewhere, the copy should be done with strlcpy() not memcpy(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-06 15:42:18 -04:00
Trond Myklebust	bcd2ea17da	NFS: Fix port initialisation in nfs_remount() The recent changeset `53a0b9c4c9` (NFS: Replace nfs_parse_ip_address() with rpc_pton()) broke nfs_remount, since the call to rpc_pton() will zero out the port number in data->nfs_server.address. This is actually due to a bug in nfs_remount: it should be looking at the port number in nfs_server.port instead... This fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14276 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-06 15:41:22 -04:00
Trond Myklebust	f5855fecda	NFS: Fix port and mountport display in /proc/self/mountinfo Currently, the port and mount port will both display as 65535 if you do not specify a port number. That would be wrong... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-06 15:40:37 -04:00
Trond Myklebust	c5811dbdd2	NFS: Fix a default mount regression... With the recent spate of changes, the nfs protocol version will now default to 2 instead of 3, while the mount protocol version defaults to 3. The following patch should ensure the defaults are consistent with the previous defaults of vers=3,proto=tcp,mountvers=3,mountproto=tcp. This fixes the bug http://bugzilla.kernel.org/show_bug.cgi?id=14259 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-06 15:40:15 -04:00
Steve French	8347a5cdd1	[CIFS] Fixing to avoid invalid kfree() in cifs_get_tcp_session() trivial bug in fs/cifs/connect.c . The bug is caused by fail of extract_hostname() when mounting cifs file system. This is the situation when I noticed this bug. % sudo mount -t cifs //192.168.10.208 mountpoint -o options... Then my kernel says, [ 1461.807776] ------------[ cut here ]------------ [ 1461.807781] kernel BUG at mm/slab.c:521! [ 1461.807784] invalid opcode: 0000 [#2] PREEMPT SMP [ 1461.807790] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:09:02.0/resource [ 1461.807793] CPU 0 [ 1461.807796] Modules linked in: nls_iso8859_1 usbhid sbp2 uhci_hcd ehci_hcd i2c_i801 ohci1394 ieee1394 psmouse serio_raw pcspkr sky2 usbcore evdev [ 1461.807816] Pid: 3446, comm: mount Tainted: G D 2.6.32-rc2-vanilla [ 1461.807820] RIP: 0010:[<ffffffff810b888e>] [<ffffffff810b888e>] kfree+0x63/0x156 [ 1461.807829] RSP: 0018:ffff8800b4f7fbb8 EFLAGS: 00010046 [ 1461.807832] RAX: ffffea00033fff98 RBX: ffff8800afbae7e2 RCX: 0000000000000000 [ 1461.807836] RDX: ffffea0000000000 RSI: 000000000000005c RDI: ffffffffffffffea [ 1461.807839] RBP: ffff8800b4f7fbf8 R08: 0000000000000001 R09: 0000000000000000 [ 1461.807842] R10: 0000000000000000 R11: ffff8800b4f7fbf8 R12: 00000000ffffffea [ 1461.807845] R13: ffff8800afb23000 R14: ffff8800b4f87bc0 R15: ffffffffffffffea [ 1461.807849] FS: 00007f52b6f187c0(0000) GS:ffff880007600000(0000) knlGS:0000000000000000 [ 1461.807852] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1461.807855] CR2: 0000000000613000 CR3: 00000000af8f9000 CR4: 00000000000006f0 [ 1461.807858] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1461.807861] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1461.807865] Process mount (pid: 3446, threadinfo ffff8800b4f7e000, task ffff8800950e4380) [ 1461.807867] Stack: [ 1461.807869] 0000000000000202 0000000000000282 ffff8800b4f7fbf8 ffff8800afbae7e2 [ 1461.807876] <0> 00000000ffffffea ffff8800afb23000 ffff8800b4f87bc0 ffff8800b4f7fc28 [ 1461.807884] <0> ffff8800b4f7fcd8 ffffffff81159f6d ffffffff81147bc2 ffffffff816bfb48 [ 1461.807892] Call Trace: [ 1461.807899] [<ffffffff81159f6d>] cifs_get_tcp_session+0x440/0x44b [ 1461.807904] [<ffffffff81147bc2>] ? find_nls+0x1c/0xe9 [ 1461.807909] [<ffffffff8115b889>] cifs_mount+0x16bc/0x2167 [ 1461.807917] [<ffffffff814455bd>] ? _spin_unlock+0x30/0x4b [ 1461.807923] [<ffffffff81150da9>] cifs_get_sb+0xa5/0x1a8 [ 1461.807928] [<ffffffff810c1b94>] vfs_kern_mount+0x56/0xc9 [ 1461.807933] [<ffffffff810c1c64>] do_kern_mount+0x47/0xe7 [ 1461.807938] [<ffffffff810d8632>] do_mount+0x712/0x775 [ 1461.807943] [<ffffffff810d671f>] ? copy_mount_options+0xcf/0x132 [ 1461.807948] [<ffffffff810d8714>] sys_mount+0x7f/0xbf [ 1461.807953] [<ffffffff8144509a>] ? lockdep_sys_exit_thunk+0x35/0x67 [ 1461.807960] [<ffffffff81011cc2>] system_call_fastpath+0x16/0x1b [ 1461.807963] Code: 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 68 48 01 d0 66 83 38 00 79 04 48 8b 40 10 66 83 38 00 79 04 48 8b 40 10 80 38 00 78 04 <0f> 0b eb fe 4c 8b 70 58 4c 89 ff 41 8b 76 4c e8 b8 49 fb ff e8 [ 1461.808022] RIP [<ffffffff810b888e>] kfree+0x63/0x156 [ 1461.808027] RSP <ffff8800b4f7fbb8> [ 1461.808031] ---[ end trace ffe26fcdc72c0ce4 ]--- The reason of this bug is that the error handling code of cifs_get_tcp_session() calls kfree() when corresponding kmalloc() failed. (The kmalloc() is called by extract_hostname().) Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> CC: Stable <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-10-06 18:31:29 +00:00
Nikanth Karthikesan	316d315bff	block: Seperate read and write statistics of in_flight requests v2 Commit `a9327cac44` added seperate read and write statistics of in_flight requests. And exported the number of read and write requests in progress seperately through sysfs. But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange output from "iostat -kx 2". Global values for service time and utilization were garbage. For interval values, utilization was always 100%, and service time is higher than normal. So this was reverted by commit `0f78ab9899` The problem was in part_round_stats_single(), I missed the following: if (now == part->stamp) return; - if (part->in_flight) { + if (part_in_flight(part)) { __part_stat_add(cpu, part, time_in_queue, part_in_flight(part) * (now - part->stamp)); __part_stat_add(cpu, part, io_ticks, (now - part->stamp)); With this chunk included, the reported regression gets fixed. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> -- Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-06 20:16:55 +02:00
Josef Bacik	1cdda9b81a	Btrfs: fix possible softlockup in the allocator Like the cluster allocating stuff, we can lockup the box with the normal allocation path. This happens when we 1) Start to cache a block group that is severely fragmented, but has a decent amount of free space. 2) Start to commit a transaction 3) Have the commit try and empty out some of the delalloc inodes with extents that are relatively large. The inodes will not be able to make the allocations because they will ask for allocations larger than a contiguous area in the free space cache. So we will wait for more progress to be made on the block group, but since we're in a commit the caching kthread won't make any more progress and it already has enough free space that wait_block_group_cache_progress will just return. So, if we wait and fail to make the allocation the next time around, just loop and go to the next block group. This keeps us from getting stuck in a softlockup. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-06 10:04:28 -04:00
Chris Mason	61d92c328c	Btrfs: fix deadlock on async thread startup The btrfs async worker threads are used for a wide variety of things, including processing bio end_io functions. This means that when the endio threads aren't running, the rest of the FS isn't able to do the final processing required to clear PageWriteback. The endio threads also try to exit as they become idle and start more as the work piles up. The problem is that starting more threads means kthreadd may need to allocate ram, and that allocation may wait until the global number of writeback pages on the system is below a certain limit. The result of that throttling is that end IO threads wait on kthreadd, who is waiting on IO to end, which will never happen. This commit fixes the deadlock by handing off thread startup to a dedicated thread. It also fixes a bug where the on-demand thread creation was creating far too many threads because it didn't take into account threads being started by other procs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-05 09:44:45 -04:00
Alexey Dobriyan	a99bbaf5ee	headers: remove sched.h from poll.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-04 15:05:10 -07:00
Linus Torvalds	58e57fbd1c	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits) Revert "Seperate read and write statistics of in_flight requests" cfq-iosched: don't delay async queue if it hasn't dispatched at all block: Topology ioctls cfq-iosched: use assigned slice sync value, not default cfq-iosched: rename 'desktop' sysfs entry to 'low_latency' cfq-iosched: implement slower async initiate and queue ramp up cfq-iosched: delay async IO dispatch, if sync IO was just done cfq-iosched: add a knob for desktop interactiveness Add a tracepoint for block request remapping block: allow large discard requests block: use normal I/O path for discard requests swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL fs/bio.c: move EXPORT* macros to line after function Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs cciss: fix build when !PROC_FS block: Do not clamp max_hw_sectors for stacking devices block: Set max_sectors correctly for stacking devices cciss: cciss_host_attr_groups should be const cciss: Dynamically allocate the drive_info_struct for each logical drive. cciss: Add usage_count attribute to each logical drive in /sys ...	2009-10-04 12:39:14 -07:00
Jens Axboe	0f78ab9899	Revert "Seperate read and write statistics of in_flight requests" This reverts commit `a9327cac44`. Corrado Zoccolo <czoccolo@gmail.com> reports: "with 2.6.32-rc1 I started getting the following strange output from "iostat -kx 2": Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10,70 0,00 3,16 15,75 0,00 70,38 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 18,22 0,00 0,67 0,01 14,77 0,02 43,94 0,01 10,53 39043915,03 2629219,87 sdb 60,89 9,68 50,79 3,04 1724,43 50,52 65,95 0,70 13,06 488437,47 2629219,87 avg-cpu: %user %nice %system %iowait %steal %idle 2,72 0,00 0,74 0,00 0,00 96,53 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 6,68 0,00 0,99 0,00 0,00 92,33 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 4,40 0,00 0,73 1,47 0,00 93,40 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 4,00 0,00 3,00 0,00 28,00 18,67 0,06 19,50 333,33 100,00 Global values for service time and utilization are garbage. For interval values, utilization is always 100%, and service time is higher than normal. I bisected it down to: [`a9327cac44`] Seperate read and write statistics of in_flight requests and verified that reverting just that commit indeed solves the issue on 2.6.32-rc1." So until this is debugged, revert the bad commit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-04 21:04:38 +02:00
Linus Torvalds	9117703fab	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: [PATCH] ext4: retry failed direct IO allocations ext4: Fix build warning in ext4_dirty_inode() ext4: drop ext4dev compat ext4: fix a BUG_ON crash by checking that page has buffers attached to it	2009-10-03 11:24:19 -07:00
Eric Sandeen	fbbf694566	[PATCH] ext4: retry failed direct IO allocations On a 256M filesystem, doing this in a loop: xfs_io -F -f -d -c 'pwrite 0 64m' test rm -f test eventually leads to ENOSPC. (the xfs_io command does a 64m direct IO write to the file "test") As with other block allocation callers, it looks like we need to potentially retry the allocations on the initial ENOSPC. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-10-02 21:20:55 -04:00
Curt Wohlgemuth	74072d0a63	ext4: Fix build warning in ext4_dirty_inode() This fixes the following warning: fs/ext4/inode.c: In function 'ext4_dirty_inode': fs/ext4/inode.c:5615: warning: unused variable 'current_handle' We remove the jbd_debug() statement which does use current_handle, as it's not terribly important in the grand scheme of things. Thanks to Stephen Rothwell for pointing this out. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-10-02 21:08:32 -04:00
Linus Torvalds	0efe5e32c8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix data space leak fix Btrfs: remove duplicates of filemap_ helpers Btrfs: take i_mutex before generic_write_checks Btrfs: fix arguments to btrfs_wait_on_page_writeback_range Btrfs: fix deadlock with free space handling and user transactions Btrfs: fix error cases for ioctl transactions Btrfs: Use CONFIG_BTRFS_POSIX_ACL to enable ACL code Btrfs: introduce missing kfree Btrfs: Fix setting umask when POSIX ACLs are not enabled Btrfs: proper -ENOSPC handling	2009-10-01 20:23:15 -07:00
Christoph Hellwig	80e50be422	afs: remove cache.h It's just a wrapper for <linux/fscache.h>, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:16 -07:00
Alexey Dobriyan	828c09509b	const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:11 -07:00
Chris Mason	9c2693c924	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus	2009-10-01 17:24:44 -04:00
Josef Bacik	fbf1908744	Btrfs: fix data space leak fix There is a problem where page_mkwrite can be called on a dirtied page that already has a delalloc range associated with it. The fix is to clear any delalloc bits for the range we are dirtying so the space accounting gets handled properly. This is the same thing we do in the normal write case, so we are consistent across the board. With this patch we no longer leak reserved space. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-01 17:10:23 -04:00
H Hartley Sweeten	a112a71d45	fs/bio.c: move EXPORT* macros to line after function As mentioned in Documentation/CodingStyle, move EXPORT* macro's to the line immediately after the closing function brace line. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-01 21:15:46 +02:00
Christoph Hellwig	8aa38c31b7	Btrfs: remove duplicates of filemap_ helpers Use filemap_fdatawrite_range and filemap_fdatawait_range instead of local copies of the functions. For filemap_fdatawait_range that also means replacing the awkward old wait_on_page_writeback_range calling convention with the regular filemap byte offsets. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-01 12:58:30 -04:00
Chris Mason	25472b880c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable into for-linus	2009-10-01 12:58:13 -04:00
Chris Mason	ab93dbecfb	Btrfs: take i_mutex before generic_write_checks btrfs_file_write was incorrectly calling generic_write_checks without taking i_mutex. This lead to problems with racing around i_size when doing O_APPEND writes. The fix here is to move i_mutex higher. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-01 12:29:10 -04:00
Christoph Hellwig	35d62a942d	Btrfs: fix arguments to btrfs_wait_on_page_writeback_range wait_on_page_writeback_range/btrfs_wait_on_page_writeback_range takes a pagecache offset, not a byte offset into the file. Shift the arguments around to wait for the correct range Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-01 10:27:01 -04:00
Eric Sandeen	f0e2dfa7f3	ext4: drop ext4dev compat Kconfig & super.c promised it'd be gone by 2.6.31, so it's about time to drop it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-10-01 02:21:07 -04:00
Theodore Ts'o	1f94533d9c	ext4: fix a BUG_ON crash by checking that page has buffers attached to it In ext4_num_dirty_pages() we were calling page_buffers() before checking to see if the page actually had pages attached to it; this would cause a BUG check crash in the inline function page_buffers(). Thanks to Markus Trippelsdorf for reporting this bug. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 22:57:41 -04:00
David Teigland	6861f35078	dlm: fix socket fd translation The code to set up sctp sockets was not using the sockfd_lookup() and sockfd_put() routines to translate an fd to a socket. The direct fget and fput calls were resulting in error messages from alloc_fd(). Also clean up two log messages and remove a third, related to setting up sctp associations. Signed-off-by: David Teigland <teigland@redhat.com>	2009-09-30 12:19:44 -05:00
David Teigland	04bedd79a7	dlm: fix lowcomms_connect_node for sctp The recently added dlm_lowcomms_connect_node() from `391fbdc5d5` does not work when using SCTP instead of TCP. The sctp connection code has nothing to do without data to send. Check for no data in the sctp connection code and do nothing instead of triggering a BUG. Also have connect_node() do nothing when the protocol is sctp. Signed-off-by: David Teigland <teigland@redhat.com>	2009-09-30 12:19:44 -05:00
Linus Torvalds	9abf47f11b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix missing initialization of i_dir_start_lookup member nilfs2: fix missing zero-fill initialization of btree node cache	2009-09-30 09:42:24 -07:00
Linus Torvalds	9f44fdc518	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix time encoding with extra epoch bits ext4: Add a stub for mpage_da_data in the trace header jbd2: Use tracepoints for history file ext4: Use tracepoints for mb_history trace file ext4, jbd2: Drop unneeded printks at mount and unmount time ext4: Handle nested ext4_journal_start/stop calls without a journal ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode ext4: Avoid updating the inode table bh twice in no journal mode ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first ext4: async direct IO for holes and fallocate support ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O ext4: Split uninitialized extents for direct I/O ext4: release reserved quota when block reservation for delalloc retry ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks ext4: Fix hueristic which avoids group preallocation for closed files ext4: Use ext4_msg() for ext4_da_writepage() errors ext4: Update documentation about quota mount options	2009-09-30 09:32:30 -07:00
Linus Torvalds	4c8f1cb266	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Check s_dirt in fat_sync_fs() vfat: change the default from shortname=lower to shortname=mixed fat/nls: Fix handling of utf8 invalid char	2009-09-30 09:31:14 -07:00
Theodore Ts'o	c1fccc0696	ext4: Fix time encoding with extra epoch bits "Looking at ext4.h, I think the setting of extra time fields forgets to mask the epoch bits so the epoch part overwrites nsec part. The second change is only for coherency (2 -> EXT4_EPOCH_BITS)." Thanks to Damien Guibouret for pointing out this problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 01:13:55 -04:00
Theodore Ts'o	bf6993276f	jbd2: Use tracepoints for history file The /proc/fs/jbd2/<dev>/history was maintained manually; by using tracepoints, we can get all of the existing functionality of the /proc file plus extra capabilities thanks to the ftrace infrastructure. We save memory as a bonus. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 00:32:06 -04:00
Theodore Ts'o	296c355cd6	ext4: Use tracepoints for mb_history trace file The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 00:32:42 -04:00
Sage Weil	dd7e0b7b02	Btrfs: fix deadlock with free space handling and user transactions If an ioctl-initiated transaction is open, we can't force a commit during the free space checks in order to free up pinned extents or else we deadlock. Just ENOSPC instead. A more satisfying solution that reserves space for the entire user transaction up front is forthcoming... Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 19:50:07 -04:00
Sage Weil	1ab86aedbc	Btrfs: fix error cases for ioctl transactions Fix leak of vfsmount write reference and open_ioctl_trans reference on ENOMEM. Clean up the error paths while we're at it. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 18:38:44 -04:00
Theodore Ts'o	90576c0b9a	ext4, jbd2: Drop unneeded printks at mount and unmount time There are a number of kernel printk's which are printed when an ext4 filesystem is mounted and unmounted. Disable them to economize space in the system logs. In addition, disabling the mballoc stats by default saves a number of unneeded atomic operations for every block allocation or deallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 15:51:30 -04:00
Chris Ball	3baf0bed0a	Btrfs: Use CONFIG_BTRFS_POSIX_ACL to enable ACL code We've already defined CONFIG_BTRFS_POSIX_ACL in Kconfig, but we're currently not using it and are testing CONFIG_FS_POSIX_ACL instead. CONFIG_FS_POSIX_ACL states "Never use this symbol for ifdefs". Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 13:51:05 -04:00
Julia Lawall	fd2696f399	Btrfs: introduce missing kfree Error handling code following a kzalloc should free the allocated data. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,f1,l; position p1,p2; expression ptr != NULL; @@ x@p1 = \(kmalloc\\|kzalloc\\|kcalloc\)(...); ... if (x == NULL) S <... when != x when != if (...) { <+...x...+> } ( x->f1 = E \| (x->f1 == NULL \|\| ...) \| f(...,x->f1,...) ) ...> ( return \(0\\|<+...x...+>\\|ptr\); \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 13:51:04 -04:00
Chris Ball	49cf6f4529	Btrfs: Fix setting umask when POSIX ACLs are not enabled We currently set sb->s_flags \|= MS_POSIXACL unconditionally, which is incorrect -- it tells the VFS that it shouldn't set umask because we will, yet we don't set it ourselves if we aren't using POSIX ACLs, so the umask ends up ignored. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 13:51:04 -04:00
Curt Wohlgemuth	d3d1faf6a7	ext4: Handle nested ext4_journal_start/stop calls without a journal This patch fixes a problem with handling nested calls to ext4_journal_start/ext4_journal_stop, when there is no journal present. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 11:01:03 -04:00
Curt Wohlgemuth	f3dc272fd5	ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode This patch a problem that ext4_dirty_inode() was not calling ext4_mark_inode_dirty() if the current_handle is not valid, which it is the case in no journal mode. It also removes a test for non-matching transaction which can never happen. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 16:06:01 -04:00
Frank Mayhar	830156c79b	ext4: Avoid updating the inode table bh twice in no journal mode This is a cleanup of commit `91ac6f4`. Since ext4_mark_inode_dirty() has already called ext4_mark_iloc_dirty(), which in turn calls ext4_do_update_inode(), it's not necessary to have ext4_write_inode() call ext4_do_update_inode() in no journal mode. Indeed, it would be duplicated work. Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 10:07:47 -04:00
Ryusuke Konishi	3cc811bffd	nilfs2: fix missing initialization of i_dir_start_lookup member The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-29 20:32:13 +09:00
Ryusuke Konishi	1f28fcd925	nilfs2: fix missing zero-fill initialization of btree node cache This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-09-29 20:12:56 +09:00
Josef Bacik	9ed74f2dba	Btrfs: proper -ENOSPC handling At the start of a transaction we do a btrfs_reserve_metadata_space() and specify how many items we plan on modifying. Then once we've done our modifications and such, just call btrfs_unreserve_metadata_space() for the same number of items we reserved. For keeping track of metadata needed for data I've had to add an extent_io op for when we merge extents. This lets us track space properly when we are doing sequential writes, so we don't end up reserving way more metadata space than what we need. The only place where the metadata space accounting is not done is in the relocation code. This is because Yan is going to be reworking that code in the near future, so running btrfs-vol -b could still possibly result in a ENOSPC related panic. This patch also turns off the metadata_ratio stuff in order to allow users to more efficiently use their disk space. This patch makes it so we track how much metadata we need for an inode's delayed allocation extents by tracking how many extents are currently waiting for allocation. It introduces two new callbacks for the extent_io tree's, merge_extent_hook and split_extent_hook. These help us keep track of when we merge delalloc extents together and split them up. Reservations are handled prior to any actually dirty'ing occurs, and then we unreserve after we dirty. btrfs_unreserve_metadata_for_delalloc() will make the appropriate unreservations as needed based on the number of reservations we currently have and the number of extents we currently have. Doing the reservation outside of doing any of the actual dirty'ing lets us do things like filemap_flush() the inode to try and force delalloc to happen, or as a last resort actually start allocation on all delalloc inodes in the fs. This has survived dbench, fs_mark and an fsx torture test. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-28 16:29:42 -04:00
Theodore Ts'o	f3ce8064b3	ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first Move the check to make sure the original and donor inodes are different earlier, to avoid a potential deadlock by trying to lock the same inode twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:58:29 -04:00
Mingming Cao	8d5d02e6b1	ext4: async direct IO for holes and fallocate support For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2009-09-28 15:48:29 -04:00
Mingming Cao	4c0425ff68	ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:48:41 -04:00
Mingming Cao	0031462b5b	ext4: Split uninitialized extents for direct I/O When writing into an unitialized extent via direct I/O, and the direct I/O doesn't exactly cover the unitialized extent, split the extent into uninitialized and initialized extents before submitting the I/O. This avoids needing to deal with an ENOSPC error in the end_io callback that gets used for direct I/O. When the IO is complete, the written extent will be marked as initialized. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:49:08 -04:00
Mingming Cao	9f0ccfd8e0	ext4: release reserved quota when block reservation for delalloc retry ext4_da_reserve_space() can reserve quota blocks multiple times if ext4_claim_free_blocks() fail and we retry the allocation. We should release the quota reservation before restarting. Bug found by Jan Kara. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:49:52 -04:00
Theodore Ts'o	55138e0bc2	ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks Work around problems in the writeback code to force out writebacks in larger chunks than just 4mb, which is just too small. This also works around limitations in the ext4 block allocator, which can't allocate more than 2048 blocks at a time. So we need to defeat the round-robin characteristics of the writeback code and try to write out as many blocks in one inode before allowing the writeback code to move on to another inode. We add a a new per-filesystem tunable, max_writeback_mb_bump, which caps this to a default of 128mb per inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 13:31:31 -04:00
Theodore Ts'o	7178057730	ext4: Fix hueristic which avoids group preallocation for closed files The hueristic was designed to avoid using locality group preallocation when writing the last segment of a closed file. Fix it by move setting size to the maximum of size and isize until after we check whether size == isize. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 00:06:20 -04:00
Alexey Dobriyan	f0f37e2f77	const: mark struct vm_struct_operations * mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-27 11:39:25 -07:00
Theodore Ts'o	1693918e0b	ext4: Use ext4_msg() for ext4_da_writepage() errors This allows the user to see what filesystem was involved with a particular ext4_da_writepage() error. Also, use KERN_CRIT which is more appropriate than KERN_EMERG. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-26 17:43:59 -04:00
Linus Torvalds	bfebb14063	Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block * 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: pass in super_block to bdi_start_writeback()	2009-09-26 10:11:13 -07:00
Linus Torvalds	07e2e6ba27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix locking and list handling code in cifs_open and its helper [CIFS] Remove build warning cifs: fix problems with last two commits [CIFS] Fix build break when keys support turned off cifs: eliminate cifs_init_private cifs: convert oplock breaks to use slow_work facility (try #4) cifs: have cifsFileInfo hold an extra inode reference cifs: take read lock on GlobalSMBSes_lock in is_valid_oplock_break cifs: remove cifsInodeInfo.oplockPending flag cifs: fix oplock request handling in posix codepath [CIFS] Re-enable Lanman security	2009-09-26 10:10:35 -07:00
Jens Axboe	a72bfd4dea	writeback: pass in super_block to bdi_start_writeback() Sometimes we only want to write pages from a specific super_block, so allow that to be passed in. This fixes a problem with commit `56a131dcf7` causing writeback on all super_blocks on a bdi, where we only really want to sync a specific sb from writeback_inodes_sb(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-26 00:10:40 +02:00
Jeff Layton	3321b791b2	cifs: fix locking and list handling code in cifs_open and its helper The patch to remove cifs_init_private introduced a locking imbalance. It didn't remove the leftover list addition code and the unlocking in that function. cifs_new_fileinfo does the list addition now, so there should be no need to do it outside of that function. pCifsInode will never be NULL, so we don't need to check for that. This patch also gets rid of the ugly locking and unlocking across function calls. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-25 17:59:31 +00:00
Linus Torvalds	6d7f18f6ea	Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block * 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: writeback_inodes_sb() should use bdi_start_writeback() writeback: don't delay inodes redirtied by a fast dirtier writeback: make the super_block pinning more efficient writeback: don't resort for a single super_block in move_expired_inodes() writeback: move inodes from one super_block together writeback: get rid to incorrect references to pdflush in comments writeback: improve readability of the wb_writeback() continue/break logic writeback: cleanup writeback_single_inode() writeback: kupdate writeback shall not stop when more io is possible writeback: stop background writeback when below background threshold writeback: balance_dirty_pages() shall write more than dirtied pages fs: Fix busyloop in wb_writeback()	2009-09-25 09:27:30 -07:00
Jens Axboe	56a131dcf7	writeback: writeback_inodes_sb() should use bdi_start_writeback() Pointless to iterate other devices looking for a super, when we have a bdi mapping. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:26 +02:00
Wu Fengguang	b3af9468ae	writeback: don't delay inodes redirtied by a fast dirtier Debug traces show that in per-bdi writeback, the inode under writeback almost always get redirtied by a busy dirtier. We used to call redirty_tail() in this case, which could delay inode for up to 30s. This is unacceptable because it now happens so frequently for plain cp/dd, that the accumulated delays could make writeback of big files very slow. So let's distinguish between data redirty and metadata only redirty. The first one is caused by a busy dirtier, while the latter one could happen in XFS, NFS, etc. when they are doing delalloc or updating isize. The inode being busy dirtied will now be requeued for next io, while the inode being redirtied by fs will continue to be delayed to avoid repeated IO. CC: Jan Kara <jack@suse.cz> CC: Theodore Ts'o <tytso@mit.edu> CC: Dave Chinner <david@fromorbit.com> CC: Chris Mason <chris.mason@oracle.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:26 +02:00
Jens Axboe	9ecc2738ac	writeback: make the super_block pinning more efficient Currently we pin the inode->i_sb for every single inode. This increases cache traffic on sb->s_umount sem. Lets instead cache the inode sb pin state and keep the super_block pinned for as long as keep writing out inodes from the same super_block. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:26 +02:00
Jens Axboe	cf137307cd	writeback: don't resort for a single super_block in move_expired_inodes() If we only moved inodes from a single super_block to the temporary list, there's no point in doing a resort for multiple super_blocks. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:26 +02:00
Shaohua Li	5c03449d34	writeback: move inodes from one super_block together __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has several partitions, writeback might keep spindle moving between partitions. To reduce the move, better write big chunk of one partition and then move to another. Inodes from one fs usually are in one partion, so idealy move indoes from one fs together should reduce spindle move. This patch tries to address this. Before per-bdi writeback is added, the behavior is write indoes from one fs first and then another, so the patch restores previous behavior. The loop in the patch is a bit ugly, should we add a dirty list for each superblock in bdi_writeback? Test in a two partition disk with attached fio script shows about 3% ~ 6% improvement. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:25 +02:00
Jens Axboe	5b0830cb90	writeback: get rid to incorrect references to pdflush in comments Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:25 +02:00
Jens Axboe	71fd05a887	writeback: improve readability of the wb_writeback() continue/break logic And throw some comments in there, too. Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:25 +02:00
Wu Fengguang	ae1b7f7d4b	writeback: cleanup writeback_single_inode() Make the if-else straight in writeback_single_inode(). No behavior change. Cc: Jan Kara <jack@suse.cz> Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:25 +02:00
Wu Fengguang	7fbdea3232	writeback: kupdate writeback shall not stop when more io is possible Fix the kupdate case, which disregards wbc.more_io and stop writeback prematurely even when there are more inodes to be synced. wbc.more_io should always be respected. Also remove the pages_skipped check. It will set when some page(s) of some inode(s) cannot be written for now. Such inodes will be delayed for a while. This variable has nothing to do with whether there are other writeable inodes. CC: Jan Kara <jack@suse.cz> CC: Dave Chinner <david@fromorbit.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:25 +02:00
Wu Fengguang	d3ddec7635	writeback: stop background writeback when below background threshold Treat bdi_start_writeback(0) as a special request to do background write, and stop such work when we are below the background dirty threshold. Also simplify the (nr_pages <= 0) checks. Since we already pass in nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't need to worry about it being decreased to zero. Reported-by: Richard Kennedy <richard@rsk.demon.co.uk> CC: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:24 +02:00
Jan Kara	a5989bdc98	fs: Fix busyloop in wb_writeback() If all inodes are under writeback (e.g. in case when there's only one inode with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by waiting on I_SYNC flags of an inode on b_more_io list in case we failed to write anything. Tested-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-25 18:08:24 +02:00
Steve French	15dd478107	[CIFS] Remove build warning Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-25 02:24:45 +00:00
Jeff Layton	5d2c0e2259	cifs: fix problems with last two commits Fix problems with commits: `086f68bd97` `3bc303c254` Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-25 02:12:33 +00:00
Steve French	0f59e61c1f	[CIFS] Fix build break when keys support turned off Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-25 00:33:37 +00:00
Andrew Morton	c44972f178	procfs: disable per-task stack usage on NOMMU It needs walk_page_range(). Reported-by: Michal Simek <monstr@monstr.eu> Tested-by: Michal Simek <monstr@monstr.eu> Cc: Stefani Seibold <stefani@seibold.net> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 17:11:24 -07:00
Linus Torvalds	b9b9df62e7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Prevent lower dentry from going negative during unlink eCryptfs: Propagate vfs_read and vfs_write return codes eCryptfs: Validate global auth tok keys eCryptfs: Filename encryption only supports password auth tokens eCryptfs: Check for O_RDONLY lower inodes when opening lower files eCryptfs: Handle unrecognized tag 3 cipher codes ecryptfs: improved dependency checking and reporting eCryptfs: Fix lockdep-reported AB-BA mutex issue ecryptfs: Remove unneeded locking that triggers lockdep false positives	2009-09-24 17:10:17 -07:00
Jeff Layton	086f68bd97	cifs: eliminate cifs_init_private ...it does the same thing as cifs_fill_fileinfo, but doesn't handle the flist ordering correctly. Also rename cifs_fill_fileinfo to a more descriptive name and have it take an open flags arg instead of just a write_only flag. That makes the logic in the callers a little simpler. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-24 19:35:18 +00:00
Al Viro	36dd2fdb37	nfs[23] tcp breakage in mount with binary options We forget to set nfs_server.protocol in tcp case when old-style binary options are passed to mount. The thing remains zero and never validated afterwards. As the result, we hit BUG in fs/nfs/client.c:588. Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data merged yesterday... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-09-24 14:58:42 -04:00
Jeff Layton	3bc303c254	cifs: convert oplock breaks to use slow_work facility (try #4 ) This is the fourth respin of the patch to convert oplock breaks to use the slow_work facility. A customer of ours was testing a backport of one of the earlier patchsets, and hit a "Busy inodes after umount..." problem. An oplock break job had raced with a umount, and the superblock got torn down and its memory reused. When the oplock break job tried to dereference the inode->i_sb, the kernel oopsed. This patchset has the oplock break job hold an inode and vfsmount reference until the oplock break completes. With this, there should be no need to take a tcon reference (the vfsmount implicitly holds one already). Currently, when an oplock break comes in there's a chance that the oplock break job won't occur if the allocation of the oplock_q_entry fails. There are also some rather nasty races in the allocation and handling these structs. Rather than allocating oplock queue entries when an oplock break comes in, add a few extra fields to the cifsFileInfo struct. Get rid of the dedicated cifs_oplock_thread as well and queue the oplock break job to the slow_work thread pool. This approach also has the advantage that the oplock break jobs can potentially run in parallel rather than be serialized like they are today. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-24 18:33:18 +00:00
Linus Torvalds	7ca263cdf8	Merge branch 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6 * 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6: [PATCH] Fix idle time field in /proc/uptime	2009-09-24 09:04:24 -07:00

1 2 3 4 5 ...

15679 Commits