android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Tejun Heo	a70c691376	sparc64: implement page mapping percpu first chunk allocator Implement page mapping percpu first chunk allocator as a fallback to the embedding allocator. The next patch will make the embedding allocator check distances between units to determine whether it fits within the vmalloc area so that this fallback can be used on such cases. sparc64 currently has relatively small vmalloc area which makes it impossible to create any dynamic chunks on certain configurations leading to percpu allocation failures. This and the next patch should allow those configurations to keep working until proper solution is found. While at it, mark pcpu_cpu_distance() with __init. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net>	2009-09-29 09:17:57 +09:00
Tejun Heo	bcb2107fdb	sparc64: use embedding percpu first chunk allocator sparc64 currently allocates a large page for each cpu and partially remap them into vmalloc area much like what lpage first chunk allocator did. As a 4M page is used for each cpu, this results in very large unit size and also adds TLB pressure due to the double mapping of pages in the first chunk. This patch converts sparc64 to use the embedding percpu first chunk allocator which now knows how to handle NUMA configurations. This simplifies the code a lot, doesn't incur any extra TLB pressure and results in better utilization of address space. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net>	2009-08-14 15:00:53 +09:00
Tejun Heo	fb435d5233	percpu: add pcpu_unit_offsets[] Currently units are mapped sequentially into address space. This patch adds pcpu_unit_offsets[] which allows units to be mapped to arbitrary offsets from the chunk base address. This is necessary to allow sparse embedding which might would need to allocate address ranges and memory areas which aren't aligned to unit size but allocation atom size (page or large page size). This also simplifies things a bit by removing the need to calculate offset from unit number. With this change, there's no need for the arch code to know pcpu_unit_size. Update pcpu_setup_first_chunk() and first chunk allocators to return regular 0 or -errno return code instead of unit size or -errno. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David S. Miller <davem@davemloft.net>	2009-08-14 15:00:51 +09:00
Tejun Heo	fd1e8a1fe2	percpu: introduce pcpu_alloc_info and pcpu_group_info Till now, non-linear cpu->unit map was expressed using an integer array which maps each cpu to a unit and used only by lpage allocator. Although how many units have been placed in a single contiguos area (group) is known while building unit_map, the information is lost when the result is recorded into the unit_map array. For lpage allocator, as all allocations are done by lpages and whether two adjacent lpages are in the same group or not is irrelevant, this didn't cause any problem. Non-linear cpu->unit mapping will be used for sparse embedding and this grouping information is necessary for that. This patch introduces pcpu_alloc_info which contains all the information necessary for initializing percpu allocator. pcpu_alloc_info contains array of pcpu_group_info which describes how units are grouped and mapped to cpus. pcpu_group_info also has base_offset field to specify its offset from the chunk's base address. pcpu_build_alloc_info() initializes this field as if all groups are allocated back-to-back as is currently done but this will be used to sparsely place groups. pcpu_alloc_info is a rather complex data structure which contains a flexible array which in turn points to nested cpu_map arrays. * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to help dealing with pcpu_alloc_info. * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info, generalized and renamed to pcpu_build_alloc_info(). @cpu_distance_fn may be NULL indicating that all cpus are of LOCAL_DISTANCE. * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info, generalized and renamed to pcpu_dump_alloc_info(). It now also prints which group each alloc unit belongs to. * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the separate parameters. All first chunk allocators are updated to use pcpu_build_alloc_info() to build alloc_info and call pcpu_setup_first_chunk() with it. This has the side effect of packing units for sparse possible cpus. ie. if cpus 0, 2 and 4 are possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4. * x86 setup_pcpu_lpage() is updated to deal with alloc_info. * sparc64 setup_per_cpu_areas() is updated to build alloc_info. Although the changes made by this patch are pretty pervasive, it doesn't cause any behavior difference other than packing of sparse cpus. It mostly changes how information is passed among initialization functions and makes room for more flexibility. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>	2009-08-14 15:00:51 +09:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Tejun Heo	74d46d6b2d	percpu, sparc64: fix sparse possible cpu map handling percpu code has been assuming num_possible_cpus() == nr_cpu_ids which is incorrect if cpu_possible_map contains holes. This causes percpu code to access beyond allocated memories and vmalloc areas. On a sparc64 machine with cpus 0 and 2 (u60), this triggers the following warning or fails boot. WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240() Modules linked in: Call Trace: [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240 [00000000004b1840] map_vm_area+0x20/0x60 [00000000004b1950] __vmalloc_area_node+0xd0/0x160 [0000000000593434] deflate_init+0x14/0xe0 [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0 [00000000005844f0] crypto_alloc_base+0x50/0xa0 [000000000058b898] alg_test_comp+0x18/0x80 [000000000058dad4] alg_test+0x54/0x180 [000000000058af00] cryptomgr_test+0x40/0x60 [0000000000473098] kthread+0x58/0x80 [000000000042b590] kernel_thread+0x30/0x60 [0000000000472fd0] kthreadd+0xf0/0x160 ---[ end trace 429b268a213317ba ]--- This patch fixes generic percpu functions and sparc64 setup_per_cpu_areas() so that they handle sparse cpu_possible_map properly. Please note that on x86, cpu_possible_map() doesn't contain holes and thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu>	2009-08-14 13:20:53 +09:00
Tejun Heo	2f39e637ea	percpu: allow non-linear / sparse cpu -> unit mapping Currently cpu and unit are always identity mapped. To allow more efficient large page support on NUMA and lazy allocation for possible but offline cpus, cpu -> unit mapping needs to be non-linear and/or sparse. This can be easily implemented by adding a cpu -> unit mapping array and using it whenever looking up the matching unit for a cpu. The only unusal conversion is in pcpu_chunk_addr_search(). The passed in address is unit0 based and unit0 might not be in use so it needs to be converted to address of an in-use unit. This is easily done by adding the unit offset for the current processor. [ Impact: allows non-linear/sparse cpu -> unit mapping, no visible change yet ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>	2009-07-04 08:11:00 +09:00
Tejun Heo	ce3141a277	percpu: drop pcpu_chunk->page[] percpu core doesn't need to tack all the allocated pages. It needs to know whether certain pages are populated and a way to reverse map address to page when freeing. This patch drops pcpu_chunk->page[] and use populated bitmap and vmalloc_to_page() lookup instead. Using vmalloc_to_page() exclusively is also possible but complicates first chunk handling, inflates cache footprint and prevents non-standard memory allocation for percpu memory. pcpu_chunk->page[] was used to track each page's allocation and allowed asymmetric population which happens during failure path; however, with single bitmap for all units, this is no longer possible. Bite the bullet and rewrite (de)populate functions so that things are done in clearly separated steps such that asymmetric population doesn't happen. This makes the (de)population process much more modular and will also ease implementing non-standard memory usage in the future (e.g. large pages). This makes @get_page_fn parameter to pcpu_setup_first_chunk() unnecessary. The parameter is dropped and all first chunk helpers are updated accordingly. Please note that despite the volume most changes to first chunk helpers are symbol renames for variables which don't need to be referenced outside of the helper anymore. This change reduces memory usage and cache footprint of pcpu_chunk. Now only #unit_pages bits are necessary per chunk. [ Impact: reduced memory usage and cache footprint for bookkeeping ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>	2009-07-04 08:11:00 +09:00
Tejun Heo	38a6be5254	percpu: simplify pcpu_setup_first_chunk() Now that all first chunk allocator helpers allocate and map the first chunk themselves, there's no need to have optional default alloc/map in pcpu_setup_first_chunk(). Drop @populate_pte_fn and only leave @dyn_size optional and make all other params mandatory. This makes it much easier to follow what pcpu_setup_first_chunk() is doing and what actual differences tweaking each parameter results in. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-07-04 08:10:59 +09:00
Stephen Rothwell	6ac5c61082	sparc: replace uses of CPU_MASK_ALL_PTR CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so: #define CPU_MASK_ALL (cpumask_t) { { ... } } Taking the address of such a temporary is questionable at best, unfortunately `321a8e9d` (cpumask: add CPU_MASK_ALL_PTR macro) added CPU_MASK_ALL_PTR: #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL) Which formalizes this practice. One day gcc could bite us over this usage (though we seem to have gotten away with it so far). [Description by Rusty Russell] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:55 -07:00
Hong H. Pham	280ff97494	sparc64: fix and optimize irq distribution irq_choose_cpu() should compare the affinity mask against cpu_online_map rather than CPU_MASK_ALL, since irq_select_affinity() sets the interrupt's affinity mask to cpu_online_map "and" CPU_MASK_ALL (which ends up being just cpu_online_map). The mask comparison in irq_choose_cpu() will always fail since the two masks are not the same. So the CPU chosen is the first CPU in the intersection of cpu_online_map and CPU_MASK_ALL, which is always CPU0. That means all interrupts are reassigned to CPU0... Distributing interrupts to CPUs in a linearly increasing round robin fashion is not optimal for the UltraSPARC T1/T2. Also, the irq_rover in irq_choose_cpu() causes an interrupt to be assigned to a different processor each time the interrupt is allocated and released. This may lead to an unbalanced distribution over time. A static mapping of interrupts to processors is done to optimize and balance interrupt distribution. For the T1/T2, interrupts are spread to different cores first, and then to strands within a core. The following is some benchmarks showing the effects of interrupt distribution on a T2. The test was done with iperf using a pair of T5220 boxes, each with a 10GBe NIU (XAUI) connected back to back. TCP \| Stock Linear RR IRQ Optimized IRQ Streams \| 2.6.30-rc5 Distribution Distribution \| GBits/sec GBits/sec GBits/sec --------+----------------------------------------- 1 0.839 0.862 0.868 8 1.16 4.96 5.88 16 1.15 6.40 8.04 100 1.09 7.28 8.68 Signed-off-by: Hong H. Pham <hong.pham@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:28 -07:00
David S. Miller	4fd78a5f1e	sparc64: Use new dynamic per-cpu allocator. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:27 -07:00
David S. Miller	0c243ad81f	sparc64: Only allocate per-cpu areas for possible cpus. This gets us real close to the generic implementation of setup_per_cpu_areas() except: 1) We store the per-cpu offset into the trap_block[], whereas the generic code has it's own static array. 2) We have to initialize the %g5 register to hold the boot cpu's per-cpu area offset. 3) The OBP/MDESC cpu info scan is performed at the end. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:25 -07:00
David S. Miller	73fffc037e	sparc64: Get rid of real_setup_per_cpu_areas(). Now that we defer the cpu_data() initializations to the end of per-cpu setup, we can get rid of this local hack we had to setup the per-cpu areas eary. This is a necessary step in order to support HAVE_DYNAMIC_PER_CPU_AREA since the per-cpu setup must run when page structs are available. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:23 -07:00
David S. Miller	b696fdc259	sparc64: Defer cpu_data() setup until end of per-cpu data initialization. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:22 -07:00
David S. Miller	5a5488d3bb	sparc64: Store per-cpu offset in trap_block[] Surprisingly this actually makes LOAD_PER_CPU_BASE() a little more efficient. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:11 -07:00
David S. Miller	557fe0e884	sparc64: Reclaim trap_block[]->hdesc This really isn't necessary at all, a local variable suits the job just fine. This frees up 8 bytes in the trap_block[] that we can use later to store the per-cpu base addresses. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-16 04:56:08 -07:00
David S. Miller	8e255baa44	sparc64: Fix smp_callin() locking. Interrupts must be disabled when taking the IPI lock. Caught by lockdep. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-14 17:08:56 -07:00
David S. Miller	ed223129a3	Merge branch 'master' of ssh://master.kernel.org/home/ftp/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask-for-sparc Conflicts: arch/sparc/kernel/smp_64.c	2009-03-29 15:44:22 -07:00
David S. Miller	f9384d41c0	sparc64: Fix MM refcount check in smp_flush_tlb_pending(). As explained by Benjamin Herrenschmidt: > CPU 0 is running the context, task->mm == task->active_mm == your > context. The CPU is in userspace happily churning things. > > CPU 1 used to run it, not anymore, it's now running fancyfsd which > is a kernel thread, but current->active_mm still points to that > same context. > > Because there's only one "real" user, mm_users is 1 (but mm_count is > elevated, it's just that the presence on CPU 1 as active_mm has no > effect on mm_count(). > > At this point, fancyfsd decides to invalidate a mapping currently mapped > by that context, for example because a networked file has changed > remotely or something like that, using unmap_mapping_ranges(). > > So CPU 1 goes into the zapping code, which eventually ends up calling > flush_tlb_pending(). Your test will succeed, as current->active_mm is > indeed the target mm for the flush, and mm_users is indeed 1. So you > will -not- send an IPI to the other CPU, and CPU 0 will continue happily > accessing the pages that should have been unmapped. To fix this problem, check ->mm instead of ->active_mm, and this means: > So if you test current->mm, you effectively account for mm_users == 1, > so the only way the mm can be active on another processor is as a lazy > mm for a kernel thread. So your test should work properly as long > as you don't have a HW that will do speculative TLB reloads into the > TLB on that other CPU (and even if you do, you flush-on-switch-in should > get rid of any crap here). And therefore we should be OK. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-27 01:09:17 -07:00
Rusty Russell	81f1adf012	cpumask: use mm_cpumask() wrapper: sparc Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-16 14:40:39 +10:30
Rusty Russell	f46df02a57	cpumask: arch_send_call_function_ipi_mask: sparc We're weaning the core code off handing cpumask's around on-stack. This introduces arch_send_call_function_ipi_mask(), and by defining it, the old arch_send_call_function_ipi is defined by the core code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-16 14:40:22 +10:30
Rusty Russell	fd8e18e9f4	cpumask: Use smp_call_function_many(): sparc64 Impact: Use new API Change smp_call_function_mask() callers to smp_call_function_many(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-03-16 14:40:22 +10:30
Sam Ravnborg	9018113649	sparc64: Use unsigned long long for u64. Andrew Morton wrote: People keep on doing printk("%llu", some_u64); testing it only on x86_64 and this generates a warning storm on powerpc, sparc64, etc. Because they use `long', not `long long'. Quite a few 64-bit architectures are using `long' for their s64/u64 types. We should convert them all to `long long'. Update types.h so we use unsigned long long for u64 and fix all warnings in sparc64 code. Tested with an allnoconfig, defconfig and allmodconfig builds. This patch introduces additional warnings in several drivers. These will be dealt with in separate patches. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-06 13:19:28 -08:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Rusty Russell	8e757281de	sparc: replace for_each_cpu_mask_nr with for_each_cpu Simple replacement, now the _nr is redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-08 01:10:08 -08:00
Sam Ravnborg	a88b5ba8bd	sparc,sparc64: unify kernel/ o Move all files from sparc64/kernel/ to sparc/kernel - rename as appropriate o Update sparc/Makefile to the changes o Update sparc/kernel/Makefile to include the sparc64 files NOTE: This commit changes link order on sparc64! Link order had to change for either of sparc32 and sparc64. And assuming sparc64 see more testing than sparc32 change link order on sparc64 where issues will be caught faster. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-04 09:17:21 -08:00

27 Commits