We only need to write an invalid tag every 16 bytes,
so taking advantage of this can save many instructions
compared to the simple memset() call we make now.
A prefetching implementation is implemented for sun4u
and a block-init store version if implemented for Niagara.
The next trick is to be able to perform an init and
a copy_tsb() in parallel when growing a TSB table.
Signed-off-by: David S. Miller <davem@davemloft.net>
The context allocation scheme we use depends upon there being a 1<-->1
mapping from cpu to physical TLB for correctness. Chips like Niagara
break this assumption.
So what we do is notify all cpus with a cross call when the context
version number changes, and if necessary this makes them allocate
a valid context for the address space they are running at the time.
Stress tested with make -j1024, make -j2048, and make -j4096 kernel
builds on a 32-strand, 8 core, T2000 with 16GB of ram.
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we don't need to lock the TSB into the TLB.
The trick is that every TSB load/store is registered into
a special instruction patch section. The default uses
virtual addresses, and the patch instructions use physical
address load/stores.
We can't do this on all chips because only cheetah+ and later
have the physical variant of the atomic quad load.
Signed-off-by: David S. Miller <davem@davemloft.net>
As the RSS grows, grow the TSB in order to reduce the likelyhood
of hash collisions and thus poor hit rates in the TSB.
This definitely needs some serious tuning.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also cleans up tsb_context_switch(). The assembler
routine is now __tsb_context_switch() and the former is
an inline function that picks out the bits from the mm_struct
and passes it into the assembler code as arguments.
setup_tsb_parms() computes the locked TLB entry to map the
TSB. Later when we support using the physical address quad
load instructions of Cheetah+ and later, we'll simply use
the physical address for the TSB register value and set
the map virtual and PTE both to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
We now use the TSB hardware assist features of the UltraSPARC
MMUs.
SMP is currently knowingly broken, we need to find another place
to store the per-cpu base pointers. We hid them away in the TSB
base register, and that obviously will not work any more :-)
Another known broken case is non-8KB base page size.
Also noticed that flush_tlb_all() is not referenced anywhere, only
the internal __flush_tlb_all() (local cpu only) is used by the
sparc64 port, so we can get rid of flush_tlb_all().
The kernel gets it's own 8KB TSB (swapper_tsb) and each address space
gets it's own private 8K TSB. Later we can add code to dynamically
increase the size of per-process TSB as the RSS grows. An 8KB TSB is
good enough for up to about a 4MB RSS, after which the TSB starts to
incur many capacity and conflict misses.
We even accumulate OBP translations into the kernel TSB.
Another area for refinement is large page size support. We could use
a secondary address space TSB to handle those.
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!