135 lines
4.5 KiB
Plaintext
135 lines
4.5 KiB
Plaintext
|
=====================================
|
||
|
FUJITSU FR-V KERNEL ATOMIC OPERATIONS
|
||
|
=====================================
|
||
|
|
||
|
On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI
|
||
|
instruction. Unfortunately, this alone can't be used to implement the following operations:
|
||
|
|
||
|
(*) Atomic add to memory
|
||
|
|
||
|
(*) Atomic subtract from memory
|
||
|
|
||
|
(*) Atomic bit modification (set, clear or invert)
|
||
|
|
||
|
(*) Atomic compare and exchange
|
||
|
|
||
|
On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable
|
||
|
interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be
|
||
|
done twice. This means the CPU runs for a relatively long time with interrupts disabled,
|
||
|
potentially having a great effect on interrupt latency.
|
||
|
|
||
|
|
||
|
=============
|
||
|
NEW ALGORITHM
|
||
|
=============
|
||
|
|
||
|
To get around this, the following algorithm has been implemented. It operates in a way similar to
|
||
|
the LL/SC instruction pairs supported on a number of platforms.
|
||
|
|
||
|
(*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag.
|
||
|
|
||
|
(*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined
|
||
|
state).
|
||
|
|
||
|
(*) All atomic operations can then be broken down into the following algorithm:
|
||
|
|
||
|
(1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR).
|
||
|
|
||
|
(2) Load the value currently in the memory to be modified into a register.
|
||
|
|
||
|
(3) Make changes to the value.
|
||
|
|
||
|
(4) If CC3 is still True, simultaneously and atomically (by VLIW packing):
|
||
|
|
||
|
(a) Store the modified value back to memory.
|
||
|
|
||
|
(b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current
|
||
|
task pointer in the kernel, and so is guaranteed to be non-zero).
|
||
|
|
||
|
(5) If ICC3.Z is still true, go back to step (1).
|
||
|
|
||
|
This works in a non-SMP environment because any interrupt or other exception that happens between
|
||
|
steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the
|
||
|
condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1).
|
||
|
|
||
|
|
||
|
This algorithm suffers from two problems:
|
||
|
|
||
|
(1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or
|
||
|
not any changes were made to the target memory location during that exception.
|
||
|
|
||
|
(2) The branch from step (5) back to step (1) may have to happen more than once until the store
|
||
|
manages to take place. In theory, this loop could cycle forever because there are too many
|
||
|
interrupts coming in, but it's unlikely.
|
||
|
|
||
|
|
||
|
=======
|
||
|
EXAMPLE
|
||
|
=======
|
||
|
|
||
|
Taking an example from include/asm-frv/atomic.h:
|
||
|
|
||
|
static inline int atomic_add_return(int i, atomic_t *v)
|
||
|
{
|
||
|
unsigned long val;
|
||
|
|
||
|
asm("0: \n"
|
||
|
|
||
|
It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the
|
||
|
True state.
|
||
|
|
||
|
" orcc gr0,gr0,gr0,icc3 \n" <-- (1)
|
||
|
" ckeq icc3,cc7 \n" <-- (1)
|
||
|
|
||
|
Then it does the load. Note that the final phase of step (1) is done at the same time as the
|
||
|
load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be
|
||
|
removed without swapping the order of these two instructions.
|
||
|
|
||
|
" ld.p %M0,%1 \n" <-- (2)
|
||
|
" orcr cc7,cc7,cc3 \n" <-- (1)
|
||
|
|
||
|
Then the proposed modification is generated. Note that the old value can be retained if required
|
||
|
(such as in test_and_set_bit()).
|
||
|
|
||
|
" add%I2 %1,%2,%1 \n" <-- (3)
|
||
|
|
||
|
Then it attempts to store the value back, contingent on no exception having cleared CC3 since it
|
||
|
was set to True.
|
||
|
|
||
|
" cst.p %1,%M0 ,cc3,#1 \n" <-- (4a)
|
||
|
|
||
|
It simultaneously records the success or failure of the store in ICC3.Z.
|
||
|
|
||
|
" corcc gr29,gr29,gr0 ,cc3,#1 \n" <-- (4b)
|
||
|
|
||
|
Such that the branch can then be taken if the operation was aborted.
|
||
|
|
||
|
" beq icc3,#0,0b \n" <-- (5)
|
||
|
: "+U"(v->counter), "=&r"(val)
|
||
|
: "NPr"(i)
|
||
|
: "memory", "cc7", "cc3", "icc3"
|
||
|
);
|
||
|
|
||
|
return val;
|
||
|
}
|
||
|
|
||
|
|
||
|
=============
|
||
|
CONFIGURATION
|
||
|
=============
|
||
|
|
||
|
The atomic ops implementation can be made inline or out-of-line by changing the
|
||
|
CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of
|
||
|
advantages:
|
||
|
|
||
|
- The resulting kernel image may be smaller
|
||
|
- Debugging is easier as atomic ops can just be stepped over and they can be breakpointed
|
||
|
|
||
|
Keeping it inline also has a number of advantages:
|
||
|
|
||
|
- The resulting kernel may be Faster
|
||
|
- no out-of-line function calls need to be made
|
||
|
- the compiler doesn't have half its registers clobbered by making a call
|
||
|
|
||
|
The out-of-line implementations live in arch/frv/lib/atomic-ops.S.
|