target-arm: Implement WFE as a yield operation

Implement WFE to yield our timeslice to the next CPU.
This avoids slowdowns in multicore configurations caused
by one core busy-waiting on a spinlock which can't possibly
be unlocked until the other core has an opportunity to run.
This speeds up my test case A15 dual-core boot by a factor
of three (though it is still four or five times slower than
a single-core boot).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1393339545-22111-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tested-by: Rob Herring <rob.herring@linaro.org>
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 276f3a9..8923f8a 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -50,6 +50,7 @@
                    i32, i32, i32, i32)
 DEF_HELPER_2(exception, void, env, i32)
 DEF_HELPER_1(wfi, void, env)
+DEF_HELPER_1(wfe, void, env)
 
 DEF_HELPER_3(cpsr_write, void, env, i32, i32)
 DEF_HELPER_1(cpsr_read, i32, env)