By Guruprasad Vadhiraj Putty
Cortex-R and cortex-M series is targeted for different requirements and for different applications. It is important to know the parameters and features that separates them as there could be applications where both of them can fit in. This paper is targeted for such a scenario and helps the Designers for selection. The final objective is to help the Designers or Developers to have understanding of Architectures of ARM.
This paper is the continuation of earlier of my earlier paper where cortex-M and Classical series is compared. The link can be found at the section 5. This paper compares Cortex-R4 and Cortex-M3(M4 has additional DSP over M3). It does not compare about the debug modules and Power management is discussed very briefly as it is application specific. In this paper Cortex-R refers to Cortex-R4 and Cortex-M refers to Cortex-M3.
3 Architecture Blocks
|Architecture blocks/components ||Cortex- R ||Cortex -M|
|Architecture ||ARMV7-R and ARMV7-debug A ||RMV7-M|
|Load Store unit ||Yes ||Yes|
|Data processing unit ||Yes ||Yes|
|Prefetch unit ||**Yes ||Yes|
|MPU ||Ye ||s Yes|
|Instruction Cache ||Yes ||No|
|Data Cache ||Yes ||No|
|ATCM ||*Yes ||No|
|BTCM ||*Yes ||No|
|Co-processor ||Yes ||No|
|Bus Interface ||AXI ||AHB Lite|
|Performance Monitor Unit ||Yes ||No|
|Floating Point Unit(FPU) ||Yes(Available in Cortex-R4F) ||Yes (Available in Cortex –M4F)|
|Interrupt Controller ||Connected through port ||Closely coupled to the core|
|Pipeline stages ||8 ||3|
|Integration test register ||Yes ||No|
*ITCM in classical series is renamed as ATCM and DTCM is renamed as BTCM in Cortex -R
**Cortex-R has additional 4 word entry return stack. On procedure call, return address is pushed on to hardware stack and while returning address is popped from the stack and prefetch unit uses this address for returning.
BL routine ; return address is pushed in to return stack by PFU
routine: add r0,r1,r2
pop pc ; return address is popped from return stack by PFU
Integration test register: It is used for testing the signals and used during integration process with other IP’s.
Performance Monitor Unit: This is the module which makes Cortex-R to be used for Real Time Applications. It helps in profiling. Few important applications are
a. count of cache read/write operation
b. cycle count LSU being busy
c. number of cycles FIQ and IRQ are disabled.
Latency is higher when compared to Cortex -M for the following reasons
| ||Cortex -R ||Cortex -M|
|Tail chaining of interrupts ||No ||Yes|
|Handling of Late arriving Interrupts ||No ||Yes|
|IRQ entry and exit ||Need to use SRS( Save return state) |
RFE(return from exception)
CPS (Change processor state) instructions
|Automatic state stored on IRQ entry and restored on exit|
|Vector read ||It follows classical ARM series ||Storing the current state and branching to vector is done at the same time|
|Abandoning LDM and STM on assertion of interrupt ||**Yes ||Yes|
|Interrupt controller ||Vectored Interrupt controller is external to the core ||Nested Vectored Interrupt controller is close to the core|
** There is a catch here .Though it abandons and jumps to ISR to reduce the latency, it executes the abandoned instruction again after returning from ISR.
If you do not want LDM and STM instruction to be abandoned then DILSM bit (bit 22) in auxiliary control register needs to be reset. This bit is called as Low interrupt latency bit. This is enabled by default.
My observation is that critical and sensitive parameters like latency should not be at the discretion of the programmer.
Also Instruction that access strongly ordered or Device memory is never abandoned when they have started accessing the memory
Where as in Cortex –M it starts from where it left (other than divide instruction)
Cortex-R provides a VE bit in register1 of cp15. Enabling this bit will enable the
VIC port and CPU can directly branch to ISR address without branching to vector address at 0x18(performs handshake with VIC).
5 PROGRAMMER, EXCEPTION MODEL AND FAULT HANDLING
Cortex-R has programmer and exceptional model is same as Classical Series of ARM where as (Small change in Exception processing, Cortex-R makes use of SRS,CPS,RFE for exception entry and exit)Cortex-M follow different model.
Technical paper at the below link gives the comparison between Cortex –M and Classical ARM series
The only common between Cortex-R and Cortex –M is the precise and imprecise Abort exception but Cortex –R provides option of masking abort in Program status register(Set Automatically when control is in abort handler, must be cleared if another abort can be handled). This is to prevent another abort when you are in the abort handler but this option is not provided in Cortex-M.
The reason could be because of the difficulty in fitting abort mask bit in a register and also because of number of pipeline stages
- Precise abort: Points to the instruction where abort occurred
- Imprecise abort: Does not point to the instruction where exception occurred because of the pipeline and use of write buffer.
Cortex –M4 has an option of making all the Data aborts as precise by setting bit 1 in Auxiliary Control Register.
Instruction abort is always precise
In Cortex-R there is fault status register(specifies precise or imprecise) for Data and instruction and one fault address register(contains information about the address of the aborted instruction). There is another two registers called as Auxiliary Data and Auxiliary Instruction Fault register that specifies the interface fault.
In Cortex-M there are two fault address register, one for bus related errors and other for memory attributed errors and a single fault status register.
Note: Though Classical series and Cortex-R has coprocessor, register parameters or bit fields is different.
6 Instruction Set Architecture
Cortex-R has ARM, Thumb instruction whereas Cortex-M makes use of Thumb only.
When you are compiling for ARM mode(Cortex-R) suffix .W will not have effect
E.g.: ADD.W R0,R1,R2
This will not have any effect. If it is compiled with thumb then instruction may expand to 32 bit.
Similarly ADD.N R0,R1,R2 will throw an error when compiled in ARM mode but compiles in thumb mode(as it specifies narrow, 16 bit).
If neither is specified it depends on the compiler.
Note: After ARMV6-T2 thumb instruction was modified to accommodate 32 bit and is called Thumb2
Appendix A contains brief comparison of Instructions
Can code written for Cortex-R be ported on Cortex-M?
Application level instructions(instructions not accessing module/register unique to R series) of Cortex-R can be used for Cortex-M.
To Accomplish this, the following needs to be verified( this is a generic discussion, memory maps and architecture changes like status registers are assumed to be changed)
- 32 bit instructions needs to be suffixed with .W as shown above
- Instructions used should be available in M series( see Appendix A)
- ARM instructions that do not have thumb equivalent like RSC should not be used
7 Power Management
7.1 Power modes in Cortex- R
- Standby mode
Device is powered on but most of the clock to the blocks will be off
- Dormant mode
Caches and Tightly Coupled memory is on. Rest of the processor is off.
7.2 Power modes in Cortex- M
Clock signal is gated
- Deep sleep
Clock Controller is gated
Sleep and Standby mode are identical except for a small difference. Sleep mode can be entered when WFI (Wait For Interrupt) or WFE (Wait For Event) instruction is executed whereas standby mode can be entered when WFI instruction is executed.
| ||Cortex-R ||Cortex-M|
|Instruction Set State ||ARM,Thumb,Thumb2 ||Thumb2|
|Data format ||8,16,32,64 ||8,16,32|
|*Operating modes ||Same as Classical Series ||Different model|
|MPU regions ||12 ||8|
|Selection of Instruction Endianess ||Yes(IE bit in SCTLR register). This is to support legacy code ||No|
|Selection of Data Endianess ||Yes (need to use SETEND instruction which sets Data Endianess bit in CPSR) ||Yes (BIGEND external pin)|
Exception model, programmer’s model of Cortex-R is same as that of Classical Series of ARM. As a result most of the chip manufactures are opting classical series or Cortex-A/M series(cost is also a factor). Number of chips in the market today with Cortex-R is a pointer to this. It is not in the road map of leading manufacturers. Even if it is used, it is always in dual core chips.
Cortex-M4 was introduced with DSP and was projected as a low cost replacement for R4. Going forward there is every possibility that R series might get merged with M series.
- ARMV7-M Architecture Reference Manual
- ARMV7-R Architecture Reference Manual
- Cortex-R4 Technical Reference Manual
- Cortex-M3 Technical Reference Manual
Have 9 years of Experience in Low Level Drivers for ARM, IP verification, IP validation and have worked on Analyzing the performance of ARM.
Few important instructions have been mentioned here. Difference in Load and store instructions and other instruction has been kept outside preview of this paper. Description is given only for Saturation and Miscellaneous instruction.
Saturation Instruction and Load Exclusive Instruction
|Instruction ||Description ||Cortex-R ||Cortex -M|
|LDREXD/STREXD ||Load or store double word from memory ||Yes ||No|
|SWP/SWPB ||Swap bytes/word ||Yes ||No|
|SIMD ||Single Instruction Multiple Data ||Yes ||No|
|SSAT ||Signed Saturate ||Yes ||Yes|
|USAT ||Unsigned Saturate ||Yes ||Yes|
|SSAT16 ||Signed Saturate16 ||Yes ||No|
|USAT16 ||Unsigned Saturate16 ||Yes ||No|
Packing and Unpacking Instruction
The common Packing and Unpacking instructions are listed below
There are many other Packing and Unpacking instructions found in Cortex-R which is not available in Cortex –M. As the list is larger, only common instructions is listed
As most of the instructions are common, the instructions which are found only in Cortex –R is listed
|SEL ||Select Bytes using GE flags( GE flags are in Application Status register and is updated when SIMD instruction is executed)|
|USAD8 ||Unsigned Sum of Absolute Differences|
|USADA8 ||Unsigned Sum of Absolute Differences and Accumulate|
There is also Reverse subtract with carry instruction(RSC) which is not available I in M series(because there is no thumb equivalent).
Multiply Instructions that are common
There are many signed multiply instructions that are not available in Cortex –M(These are DSP instruction set Summary). These DSP instructions are available in Cortex-M4.