By Sangho Seo, Gyongsu Lee (Newracom Inc.)
To meet the increasing demands for low power and flexible Internet of Things (IoT) connectivity, this paper presents software (SW) instruction based highly flexible design architecture for Medium Access Controller (MAC) design. Focusing on SW implementation, MAC architecture is generalized to support any physical- layer (PHY) protocols such as Wi-Fi, Zigbee, Bluetooth (BT) or any other standards. To examine the feasibility, we applied the MAC design to IEEE 802.11 MAC that allows almost the highest MAC constraints than other IoT candidates. Our further evaluation shows 160MHz operating frequency with 8-bit MPI (MAC and PHY Interface) width is sufficient for our instruction based Lower MAC (LMAC) design to meet both of protocol latency and hardware (HW) latency, such as under-run/over-run constraints of 433MHz Very High Throughput (VHT) PHY. When the aggregated MPDU (MAC Protocol Data Unit) number with 1500Byte MPDU is larger than 42, around 280Mbps throughput performance could be possible with 433Mbps PHY rate. Although this 64% MAC/PHY efficiency is not the most efficient number compared to other Wi-Fi-specific high-end solutions, proposed MAC operates with configurable 8-bit instructions generated by main processor (CPU), and thus, enables highly flexible MAC protocol implementation. Sharing system memory with main CPU makes the elimination of its local buffers, which results around 20K equivalent NAND gate count area for the whole LMAC HW logic.
Because of the recent trends for IoT or Internet of Everything (IoE) applications, the wireless connectivity technology evolves partly from the throughput to quality and low power features, and such diversified connectivity, such as Bluetooth Low Energy (BLE) and IEEE 802.11ax, reflect these trends. Although Zigbee has traditional ownership for low power sensor network, Bluetooth and Wi-Fi, including IEEE 802.11ah/af, are also beginning to compete with one another to get those blue ocean IoT markets. Related to these trends, chip vendors are trying to integrate connectivity protocols including even cellular as many as possible to get the combo-featured low size and low power with even high performance connectivity System on a Chip (SoC) chipset. Because all of those connectivity technologies have similar sub-modules but rather different scheme at the system level PHY and radio frequency (RF) layer, hardware (HW) sharing is a big challenge to make as a common platform. On the other hand, MAC protocols for each connectivity technology seem to also have completely different features and timing requirements. However, controlling lower level PHY and RF as well as communicating with upper layer are common roles for every protocol, and if these operations could be abstracted and designed in SW, all the connectivity systems could have same hardware but with different SW code. This solution has many benefits for chip designer as it allows for faster implementation by developing SW based MAC design during hardware SoC implementation. In addition, the multi-standard MAC scheme requires a high level flexibility and re-configurability in order to meet the emerging challenges such as to satisfy varying application demands, user patterns, co-operating and/or even protocol changes.
This paper presents how the 8-bit customized micro-processor could implement the MAC operation that sufficiently co-processes with main CPU as well as meets the timing requirement for controlling various PHY and RF. Because IEEE 802.11ac VHT MAC is the most challengeable protocol to be designed by SW among the introduced connectivity technologies, we evaluated our MAC design with VHT MAC protocol and generated several constraints, which are not critical points to our programmable and common platform concepts. Instantiating multiple hardware IPs of the proposed small sized LMAC part, we can make small- sized, combo-featured connectivity SoC with low power. In the following Section 2, the related works are introduced as a reference of full-hardwired design and fully SW design approach. The detailed implementation architecture and simulation data are described in Section 3, and the implementation results are described in Section 4. Section 5 concludes this paper.
2. Related works
Authors in  described an IEEE 802.11e MAC hardware architecture comprising of CPU and other peripherals. The MAC hardware is characterized by 200MHz CPU's operating frequency for Upper MAC (UMAC) and LMAC hardware with 50MHz operating frequency. The buffer queue update of transmission and reception stream are carried out by utilizing DMA through the shared bus, and whole frame manipulations are processed on the hardware side. Protocol Manager (PM) block in their design is the key logic to manage the LMAC operations, and this implementation is optimal for 802.11e design. However, local buffer memory makes the hardware unnecessarily large, and moreover, this hardwired LMAC may not be applied to other protocols such as BT and Zigbee, and also it is hard to be upgraded to later WiFi protocols of 802.11n/ac LMAC without changing the hardware architecture. Because of the hardwired LMAC structure, two important features of flexibility and scalability can be insufficiently supported; thus, these types of enhancements may not be easy for the architecture proposed in .
On the other hand, the authors in  introduced the fully reconfigurable MAC design. Figure 1 below shows the common platform that could design various MAC design.
Figure 1: MAC common platform.
They have designed a decomposable MAC framework, which enables high level code for re-usage, and have identified a list of common MAC components, which serves as the basic building blocks for a vast range of MAC protocols. This design reusable common platform has some similarities to this paper's initial motivation. However, the final results in Fig. 1 are not the product level development, but prototyping common platforms as the authors mentioning. When targeting to build an IoT chipset, MAC needs to have simple architecture which enables lower power and smaller area. The authors in  introduce x86 linux based multicore prototyping platform that is not suitable for commercialized connectivity chipset. Also, full functionalities for both UMAC and LMAC are designed together without regards to the target applications. As described in previous Section 1, our design goal for flexible MAC architecture is to implement multiple MAC protocols together in the same combo-chip, which is why we advocate that LMAC should be smaller size to be instantiated for each PHY protocol. Regarding the UMAC, it needs to have high performance ability to control those connectivity features in one CPU. This separation makes possible for the combo-chip to be implemented in smaller size at lower power. Section 3 will describe the detailed concepts for these common platform architecture.
3. Low power but highly flexible common MAC architecture
The basic calculation unit or information unit for various MAC protocols consist of 8-bit data width. Furthermore, because the MAC headers or MPDUs are in 1-byte or 4-byte aligned shape, our basic LMAC design has started with 1-byte based operation for the simple and common platform. Another constraint for the LMAC architecture is leading instruction based re-configurable operation, which instructions are stored in the main system memory. If LMAC operates its operation by reading instructions and data from the system memory, then the main CPU could control the operation of LMAC without multi-processors' complex operation. Moreover, LMAC could directly access the data structures, such as linked-list generated from main CPU, without additional procedure calls. Therefore, the LMAC hardware can be regarded as fully passive co- processor unit that borrows the system memory of main CPU. It can also be regarded as a programmable hardware state machine, which state and corresponding operations are controlled by the CPU configured instructions. Figure 2 illustrates basic concept of the proposed common MAC platform architecture. Every LMAC hardware is the same, but each of them reads different set of instructions from the main memory, then it controls each baseband (BB) for transmission of data or inform CPU for UMAC to receive data from BB. Thus, small sized LMAC hardware could process such as DMA operation, exception manipulation, response frame generation, or real time PHY control, etc. The main UMAC CPU communicates with the host interface as well as schedules job for each connectivity protocol's transmission and reception data.
Figure 2: Common platform Design concept with Flexible LMAC and UMAC architecture.
To make specifications for LMAC design, we defined applicable instructions by first breaking down the LMAC operations into common jobs, data pumping and special instructions. For common instructions, basic arithmetic logic unit (ALU) or branch instructions such as add/sub, and/or, load/store, jump/call instructions are designed, and pumping instructions are for 8-bit data transactions between MAC and PHY interface, or 32-bit data transactions between MAC and system bus interface. The last special instructions are defined to meet latency critical operations of 802.11 MAC, such as CRC and Aggregated MPDU (AMPDU) Delimiter manipulation. The CRC is generalized to work at any CRC polynomial, and various events from PHY are also checked by corresponding special instruction while instructions generating CPU interrupts are also being defined. Table 1 below shows the representative instructions defined and implemented in the proposed LMAC design.
Table 1 Instructions designed in the proposed architecture
|Common ||load_bit ||load_ byte ||load_ word ||jump /call ||add /sub ||and /or / xor |
|Pumping ||trnx_tx ||trnx_rx ||trnx_byte_tx ||trnx_byte_rx ||Delimiter generation ||Delimiter check |
|Special ||crc ||event_wait ||event_ gen ||event_chk ||Irq generation ||wait tsf / load tsf |
Due to SW based LMAC operation, hardware accelerator is defined to be a minimized number and commonly reusable spec. Figure 3 below shows the detail architecture of proposed MAC design with CPU's UMAC, shared bus, Interrupt controller (INTC), and LMAC hardware. LMAC includes the timing critical accelerators of counters and CRC. Approximately 20K equivalent NAND gates are consumed to design LMAC hardware without any performance or latency timing violations.
Figure 3: Proposed Flexible MAC architecture
Basic timing has measured to evaluate the throughput constraint. As an assumption of this evaluation, 1500Byte video traffic and 3.008ms transmission opportunity (TXOP) condition are considered together with ideal host interface. Because the proposed MAC design should be stable for higher throughput also, we evaluated up to 802.11ac VHT PHY rate of 866Mbps as the upper highest rate in the targeted IoT candidates. Based on this assumption, we extracted design constraints for our MAC to support VHT MAC. If we implement the LMAC design with this constraint, we could get VHT PHY supportable MAC IP, which sacrifices little more power than other low rate MAC IP. We evaluated our firmware (FW) and LMAC latency for our transmission scenario. Firmware (FW) includes, ULMAC's queue manipulation as well as TX vector setting times, 13.7us. Because FW overhead is different for every connectivity standard, we gave enough margin for almost the highest VHT PHY rate. As a result, the analysis shows around 64% MAC/PHY efficiency could be achieved with our LMAC design. Although this is not very high compared to other high-end connectivity chipsets, our main-goal of low power and low size IoT application might not care of this level of efficiency. As a latency view, LMAC includes bus master block for AHB shared bus to communicate with main system memory. This master operates burst operation with 32-bit interface width. We also selected 8-bit communication width for single word access as a protocol of MAC-PHY Interface (MPI). This configuration make the 8-bit MPI interface data with flexible manipulation, and 32-bit bus interface with higher communication bandwidth. Figure 4 portrays a profiled result of the UMAC's FW overhead and LMAC's data manipulating latency with a pre-described transmission scenario.
Figure 4: Timing latency margin
We applied these profiling result and further estimated the possible constraint for the PHY throughput to 866Mbps of which configuration is 2x2 802.11ac multiple-input and multiple-output (MIMO). Detailed evaluation result will be described in Section 4.
4. Evaluation results
Figure 5: Evaluation result for LMAC's Communication margin
Figure 5 presents LMAC's communication margin for 80/160/320MHz operation frequency when it runs an environment of up to 2x2 VHT PHY with LMAC's 8-bit MAC-PHY interface width. The under-run indicates communication failure or that there is no margin for the bus confliction, which is unavailable situation. This graph shows almost 30% of confliction margin when LMAC operates with 160MHz clock frequency.
When it runs with 80MHz frequency, there is no confliction margin from 433Mbps PHY rate, which means an under-run will be occurred for higher than 433Mbps PHY rate with 80MHz operating frequency. On the other hand, if PHY throughput reaches 866MHz, there are under-run situations for both of 80MHz and 160MHz frequencies. Although SW based MAC design for 2x2 866MHz VHT PHY is rather a challengeable feature, Figure 5 shows some feasibilities when LMAC's operating frequency is increased up to 360MHz.
This paper presents SW instruction based highly flexible MAC design that is reusable for many MAC protocols, including Wi-Fi, Zigbiee and BT. Our evaluation shows that 160MHz operating frequency with 8-bit MPI width is sufficient for our instruction based LMAC (Lower MAC) design to meet latency and under- run/over-run constraints of 1x1 VHT PHY. As a common MAC platform, designed LMAC has very low hardware size and controlled by the main CPU while acting like a co-processor. Although 8-bit instruction, 8- bit PHY communication data-width, no internal instruction pipelining operation, or single word-size bus interface could be regarded as very low and inefficient way as a co-processor, this low-performance like hardware logic eliminates uncertainties that occur in the multi-core, deep pipelining, or caching. Our design goal is to create the simplest but very stable LMAC architecture that is applicable in every MAC of wireless connectivity chipset development.
The MAC operates with configurable 8-bit instructions generated by main CPU, enabling highly flexible MAC protocol implementation. Sharing system memory with main CPU eliminates local buffers, which results around 20K equivalent NAND gate count.
 M. L. Huang, J. Lee, H. Setiawan, H. Ochi, and S.-C. Park, "A High Throughput Medium Access Control Implementation Based on IEEE 802.11e Standard," IEICE Trans. Communications, Vol. E93-B, No. 4, April 2010.
 X Zhang Shanxi, P.R, "Reconfigurable Medium Access Control Protocols for Wireless Networks,"
 S. Blionas, K. Masselos, C. Dre, C. Drosos, F. Ieromnimon, D. Metafas, T. Pagonis, A. Pnevmatikakis, A. Tatsaki, T. Trimis, and A.Votzalidis, "Prototyping of a 5 GHz WLAN Reconfigurable System-in-Chip", IEICE Trans. Inf. & Syst., Vol.E86-D, No.5, pp.891-900, May 2003.
If you wish to download a copy of this white paper, click here