Why Embedded Software Development Still Matters: Optimizing a Computer Vision Application on the ARM Cortex A8

D. Kastaniotis, N. Fragoulis (IRIDA LABS)

Introduction

This article presents a comparison of two programming approaches for developing effective solutions for computer vision applications. The algorithm we tackle is HSV to RGB color conversion. The processor is an ARM Cortex A8 core with NEON technology.

We conclude that while compilers have improved through the use of techniques such as auto-vectorization, truly optimized solutions require intense embedded software development and solid understanding of the underlying algorithmic trade-offs in a specific application.

The ARM processor

The ARM Cortex A8 core is an advanced signal processing core that can support clock frequencies as high as 1 GHz and a deep pipeline. It employs features such as symmetric, super-scalar pipeline support for full dual-issue capability advanced branch prediction unit with >95% accuracy and an integrated Level 2 Cache for optimal performance in high-performance systems. In addition, for Media Processing Applications, the A8 supports its NEON technology, whose features include a 128-bit SIMD data engine, 2x the performance of v6 SIMD, aligned and unaligned data access, efficient access to packed arrays, support for both integer and floating point operations and a large dual 128-bit/64-bit register file.

The ARM Cortex A8 also offers an auto-vectorization capability in its tool-chain - a very useful feature for porting generic code (e.g. OpenCV) that can lead to significant performance improvements over typical compilers.

The RGB to HSV Color Conversion Challenge

The HSV (stands for hue, saturation and value) color space is the most common cylindrical-coordinate representation of points in an RGB color model. This representation rearranges the geometry of RGB in an attempt to be more intuitive and perceptually relevant than the Cartesian (cube) representation (Fig 1). Developed in the 1970s for computer graphics applications, HSV is used today in image analysis and computer vision, as well as in color pickers and in image editing software.

The HSV color model describes colors similarly to the way the human eye tends to perceive color. Thus, while RGB defines color in terms of a combination of primary colors, HSV describes color using more familiar comparisons such as color, vibrancy and brightness.

Figure 1: RGB cube and HSV cone

The HSV color space is used in many computer vision applications since, by working with the Hue component, color is separated from the saturation and the lighting. As an example consider real time face detection and tracking in which the Hue component (in combination with some other descriptor like CAMShift), is used to describe human flesh. This approach is more robust to skin color variations due to racial differences as well as to other color variations which are very often inevitable due to changes in scenery, lighting conditions, camera sensors, etc.

The HSV color space is a result of RGB cube manipulation. As opposed to other color spaces that can be obtained using linear transformations as for example the YCbCr color space, where multiplications and additions need to be performed, the HSV color space is obtained through a non-linear transformation of the RGB cube.

In an RGB to HSV conversion, the V and S components are obtained using a maximum and the difference between the maximum and the minimum values of the RGB components respectively (Table 1). The associated operations can be efficiently carried out in RISC processors by using special single instruction, multiple data (SIMD) instructions. That is because such cores typically allow multiple operations to be executed in one cycle, in vector form, and in sets of 4, 8, 16 or 32, depending on the data- path specifications.

Computation of H is a linear combination of the values of RGB components, provided certain conditions are met, which means that the use of several branches in program code are required and this fact makes the overall color conversion process a challenging task. This task becomes even more challenging in real-time embedded video processing systems, where processing power is limited and at the same time power consumption and timing requirements are also challenging.

The HSV to RGB color conversion algorithm also involves arithmetic divisions. An arithmetic division usually requires floating-point arithmetic that significantly increases computational complexity. One can avoid divisions by using look-up tables . however this can only be effective in circumstances whether the number of possible divisions is limited. In our analysis, we assume a more realistic scenario in which divisions are performed using intrinsic functions provided by the ARM instruction set.

Table 1: Pseudo-code for the RGB to HSV conversion algorithm

The Results: Auto-vectorized vs. Hand-vectorized

The pseudo code in Table 1 was ported to an ARM Cortex A8 core using two different approaches:

"Auto-vectorization", in which code was originally written in plain ANSI C. The ARM compiler automatically produces the ported code. The resulting code requires no knowledge of the intrinsic commands in the processor and the port is relatively quick. This approach is usually followed by designers who are not experienced in embedded system software design or in cases where high speed or low power is not critical design goals.

"Hand-vectorization", in which embedded software is developed for the processor core at the machine language level. This software can be optimized to explicitly exploit a processorfs architectural characteristics by using special intrinsic commands. Also - because the developer has more control of the processing, important know-how related to the underlying algorithm can be effectively utilized. Developing hand-vectorized code is significantly more time consuming than compiled C-code.

We utilized these two approaches in the RGB to HSV color conversion problem. The results are shown in Table 2.

Table 2: Performance results

As shown in Table 2 a baseline, in which code is developed using a typical compiler, is used as a starting reference point.

Enabling the auto-vectorization feature available in the ARM compiler produces a speed improvement of 22% over the baseline.

In order to get a significant improvement in the processing speed, hand-vectorization needs to be employed. Irida Labs is able to produce hand-vectorized code that leverages deep expertise in computer vision as well as in embedded code development for the ARM Cortex A8. Iridafs hand vectorization achieves as much as 84% improvement in performance compared with the baseline.

It is apparent from the above that auto-vectorization offers a considerable improvement in terms of performance over generic compilers. However the real power of the processor is unveiled only when its architectural features are truly exploited. This is only possible when hand-vectorization is employed. Un-optimized implementations of computer vision systems that use generic software e.g. OpenCV which is optimized for Intel architecture) or use functions that are not optimized at all (e.g. C/C++ functions generated with a MATLAB coder) are typically not able to process high resolution videos in real-time.

Conclusions

In this article a comparison of two programming approaches is made. Results clearly show the dramatic improvement in the performance of an RGB to HSV algorithm when hand-vectorization is employed.

We conclude that in time-sensitive and power-challenged applications like Computer Vision, even though compilers are becoming faster and more sophisticated, a truly optimized solution requires expertise in scientific computer vision and embedded software development.

Irida Labs specializes in Embedded Computer Vision software. Its engineering team consists of computer vision scientists and embedded system engineers and is able to design and implement computer vision solutions for a variety of platforms using the state of the art in both fields to deliver high quality and exceptional performance.

About Irida Labs

Founded in late 2007, IRIDA Labs is a privately-held company with headquarters in Patras, Greece and worldwide sales support, backed by VC and private investors. IRIDA Labs is a platform-independent leading technology provider of software and silicon IPs for Embedded Video Processing.

The company possesses significant knowledge in analysis, modeling, design and development of high-fidelity reference components and systems in Video Processing and Computer Vision, using state of the art FPGA and DSP technologies. Its product portfolio includes embedded video software and silicon IP for high throughput applications such as video stabilization with rolling shutter correction, face detection and recognition, low-light image/video enhancement, and car plate detection and recognition, addressing various markets ranging from consumer electronics to mobile appliances and automotive applications.