Read In Complete Print Format

A facial tracking system design example examines the issues and tradeoffs of using FPGA technology versus other alternatives in a power-constrained environment.

The world of military embedded computers is facing challenges on multiple fronts. Compute demands continue to increase as software application complexity increases. Energy consumption, time integrated power, targets continue to decrease. Data volumes are growing whether it be the capacity of local information storage or movement of data across local or remote networks. Finally, the number, type and performance of IO interfaces is growing.

System-Level Approach

For military, aerospace and industrial OEM’s meeting all of these demands require a system level solution. It is not enough to simply design hardware or software as standalone entities. The solution provider must balance and optimize all these variables to meet specific product requirements. The surveillance and military UAV markets have a set of requirements that highlight the issue facing embedded computers.

Remote sensing devices operating on battery power or devices operating on energy supplied by an engine with a limited fuel supply require careful power management to provide maximum operating time and maximum data processing. Let’s look at facial tracking which is a common function within the general category of video processing and object recognition. A simplified facial tracking system consists of a camera, embedded computer with video frame storage and software to process and display the resulting images (Figure 1).

Basic operation consists of capturing frames and temporarily storing those frames in the embedded computer’s memory. Various types of image processing operations are required to prepare the image for analysis. Typical image preprocessing operations include color conversion and filtering. Figure 2 depicts a typical Facial tracking system with an input image presented to the camera which captures a frame of data and transmits the video data to the embedded computer. The embedded computer in turns preprocesses the data doing color conversion and filtering and passes the resulting image to the facial detection/tracking system. If the facial tracking algorithm detects a face(s) it will identify the face by drawing a yellow box around each face in the picture. The resulting boxes are merged with the raw input image and the final image displayed on a local monitor.

Performance and Cost

If we now explore the performance and cost attributes of implementing a facial tracking system we must first understand how the facial tracking system operates. The facial detection algorithm takes the preprocessed video frames and performs a variety of image processing steps. The facial detection using a process called Histogram of Oriented Gradients or HOG. HOG divides an image into cells and examines each cell looking for intensity gradients—areas where the image is transitioning from light to dark or vice versa.

The larger the change from light-todark the “stronger” the gradient is. The algorithm further looks at a fixed number of orientations 0/45/90/135/180/225/270 and 315 degrees and calculates the gradient at each orientation. A histogram of the gradient “strength” and angle is then computed and compared to a data base for the object of interest. Figure 3 shows a picture and the associated HOG. Notice the distinct gradients around the eyes and nose. This algorithm is very amenable to facial recognition type algorithms.

A closer examination of the HOG image processing pipeline is shown below in Figure 4a. It consists of creating a grayscale image then calculating the gradients for the basic orientations. Once this is done each cell is given a score which is then normalized across blocks of image data. The final score is then compared to a template for faces and a determination of whether a face match occurs. Wherever a face match occurs an outline box is drawn around the region. One strategy would be to execute the entire pipeline in software running on a multi-core processor. All the data is stored in the embedded controller’s local memory. Depending on the frame rates and image sizes performing this simple function at 30 FPS require an i3 processor running at 3GHz.

Execution is FPGA or GPU?

A different strategy is to examine the image processing functions and determine what operations might be better suited for execution in an FPGA or GPU or other targeted hardware. Since cameras often have proprietary high-speed video interfaces either an FPGA or ASIC is required to capture the video data. In our example we assume an FPGA is used to connect and capture video data. The camera sensor capture, gradient image analysis and identifying the strong gradient can be pipelined directly in the FPGA without requiring any core CPU processing (Figure 4b). The binning and block normalization are better suited to a general-purpose CPU. Finally, the maximum score and template comparison can be accelerated using the FPGA again. This combination of FPGA and CPU enable the facial tracking to be performed on an Atom Baytrail class SoC at 30 FPS.

The combination of a lower performance SoC together with a targeted FPGA provides equivalent performance at a cost savings of less than 50 percent. There 4 key parameters a system designer will need to optimize. These parameters might include but are not limited to: (1) A general purpose CPU benchmark such as CPUMark and normalized to cost – CPUMarks/$; (2) Normalized cost; (3) Device TDP power and (4) Specific performance metric—in this case video Frames Per Second.

This Part 1 article has explored the performance and cost benefits through optimal functional partitioning. Part 2—to be published in a later issue of COTS Journal—will examine the issues of energy consumption, data storage/movement, communications and I/O.

IXI Technology
Yorba Linda, CA
(714) 221-5000