Ron Wilson, Altera
Papers at this year’s Embedded Vision Summit suggested the vast range of ways that embedded systems can employ focused light as an input, and the even vaster range of algorithms and hardware implementations they require to render that input useful. Applications range from simple, static machine vision to classification and interpretation of real-time, multi-camera video. And hardware can range from microcontrollers to purpose-built supercomputers and arrays of neural-network emulators.
Perhaps surprisingly, most of the systems across this wide spectrum of requirements and implementations can be described as segments of a single processing pipeline. The simplest systems implement only the earliest stages of the pipeline. More demanding systems implement deeper stages, until we reach the point of machine intelligence, where all the stages of the pipeline are present, and may be coded into one giant neural-network model.
The individual stages are easy to describe, if not easy to implement. In the first stage, a—usually–simple algorithm extracts features from the image. Features, in this context, are easily-detectable patterns, such as edges, corners, or wavelets, that tend to be stable attribute of the object on which they occur, even when externalities like position or lighting change. The line- and arc-segments in printed characters or the patterns of light and dark regions in a human face are examples. Depending on what kind of features you are trying to extract, the best tool may be, for example, a convolution kernel scanning across the image, or a gradient-based matrix analysis looking for points at which the color or intensity of pixels changes dramatically.
Click here to read more ...