Developers of embedded software face unique challenges, including running systems reliably for extended periods, with limited memory and processing power. To build cost-effective systems, embedded programmers must squeeze performance and find errors before they reach the field. From seeking the needles in the haystack to understanding where CPU cycles are truly used, this article summarizes experiences with thousands of real-world systems into 10 simple rules for building better real-time software. (For a longer, more detailed version of this article, go to www.esconline.com:8080/electronicaUSA/esc_proc/sessions/468.htm.)
Know your tools
Good mechanics have many tools; you can't fix a car with just a hammer.
Like good mechanics, good programmers need to be proficient with a variety of tools. Each has a place, each has a Heisenberg effect, each has power. Tools promise great insight into system operation. Tools designed for embedded systems let you see — live, as it happens — what your program is doing, what resources it's using, and how it interacts with the external world. The insight they provide is truly powerful; you can often quickly spot problems, issues or inefficiencies that would take days to discover by other means.
Here's a short list of the most popular types of tools.
- ICE and JTAG
- Source-level debuggers
- Data monitors
- Operating systems (OS) monitors
- Performance profilers
- Memory testers
- Execution tracers
- Coverage testers
Find memory problems early
Memory problems are insidious. There are three main types: leaks, fragmentation and corruption. The best way to combat them is to find them early.
Memory leaks are the best known "hidden" programming problem. A memory leak arises when a program allocates more and more memory over time. Eventually, the system runs out of memory and fails. The problem is often hidden--when the system runs out of memory, the failing code often has nothing to do with the leak.
Embedded systems face an even sneakier memory challenge: fragmentation. As memory is allocated and freed, most allocators carve large blocks of memory into variable-sized smaller blocks. Allocated blocks tend to be distributed in the memory, resulting in a set of smaller free blocks to carve out new pieces. This process is called fragmentation. A severely fragmented system may fail to find a 64k block, even with megabytes of free memory. Even paged systems, which do not suffer so badly, can become slow or wasteful of memory over time due to inefficient use of blocks.
Any code written in a language that supports pointers can corrupt memory. There are many ways corruption can occur: writing off the end of an array, writing to freed memory, bugs with pointer arithmetic, dangling pointers, writing off the end of a task stack and other mistakes. In practice, we find that most corruption is caused by some saved state that isn't cleaned up when the original memory is freed. The classic example is a block of memory that is allocated, provided to a library or the operating system as a buffer, and then freed and forgotten. Corruption is then waiting to happen at some "random" time in the future. The corrupted system will then fail (or not) at some other "random" time. Corrupted systems are completely unpredictable. The errors can be hard to find.
Optimize through understanding.
Real-time is more about reliability than it is about speed. That said, efficient code is critical for many embedded systems. Knowing how to make your code "zing" is a fundamental skill that every embedded programmer must master.
Performance problems can even masquerade as other problems. If the CPU doesn't respond to external events, or queues overflow, or packets are dropped, or hardware isn't serviced, your application may fail. And you may never suspect a performance problem. Know how your CPU is executing your code. It's the only path to efficiency.
No needles in haystack
Finding a needle in the haystack is a good metaphor for much of debugging. So how do you find needles? Start by not dropping them in the haystack. All developers know that nagging feeling that they're cutting a corner and creating a bug for the future. Stop! Listen to your inner voice! Follow your good coding and design guidelines, check your assumptions, rethink your algorithms. If nothing else, put an easily found tag in a comment that this code is suspect.
Isolate the problem
If you do end up with a needle, divide the haystack. The first step in debugging a complex team-developed application is often a process of isolating the problem. The trick is to do this effectively.
Know where you've been
Hansel and Gretel were smart to drop a trail of bread crumbs. A backwards-traceable record is a great way to make sure you understand future problems. Hansel and Gretel's only real mistake (besides trusting the witch) was not using a more permanent means of recording their progress. Don't make their same mistake.
When you get your application or module working in any significant capacity, checkpoint it. Later, when it stops working, even though nothing has changed, you will have a baseline to check your assumptions.
Are your tests complete?
How thorough are your tests? How do you know that? Coverage testing can tell you; it shows you exactly what code has been executed. Coverage tools verify the completeness of your test suite. Experienced coverage testers will tell you that most test suites exercise only 20 percent to 40 percent of the total code. The remainder of the code can include bugs waiting to happen.
Coverage testing should be part of every quality assurance process. How many revisions and rewrites has your code gone through over the years and releases? Has your test suite grown with the changes? Or do the tests only exercise the features that existed in version 1.0? Coverage testing may keep your test group busy, but a solid coverage report will make your product manager sleep well.
Pursue quality to save time
- Our experience indicates that more than 80 percent of development time is spent:
- Debugging your own code
- Debugging the integration of your code with other code
- Debugging the overall system
Worse, it costs 10 to 200 times more to fix a bug at the end of the cycle than at the beginning. The cost of a small bug that makes it to the field can be astronomical.
Understand, then make it work
By definition, real-time systems interact with a dynamic world. Unfortunately, traditional debuggers can't reveal dynamic behavior; debuggers deal only with static values of stopped programs.
Such questions as:
- How noisy is my sensor?
- How fast is the queue growing?
- and When did the valve close?
simply cannot be answered by any tool that stops the execution.
These questions deal directly with the dynamic behavior of the real-world system. They can only be answered with a dynamic tool that analyzes your program while it's running. The same is true of most embedded systems. Many critical real-time systems cannot be stopped at all. For example, stopping a network system to discover how long a work queue is getting may cause it to immediately overflow. Placing a breakpoint in the control software for a moving robotic arm could be dangerous. Other operations take place over time. For instance, a wafer stepper executes many steps of a process in order. You may need to watch the behavior of the system during the entire sequence to evaluate its operation.
Real-time monitors answer the dynamic performance questions. They don't stop the system. They collect live data and display them in real time. There are several types for different types of problems; the most common are OS monitors, data monitors and profilers.
Harness the beginner's mind.
Most debugging is the process of learning more and more about your application. Marathon, unbroken debugging sessions that examine every detail are sometimes the only way to gather sufficient information and enough focus to identify and tackle a bug.
But there is a trade-off. During the focused marathon, your mind closes to possibilities. Assumptions become unquestioned facts. You begin to know things that just aren't true. You don't consider possibilities that you rejected for unsound reasons.
So, when you're really stuck, when you've looked at the problem in every way you can, when every tool in your toolbox has failed you, take time off to think. Go ride a bike. Take the afternoon off and come back in the evening to give yourself a new perspective. Explain the problem to a friend, a manager, a dog or the wall. The external challenge to your assumptions is often the catalyst that breaks the barrier down.
Debugging is an art. But like most art, success is a combination of talent, experience and mastery of the tools of the trade. The secrets of debugging are not mysterious. Know your tools. Avoid the common, dangerous mistakes if you can, then check to make sure you've succeeded. Be proactive. Save data to help you compare the past with the present, even though nothing has changed. Keep your eyes and your mind open.
And accept the responsibility of excellence. No system should ship without:
- Testing for memory leaks and corruption
- Scanning for performance bottlenecks
- Collecting a variable trace of key sequences
- Ensuring sufficient CPU bandwidth margin
- Evaluating test coverage effectiveness
- Tracing OS resource usage
- Recording execution sequences (semaphores, process interactions) of key activities
- Checking for error returns from common APIs.
Stan Schneider (firstname.lastname@example.org) is CEO and Lori Fraleigh (email@example.com) is senior software engineer at Real-Time Innovations Inc. (Sunnyvale, Calif.).
See related chart