Embedded Device Robustness Checklist (Standard 7 SP1)
7/8/2014
Microsoft Corporation
June 2010
Summary
This technical article provides a basic checklist for embedded device robustness.
Applies To
Windows Embedded Standard 7
Introduction
Robustness Audit
Are unnecessary services turned off?
Is your device serviceable?
Is your device secure?
Can your device recover from a failure?
Are you deploying stable applications and drivers?
What user accounts are used on your device?
Is your device protected with EWF, FBWF, or HORM?
Conclusion
Introduction
According to a U.S. Department of Defense report, too often devices lack an appropriate level of robustness testing: "Many Department of Defense (DoD) programs engage in what has been called 'happy-path testing' (that is, testing that is only meant to show that the system meets its functional requirements). While testing to ensure that requirements are met is necessary, often tests aimed at ensuring that the system handles errors and failures appropriately are neglected" (Cohen, J., et al., Robustness Testing of Software-Intensive Systems: Explanation and Guide, April 2005).
This same concern for a lack of robustness testing applies to embedded devices, regardless of the platform.
This article is a basic checklist for Windows Embedded Standard 7 device developers to help ensure their devices will be robust in the field. It's common for embedded devices to remain in the field for years, in some cases for a decade or more. With such extended lifespan, a device might experience a few surprises as the technology changes. A device is considered robust if it can cope well with unforeseen circumstances.
This checklist covers more than just basic software-development concepts, such as security and serviceability. However, this list is not intended to be comprehensive. Instead, it's based on lessons learned from devices used in the field today. It also takes into consideration one-off important events in the life of a device, such as whether it can handle a sudden loss of power. If you don't know the answer to how your device will handle some of the concepts or questions in this checklist, you should consider testing your device specifically to determine how it will perform.
Most of the tests described in this article can be performed without special software.
Robustness Audit
Are unnecessary services turned off?
- If you are building a headless device that includes the Themes service, have you considered turning off that service?
- If your device is not expected to use multimedia-centric services, yet it includes the Windows Audio service, have you considered turning off that service?
While building a device image using Image Configuration Editor or Image Builder Wizard, there will always be some packages added to the device run-time that contain services. Some of those packages might have been added to satisfy dependencies of other features, or for third-party applications. Yet those packages might also include services not required to satisfy a dependency, and are therefore unnecessary.
In those cases when a service is not required for your particular device, we recommended that you turn off that service if it's possible to do so without impacting the design scenarios for your device. Turning off unnecessary services has potential benefits in run-time performance improvement and the reduction of the attack surface area of your device.
When you decide to turn off a service, it's highly recommended that you determine if there are any other services dependent upon it. It's also recommended that you test your user scenarios with the service turned off to ensure you do not inadvertently interfere with your device's functionality.
You can use one of the following methods to view running services, and to turn off unnecessary services.
Using the Services Management Console Snap-In (services.msc)
The Services snap-in lists the name of each service, a description the service if one has been provided by the service's author, the status of the service, and the startup type. It will also allow you to change the state and startup type for all the services on your device.
To launch the services snap-in:
- On the Start menu on your device, type services.msc in the search box, and press ENTER to launch the Services snap-in.
To display more information on a particular service:
- In the Services window, right-click on the service name, and select Properties.
To stop a running service:
- In the Services window, right-click on the service name, and select Stop.
To disable a service so it no longer starts:
- In the Services window, right-click on the service name, and select Properties.
- In the Application Experience Properties dialog box, select the General tab.
- In the Startup type list, select Disabled, then click OK.
Using Tools from the Command Prompt
If you do not have the Services Management Console snap-in on your device, you can use a combination of two command-prompt tools. In the examples below, servicename is the name of the service you want to affect.
To list the names of all running services:
- At a command prompt, type net start.
To stop a running service:
- At a command prompt, type net stopservicename.
To disable a service so it no longer starts:
- At a command prompt, type sc config servicename start=disabled.
Is your device serviceable?
- What is your servicing plan for routine updates to your device's operating system?
- What is your plan to apply a service pack to your device in the field?
- Have you considered the need to update your device's third-party applications and drivers in the field?
Over the lifetime of your device, updates will almost certainly be required to be deployed while it's in the field. It's much more costly to require a technician to visit each device in the field to install updates than it is to set up your device to be remotely serviceable. Lack of service-planning requirements has caused a lot of pain for some OEMs when their field devices need to be serviced and they realize they don't have a cost-effective service plan in place. The most common fallback plan has been to send technicians to every device in the field.
Many device builders rely on Windows Update to keep the device up to date with security fixes. However, you still need to ensure that you have a method of deploying hotfixes, application and driver updates, and any other updates that can't be distributed by Windows Update.
Another concern is service packs. After several years in the field, most likely there will also be service packs available for your devices and you'll need to decide whether you want to deploy them. Because service packs are significant in both size and scope, it's not an update scenario to take lightly. The benefits are often significant, but there is potential downtime while your devices are updating, and each device might require one or more reboots before everything is fully updated. Because of the risks associated with deploying service packs to embedded devices in particular, update scenarios involving service packs should be thoroughly tested in your lab before being done in the field.
Is your device secure?
- Will your device be physically secure? Will the public have general access to your device?
- Have you considered applying or leveraging security features in the operating system, features such as Bitlocker, to protect data on your device?
- If your device is plugged into a network, do you have antivirus software installed?
- Are you locking down your device through Group Policy settings?
No device developer wants to see a picture of their public-facing device on the internet showing how some creative user was able to gain access to the system and deface it. It can happen, but there are some things you can do to reduce the risk. For example, you might consider these security tips: disable all of your hotkeys, require users of all public-facing devices to be logged in as a user account to limit access to your system, and consider replacing the Explorer shell with your own custom application.
If your device has external interfaces such as USB ports or SD slots, you might consider disabling those through the BIOS, through Group Policy, or by physically disconnecting them from the system board. If the ports must be accessible for certain scenarios (such as installing local updates by plugging in a UFD, for instance), then you might consider limiting user access through Group Policy settings.
If sensitive data is stored on your device, consider using Bitlocker, a whole-disk encryption system. For more information, see this Microsoft website.
Implementing Applocker might also help you ensure that only authorized scripts, applications, and libraries are used on your device. For more information, see this Microsoft website.
Deploying antivirus software to your device is also a wise decision if your device is plugged into a network. Do not rely on write filters (EWF and FBWF) to act as a "free" antivirus solution. It's still possible for a device to become infected while the media itself is protected. In that state, your device could still be made to act as a zombie and infect other devices or computers on the network. Also, if you commit changes to disk from the overlay while the device is in an infected state, it will be permanently infected until you clean it. At that point, the cleaning you will need to do will involve reformatting your device and deploying a fresh image to it.
Can your device recover from a failure?
- Have you tested your device in Safe Mode?
- Can your device sustain a sudden loss of power? What does the subsequent boot experience look like when recovering from this?
- Are there logs or Crash Dump files you need to collect off your device when it recovers? Is your device write-protected and those files are prevented from persisting?
- Can the applications on your device gracefully handle the loss of a network connection? What about the loss of connectivity while a transaction is being performed across the network?
Knowing the answers to these questions will not only help you improve the ability of your device to recover from unexpected problems but can also help you diagnose catastrophic failures when they do happen.
Many devices exist with no means of determining why they fail in the field. If a technician is sent to service a device, the technician might not be able to reproduce the problem, only to return to that same device again and again when it continues to exhibit the same symptoms over and over. Implementing Crash Dumps can be a great start in determining whether a driver or application is causing the device to become destabilized, and to pinpoint the source of the trouble.
If you are using a remote client management suite such as System Center Configuration Manager (SCCM) or Altiris, for example, this may also help you to recover the device and diagnose failures in the field.
Are you deploying stable applications and drivers?
- Are you using Windows Hardware Quality Labs (WHQL) certified drivers?
- Will your custom applications handle malformed input? Have your custom applications been tested with Application Verifier?
- Have your custom drivers also been tested with Driver Verifier? Have the WHQL tests been run against your custom drivers?
If you're developing custom drivers, we recommend you use the tools Microsoft provides for driver developers on this Microsoft website. These include Driver Verifier, Static Verifier, and Prefast. If you require WHQL certification for your custom drivers, you should start with the information provided on the Windows Logo Program site.
If you're developing custom applications, once again Microsoft provides development tools to help you. Application Verifier, for instance, is a run-time verification tool that helps you find programming errors; this tool is available in Visual Studio. For more information, see this Microsoft website.
Finally, it's important that you consider performing endurance testing for your device. After prolonged use, a device's applications and drivers can develop memory leaks and other problems. Endurance testing will help you understand where potential problems might arise over time.
What user accounts are used on your device?
- The default setting for various user accounts on your device might be satisfactory, but are there unique permissions you can apply that further restrict those accounts?
- Have you disabled the Guest account?
- Have you disabled any unneeded account groups?
It's not unusual to find field devices where a given user of the device is using Administrator privileges in order to work around some application limitation, or to enable a user scenario that is only available to the Administrator account. This can pose a significant problem for the security of your device's software and data, and can also defeat the security built into your operating system.
Is your device protected with EWF, FBWF, or HORM?
- Have you considered the RAM requirements for normal and extreme use of your device?
Write filters can redirect disk changes to an overlay, which can help maintain the system in its desired state over an extended period of time. However, if your applications routinely write to disk they will be consuming your RAM overlay. Consider allowing routinely updated files (for example, logs, etc.) to write through to the disk using FBWF to save on your overlay consumption.
Conclusion
Although this checklist is not comprehensive, a careful review of your device's robustness will help you avoid troublesome and costly issues in the field.
Additional Information
Windows Hardware Development Central - Tools for Testing Drivers