The motivation (inspiration) to write this article came from recent discussions with traditional military and aerospace personnel and, to some extent, with commercial companies interested in flying commercially-available hardware in space. Its purpose is to shed some light on the reasoning behind the selection of hardware for space application with emphasis on COTS. Although the basic question these companies asked centered on whether a military grade single board computer (SBC) could be used for space missions, it was asked in different ways.

George Romaniuk, Director, Space Product Management, Aitech Defense Systems

What would it take to make my SBC space qualified?

Let’s start with some clarification of commonly used phrases like “Space Grade” or “Space Qualified” hardware. It’s unfortunate that these phrases are oftentimes used universally, without thinking about all the issues and considerations associated with the application of hardware for a specific space mission. (Figure 1).

Why are we talking about “Space Grade” parts to begin with? Sending hardware to space was—and is still—very expensive, so the objective is to maximize the mission’s success and lifetime. In order to achieve this, we need to look at the reliability of the parts we would like to use for the mission. (Note, we do not address the effects of radiation here, as it’s addressed later in the article.)

The traditional approach to obtain Electrical, Electronic and Electromechanical (EEE) parts with desired reliability was based on parts that were well-designed both electrically and mechanically as well as manufactured using the same process and materials in quality controlled production lots. These parts were later subject to a screening process intended to identify and remove those few parts that exhibited infant mortality failures.

An important aspect of the screening process is the electrical test performed at three point temperatures (minimum, ambient, maximum), since COTS parts are not typically tested at these three temperatures. Part failure is declared when, after exposure to temperature cycling and dynamic burnin, the electrical parameters exceed the range published in the data sheet.

The electrical measurements are evaluated for parameter “drift” or changes that occurred when parts were subjected to screening. The parameter drift has to be limited, otherwise one may extrapolate the drift and argue that the part will be out of specification within the expected lifetime. Parts with excessive drift are declared failing.

There is a limit on the number of parts from the same lot that may fail the screening. If this limit is exceeded, the entire lot is discarded because this points to something having clearly gone wrong during the manufacturing process. The screening is not intended to find the few good parts, but to verify lot integrity and remove a few bad parts (typically we should see 1% to 2% failure rate).

It’s often asked if the board can be screened, instead of the parts, but it is then impossible to perform parametric measurements of parts at the board level and calculate the drift. The testing at a board level can’t be compared to testing at the component level. Although board level testing may be an acceptable approach for certain missions willing to accept high risk, there are several critical parameters that can will be missed using board level screening.

Is the screening sufficient to use the part for my space mission?

If the parts are determined to be free from infant mortality failures, but we still don’t know if they will meet the useful life expectations for a space mission, so we look to the qualification process by randomly select a small subset of parts and subjecting them to life tests. All parts must pass the life test. If one or more failure is encountered, the entire lot is discarded.

NASA published a document titled “EEE-INST-002: Instructions for EEE Parts Selection, Screening, Qualification, and Derating” that classifies parts into three levels, based on their reliability:

  • Level 1 for missions 5 years or longer
  • Level 2 for missions 1 to 5 years
  • Level 3 (high risk parts) for programs less than 1 year to 2 years.

This classification requires additional screening of military parts manufactured according to the MIL-STD-883 to meet even the requirements of Level 3. Therefore, some companies refer to Level 1, 2 or 3 parts as “Space Grade,” but it is evident these parts have different reliability bounds as well as preferred area of applications. It will be very risky to use Level 3 parts for an 18-year mission.

When radiation effects are added to this mix of complexities, it’s clear why parts should be evaluated for each space mission rather than indiscriminately called Space Grade parts and use for every mission.

Can we talk about COTS for space?

Sure, COTS is truly amazing as far as the computational performance and functionality is concerned, and these parts are typically small and inexpensive. The reliability of COTS parts is getting better and better, mainly driven by automotive applications, so why aren’t they used in space? The big problem with using COTS EEE parts in space is that manufacturers don’t characterize parts sensitivities to radiation effects.

First, let’s take a look at the big picture: an electronic system design for commercial, military and space applications. These are the “dimensions” of our activity: Table 1.

The design activities for the COTS missions are practically independent, but this is not the case for a space system design, where radiation effects and mitigation of these effects form a common thread between all of the design activities. This common thread requires a good teamwork and agility in order to accommodate feedback from other members related to mitigation of radiation effects.

A typical example is an increase in the wall thickness of the system chassis/enclosure to accommodate EEE part(s) with a lower total ionizing dose (TID). This, in itself, may cause an excessive increase in mass, so a more detailed radiation analysis will be required to provide new guidance for the parts placement, which may be in conflict with the optimal electrical or thermal placement.

What are radiation effects?

There are three broad classes of radiation effects to consider in evaluating the applicability of COTS for a space mission.

1. Total Ionizing Dose (TID) is related to damage of the device resulting from long-term exposure to protons, electrons and heavy ions. Some devices, like bipolar transistors, are very sensitive to ionizing radiation applied at a low dose rate (as is typically the case for most of space missions). The use of aluminum or other metals shield the EEE parts from the TID to a large degree. All EEE parts suffer from TID, but the smaller the device geometry, the higher the TID tolerance that should be expected, except for FLASH storage devices.

2. Displacement Damage (DD) is related to the disruption in the crystal lattice structure of the semiconductor device. It is a non-ionizing damage. The DD mainly affects bipolar transistors, solar cells, LEDs, laser diodes and optocouplers.

3. Single Event Effects (SEEs) are related to direct or indirect ionization of a sensitive area of the semiconductor circuit. There is a long list of specific SEEs, but the most common to note are: a. Single Event Latch-up (SEL) causing high current flowing through the device b. Single Event Upset (SEU) causing aa change of state in flip-flop or memory cell c. Single Event Transient (SET) occurring both in analog and digital circuits

The typical design process with COTS EEE parts should start with a parallel evaluation of the parts’ reliability as well as checking for the presence of forbidden substances within a part and, last but not least, radiation testing of a candidate part. Radiation testing should characterize the part for sensitivity to TID, protons and heavy ions.

What do we expect from the proton test?

Proton testing offers an easy way to look into device degradation as a function of TID, and evaluate the SEE in a limited range of ionizing energies. There are very good NASA and JPL guidelines for testing EEE parts with protons [1], [2]. Access to a proton beam is less cumbersome than access to a heavy ion beam, therefore it’s used as a first step in parts evaluation for space missions.

We should expect parts failing the radiation test and bring few similar parts to the test and select the best one. The proton test may demonstrate several surprising results, including:

  • your favorite switching power supply fails destructively after few seconds of exposure to high energy protons
  • a newer microcontroller with a rich set of peripherals loses functionality after <400rads
  • an older microprocessor passes 100krads with minimal degradation.
  • For the EEE part that survived the test without destruction or significant degradation, the vital statistics are the sensitivities (cross-sections) calculated as the number of observed SEE divided by the fluence of protons (typically 1E10 or 10 billion per square centimeter). The cross-section calculated from this experiment used to be fairly representative of the part sensitivity, but with smaller geometries, we‘re hitting lower and lower percentages of transistors with 1E10 protons/cm2. The detailed analysis of this underlying phenomenology is presented in [4].

    What we do with proton testing results?

    Hopefully after the proton test, we have parts that didn’t suffer from SEL and survived the expected TID within a reasonable margin. The testing with protons gives us visibility to the device sensitivity in a narrow range of the Linear Energy Transfer (LET) spectrum, which defines the amount of energy an ionizing particle transfers to the material traversed per unit distance.

    The LET encountered in testing with 200MeV protons does not exceed the 15MeV*cm2/mg, but the mission may need characterization to LET of 35MeV*cm2/mg (the typical value for LEO missions) or higher. To perform such characterization, one needs to perform testing with heavy ions, which is more complicated than proton testing, mainly due to very low ranges of the ions in silicon.

    The proton test will most likely uncover SEFI (Single Event Functional Interrupt) in the candidate parts and, with well-designed test boards, ways of mitigating them (reset, power cycle) will be determined. Electrical designers use this information to design the circuits and, in the reliability analysis, to predict system availability values or upset rates.

    The SEU sensitivity allows upset rate calculations for the mission and establish the proper mitigation, such as memory scrubbing, ECC (Error Check & Correct), TMR (Triple Module Redundancy), etc.

    Are these radiation tested parts good for my space flight?

    The radiation test and the analysis will determine the TID as well as find if the part is free from SEL in the LET range specified for the mission and will establish modes and frequencies of SEE and SEFI. The information about SEE and SEFI will be used by the electrical and software design teams to find mitigation means. Some SEFI modes may prevent the part from being designed-in and the same applies to SEE, but to a lesser extent, particularly to SETs on the outputs of the voltage regulators.

    The part has to be approved by the reliability team for its expected failure rates. This assessment is based on the available qualification data from the manufacturer (not all parts have this information available) and on the construction analysis (also known as Destructive Part Analysis).

    All these tests and evaluations take some time. Once you are satisfied with the results and ready to purchase parts for screening, you need to recognize and address some other design issues, like making sure the radiation tested parts are the same as parts procure for screening. Manufacturers tend to perform die shrink, which changes the radiation performance of parts. Some manufacturers fabricate the part in a few locations around the world, with each location using a slightly different process. So, parts from multiple locations are packaged in one facility and marked the same way, but most likely, will have different radiation performance.

    During the screening process parts are subject to electrical test. The data sheet for simpler parts may not even show an equivalent electrical circuit, only a block diagram, yet the part typically has lot more functionality than what is depicted. (Figure 2). For example, some memory ICs have built-in ECC not mentioned in the data sheet or redundant circuits inserted to improve the yield. These circuits may mask the degradation of the device during the screening. It is good to inform the component manufacturer of your intentions and ask for suggestions; you may learn a lot from them.

    When would you use COTS for space missions?

    As we have shown, getting COTS EEE parts to level 1 or 2 is quite expensive, time consuming and still risky (candidates may fail). The best use of COTS in space is if components offer a performance level or functionality not available from the existing portfolio of high-reliability, radiation characterized parts. No matter how the question is asked, the proper use of COTS requires a critical look at the mission itself, and how the parts are expected to perform.

    1. Proton Single Event Effects (SEE) Guideline, Kenneth LaBel 2009
    2. Proton Test Guideline Development – Lessons Learned, NEPP 2002
    3. Guideline for Ground Radiation Testing of Microprocessors in the Space Radiation Environment, Farokh Irom JPL 2008
    4. Proton Testing: Opportunities, Pitfalls and Puzzles, Ray Ladbury, NASA Goddard Space Flight Center

    AiTech
    Chatsworth, Ca
    (888) 248-3248
    www.rugged.com