1. HAB Software Woes
John Graham-Cumming
September 2012
Or “My capsule didn‟t crash but my software did”
2. Background
> 30 years of
programming
experience
One HAB flight
◦ GAGA-1
http://blog.jgc.org/2011/04/gaga-1-flight.html
https://github.com/jgrahamc/gaga
3. Where‟s your flight‟s
complexity?
Example: GAGA-1
◦ One balloon, parachute, polystyrene box
◦ Many metres of cord attached with knots
◦ An off-the-shelf camera
◦ 2,836 lines of code
◦ Common to see defect rates of 2 to 4 per
KLOC
◦ So GAGA-1 likely has 5 to 10 errors in it
4. Real Stuff Seen on HAB
flights
Complete computer crash
Altitude going negative
Latitude and longitude garbled
Cutdown triggered in back of car
Long periods of no transmission
Not setting the GPS up before launch
Not turning the camera on
Running out of camera disk space
Altitude jumping around rhythmically
5. The Curse and Joy of
Determinism
Computers do what you tell them to
◦ Precisely what you tell them to
◦ Not what you think you told them to do
A Curse
◦ Will do things you don‟t expect
◦ Will process bogus input without
complaint
The Joy
◦ Easy to test that it does what‟s expected
6. HAB Is A Harsh Environment
Cold
Vibration
Stuff breaks in flight
Software needs to be able to cope with
failing hardware
Very important to think about failure
modes
YOUR CODE IS ON ITS OWN OUT
THERE
7. Deadly Sins
The “It works!” Fallacy
The Last Minute Change
Being Far Too Clever
Overlooking Odd Behaviour
Copying Other People‟s Code
Assuming Finding A Bug Solves The
Problem
8. The “It works!” Fallacy
If you‟re an inexperienced (and
sometimes experienced)
programmer…
◦ You hack some code together
◦ It works once
◦ You assume it will always work
Only solution to this is
◦ Testing
◦ Paranoia
9. The Last Minute Change
Never, ever change anything in code
at the last minute no matter how
simple.
Example: HABE 1
◦ Complete camera failure
◦ Maximum integer size in uBASIC on
CHDK is 999,999
◦ Last minute change of integer from
600,000 to 1,000,000 caused total failure
10. Being Far Too Clever
Example: GAGA-1
◦ Entered the wrong value of 2 * pi in code
to do GPS position conversion from
radians to degrees
◦ Caught before flight because I verified the
location of my own back garden
◦ Note to self: 2 * pi != 6.2818.
https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113
11. Overlooking Odd Behaviour
Example: GAGA-1
◦ In tests RTTY output was fine some of the
time, garbled at other times
◦ Turned out to be interrupts from the GPS
messing up the RTTY timing
◦ Solution: disable GPS serial interface while
sending RTTY string
ALWAYS BE HONEST WITH
YOURSELF ABOUT YOUR CODE
EXPECT THE SPANISH INQUISITION!
https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229
12. Copying Other People‟s Code
Don‟t do this, you have no idea what
you are copying or who they copied it
from
Better practice is to look at other
people‟s code and…
◦ Write your own version
◦ That you understand
◦ That you are able to test
◦ Example: GAGA-1
Read lots of people‟s RTTY code, wrote my
own
https://github.com/jgrahamc/gaga/blob/master/gaga-
13. APRS Tracker using copied
code
If the altitude in metres contained an 8 or a 9 the altitude reported would
be wrong
http://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html
14. Assuming Finding The Bug
Solves The Problem
Just because you‟ve found A bug
doesn‟t mean it was THE bug
Lots of research in computer science
shows bugs tend to cluster
Example: CLOUD1, CLOUD2
◦ Three bugs in printing latitude, longitude
and altitude
◦ One fixed on CLOUD1, …
15. “The One Thing I Didn‟t Test”
http://ukhas.org.uk/guides:common_coding_errors_payload_testing
17. You might never be a
great programmer…
… but you can be a
paranoid tester!
18. Good Things To Do
No infinite loops
Self-Checking
Unexpected Error Handling
Handle Exceptions
Simulation
Simplify, Simplify, Simplify
Unit Test
Write Log Files
19. No Infinite Loops
Never sit in a loop waiting forever
Example: ATLAS 3
while (1) {
// Make sure data is available to read
if (Serial.available()) {
b = Serial.read();
if(bytePos == 8){
navmode = b;
return true;
}
bytePos++;
}
// Timeout if no valid response in 3 seconds
if (millis() - startTime > 3000) {
navmode = 0;
return false;
}
}
}
https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L
20. Self-Checking
-- Now enter a self-check of the manual mode settings
log( "Self-check started" )
assert_prop( 49, -32764, "Not in manual mode" )
assert_prop( 5, 0, "AF Assist Beam should be Off" )
assert_prop( 6, 0, "Focus Mode should be Normal" )
assert_prop( 8, 0, "AiAF Mode should be On" )
assert_prop( 21, 0, "Auto Rotate should be Off" )
assert_prop( 29, 0, "Bracket Mode should be None" )
assert_prop( 57, 0, "Picture Mode should be Superfine" )
assert_prop( 66, 0, "Date Stamp should be Off" )
assert_prop( 95, 0, "Digital Zoom should be None" )
assert_prop( 102, 0, "Drive Mode should be Single" )
assert_prop( 133, 0, "Manual Focus Mode should be Off" )
assert_prop( 143, 2, "Flash Mode should be Off" )
assert_prop( 149, 100, "ISO Mode should be 100" )
assert_prop( 218, 0, "Picture Size should be L" )
assert_prop( 268, 0, "White Balance Mode should be Auto" )
assert_gt( get_time("Y"), 2009, "Unexpected year" )
assert_gt( get_time("h"), 6, "Hour appears too early" )
assert_lt( get_time("h"), 20, "Hour appears too late" )
assert_gt( get_vbatt(), 3000, "Batteries seem low" )
assert_gt( get_jpg_count(), ns, "Insufficient card space" )
https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96
21. Self-Checking
Example: ALTAS 3
Makes sure uBlox GPS will work at
high altitude; fixes it if not
if((count % 10) == 0) {
digitalWrite(6, LOW);
checkNAV();
delay(1000);
if(navmode != 6){
setupGPS();
delay(1000);
}
checkNAV();
delay(1000);
digitalWrite(6, HIGH);
}
https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L3
22. Unexpected Error Handling
def temperature():
t = at.cmd( 'AT#TEMPMON=1' )
# Command returns something like:
#
# #TEMPMEAS: 0,28
#
# OK
#
# So split on whitespace first to isolate the temperate 0,28
# and then split on comma to get the temperature
w = t.split()
if len(w) < 2:
logger.log( "Temperature read returned %s" % t )
return -1000
m = w[1].split(',')
if len(m) != 2:
logger.log( "Temperature read returned %s" % t )
return -1000
else:
return int(m[1])
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py
23. Handle Exceptions
If your language can generate
exceptions then you‟d better handle
them!
Example: GAGA-1
◦ Recovery computer used Python
◦ Exception could have killed it
◦ Global exception handler
except:
logger.log( "Caught exception in main loop: %s" %
sys.exc_info()[1] )
Bonus: What‟s wrong with that code?
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144
24. Simulation
Simulate a flight
Example: UKHAS wiki has example of
using a PC as a fake GPS
http://www.ukhas.org.uk/guides:common_coding_errors_payload_testing
Example: GAGA-1
◦ To test the embedded Telit module wrote
modules that faked the entire Telit Python
interface.
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.py
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py
25. Simplify, Simplify, Simplify
Make your code as simple as possible
Never have „duplicated‟ or „copy and
paste‟ code
Break it up into small functions that
you understand
Make sure you understand the
limitations of the functions you call
26. Unit Test
Break your program up into small,
separate functions
Write tests that call that function and
make sure it does what you expect.
Lots of ways to do this
◦ Use something like cpptest
◦ ArduinoUnit
◦ Write your own test program
27. Unit Test Example
In the bad APRS program
Turn metres to feet code into a
separate function: int m_to_f(int m)
assertEquals(m_to_f(1000),3300)
assertEquals(m_to_f(2000),6600)
assertEquals(m_to_f(3000),9900)
assertEquals(m_to_f(4000),13200)
assertEquals(m_to_f(5000),16500)
assertEquals(m_to_f(6000),19800)
assertEquals(m_to_f(7000),23100)
assertEquals(m_to_f(8000),26400)
assertEquals(m_to_f(9000),29700)
assertEquals(m_to_f(10000),33000)
28. Write Log Files
Write detailed log files to non-volatile
memory for post flight debugging
Data sent via RTTY or APRS is limited
Log exceptions and errors in detail
Make sure you have a timestamp
29. Perform system testing
Test your entire system before flight
◦ Put your tracker in the garden
◦ Get a GPS lock
◦ Listen to the RTTY on your radio
◦ Look at the decoded RTTY on your
computer
◦ Test uploaded data on the tracker*
◦ *I didn‟t do that step, on the day people
had to fix the tracker for me.