Configuration Testing on a Fleet of Unmanned Ground Vehicles
Up until now, most of our discussion at UpGuard has been on configuration testing in an enterprise context.
And why not? These systems are complex, touched by many people, and they can be critical in the day-to-day runnings of a large company. This post, however, is about changing out of our suits and ties, and putting on a lab coat to look a little closer into the toys current academics play with, called robots. Robotics systems are becoming more mainstream, and put into more and more industrial and public-facing situations (military, mining, space exploration, deep sea exploration, Google's driverless car), and will continue to become more mainstream for the foreseeable future. The US military have their attack drones. Australia is a world leader in autonomous mining (yes, those big trucks with wheels the size of a human are being driven around mining sites autonomously). NASA is sending more and more awesome rovers to Mars, and they plan on exploring Jupiter and Saturn's moons with similar probes.
I, for one, welcome our new robotic overlords.
However, robotics systems will likely be just as complex, or even more so, in the coming years.
Ignoring the materials science, the mechanical design, and electrical hardware and the circuitry that goes into a robotic system, and while just focusing on the networking and software components, robots are complex machines. They will continue to grow in complexity as they are made to function in more specific environments and adhere to strict safety regulations.
When a system goes down in the enterprise world it can cost companies hundreds of thousands or even millions of dollars, maybe a few customers, and a bit of a bruised ego. When robotic systems are unleashed into an industrial context and misbehave, they can cause real physical destruction of objects, or even cost lives.
Industrial robotics engineers are going to have to adhere to very high standards in the coming years. Unlike current software, any minor mishap with a robotics system could snowball into public distrust and set the young industry back a number of years. Imagine the potential public reaction to the Google car autonomously, but accidentally, running over and killing a person and the public image of driverless cars everywhere. Compare that to a human accidentally running over a person and killing them...
Public facing robotics systems are going to have be designed, built, and maintained at the highest standard and one of the most overlooked parts of this quality comes down to knowing the state of your software environment and being able to validate it, at will.
Let's step into the lab and have a look at a pretty simple robotics platform relative to industrial public facing robots out there:
The team of engineers and researchers in the Mechatronics group at UNSW use a fleet of UGVs (Unmanned Ground Vehicles) for teaching and research purposes.
The robots are designed to autonomously explore a large urban area and generate a 3D virtualized representation of that area back on a base station computer.
Looking under the hood, each robot has a number of sensors that enable it to sense the world around it. These include webcams, laser range finders, an IMU (Inertial Measurement Unit), wheel turn and speed readers on the onboard DMC (Digital Motion Controller), and an Xbox Kinect. In addition the robot has a number of actuators that allow it to move and interact with its surroundings. These cover the drive and steering motors for the wheels as well as other motors that assist some of the sensors in being able to sense with a greater field of view. A pretty standard rig for ground vehicle applications.
Here's where we get to the complexity in the system: each sensor and each actuator has a specific software module designed to interface with each piece of hardware. The system also has many other small software modules, which handle data processing and decision making. At the center of it all is a circular database system called Possum, which allows data to flow between modules and auto-replicates data between robots and the base station computer.
Each of these software modules on each of the robots and base station computer have one or more configuration files associated with them. Each machine has a set of startup scripts that make sure the correct programs are started for a particular scenario, and that these programs are started in the correct order.
Each hardware and non-hardware related software module relies on certain system settings to be in place such as correct IP addresses, subnets, and gateways being set. Some of the older sensors still make use of COM ports, so these need to be validated between the operating system and configuration files.
For the team of engineers who maintain this robotic system, configuration errors are the most frequent, ongoing problem. When a piece of software misbehaves, this functional error can be identified and fixed permanently. Configuration errors, however, re-occur due to the system being used in different contexts, and therefore requiring different configurations, on a regular basis. A constantly changing configuration makes it difficult to confidently know which state the system is set up for and manually validating a configuration is a time consuming process.
At the recent Australasian Conference on Robotics and Automation we presented work that showed time taken to troubleshoot configuration problems on this system. In short, we created a chaos script similar to Netflix's Chaos Monkey, which was used to break a random, but critical, configuration item within the system. An experienced engineer was then asked to diagnose the problem using any resource they would normally have access to. On every second attempt the engineer was additionally allowed access to a UpGuard test package that covered the entire configuration space of the robotic system. This test was run 50 times over 111 configuration items. The time taken for the engineer to diagnose and solve the problem is shown by the graph below:
Red crosses indicate test attempts without the aid of the UpGuard test package, while blue circles indicate tests attempts where UpGuard was allowed to be used. For those of you playing at home, a total of around 5.5 hours was taken over 25 attempts under normal conditions, compared to 45 minutes with the use of UpGuard over the same number of attempts.
Although this post has discussed how UpGuard can greatly assist in validating a robot's software configuration for troubleshooting purposes, it is also being used as part of routine pre-experiment checks, and to validate software environments of new robots that join the fleet.
Follow UpGuard on Twitter
 J. Guivant, S. Cossell, M. Whitty and J. Katupitiya, "Internet-based operation of autonomous robots: The role of data replication, compression, bandwidth allocation and visualization", Journal of Field Robotics, Vol. 29, No. 5, pp. 793-818, September/October 2012. (DOI: 10.1002/rob.21432)
 M. Whitty, S. Cossell, K. S. Dang, J.Guivant and J. Katupitiya, "Autonomous Navigation using a real-time 3D point cloud", Australasian Conference on Robotics and Automation, Brisbane, Australia, December 2010.
 J. Guivant, "Possum robot", http://www.possumrobot.com, 2012.
 S. Cossell, "A novel approach to automated systems engineering on a multi-agent robotics platform using enterprise grade configuration testing software", Australasian Conference on Robotics and Automation, Wellington, New Zealand, December, 2012.