S4: Simple, Secure, Survivable Systems. Human-first crisis technology design principles
Below is an excerpt of a paper I wrote in April 2020 with my colleague Dr. Paul Gardner-Stephen, from Flinders University. I keep referring to the ideas in conversation, which seem relevant and time-sensitive. We hope they provide a useful framework. The full paper is currently under review for publication.
Information technology has become embedded in almost every area of modern life. The many complex digital systems that support modern societies are now highly dependent on the correct function of complex and highly interdependent technological systems. Digital tools are increasingly becoming part of traditional crisis response efforts by government and non-government organisations. While digital tools have substantial capabilities to enhance crisis response efforts, they also pose significant risks to user communities when deployed in time- sensitive, vulnerable and fragile crisis contexts — as part of an already complex system. These risks and inefficiencies have been demonstrated in the contact-tracing application debate in the response to the COVID-19 pandemic.
Technology must be intentionally designed and implemented, both to help solve the problem at hand and support end user communities. The principles of Simple, Secure and Survivable Systems (‘The S4 Principles’) offer a framework for technology that serves the interests of end-users and maintains human dignity, especially in crisis situations. The S4 Principles are already evident in a number of technology projects, across research, design, build and deployment phases. Instead of high-risk, ad hoc, reactive digital solutions, crisis responders can pre-emptively share information, invest and work with existing technology design and development experts that reflect the S4 Principles for efficient, effective solutions that enhance response capabilities both now an in future scenarios.
DESIGN PRINCIPLES FOR CRISIS TECHNOLOGY
The S4 Principles are the four properties that are vital for the long-term sustainability of critical capabilities, and therefore the strengthening of societies against existing and emergent shocks and threats. These four properties do not stand in isolation from one another, but rather support and complement one another.
It is expected that this set of properties and justifications will continue to evolve as the program is carried out as more is learnt about creating such systems, as well as respective strengths and weaknesses. Thus, the following provide a putative starting point which will be used to seed initial activity and will be informed by the results of the ongoing, collaborative research and industry engagement.
A. S1: Simple
Complexity makes every stage of capability delivery harder: design, construction, maintenance, troubleshooting and integration with other systems. Complexity also greatly increases the volume of knowledge that must be passed from generation to generation for long-term capability maintenance and has the effect of reducing the available number of people who are competent to install, maintain, repair or adapt a capability.
Growing complexity effectively makes security impossible to achieve. Once a system becomes more complex than a single highly-trained expert or small group of experts can comprehend, it ceases to be possible to efficiently reason about the security of a system. Once the complexity grows such that even a well-resourced team cannot fully comprehend a system, then security becomes impossible to achieve.
The best that is achievable in these circumstances is a heuristic approach to security, similar to that of the human immune system, where a combination of inherent and adaptive defensive mechanisms acts to increase the resistance to infection but are unable to prevent it. This is an untenable situation for critical digital infrastructure, because it is highly likely for a system and the capability it provides to be subverted, disabled or destroyed before any reaction can be mounted to stop the compromise.
Systems must be simple enough to avoid these problems. Alternatively, they must have a simple mode of operation that can be activated in the face of attack, so that the core capability can be sustained, even while the attack is identified and eventually defeated, at which time full capability can be restored.
In practice, this means that systems should have no more functionality (i.e., complexity) than is required to deliver the required utility (i.e, benefit to society or user). It also requires the simplification of external interfaces, so that the attack surface can be reduced as much as possible. This requires a radical re-think of how software-embodying systems are conceived, designed, implemented and maintained.
There are also related sub-themes of transparency (ability to inspect failure to understand how to correct it), documentation (ability for maintenance capability to be sustained between generations) and user expectation management (so that simpler systems are acceptable).
An environment where the value of simplicity is taken seriously is one where users enjoy and value the reliability of systems, even where for some uncommon special cases the end-result may be acceptably sub-optimal.
Simple systems also have the advantage of being potentially much cheaper to create, precisely because of their lack of complexity.
B. S2: Secure
As the geopolitical and criminal activity environment of cyber and cyber-physical systems in crisis continues to worsen, such as email hacking, phishing or geo-political attack, security continues to grow in importance [41], [42]. Therefore, security must become a first-order objective of system design. For critical infrastructure and systems, security should take priority over functionality.
Security requires simplicity to be achieved. As described under the theme of simplicity; if a system contains millions to billions of lines of code or transistors, it not rational to expect that even a large well-resourced team will be able to provide any guarantees of security. In contrast, a sufficiently simple system can be audited, verified and proven to be functioning correctly.
A key requirement for security in interconnected systems is that the interfaces between the systems much be simplified and hardened as much as possible. Simple protocols and data formats should be used unless truly unavoidable, as many security problems are directly related to faults in these systems. Innovative approaches to addressing such common security problems are required, such as automatic translation of protocols into hardware implementations that are much more strongly resistant to cyber-attack due to their lack of flexibility.
Consideration must also be given to supply chain security, as recently highlighted by Googles detection of malware being installed in the supply-chain stage of the development of smart-phones [43].
C. S3: Survivable
Survivability is the ability of a system to survive a shock or attack. Weak survivability implies only that a capability can be rapidly restored after a shock or attack, while strong survivability requires that a capability continues to be available during a shock or attack, although perhaps at reduced functionality, service level or capacity. This also requires that systems be secure.
Survivable systems must be able to operate when surrounding systems fail. That is, they must be able to fall-back to a stand-alone mode of operation, even if the result is sub-optimal performance. They must also be able to fall-forward to full- service when conditions again allow it.
Survivable systems must also be able to survive generational informational transfer, so that the capability remains creatable, maintainable and adaptable after its creators are no longer available. Making systems simple and well-documented is the best way to achieve this goal, so that the volume of information required is as small as possible, and as accessible as possible.
D. S4: Systems of S4 Systems
Society consists of systems of systems. For this complex web of interdependent systems and the capabilities they provide to continue to support society, they must be able to interact in a manner that avoids the bootstrap problem.
This is directly supported by creating survivable systems that are able to provide a base level of the capabilities that are required by other systems. Similarly, systems should be designed so that they can bootstrap using only the base-level of fall-back service that the capabilities that they depend on are able to provide in a standalone mode. Further, systems should avoid all bootstrap dependencies where possible, so as to completely eliminate the bootstrap problem.
Systems of systems create meta-systems with emergent properties of their own, including new fragilities and weaknesses that are not inherent in their component systems. Measures should be taken to avoid this where possible, including through measures such as simplifying the dependency and communications graph among systems. At the micro-scale of creation of individual systems and capabilities, this should take the form of minimising the number of dependent systems.
HOW S4 SOLUTIONS CAN BE SUPPORTED BEFORE THE CRISIS HITS
Crises take many forms, and vary considerably in magnitude, and trigger differing types of societal responses. The COVID- 19 crisis has, despite some missteps, elicited a much greater and more rapid societal response than many disasters before it. It has triggered nation-states to contemplate the international supply-chain interdependencies that support their critical infrastructure, as they witnessed their assumptions of continuity of supply being disrupted by the particular nature of the crisis. The S4 Principles offer a framework to design and adopt human-first digital infrastructure.
The challenge is to harness the temporary increased awareness and willingness by society to respond, to cause the necessary changes in practice and approach to problem solving, technology generation and systems design and implementation.
You can check out Paul’s S4 Systems at MEGA65 and Serval Project.