Systems, software, hardware, networks, processes, and people all fail. Designing systems to tolerate failure, avoid cascades, and be easily repairable on failure is hard, but very possible. This talk focuses on principles, designs, techniques, organization, and even some tools which serve to create failure tolerant systems.
Session type: Infrastructure