×

Improving a Distributed System Post-Incident

Add to wishlistAdded to wishlistRemoved from wishlist 0
Add to compare+
Duration

33m

level

Intermediate

Course Creator

Gremlin

Last Updated

14-Dec-22

Category:

In this session, we will dive into a case study of how a team can recover and improve a distributed system after a major incident.

Add your review

In this session, we will dive into a case study of how a team can recover and improve a distributed system after a major incident. Distributed systems are more prone to failure than other systems due to their incredible complexity and scale, and incidents are a fact of life with these systems. This year, my team faced a week long incident for our IP address management system which impacted out customers. From this incident, we had had to reevaluate our system’s performance & overhaul several keys areas of our codebase, as well as improve our monitoring, testing processes, database interactions, and reliability. Viewers will learn about these improvements and how they can apply them to their own systems to achieve greater reliability and performance. Additionally, viewers will learn how to effectively leverage monitoring practices to uncover inefficiencies in their system, tips for creating a testing process to properly stress your system before deploying to production, and how to rally a team together during a high-pressure incident.
Author Name: Gremlin
Author Description:
Gremlin is a Chaos Engineering service on a mission to help build a more reliable internet. Their solutions turn failure into resilience by offering engineers a fully hosted SaaS platform to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss. Founded by CEO Kolton Andrus and CTO Matthew Fornaciari in 2016, the company has since raised $26.8Million in funding from Redpoint Ventures, Index Ventures, and Amplify Partners. Existi… more

User Reviews

0.0 out of 5
0
0
0
0
0
Write a review

There are no reviews yet.

Be the first to review “Improving a Distributed System Post-Incident”

Your email address will not be published. Required fields are marked *

Improving a Distributed System Post-Incident
Improving a Distributed System Post-Incident
Edcroma
Logo
Compare items
  • Total (0)
Compare
0
https://login.stikeselisabethmedan.ac.id/produtcs/
https://hakim.pa-bangil.go.id/
https://lowongan.mpi-indonesia.co.id/toto-slot/
https://cctv.sikkakab.go.id/
https://hakim.pa-bangil.go.id/products/
https://penerimaan.uinbanten.ac.id/
https://ssip.undar.ac.id/
https://putusan.pta-jakarta.go.id/
https://tekno88s.com/
https://majalah4dl.com/
https://nana16.shop/
https://thamuz12.shop/
https://dprd.sumbatimurkab.go.id/slot777/
https://dprd.sumbatimurkab.go.id/
https://cctv.sikkakab.go.id/slot-777/
https://hakim.pa-kuningan.go.id/
https://hakim.pa-kuningan.go.id/slot-gacor/
https://thamuz11.shop/
https://thamuz15.shop/
https://thamuz14.shop/
https://ppdb.smtimakassar.sch.id/
https://ppdb.smtimakassar.sch.id/slot-gacor/
slot777
slot dana
majalah4d
slot thailand
slot dana
rtp slot
toto slot
slot toto
toto4d
slot gacor
slot toto
toto slot
toto4d
slot gacor
tekno88
https://lowongan.mpi-indonesia.co.id/
https://thamuz13.shop/
https://www.alpha13.shop/
https://perpustakaan.smkpgri1mejayan.sch.id/
https://perpustakaan.smkpgri1mejayan.sch.id/toto-slot/
https://nana44.shop/
https://sadps.pa-negara.go.id/
https://sadps.pa-negara.go.id/slot-777/
https://peng.pn-baturaja.go.id/
https://portalkan.undar.ac.id/
https://portalkan.undar.ac.id/toto-slot/
https://penerimaan.ieu.ac.id/
https://sid.stikesbcm.ac.id/