Identifying Hidden Dependencies
Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose.
You don’t need to write automation or deploy on Kubernetes to gain benefits from resilience engineering! Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose. We’ll discuss the initial manual experiments we ran, the bugs in our automatic replacement tools we uncovered, and what steps we needed to progress towards continuously running the experiments. Today, no node at Honeycomb lives longer than 12 months, and we automatically recycle nodes every week.
Author Name: Gremlin
Author Description:
Gremlin is a Chaos Engineering service on a mission to help build a more reliable internet. Their solutions turn failure into resilience by offering engineers a fully hosted SaaS platform to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss. Founded by CEO Kolton Andrus and CTO Matthew Fornaciari in 2016, the company has since raised $26.8Million in funding from Redpoint Ventures, Index Ventures, and Amplify Partners. Existi… more
There are no reviews yet.