Talk about DevOps is on the rise, and technologies like Docker, Rudder, as well as continuous integration tools like Jenkins are entering center stage. As DevOps practices become ubiquitous, it’s important to understand what a DevOps engineer does, and how to optimize those processes.
Since the best advice comes in sevens, I’ve created a list of seven scenarios DevOps engineers face daily:
1. Find the root cause.
Having an unknown root cause is a problem that’s mainly faced by those new to DevOps. Nothing stops working on its own, there’s always a root cause. If you know that something suddenly changed or abruptly stopped working, you also know that there must be something affecting the environment. A DevOps engineer should always strive to resolve issues by identifying and addressing the root cause.
2. Isolate the issue.
Applications today have complex infrastructure and architecture. If an application is facing an issue, then it is essential to determine the all contributing factors. It may be a combination of multiple issues, which may or may not be correlated. In the case of multiple issues, these issues may be a chain of events, collateral damage or it could be what we call a snowball effect. Specific metrics to be considered are: size of data, number of users, and site usage when issue occurred. Collecting these factors will help isolate the issues, which in turn will lead to a faster resolution.
3. Prioritize tickets through context.
The ticketing system is an integral part of a healthy DevOps practice. It helps the DevOps engineer to quantify and track his/her work. When the DevOps engineer handles multiple tickets, the probability of getting different priorities jumbled increases. The key for solving this problem is to understand that priorities are governed by more than just the drop down priority list in your ticketing system. You have to consider the origin of the ticket, the business driver associated with it and the stakeholders impacted by the issue. An urgent priority for one ticket may mean that it needs to be completed within a day or so, whereas for another ticket “urgent” could mean it needs to be dealt with within the hour. The ultimate priority and handling of a ticket should always be driven by a combination of urgency, business impact and context.
4. Confirm application status in realtime.
The confirmation for any process or application status should be checked and conveyed in realtime. The status has to be derived from near realtime data otherwise the current state of the application or process is unclear. Make sure you communicate the status of an application properly by providing the most up-to-date information.
5. Check. Then recheck.
If everything looks good, check again. When all checks are looking good, check again to ensure that everything is really going well, and the monitoring system itself is not malfunctioning. No one ever said, “I wish I hadn’t checked just one more time.”
6. The end user experience trumps all.
DevOps engineers often need to schedule maintenance for a variety of reasons, e.g. to upgrades infrastructure or roll out new code on a production environment. While the goal is always to execute maintenance without any downtime, it is also important to consider user experience when there is downtime. All the meticulous planning leading up to an “upgrade” is negated if the end-user experience is downgraded. Success of a maintenance window is measured by its user experience. If you know there will be downtime, let the user know with as much notice as possible.
If a maintenance window requires a website or an application to incur downtime, success of the operation is measured by the user experience once the site is up and running again.
When planning for maintenance with or without downtime, seasoned DevOps engineers always consider the end user experience before, during and after the event.
7. Zoom in and zoom out.
It is almost always advisable to take a step back and consider the big picture when considering application infrastructure. Likewise, when trying to troubleshoot issues or optimize and environment, you’ll want to zoom in as much as possible to parameters at the most granular level possible. The butterfly-effect is very real for DevOps: modifying small aspects of the application infrastructure, can have far reaching effects on application performance or stability.
It is highly recommended to make small changes and monitor the impact across the entire environment before moving on.
Hopefully this advice is useful for our DevOps industry peers. We would also love to hear your best practices, tips and tricks in the comments.