Why this matters:
A site reliability engineer (SRE) is responsible for managing company systems and ensuring their reliability. As a company grows, its systems will require necessary upgrades to match the operational needs. SREs must know how to determine that an existing system can accept new features, using resources like service-level indicator (SLI) and service-level objective (SLO) to assess the system’s reliability.
What to listen for:
- Knowledge of the correlation between SLIs and SLOs when measuring service reliability
- Experience in successfully implementing new features in a system
- An explanation of how an optimized system helps streamline company operations
Why this matters:
Automation is a key component of site reliability engineering, as SREs utilize it to develop code quickly and efficiently for systems. Because automation has the capability to remove repetitive duties and speed up company IT operations, SREs must know how to create reliable automated solutions that have minimal production incidents.
What to listen for:
- Ability to build automated solutions through code
- Demonstrated experience in building automated system solutions
- A clear understanding of how automation benefits a company’s daily operations
Why this matters:
Although the ideal is for systems to always work as expected, it’s normal for SREs to be faced with an occasional production incident. In order to ensure that these do not disrupt a company’s IT operations, SREs must know how to identify the issue at hand and resolve it using their knowledge of existing systems and infrastructure.
What to listen for:
- Experience addressing production issues with effective solutions
- A willingness to further investigate issues to prevent future disruptions
- Evidence that an SRE candidate’s system implementations helped reduce incidents
Why this matters:
Certain system upgrades or additions sometimes require more resources from a company in order to come to fruition. SREs must be able to communicate these needs to their supervisors while providing a detailed explanation of how a larger budget could help with project development in the long run.
What to listen for:
- Strong verbal communication skills
- Experience budgeting and getting approval from supervisors and other key stakeholders
- Understanding of how a given budget ensured successful project development
Why this matters:
Over time, qualified SREs gain a wealth of knowledge in the complexities of system production and development. In order to support other department teams that may not have the same level of understanding, the SRE must contribute to company guides so that the information is widely accessible to anyone who needs it.
What to listen for:
- Demonstrated experience in supporting team members
- A portfolio of guides that help other employees internalize key information
- An explanation of how SREs contribute their expertise for company-wide learning
Why this matters:
Because errors and delays can affect system performance in the future, SREs must know how to properly address them in a way that minimally disrupts company workflow. This question allows candidates to share their experience in identifying and mitigating system issues, and explain how their thought process led to a specific solution.
What to listen for:
- Experience in finding and resolving various types of system issues
- A clear, step-by-step explanation of how the candidate removed the bottleneck
- An ability to facilitate post-incident reviews
Why this matters:
When assessing existing systems to develop better solutions, an SRE must be able to communicate effectively with multiple departments, especially IT and operations. Each department can provide unique insight into how a system can be further improved, but it is up to the SRE to facilitate those changes while guiding others through each step of the process.
What to listen for:
- A collaborative approach to communicating with different departments
- An ability to explain complex, technical concepts in digestible ways
- Demonstrated leadership in advocating for system changes
Why this matters:
To ensure that a company’s systems and applications run seamlessly, an SRE must constantly find ways to improve the infrastructure. This can lead to multiple projects that must be kept track of. In order to meet appropriate deadlines and implement updates in a system, the SRE must take note of the duration of each project and the action items that need to be completed.
What to listen for:
- Strong project management skills
- An ability to prioritize tasks according to their importance
- Organizational skills
Why this matters:
Technical upgrades provide enhanced functionality, which can allow employees to optimize their workflow and maximize productivity. This question gives SRE candidates an opportunity to explain how certain technical improvements align with a company goal — and how they can continue to facilitate upgrades to help employees meet important objectives.
What to listen for:
- An ability to identify how new upgrades or features can help the company meet its goals
- Explanation of how technical upgrades help improve company operations
- Strong strategic and analytical thinking skills
Contact a sales specialist.