Illustration of a woman standing at a desk working on a laptop

You might hire a site reliability engineer to:

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications

  • Improve reliability, quality, and time-to-market of the company’s suite of software solutions

  • Measure and optimize system performance, with a focus on pushing capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large-scale distributed software applications

Required skills and qualifications Preferred skills and qualifications
Ability to program (structured and object-oriented) with one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
Previous success in technical engineering
Experience with distributed storage technologies like NFS, HDFS, Ceph, and S3, as well as dynamic resource management frameworks (Mesos, Kubernetes, YARN)
Coding experience beyond simple scripts
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks