You might hire a site reliability engineer to:
- Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of the company’s suite of software solutions
- Measure and optimize system performance, with a focus on pushing capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large-scale distributed software applications
Required skills and qualifications | Preferred skills and qualifications |
Ability to program (structured and object-oriented) with one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript | Previous success in technical engineering |
Experience with distributed storage technologies like NFS, HDFS, Ceph, and S3, as well as dynamic resource management frameworks (Mesos, Kubernetes, YARN) | Coding experience beyond simple scripts |
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks |