You might hire a site reliability engineer to:
- Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of the company’s suite of software solutions
- Measure and optimize system performance, with a focus on pushing capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large distributed software applications
Required Skills and qualifications | Preferred skills and qualifications |
Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript | Previous success in technical engineering |
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn) | Coding experience beyond simple scripts |
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks |