Site Reliability Engineer / Lead / Manager

A reputable bank is looking to expand its team, targetted at an enterprise wide SRE implementation. As a Site Reliability Engineer, you will be responsible for defining, implementing and running the platforms and analytics for various business units. This will include choosing the right technologies for the use cases, deploying and operating them.

Responsibilities

Usage of container technologies predominantly in Kubernetes
Work with application teams for Observability, automating monitoring and auto-remediation of known issues.
Build systems and tools that enable engineering teams to observe their applications in production with autonomy
Support services and pipelines before during and post launch and continually strive to automate and improve the monitoring, latency, availability, and overall system health

What do you need to be successful in this role?

Experience as an SRE supporting production systems.
Working experience developing software solution on any programming language, preferably Java, Golang, Python.
Working experience with common observability tools and technologies e.g. Prometheus, Grafana, Elastic, Splunk, AppDynamics, Dynatrace
Experience on containerization tools such as Docker, Kubernetes, Helm
Experience with microservice based architectures
Experience of any public cloud technologies is a plus (AWS/Azure/GCP)
Experience of Agile/Scrum development methodologies is a plus
Basic understanding working with source versioning tools, such as Git

Site Reliability Engineer / Lead / Manager

Job Description

Explore

Knowledge

Behind the Desk Podcast