A reputable bank is looking to expand its team, targetted at an enterprise wide SRE implementation. As a Site Reliability Engineer, you will be responsible for defining, implementing and running the platforms and analytics for various business units. This will include choosing the right technologies for the use cases, deploying and operating them.
- Usage of container technologies predominantly in Kubernetes
- Work with application teams for Observability, automating monitoring and auto-remediation of known issues.
- Build systems and tools that enable engineering teams to observe their applications in production with autonomy
- Support services and pipelines before during and post launch and continually strive to automate and improve the monitoring, latency, availability, and overall system health
What do you need to be successful in this role?
- Experience as an SRE supporting production systems.
- Working experience developing software solution on any programming language, preferably Java, Golang, Python.
- Working experience with common observability tools and technologies e.g. Prometheus, Grafana, Elastic, Splunk, AppDynamics, Dynatrace
- Experience on containerization tools such as Docker, Kubernetes, Helm
- Experience with microservice based architectures
- Experience of any public cloud technologies is a plus (AWS/Azure/GCP)
- Experience of Agile/Scrum development methodologies is a plus
- Basic understanding working with source versioning tools, such as Git