Mid-Level Site Reliability Engineer needed at Potentiam Limited

Save

Job title : Mid-Level Site Reliability Engineer

Job Location : Western Cape, Cape Town

Deadline : April 13, 2025

Quick Recommended Links

Purpose of role 

  • Site Reliability Engineers work tightly with Tech Support teams and product/platform engineering teams and are responsible for maximising the uptime of their platforms and clients, maintaining and enhancing their observability, responding to incidents raised and documenting/investigating/fixing the underlying root causes of these incidents. They also may work on documentation for the areas in which they specialise, in order to help upstream teams when future issues arise.

Duties and responsibilities 

  • Work closely with the Platform and Product engineering teams to ensure that the platform, infrastructure and services are designed and optimised for availability, latency and performance 
  • Own and configure observability tooling 
  • Create and tune alerts to ensure we have adequate warning of impending failures, and check alerts as they are raised 
  • Investigate and resolve support issues escalated from the Tech Support team 
  • Lead incident response, resolution, root cause investigation, retrospective writing up and follow-up actions so we can take every opportunity to learn, improve and make our services more resilient 
  • Identify patterns in incoming incidents and document these for further investigation
  • Collaborate with other SREs and Tech Support to improve processes and share knowledge/best practice

Skills/Experience 

  • Mid-level experience responsible for delivery and automation in a SRE, Platform or DevOps team 
  • Knowledgeable and comfortable with agile development practices & legacy platforms 
  • Comes from an engineering background, and is familiar with modern programming languages, ideally Python but others will be accepted 
  • Experienced (mid-level) at scripting for automation 
  • Cloud Certifications, or demonstratable knowledge
  • Is experienced in investigating and resolving technical issues, spanning performance, functionality and system interactions 
  • Is confident in proposing solutions to technical issues, and is able to communicate the pros and cons of said solutions 
  • Is capable of documenting causes of underlying issues, creating runbooks for others to follow
  • Mid-level experience (competent general usage) with any public cloud providers i.e. GCP, AWS, Azure (ideally GCP) 
  • Mid-level experience (competent general usage) of observability, both in terms of best practices and tooling implementation/use (Datadog preferable, others will be accepted) 
  • Mid-level proficiency in using Infrastructure as Code, such as Terraform or alternatives
  • Database experience and ability to understand/write SQL (mySQL/MariaDB preferable)
  • Understanding of Linux Operating Systems (Debian preferable) 
  • Has understanding of the DevSecOps culture and experience in delivering technical outcomes within this culture 
  • Possesses strong communication and stakeholder management skills, with an ability to communicate complex technical topics to non-technical stakeholders 
  • Is comfortable with providing limited on-call cover at evenings and weekends

How to Apply for this Offer

Interested and Qualified candidates should Click here to Apply Now

  • ICT jobs