Site Reliability Engineer – Pretoria/Centurion needed at Datafin Recruitment

Job title : Site Reliability Engineer – Pretoria/Centurion

Job Location : Gauteng, Centurion

Deadline : May 15, 2024

Quick Recommended Links

DUTIES:

Kubernetes CI/CD:

  • Designing, implementing, and maintaining CI/CD pipelines for Kubernetes-based applications.
  • Automating deployment processes and ensuring continuous integration and delivery of software.

Monitoring and Reporting:

  • Implementing monitoring solutions for infrastructure and applications using tools such as Prometheus, Grafana, and Kubernetes-native monitoring.
  • Generating reports on system performance, availability, and reliability.

Log Analysis:

  • Analysing logs and metrics to identify trends, anomalies, and performance issues.
  • Implementing log aggregation and analysis solutions like ELK Stack or Splunk.

Application Troubleshooting:

  • Investigating and resolving issues related to application performance, availability, and reliability in Kubernetes environments.
  • Collaborating with development teams to diagnose and debug complex issues.

Alerting and Escalation:

  • Setting up alerting mechanisms to proactively detect and respond to incidents.
  • Escalating critical issues to appropriate teams and stakeholders.

Linux Administration and Maintenance:

  • Managing and maintaining Linux servers, including installation, configuration, and patch management.
  • Implementing security measures and best practices for Linux-based systems.

Active Directory Admin and Maintenance:

  • Managing user accounts, groups, and permissions in Active Directory.
  • Performing routine maintenance tasks and ensuring the security of AD infrastructure.

DNS Admin and Maintenance:

  • Configuring and managing DNS servers and zones.
  • Troubleshooting DNS-related issues and ensuring DNS resolution reliability.

End-User Support:

  • Providing technical support and assistance to end-users for infrastructure-related issues.
  • Resolving hardware, software, and connectivity problems promptly.

Database Administration (PostgreSQL):

  • Managing PostgreSQL databases, including installation, configuration, and performance tuning.
  • Performing routine maintenance tasks such as backups, restores, and upgrades.

 
REQUIREMENTS:

  • 3+ years of experience in a Site Reliability Engineer role or similar position.
  • Proficiency in Kubernetes administration and experience with CI/CD pipelines.
  • Strong Linux administration skills, including shell scripting and troubleshooting.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Splunk.
  • Familiarity with Active Directory administration and DNS management.
  • Experience with PostgreSQL database administration is a plus.

ATTRIBUTES:

  • Excellent communication and problem-solving skills.
  • Ability to work effectively in a fast-paced, collaborative environment.

How to Apply for this Offer

Interested and Qualified candidates should Click here to Apply Now

Save