Site Reliability Engineer – Pretoria/Centurion needed at Datafin Recruitment
Job title : Site Reliability Engineer – Pretoria/Centurion
Job Location : Gauteng, Centurion
Deadline : May 15, 2024
Quick Recommended Links
DUTIES:
Kubernetes CI/CD:
- Designing, implementing, and maintaining CI/CD pipelines for Kubernetes-based applications.
- Automating deployment processes and ensuring continuous integration and delivery of software.
Monitoring and Reporting:
- Implementing monitoring solutions for infrastructure and applications using tools such as Prometheus, Grafana, and Kubernetes-native monitoring.
- Generating reports on system performance, availability, and reliability.
Log Analysis:
- Analysing logs and metrics to identify trends, anomalies, and performance issues.
- Implementing log aggregation and analysis solutions like ELK Stack or Splunk.
Application Troubleshooting:
- Investigating and resolving issues related to application performance, availability, and reliability in Kubernetes environments.
- Collaborating with development teams to diagnose and debug complex issues.
Alerting and Escalation:
- Setting up alerting mechanisms to proactively detect and respond to incidents.
- Escalating critical issues to appropriate teams and stakeholders.
Linux Administration and Maintenance:
- Managing and maintaining Linux servers, including installation, configuration, and patch management.
- Implementing security measures and best practices for Linux-based systems.
Active Directory Admin and Maintenance:
- Managing user accounts, groups, and permissions in Active Directory.
- Performing routine maintenance tasks and ensuring the security of AD infrastructure.
DNS Admin and Maintenance:
- Configuring and managing DNS servers and zones.
- Troubleshooting DNS-related issues and ensuring DNS resolution reliability.
End-User Support:
- Providing technical support and assistance to end-users for infrastructure-related issues.
- Resolving hardware, software, and connectivity problems promptly.
Database Administration (PostgreSQL):
- Managing PostgreSQL databases, including installation, configuration, and performance tuning.
- Performing routine maintenance tasks such as backups, restores, and upgrades.
REQUIREMENTS:
- 3+ years of experience in a Site Reliability Engineer role or similar position.
- Proficiency in Kubernetes administration and experience with CI/CD pipelines.
- Strong Linux administration skills, including shell scripting and troubleshooting.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Splunk.
- Familiarity with Active Directory administration and DNS management.
- Experience with PostgreSQL database administration is a plus.
ATTRIBUTES:
- Excellent communication and problem-solving skills.
- Ability to work effectively in a fast-paced, collaborative environment.
How to Apply for this Offer
Interested and Qualified candidates should Click here to Apply Now