4 hours ago
Job title : Senior Incident Response Manager
Job Location : Gauteng,
Deadline : December 01, 2024
Quick Recommended Links
Job Description
JOB PURPOSE
- The Senior Incident Response Manager is responsible for overseeing the coordination and execution of the organization’s incident response processes, ensuring the timely and efficient resolution of major incidents. This role involves developing strategies for handling incidents, coordinating with key stakeholders, and maintaining operational stability. The individual will be expected to lead high-impact situations, improve incident management processes, and work with cross-functional teams to resolve complex technical issues.
RESPONSIBILITIES
- Incident Management & Responsibilities
- Lead the incident response team in identifying, diagnosing, and resolving high-severity incidents in a timely manner
- Manage the entire lifecycle of incidents from detection to post-incident review, ensuring appropriate measures are taken to prevent future occurrences
- Coordinate with internal departments, third-party vendors, and external partners during major incidents
- Escalate unresolved issues to senior management and drive the resolution process
- Ensure incidents are identified promptly through proactive monitoring or incident reports
- Assign priority levels to incidents based on business impact, urgency, and severity
- Lead the incident resolution process from detection through closure, maintaining oversight and ensuring swift action
- Implement corrective actions to prevent recurrence and minimize downtime
- Deliver timely resolution of incidents within established SLAs (Service Level Agreements)
- Ensure all incidents are properly logged, tracked, and documented in the ticketing system
- Ensure communication of status updates to stakeholders throughout the incident lifecycle
- Incident Reporting & Documentation
- Ensure accurate and comprehensive documentation of all incidents, including root cause analysis, impact analysis, and post-incident reviews
- Develop detailed incident reports for management, outlining incident resolution timelines, mitigation actions, and lessons learned
- Develop incident reports that capture the full lifecycle of each incident, from detection to resolution
- Perform in-depth root cause analysis for major incidents, identifying both technical and process-related factors
- Present incident reports to senior leadership, including potential risks and suggestions for process improvements
- Establish metrics (MTTR, number of incidents, etc.) for tracking the performance of the incident response process
- Ensure timely and accurate reporting on incidents for internal and external stakeholders
- Deliver detailed reports that can be used to improve system resilience and incident management
- Maintain transparency with senior leadership on incident impacts and outcomes
- Response Coordination
- Serve as the central point of contact for all incident response activities, coordinating between various teams (e.g., IT, DevOps, Security, Vendors)
- Lead regular incident response meetings, war rooms, or conference calls to facilitate real-time problem-solving
- Ensure that the right stakeholders are involved based on the nature of the incident (e.g., business continuity, legal, communications)
- Escalate incidents to senior management when critical thresholds are met or breached
- Ensure seamless coordination across teams, minimizing delays in incident resolution
- Ensure accurate escalation paths are followed based on the severity of the incident
- Keep stakeholders informed with real-time updates, ensuring they understand the potential impact and mitigation efforts
- Emergency Response
- Serve as the primary leader during high-severity incidents or business-impacting crises
- Mobilize and direct resources rapidly to respond to incidents, minimizing downtime and impact on operations
- Ensure contingency plans are activated and business continuity protocols are followed in extreme cases
- Communicate incident status and resolution plans with clarity, particularly when operations are significantly impacted
- Ensure that response to critical incidents is swift, minimizing impact on customers and operations
- Ensure that emergency protocols are followed accurately, minimizing operational risks
- Maintain a high state of readiness within the incident response team for major incidents or crises
- Continuous Improvement
- Analyse past incidents to identify trends, root causes, and recurring issues
- Lead post-incident reviews (PIRs) and after-action meetings to gather insights and lessons learned
- Propose and implement improvements to incident response processes, tools, and methodologies based on feedback from PIRs
- Stay updated on new incident management frameworks, tools, and practices and introduce relevant innovations into the team’s workflows
- Ensure a measurable reduction in the recurrence of incidents over time
- Deliver clear, actionable recommendations from post-incident reviews to prevent similar incidents
- Keep the incident response process current with industry best practices and technological advancements
- Cross-functional Collaboration
- Work closely with IT operations, network teams, security teams, and software development to ensure cohesive incident response
- Collaborate with external vendors and service providers as needed for incident resolution
- Maintain close relationships with business leaders to align incident response priorities with overall business objectives
- Facilitate communication between technical and non-technical stakeholders, ensuring clear understanding of incidents
- Ensure alignment of incident response activities with business priorities, ensuring minimal disruption to core operations
- Maintain effective communication and collaboration between internal teams and external partners
- Ensure stakeholder expectations are managed, and clear, concise updates are provided
- Policy & Procedure Development
- Develop and maintain incident management policies, procedures, and runbooks based on industry best practices (e.g., ITIL, NIST)
- Ensure that all documentation is up to date, covering standard operating procedures for various incident types
- Work with compliance and security teams to ensure incident management practices align with regulatory and legal requirements
- Regularly review and update policies to incorporate new risks, tools, and business requirements
- Ensure that incident response documentation is clear, actionable, and followed during incidents
- Maintain compliance with regulatory requirements related to incident management
- Ensure that all team members are familiar with and adhere to incident management policies
- Process Improvement & Strategy Development
- Continuously review and improve the incident management processes, frameworks, and protocols to enhance operational efficiency
- Develop and maintain incident response plans, ensuring the organization is prepared to address critical incidents quickly and effectively
- Collaborate with service delivery and IT operations teams to ensure alignment between incident management and overall business objectives
- Team Leadership & Mentorship
- Lead and mentor a team of Incident Response Specialists, ensuring professional development and technical proficiency
- Provide guidance and training to team members, promoting best practices in incident management and technical troubleshooting
- Ensure team performance aligns with the business objectives and targets
- Maintain high levels of team morale and engagement, fostering a collaborative and accountable work environment
- Identify skills gaps within the team and facilitate upskilling initiatives
- Stakeholder Engagement
- Serve as the primary point of contact for incident escalation and resolution, communicating with internal and external stakeholders to ensure they are informed throughout the incident lifecycle
- Maintain strong working relationships with cross-functional teams including Service Delivery, IT Infrastructure, Application Support, and third-party vendors
- Technology & Tools Management
- Ensure the appropriate tools, systems, and resources are in place for effective incident detection, tracking, and resolution
- Stay updated on emerging technologies and tools relevant to incident management and recommend improvements to the organization’s incident response capabilities
Job Requirements
BEHAVIOURAL COMPETENCIES
- Tech Savvy
- Customer-focused
- Evaluating problems
- Investigate issues
- Information seeking
- Processing details and information
- Communicating information
- Showing resilience
- Adjusting to change
- Learning ability
- Teamwork
- Business knowledge and approach
- Instils Trust
- Plans and Aligns
EDUCATION
- Matric
- Bachelor’s degree in Computer Science, Information Technology, or a related field. Advanced certifications in Incident Management or IT Service Management (e.g., ITIL, CISSP) are a plus.
- Strong Microsoft Office productivity tools knowledge
EXPERIENCE
- Minimum of 10 years of experience in a similar role
- Experience in understanding the Technology systems and processes
- Experience in creating Incident processes within SLAs
- Extensive experience in Service management function
- Strong stakeholder management experience
How to Apply for this Offer
Interested and Qualified candidates should Click here to Apply Now
- ICT jobs