Search
Search
Location
Logotipo de RSAWeb

RSAWeb

3,0

SRE Tech Lead

Cape Town

Job Information

    Date Opened

    02/12/2025

    Job Type

    Full time

    Industry

    Systems Engineering

    Work Experience

    8 years

    Education Level

    Degree/B-Tech

    City

    Cape Town

    Province

    Western Cape

    Country

    South Africa

    Postal Code

    7405

Job Description

Established in 2001, RSAWEB is South Africa’s fastest growing internet service provider (ISP) with a focus on providing connectivity to home customers, and a wide array of technology solutions to businesses. We are obsessed about ensuring all our customers receive the best possible digital experience and exceptional customer service. Thousands of customers have given RSAWEB a 5-star rating, with an average rating of 4.7 out of 5 on Google – the best-rated ISP in South Africa. We are extremely proud of winning KFM’s Best of the Cape Awards: Best ISP in 2021 and 2022 being one of the fastest streaming ISPs on Netflix and a consistently top-rated ISP on MyBroadband. These accolades are not for nothing, as we constantly strive to improve our products, services, and solutions to enhance each customer’s experience. Having invested heavily in infrastructure, RSAWEB has built a strong presence in South Africa with Data Centres in Johannesburg and Cape Town.

Our Products and Services:
  • Fibre-to-the-Home (FTTH)
  • Fibre-to-the-Business (FTTB)
  • Enterprise connectivity
  • Mobile connectivity and data management
  • Cloud infrastructure and more!

At RSAWEB, we are passionate about using our creativity, to provide innovative solutions and services, that allow our customers to succeed in all areas of life. We believe that we are in the business of connecting customers and businesses with each other and a world of infinite possibility and opportunity, through technology. Our mission transcends our values through every customer, every interaction, every connection, every day.
Our values:
  • We Build Trust and Ownership
  • We Honour & Respect People
  • We Cultivate Passion & Creativity
  • We Innovate Feverishly
  • We Go the Extra Mile
  • We Believe in Humility
  • We Communicate Openly & Honestly
  • We Make it Fun
  • We Teach, Grow & Learn
  • We Do More, With Less


The Site Reliability Engineer (SRE) will take end-to-end ownership of the ISP’s internal core infrastructure that powers our ISP, cloud, and hosting platforms. This role operates with high autonomy and requires someone who can design, build, maintain, and scale internal tools and services with minimal day-to-day oversight.


You will develop production-grade tooling (primarily in Go and Python), integrate internal systems with third-party APIs, expose APIs for internal consumption, and ensure the platforms we build are secure, observable, reliable, and high-performing. This is a builder role: you will own the systems you create, drive improvements across reliability and automation, and enable our network, cloud, and engineering teams to operate efficiently without compromising uptime, scalability, or security.


Key Objectives:


Infrastructure Ownership & Reliability


  • Own and operate core internal infrastructure supporting ISP, cloud, and hosting platforms.
  • Maintain, improve, and extend existing internal systems with a strong reliability and observability focus.
  • Ensure services are scalable, secure, highly available, and well-instrumented.


Tooling & Service Development


  • Design, build, and run internal tools and services, primarily in Go and Python.
  • Integrate with third-party APIs using REST/JSON and webhooks.
  • Develop and expose HTTP/JSON APIs for internal systems.
  • Select, implement, and operate open-source and self-hosted solutions where practical.
Automation & Standardisation
  • Lead automation initiatives to reduce manual effort (toil) and streamline infrastructure operations.
  • Drive standardisation of tools, documentation, and operational processes.
Operational Excellence
  • Ensure strong monitoring, alerting, and observability across systems.
  • Participate in incident response, root-cause analysis, and performance troubleshooting.
  • Maintain documentation such as runbooks, architecture notes, and troubleshooting guides.


Collaboration & Support


  • Work closely with cloud, network, and development teams to provide stable internal platforms.
  • Support the ongoing operation of services you build, including participation in an on-call rotation for critical infrastructure.


Requirements

  • Strong ownership mindset with a focus on identifying and solving root-cause issues.
  • Solid Linux system administration experience (Debian/Ubuntu preferred).
  • Experience in SRE, Infrastructure, Operations, or Platform Engineering roles.
  • Production experience building and maintaining tooling/services in Go and/or Python.
  • Ability to integrate with third-party APIs and expose internal APIs.
  • Comfortable using Git-based workflows, pull requests, and code reviews.
  • Understanding of distributed systems and high-availability design principles.
  • Intermediate knowledge of protocols such as DNS, NTP, HTTP, TLS, and TCP/IP.
Nice-to-Have Skills
(Not required on day one — but willingness to learn is essential)

Network & ISP Ecosystem
  • Understanding of routing protocols: BGP, OSPF, route reflectors, IP anycast, MPLS.
  • Experience in ISP, telecoms, or high-uptime environments.


Security, Encryption & Secrets Management


  • Experience with GPG, asymmetric encryption, or SOPS.


Observability & Monitoring


Experience with:
  • Prometheus, Alertmanager
  • Loki, Grafana
  • Zabbix
  • Custom exporters and instrumentation


Linux Internals & Performance


  • Deep understanding of Linux processes, networking stack, filesystems.
  • Performance debugging across application, OS, and network layers.


Configuration Management & IaC


  • Experience with SaltStack (ideal), Ansible, Terraform, Helm, GitOps workflows.


Edge & Web Services


  • Experience with Nginx, Traefik, pm2, or similar reverse proxies and load balancers.


DNS


  • Experience with PowerDNS (Authoritative/Recursor), dnsdist, CoreDNS.
Tools You Will Work With


You don't need to know all of these on day one, but you should be comfortable learning
them:


OS & Platform

  • Debian, Ubuntu, systemd
  • Docker, k3s/Kubernetes, Helm


Networking & DNS


  • FRRouting (BGP/OSPF, route reflectors), anycast DNS
  • RADIUS
  • PowerDNS, dnsdist


Web / Edge


  • Nginx, Traefik, pm2
  • PHP, Django, Node


Configuration & IaC


  • SaltStack
  • GitHub, GitHub Actions
  • SOPS, GPG


Observability


  • Prometheus, Alertmanager
  • Loki, Grafana
  • Zabbix


Data Stores


  • PostgreSQL, MySQL/MariaDB
  • Redis, Consul


Containers & Orchestration


  • Docker, k3s/Kubernetes, Helm.


Release Engineering


  • Experience building CI/CD pipelines (GitHub Actions preferred).


Documentation


  • Ability to write internal documentation, RFCs, or technical notes.


Benefits

  • Medical Aid (Discovery)
  • Reduced Gap Cover Rates (Turnberry Premier)
  • Retirement Annuity Contribution (Allan Gray)
  • Medical Insurance (Momentum - Health4Me)
  • Discounted Internet Connectivity
  • Free Employee Wellness Programme (Lyra Wellbeing, formerly ICAS)
  • Exposure to latest industry technologies and standards
  • Lastly, a work environment that rivals the very best!
If you have not heard from us within 2 weeks of submitting your application, please consider your application as unsuccessful.

Esta empresa ya no tendrá secretos para ti

Inicia sesión para acceder a opiniones reales, valoraciones anónimas e información sobre sueldos antes de enviar tu solicitud.

3,0
  • 54 %
    Recomendaría a un amigo
  • N/A
    Aprobación del CEO
  • CEO: 0 valoraciones