Company:
Datafin Recruitment
Industry: ICT / Telecommunication
Deadline: Not specified
Job Type: Contract
Qualification: Bachelors, National Certificate
Experience: 3 – 5 years
Province: Western Cape
City: Cape Town
Field: ICT / Computer
ENVIRONMENT:
- Join a multidisciplinary team on the the world’s largest radio telescope project, as a Computer Systems Engineer. This role involves developing, integrating, and maintaining computer hardware and systems to support the telescope’s technical and operational goals.
- Responsibilities include deploying, monitoring, upgrading, diagnosing, and restoring systems, applying systems engineering practices, supporting infrastructure planning, and ensuring alignment with SRE requirements.
- The engineer collaborates across teams to build secure, reliable, and scalable systems that enable both project development and sustainable operations.
RESPONSIBILITY:
- Implement and maintain computing systems and infrastructure
- Contribute to infrastructure planning and system integration efforts
- Assist in performance tuning and reliability improvements
- Apply basic automation and scripting to improve operations
- Support containerized environments and cloud infrastructure
- Collaborate with cross-functional teams and contribute to documentation and knowledge sharing
REQUIREMENTS:
Minimum Education Required (NQF Level):
- NQF Level 6 qualification in IT, Computer Science, Software Engineering, Information Systems, Electronic Engineering, or a closely related technical discipline.
- N.Dip. with at least 5 years’ experience OR
- B.Tech/BSc (Comp Sci) with at least 3 years’ experience
Extreme Importance (Essential):
- Demonstrated ability to contribute effectively to cross-functional engineering projects and follow through on implementation plans under direction
- Hardware maintenance and support: basic skills such as changing hardware components (hard drives, memory modules, CPU, motherboard)
- Firmware and drivers diagnostics, configuration, and updates
- Health and safety, self-care within data centres, assembly workshops, and computer labs
- Tools and equipment use and management: regular cleaning, proper storage, routine maintenance, inspection, safe handling, inventory management, and asset tracking
- IT spares inventory and tracking: inventory categorization, asset tagging and labeling, maintenance of inventory system, stock management, access control, lifecycle and warranty tracking, disposal and waste management
- Computer infrastructure asset management: tracking, maintaining, and optimizing relevant hardware and software assets across their lifecycle to ensure availability, compliance, and cost-effectiveness
- IT audit and documentation: rack positions, network and server diagrams, topology maps, service and support logs
- Hands-on experience in Linux systems administration, basic automation, and performance tuning, with a willingness to deepen expertise
- Proficiency in Linux command-line usage, service configuration, and troubleshooting; learning kernel and system-level tuning practices
- Ability to manage assigned tasks within an Agile environment and collaborate effectively with teammates on sprint goals
- Effective troubleshooting skills, with a learning mindset toward root-cause analysis and improving operational resilience
High Importance (Desirable):
- Familiarity with distributed systems concepts and practical experience deploying and supporting services in scalable environments
- Working knowledge of containerization tools (Docker) and exposure to container orchestration platforms (e.g., Kubernetes) in test or staging environments
- Experience using CI/CD tools to support automated builds, tests, and deployments; able to troubleshoot basic automation pipelines
- Familiarity with DevOps workflows (e.g., IaC, basic config management), and initial exposure to observability and system reliability practices
- Knowledge and awareness of scalable storage platforms, such as Ceph, S3-compatible systems, or NFS, including deployment, tuning, and lifecycle management
- Exposure to high performance computing (HPC) environments, including schedulers (e.g., SLURM), shared filesystems, and workload optimization, with openness to ramp up
- Lifecycle and service integration capabilities: planning upgrades, dependency management, and operational runbook development
- Familiarity with Agile methodologies, such as Scrum, Kanban, or SAFe, enabling efficient collaboration across product and infrastructure teams
- Continuous improvement mindset, with a track record of learning, researching, and adopting emerging technologies in storage, compute, and observability domains
- Strong communication and collaboration skills, with the ability to interface across infrastructure, development, and stakeholder groups, translating complex systems into clear priorities
Minimum Work Experience Required:
- Experience working with server installations, monitoring, and diagnostics
- Experience with hardware upgrades and repairs
- Experience working in data centres or server rooms/environments
- Basic experience with computer networks
- Experience working with Operating Systems, IAAS tools
- Basic experience working with SANs and storage systems
- Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g., Kubernetes), DevOps/SRE practices, and cloud-native technologies
- Experience working in international teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains
Job Knowledge Required:
- Strong understanding of systems engineering principles, including performance optimization, fault tolerance, and resource scheduling within Linux-based environments
- Hands-on experience monitoring, diagnosing, and repairing various OEM hardware (HPE, Dell, Super Micro)
- Proficient in remote-first infrastructure management and monitoring
- Familiarity with containerized environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and container runtime architectures (e.g., CRI)
- Knowledge in infrastructure-as-code and CI/CD methodologies using tools such as GitLab CI, Ansible, and Terraform
- Working knowledge of networking fundamentals, including cabling and basic diagnostic procedures
- Experience in asset management practices: maintaining asset registers, system and architectural mapping, warranty and service tracking
- Proven experience working with service levels (SLAs) and understanding operational frameworks such as SRE, ITIL, and COBIT
- Sound knowledge of IT security principles, including change management, physical and logical access control
- Skilled in managing component and spare inventories, and tools/workspaces for system assembly
- Awareness and adherence to Health and Safety standards and best practices
ATTRIBUTES:
- Problem Solving and Analysis: Root cause analysis, systems troubleshooting, performance bottleneck resolution
- Communication and Collaboration: Clear articulation of technical recommendations, cross-functional stakeholder engagement, feedback integration
- Planning and Delivery: Participation in Agile and Systems Engineering processes and methodologies
- Continuous Learning: Staying current with evolving technologies in containerization, cloud-native systems, observability, systems automation, and computing infrastructure (hardware, storage, memory, motherboards, processors, I/O, GPU, HBA, NICs)
- Documentation and Knowledge Sharing: Ability to produce high-quality technical documentation and share knowledge across engineering teams