Company:

Datafin Recruitment

Datafin Recruitment

Industry: ICT / Telecommunication

Deadline: Not specified

Job Type: Contract

Qualification: Bachelors, National Certificate

Experience: 3 – 5 years

Province: Western Cape

City: Cape Town

Field: ICT / Computer

ENVIRONMENT:

Join a multidisciplinary team on the the world’s largest radio telescope project, as a Computer Systems Engineer. This role involves developing, integrating, and maintaining computer hardware and systems to support the telescope’s technical and operational goals.
Responsibilities include deploying, monitoring, upgrading, diagnosing, and restoring systems, applying systems engineering practices, supporting infrastructure planning, and ensuring alignment with SRE requirements.
The engineer collaborates across teams to build secure, reliable, and scalable systems that enable both project development and sustainable operations.

RESPONSIBILITY:

Implement and maintain computing systems and infrastructure
Contribute to infrastructure planning and system integration efforts
Assist in performance tuning and reliability improvements
Apply basic automation and scripting to improve operations
Support containerized environments and cloud infrastructure
Collaborate with cross-functional teams and contribute to documentation and knowledge sharing

REQUIREMENTS:

Minimum Education Required (NQF Level):

NQF Level 6 qualification in IT, Computer Science, Software Engineering, Information Systems, Electronic Engineering, or a closely related technical discipline.
N.Dip. with at least 5 years’ experience OR
B.Tech/BSc (Comp Sci) with at least 3 years’ experience

Extreme Importance (Essential):

Demonstrated ability to contribute effectively to cross-functional engineering projects and follow through on implementation plans under direction
Hardware maintenance and support: basic skills such as changing hardware components (hard drives, memory modules, CPU, motherboard)
Firmware and drivers diagnostics, configuration, and updates
Health and safety, self-care within data centres, assembly workshops, and computer labs
Tools and equipment use and management: regular cleaning, proper storage, routine maintenance, inspection, safe handling, inventory management, and asset tracking
IT spares inventory and tracking: inventory categorization, asset tagging and labeling, maintenance of inventory system, stock management, access control, lifecycle and warranty tracking, disposal and waste management
Computer infrastructure asset management: tracking, maintaining, and optimizing relevant hardware and software assets across their lifecycle to ensure availability, compliance, and cost-effectiveness
IT audit and documentation: rack positions, network and server diagrams, topology maps, service and support logs
Hands-on experience in Linux systems administration, basic automation, and performance tuning, with a willingness to deepen expertise
Proficiency in Linux command-line usage, service configuration, and troubleshooting; learning kernel and system-level tuning practices
Ability to manage assigned tasks within an Agile environment and collaborate effectively with teammates on sprint goals
Effective troubleshooting skills, with a learning mindset toward root-cause analysis and improving operational resilience

High Importance (Desirable):

Familiarity with distributed systems concepts and practical experience deploying and supporting services in scalable environments
Working knowledge of containerization tools (Docker) and exposure to container orchestration platforms (e.g., Kubernetes) in test or staging environments
Experience using CI/CD tools to support automated builds, tests, and deployments; able to troubleshoot basic automation pipelines
Familiarity with DevOps workflows (e.g., IaC, basic config management), and initial exposure to observability and system reliability practices
Knowledge and awareness of scalable storage platforms, such as Ceph, S3-compatible systems, or NFS, including deployment, tuning, and lifecycle management
Exposure to high performance computing (HPC) environments, including schedulers (e.g., SLURM), shared filesystems, and workload optimization, with openness to ramp up
Lifecycle and service integration capabilities: planning upgrades, dependency management, and operational runbook development
Familiarity with Agile methodologies, such as Scrum, Kanban, or SAFe, enabling efficient collaboration across product and infrastructure teams
Continuous improvement mindset, with a track record of learning, researching, and adopting emerging technologies in storage, compute, and observability domains
Strong communication and collaboration skills, with the ability to interface across infrastructure, development, and stakeholder groups, translating complex systems into clear priorities

Minimum Work Experience Required:

Experience working with server installations, monitoring, and diagnostics
Experience with hardware upgrades and repairs
Experience working in data centres or server rooms/environments
Basic experience with computer networks
Experience working with Operating Systems, IAAS tools
Basic experience working with SANs and storage systems
Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g., Kubernetes), DevOps/SRE practices, and cloud-native technologies
Experience working in international teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains

Job Knowledge Required:

Strong understanding of systems engineering principles, including performance optimization, fault tolerance, and resource scheduling within Linux-based environments
Hands-on experience monitoring, diagnosing, and repairing various OEM hardware (HPE, Dell, Super Micro)
Proficient in remote-first infrastructure management and monitoring
Familiarity with containerized environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and container runtime architectures (e.g., CRI)
Knowledge in infrastructure-as-code and CI/CD methodologies using tools such as GitLab CI, Ansible, and Terraform
Working knowledge of networking fundamentals, including cabling and basic diagnostic procedures
Experience in asset management practices: maintaining asset registers, system and architectural mapping, warranty and service tracking
Proven experience working with service levels (SLAs) and understanding operational frameworks such as SRE, ITIL, and COBIT
Sound knowledge of IT security principles, including change management, physical and logical access control
Skilled in managing component and spare inventories, and tools/workspaces for system assembly
Awareness and adherence to Health and Safety standards and best practices

ATTRIBUTES:

Problem Solving and Analysis: Root cause analysis, systems troubleshooting, performance bottleneck resolution
Communication and Collaboration: Clear articulation of technical recommendations, cross-functional stakeholder engagement, feedback integration
Planning and Delivery: Participation in Agile and Systems Engineering processes and methodologies
Continuous Learning: Staying current with evolving technologies in containerization, cloud-native systems, observability, systems automation, and computing infrastructure (hardware, storage, memory, motherboards, processors, I/O, GPU, HBA, NICs)
Documentation and Knowledge Sharing: Ability to produce high-quality technical documentation and share knowledge across engineering teams

Click Here To Apply

Computer System Engineer (CPT) (Contract) at Datafin Recruitment

Account Manager (Remote) at Datafin Recruitment

Senior Data Researcher (JHB – Hybrid) at Datafin Recruitment