Computer System Engineer (CPT) (Contract) at Datafin Recruitment

Company:

Datafin Recruitment

Datafin Recruitment

Industry: ICT / Telecommunication

Deadline: Not specified

Job Type: Contract

Qualification: Bachelors, National Certificate

Experience: 3 – 5 years

Province: Western Cape

City: Cape Town

Field: ICT / Computer

ENVIRONMENT:

  • Join a multidisciplinary team on the  the world’s largest radio telescope project, as a Computer Systems Engineer. This role involves developing, integrating, and maintaining computer hardware and systems to support the telescope’s technical and operational goals.
  • Responsibilities include deploying, monitoring, upgrading, diagnosing, and restoring systems, applying systems engineering practices, supporting infrastructure planning, and ensuring alignment with SRE requirements.
  • The engineer collaborates across teams to build secure, reliable, and scalable systems that enable both project development and sustainable operations.

RESPONSIBILITY:

  • Implement and maintain computing systems and infrastructure
  • Contribute to infrastructure planning and system integration efforts
  • Assist in performance tuning and reliability improvements
  • Apply basic automation and scripting to improve operations
  • Support containerized environments and cloud infrastructure
  • Collaborate with cross-functional teams and contribute to documentation and knowledge sharing

REQUIREMENTS:

Minimum Education Required (NQF Level):

  • NQF Level 6 qualification in IT, Computer Science, Software Engineering, Information Systems, Electronic Engineering, or a closely related technical discipline.
  • N.Dip. with at least 5 years’ experience OR
  • B.Tech/BSc (Comp Sci) with at least 3 years’ experience

Extreme Importance (Essential):

  • Demonstrated ability to contribute effectively to cross-functional engineering projects and follow through on implementation plans under direction
  • Hardware maintenance and support: basic skills such as changing hardware components (hard drives, memory modules, CPU, motherboard)
  • Firmware and drivers diagnostics, configuration, and updates
  • Health and safety, self-care within data centres, assembly workshops, and computer labs
  • Tools and equipment use and management: regular cleaning, proper storage, routine maintenance, inspection, safe handling, inventory management, and asset tracking
  • IT spares inventory and tracking: inventory categorization, asset tagging and labeling, maintenance of inventory system, stock management, access control, lifecycle and warranty tracking, disposal and waste management
  • Computer infrastructure asset management: tracking, maintaining, and optimizing relevant hardware and software assets across their lifecycle to ensure availability, compliance, and cost-effectiveness
  • IT audit and documentation: rack positions, network and server diagrams, topology maps, service and support logs
  • Hands-on experience in Linux systems administration, basic automation, and performance tuning, with a willingness to deepen expertise
  • Proficiency in Linux command-line usage, service configuration, and troubleshooting; learning kernel and system-level tuning practices
  • Ability to manage assigned tasks within an Agile environment and collaborate effectively with teammates on sprint goals
  • Effective troubleshooting skills, with a learning mindset toward root-cause analysis and improving operational resilience

High Importance (Desirable):

  • Familiarity with distributed systems concepts and practical experience deploying and supporting services in scalable environments
  • Working knowledge of containerization tools (Docker) and exposure to container orchestration platforms (e.g., Kubernetes) in test or staging environments
  • Experience using CI/CD tools to support automated builds, tests, and deployments; able to troubleshoot basic automation pipelines
  • Familiarity with DevOps workflows (e.g., IaC, basic config management), and initial exposure to observability and system reliability practices
  • Knowledge and awareness of scalable storage platforms, such as Ceph, S3-compatible systems, or NFS, including deployment, tuning, and lifecycle management
  • Exposure to high performance computing (HPC) environments, including schedulers (e.g., SLURM), shared filesystems, and workload optimization, with openness to ramp up
  • Lifecycle and service integration capabilities: planning upgrades, dependency management, and operational runbook development
  • Familiarity with Agile methodologies, such as Scrum, Kanban, or SAFe, enabling efficient collaboration across product and infrastructure teams
  • Continuous improvement mindset, with a track record of learning, researching, and adopting emerging technologies in storage, compute, and observability domains
  • Strong communication and collaboration skills, with the ability to interface across infrastructure, development, and stakeholder groups, translating complex systems into clear priorities

Minimum Work Experience Required:

  • Experience working with server installations, monitoring, and diagnostics
  • Experience with hardware upgrades and repairs
  • Experience working in data centres or server rooms/environments
  • Basic experience with computer networks
  • Experience working with Operating Systems, IAAS tools
  • Basic experience working with SANs and storage systems
  • Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g., Kubernetes), DevOps/SRE practices, and cloud-native technologies
  • Experience working in international teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains

Job Knowledge Required:

  • Strong understanding of systems engineering principles, including performance optimization, fault tolerance, and resource scheduling within Linux-based environments
  • Hands-on experience monitoring, diagnosing, and repairing various OEM hardware (HPE, Dell, Super Micro)
  • Proficient in remote-first infrastructure management and monitoring
  • Familiarity with containerized environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and container runtime architectures (e.g., CRI)
  • Knowledge in infrastructure-as-code and CI/CD methodologies using tools such as GitLab CI, Ansible, and Terraform
  • Working knowledge of networking fundamentals, including cabling and basic diagnostic procedures
  • Experience in asset management practices: maintaining asset registers, system and architectural mapping, warranty and service tracking
  • Proven experience working with service levels (SLAs) and understanding operational frameworks such as SRE, ITIL, and COBIT
  • Sound knowledge of IT security principles, including change management, physical and logical access control
  • Skilled in managing component and spare inventories, and tools/workspaces for system assembly
  • Awareness and adherence to Health and Safety standards and best practices

ATTRIBUTES:

  • Problem Solving and Analysis: Root cause analysis, systems troubleshooting, performance bottleneck resolution
  • Communication and Collaboration: Clear articulation of technical recommendations, cross-functional stakeholder engagement, feedback integration
  • Planning and Delivery: Participation in Agile and Systems Engineering processes and methodologies
  • Continuous Learning: Staying current with evolving technologies in containerization, cloud-native systems, observability, systems automation, and computing infrastructure (hardware, storage, memory, motherboards, processors, I/O, GPU, HBA, NICs)
  • Documentation and Knowledge Sharing: Ability to produce high-quality technical documentation and share knowledge across engineering teams

Senior Data Researcher (JHB – Hybrid) at Datafin Recruitment