Infrastructure Manager
sifiapp
Other Engineering
Riyadh Saudi Arabia
Job Description
ABOUT SIFI
SiFi is a corporate expense management platform designed to empower finance and accounting teams with seamless control over corporate spending. Our platform allows companies to issue cards with specific spending restrictions, ensuring that funds are used efficiently and only for approved expenses.
POSITION OVERVIEW
The Infrastructure Manager is a senior role that combines strong people and operational leadership with deep, hands-on technical expertise in infrastructure and reliability engineering. This role reports directly to the Director of Technology and is responsible for leading the Site Reliability Engineering (SRE) team, driving infrastructure strategy, and maintaining full technical authority over SiFi’s systems.
Unlike a purely administrative management role, the Infrastructure Manager is expected to possess and actively apply deep technical knowledge — participating in architecture decisions, contributing during critical incidents, and evaluating infrastructure solutions at an engineering level. This role ensures high system availability, robust security controls, full compliance with SAMA regulations, and cost-optimized infrastructure operations.
Requirements
Team Leadership & Culture
• Lead, mentor, and develop the SRE and infrastructure team; set clear career paths, goals, and performance expectations.
• Foster a culture of ownership, continuous improvement, and technical excellence aligned with SiFi’s values: Customer Centricity, Product-First Mindset, and Data-Driven Excellence.
• Partner with Product and Engineering teams to ensure infrastructure supports secure, reliable product delivery.
Infrastructure Architecture & Engineering
• Own and actively contribute to infrastructure architecture decisions, including reviewing and designing cloud-native and hybrid solutions to support scalability, security, and performance.
• Lead the hands-on design, implementation, and maintenance of secure, scalable, and resilient cloud and hybrid infrastructures, including deployment pipelines and system hardening.
• Perform regular architecture and system health assessments to proactively identify risks, bottlenecks, and improvement opportunities; translate findings into actionable engineering work.
• Act as a technical authority in infrastructure and DevOps, providing guidance across teams and ensuring alignment with SiFi’s long-term technology strategy.
Reliability & Incident Management
• Define and enforce SLAs, SLOs, and error budgets; ensure system performance meets or exceeds targets through proactive monitoring, alerting, and automation.
• Provide hands-on support during critical incidents, including deep technical troubleshooting, root cause analysis, and system recovery — acting as the most senior technical responder when needed.
• Lead production readiness reviews, post-incident reviews, and disaster recovery drills; maintain and validate business continuity frameworks.
• Track and meet clear reliability KPIs, including system uptime targets (e.g., >99.95%).
Security & Access Control
• Develop and enforce robust access control processes; ensure system access is authorized, reviewed regularly, and provisioned or de-provisioned in a timely manner.
• Collaborate with the Cybersecurity Department to align infrastructure security with internal policies and regulatory requirements.
Regulatory Compliance
• Ensure infrastructure operations comply with SAMA regulatory controls; serve as the primary infrastructure contact for regulators, ensuring timely reporting and transparent cooperation.
• Coordinate with Compliance, Risk, and Cybersecurity teams to close regulatory observations and audit findings promptly.
• Maintain accurate and up-to-date compliance documentation and evidence for audits and inspections.
Cost & Vendor Management
• Own and optimize the budget for infrastructure, systems, and software licenses; identify cost-saving opportunities through vendor management and efficient resource utilization.
• Manage relationships with cloud providers, infrastructure vendors, and license suppliers; ensure SLA adherence and value delivery.
Reporting & Communication
• Report to the Director of Technology and relevant stakeholders on system availability, incidents, compliance posture, and infrastructure health.
• Communicate proactively with internal functions about infrastructure initiatives, risks, and issue resolution.
Benefits
Experience
• 8+ years of experience in infrastructure engineering, SRE, or cloud operations, with at least 3–4 years in a management or senior technical leadership role overseeing infrastructure or SRE teams.
• Experience with payment systems or financial services infrastructure is preferred.
Technical Skills
• Deep technical knowledge of Linux/Unix systems, networking, and infrastructure-as-code tools (e.g., Terraform, Ansible) — with the ability to contribute at an engineering level when required.
• Hands-on experience with Oracle Cloud Infrastructure (OCI) is required; familiarity with Google Cloud Platform (GCP) is a strong advantage.
• Proven expertise in designing, operating, and scaling secure, high-availability cloud-native and hybrid environments, with a strong focus on OCI services including Compute, Networking, Object Storage, IAM, and Security Zones.
• Strong working knowledge of containerization and Kubernetes (including OCI Container Engine for Kubernetes — OKE); able to make informed architecture decisions and guide the team on container orchestration, deployment strategies, and cluster management.
• Solid understanding of DevOps practices and CI/CD pipelines — including pipeline design, automation, and integration with infrastructure provisioning — at an architecture and governance level.
• Hands-on experience with observability and monitoring stacks; able to architect and enforce a comprehensive monitoring strategy covering infrastructure metrics, log management, alerting, and distributed tracing (e.g., Prometheus, Grafana, Datadog, ELK/OpenSearch, or equivalent).
• Strong knowledge of SRE principles, observability, ITSM practices, and cost management for infrastructure and licenses.
• Familiarity with Microsoft infrastructure and services, including Windows Server, Active Directory, and SQL Server — as part of the environment includes Windows-based systems and Microsoft data platforms.
• Hands-on experience implementing and managing system access controls aligned with security frameworks.
Leadership & Compliance
• Demonstrated ability to manage, mentor, and grow a high-performing technical team while maintaining the technical credibility to guide engineering decisions and drive quality.
• Proven track record mentoring engineering teams on reliability and infrastructure best practices.
• Deep familiarity with regulatory frameworks, including SAMA, and experience cooperating with regulators and internal compliance functions.