SRE
Posted on: 4/2/2026
Kuala Lumpur Federal Territories
Permanent
Financial Services
About the Team
The technology team is responsible for building and maintaining a scalable, reliable digital banking platform. The Site Reliability Engineering (SRE) function plays a critical role in ensuring platform stability, availability, and security while supporting rapid product development and growth.
The team works closely with engineering, security, and platform stakeholders to enable high-quality software delivery in a cloud-native environment.
About the Role
This role focuses on driving Site Reliability Engineering and DevSecOps practices across the platform. You will support the design, implementation, and operation of infrastructure and tooling that enables reliable, secure, and efficient software delivery.
You will work closely with other SREs to manage infrastructure as code, CI/CD pipelines, cloud-native services, automation frameworks, and release management processes in alignment with business and technical objectives.
Key Responsibilities
- Design and implement highly available, secure, and scalable infrastructure using infrastructure as code (IaC)
- Define and maintain service level objectives (SLOs) and service level indicators (SLIs)
- Lead incident response during critical service disruptions and coordinate cross-functional teams
- Build and maintain observability and monitoring solutions covering uptime, latency, saturation, and error rates
- Develop automation for incident response, disaster recovery, and business continuity
- Enable DevSecOps practices through CI/CD, GitOps, and self-service deployment capabilities
- Drive cost optimisation initiatives related to cloud infrastructure (FinOps)
- Mentor junior engineers and contribute to operational best practices
Requirements
- Strong experience with cloud platforms (AWS, Azure, GCP, or Alibaba Cloud)
- Hands-on experience with CI/CD pipelines and GitOps workflows
- Strong expertise in infrastructure as code tools (Terraform, OpenTofu, CloudFormation, or similar)
- Experience designing and implementing observability and monitoring solutions
- Experience leading incident management and post-mortem processes
- Solid understanding of Java or another object-oriented programming language
- Strong knowledge of containerisation and orchestration technologies
- Experience with messaging systems (e.g. Kafka) is an advantage
- Familiarity with relational and non-relational databases is a plus
- Ability to balance hands-on technical work with strategic decision-making
- Strong problem-solving skills and ability to perform under pressure
- High ownership, accountability, and commitment to delivery
- Financial services experience is a plus but not mandatory