MAKING SUCCESS STORIES HAPPEN
 

About the Team

The technology team is responsible for building and maintaining a scalable, reliable digital banking platform. The Site Reliability Engineering (SRE) function plays a critical role in ensuring platform stability, availability, and security while supporting rapid product development and growth.

The team works closely with engineering, security, and platform stakeholders to enable high-quality software delivery in a cloud-native environment.

About the Role

This role focuses on driving Site Reliability Engineering and DevSecOps practices across the platform. You will support the design, implementation, and operation of infrastructure and tooling that enables reliable, secure, and efficient software delivery.

You will work closely with other SREs to manage infrastructure as code, CI/CD pipelines, cloud-native services, automation frameworks, and release management processes in alignment with business and technical objectives.

Key Responsibilities

  • Design and implement highly available, secure, and scalable infrastructure using infrastructure as code (IaC)
  • Define and maintain service level objectives (SLOs) and service level indicators (SLIs)
  • Lead incident response during critical service disruptions and coordinate cross-functional teams
  • Build and maintain observability and monitoring solutions covering uptime, latency, saturation, and error rates
  • Develop automation for incident response, disaster recovery, and business continuity
  • Enable DevSecOps practices through CI/CD, GitOps, and self-service deployment capabilities
  • Drive cost optimisation initiatives related to cloud infrastructure (FinOps)
  • Mentor junior engineers and contribute to operational best practices

Requirements

  • Strong experience with cloud platforms (AWS, Azure, GCP, or Alibaba Cloud)
  • Hands-on experience with CI/CD pipelines and GitOps workflows
  • Strong expertise in infrastructure as code tools (Terraform, OpenTofu, CloudFormation, or similar)
  • Experience designing and implementing observability and monitoring solutions
  • Experience leading incident management and post-mortem processes
  • Solid understanding of Java or another object-oriented programming language
  • Strong knowledge of containerisation and orchestration technologies
  • Experience with messaging systems (e.g. Kafka) is an advantage
  • Familiarity with relational and non-relational databases is a plus
  • Ability to balance hands-on technical work with strategic decision-making
  • Strong problem-solving skills and ability to perform under pressure
  • High ownership, accountability, and commitment to delivery
  • Financial services experience is a plus but not mandatory
Apply for SRE
Job Reference: MY874878

Please complete all required fields marked *

*

*

*

*

*

Only MS Word, PDF, HTML and text formats.

Issues applying with LinkedIn? Click here

*
Your personal details will be treated to conform with our Privacy Notice.
I hereby accept the Privacy Notice.

SRE
Kuala Lumpur, Federal Territories | Permanent