SRE

參考編號： MY874878

發佈於： 4/2/2026

Kuala Lumpur Federal Territories Malaysia

Permanent

金融服務

About the Team

The technology team is responsible for building and maintaining a scalable, reliable digital banking platform. The Site Reliability Engineering (SRE) function plays a critical role in ensuring platform stability, availability, and security while supporting rapid product development and growth.

The team works closely with engineering, security, and platform stakeholders to enable high-quality software delivery in a cloud-native environment.

About the Role

This role focuses on driving Site Reliability Engineering and DevSecOps practices across the platform. You will support the design, implementation, and operation of infrastructure and tooling that enables reliable, secure, and efficient software delivery.

You will work closely with other SREs to manage infrastructure as code, CI/CD pipelines, cloud-native services, automation frameworks, and release management processes in alignment with business and technical objectives.

Key Responsibilities

Design and implement highly available, secure, and scalable infrastructure using infrastructure as code (IaC)
Define and maintain service level objectives (SLOs) and service level indicators (SLIs)
Lead incident response during critical service disruptions and coordinate cross-functional teams
Build and maintain observability and monitoring solutions covering uptime, latency, saturation, and error rates
Develop automation for incident response, disaster recovery, and business continuity
Enable DevSecOps practices through CI/CD, GitOps, and self-service deployment capabilities
Drive cost optimisation initiatives related to cloud infrastructure (FinOps)
Mentor junior engineers and contribute to operational best practices

Requirements

Strong experience with cloud platforms (AWS, Azure, GCP, or Alibaba Cloud)
Hands-on experience with CI/CD pipelines and GitOps workflows
Strong expertise in infrastructure as code tools (Terraform, OpenTofu, CloudFormation, or similar)
Experience designing and implementing observability and monitoring solutions
Experience leading incident management and post-mortem processes
Solid understanding of Java or another object-oriented programming language
Strong knowledge of containerisation and orchestration technologies
Experience with messaging systems (e.g. Kafka) is an advantage
Familiarity with relational and non-relational databases is a plus
Ability to balance hands-on technical work with strategic decision-making
Strong problem-solving skills and ability to perform under pressure
High ownership, accountability, and commitment to delivery
Financial services experience is a plus but not mandatory