What you'll do
As an SRE at Airomeda you'll own the reliability of 47 production systems. You'll manage Kubernetes clusters across 4 active regions (fra1 · lon1 · ist1 · iad1), operate the Grafana + Loki + OpenTelemetry observability stack, and keep failover under 8 seconds.
You'll optimise GitHub Actions pipelines, run chaos experiments, and act as an infrastructure advisor when new client projects go live. The 99.95% SLA we promise clients is yours to defend.
Is this role for you?
We're looking for someone who sees infrastructure as an engineering product, not a utility. Someone who writes post-mortems to make the system stronger — not to assign blame. Someone who can tell the difference between "noisy alert" and "real signal" in under 60 seconds.
Responsibilities
- →Manage Kubernetes clusters across fra1 · lon1 · ist1 · iad1
- →Optimise CI/CD pipelines and improve reliability
- →Participate in SLA monitoring, alerting and on-call rotation
- →Run chaos engineering experiments to stress-test infrastructure
- →Contribute to infrastructure design for new client projects
- →Mentor engineering team on observability tooling
Requirements
- ·4+ years of Kubernetes production operations
- ·Terraform IaC; multi-region infrastructure experience
- ·Grafana, Prometheus, Loki, OpenTelemetry stack
- ·CI/CD pipeline design (GitHub Actions or GitLab CI)
- ·Zero-downtime deployment strategies (rolling, canary, blue-green)
- ·Linux sysadmin and network architecture
- ·Written English proficiency
What we offer
- ✓Competitive salary + annual bonus
- ✓Fully remote or hybrid at Istanbul Maslak office
- ✓12,000 TRY annual conference budget
- ✓Mac or Linux workstation of choice
- ✓Private health insurance
- ✓On-call compensation

