← Back to Jobs
Staff Site Reliability Engineer – Developer Experience | Talkdesk

Job Description

We're looking for a Site Reliability Engineer that will help to elevate Developer Experience for our customers by building reliable and scalable infrastructure and internal platforms. You'll design infrastructure and internal platforms using Kubernetes, GitHub Actions, Terraform, and other tools to build automated workflows, enhance developer tooling, and support a reliable Engineering experience. This role is a key part of our DevOps-first culture.


Key Responsibilities

  • Guarantee high availability, performance, and scalability of business-critical services.
  • Design and implement resilient and fault-tolerant infrastructure.
  • Lead teams and mentor on incident response, postmortems, and production readiness.
  • Use Terraform and comparable tools to automate infrastructure provisioning.
  • Build internal platforms and tools to increase developer efficiency and developer velocity.
  • Collaborate on out-of-team infrastructure projects that are in-house with an alignment on business goals.
  • Develop and maintain book runs, architecture diagrams, and operational documentation.
  • Find and automate installable engineering processes that are repetitive or prone to human error.


Required Skills

  • 8-12+ years in SRE, DevOps, or another related infrastructure area of expertise.
  • Proficiency in Kubernetes, Linux/Unix, and Infrastructure as code (Terraform, Ansible, Helm).
  • Experience with scripting languages (i.e. Python, Bash) for automation purposes.
  • Experience managing relational and NoSQL databases (i.e. PostgreSQL, MongoDB, Redis, or Elasticsearch).
  • Skills to debug and tune very large distributed systems.
  • Demonstrated experience in incident management, high-severity incidents, and root cause analyses.
  • Developer-first mindset with a mindset of ownership and responsibility.


Nice to Have

  • Experience with cloud providers: AWS, GCP, or Azure.
  • Familiarity with developer platforms or internal PaaS environments.
  • Experience managing cost-effective cloud infrastructure.


What We Value

  • Focus: Prioritize impactful work and smart collaboration.
  • Accountability: Own outcomes and continuously improve.
  • Speed: Move fast with purpose and agility.
  • Talkdesker Mindset: Be bold, be thoughtful, and drive change.