Site Reliability Engineer

 

Description:

Xebia, a global leader in digital transformation and IT consulting, is actively seeking a Site Reliability Engineer (SRE) or SRE Lead with deep hands-on experience in Dynatrace and a passion for driving reliability, observability, automation, and cloud-native architecture at scale. This is a full-time, on-site role based in Atlanta, GA, offering a high-impact opportunity to help shape reliability strategy across enterprise platforms.

The selected candidate will take a senior technical role to:

  • Architect and design highly available, scalable, and secure AWS-based systems.

  • Lead the definition and implementation of SLIs, SLOs, and error budgets for mission-critical services.

  • Enhance observability maturity using tools like Dynatrace, Prometheus, Grafana, ELK, OpenTelemetry, etc.

  • Evaluate and improve current CI/CD pipelines, IaC modules, and incident remediation automation.

  • Conduct technical mentoring, provide architecture consultation, and lead production readiness reviews.

  • Champion chaos engineering, resilience practices, and blameless postmortems.

Key Responsibilities Include:

  • Reliability Strategy & Design: Define best practices and implement scalable patterns for deployment, monitoring, and readiness.

  • Automation: Eliminate toil and improve system efficiency through automated pipelines and robust architecture.

  • Leadership: Act as a subject matter expert in cloud-native SRE and mentor fellow engineers.

  • Resilience: Advocate and implement robust fault-tolerant patterns across infrastructure and applications.

Required Qualifications:

  • Proven experience in SRE architecture for large-scale systems.

  • Strong knowledge of AWS infrastructure and security.

  • Hands-on with Kubernetes, Docker, and serverless technologies.

  • Expertise in observability and monitoring (Dynatrace, ELK, Prometheus, Grafana).

  • Strong programming skills in Python, Go, or Bash.

  • Clear communication and the ability to lead technical initiatives.

Preferred Qualifications:

  • Direct experience implementing chaos engineering methodologies and platforms.

About Xebia:
With over 20 years of global consulting experience, Xebia operates across 16 countries and delivers top-tier services in cloud, DevOps, AI, digital platforms, agile transformation, and more. The company is known for its deep technical expertise and its focus on helping global enterprises embrace digital innovation.

Organization Xebia
Industry IT / Telecom / Software Jobs
Occupational Category Site Reliability Engineer
Job Location Georgia,USA
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2025-06-30 3:17 pm
Expires on 2026-01-04