Senior Site Reliability Engineer

 

Description:

Manus AI, a trailblazing company in general AI agents, is seeking a Senior Site Reliability Engineer (SRE) to join their on-site team in California. The company is committed to creating intelligent systems that do more than think—they execute and deliver. Manus AI integrates expertise across engineering, research, and business domains, fostering a dynamic and forward-thinking workplace.

In this critical role, the Senior SRE will ensure the high availability, scalability, and robustness of Manus AI’s infrastructure. You will lead initiatives to automate operations, manage containerized environments, and maintain performance across a range of production services. This is a hands-on, full-time position requiring deep technical expertise, problem-solving skills, and the ability to work effectively in a collaborative yet autonomous setting.

Key Responsibilities:

  • Manage and maintain container clusters and open-source component clusters across business lines

  • Build and enhance infrastructure operation platforms, including infrastructure management, CI/CD, monitoring, alerting, and logging systems

  • Respond swiftly to incidents and implement efficient solutions to minimize downtime

  • Optimize system architecture and deployment strategies to ensure production service availability

  • Drive automation initiatives to enhance operational efficiency and reduce manual processes

  • Collaborate with development teams to implement infrastructure-as-code and service reliability best practices

  • Participate in a 24/7 on-call rotation for mission-critical systems

Qualifications:

  • Bachelor's degree in Computer Science or related technical field preferred

  • 5+ years of experience in systems operations or SRE roles

  • Proficient in major public cloud platforms (AWS, Azure, GCP)

  • Strong Linux administration skills and day-to-day operational experience

  • Advanced scripting skills using Shell and Python

  • Deep understanding of internet technologies and optimization for Nginx, MySQL, Redis, Kafka, ElasticSearch, and JVM

  • Extensive hands-on experience with Kubernetes and Docker in production environments

  • Familiarity with CI/CD tools like GitLab CI and ArgoCD

  • Excellent troubleshooting and problem-solving skills under pressure

  • Strong communication and collaboration abilities, especially in remote settings

  • Self-driven with the ability to work independently while aligned with team goals

  • Fluency in both Chinese and English (working proficiency in both required)

About the Company:
Manus AI builds general AI agents capable of both reasoning and execution. Their agents are designed to enhance productivity by autonomously handling tasks across work and life, allowing users to rest while the AI handles the workload. The company offers a highly collaborative environment where professionals come together to innovate at the edge of AI capabilities.

Organization Manus AI
Industry IT / Telecom / Software Jobs
Occupational Category Senior Site Reliability Engineer
Job Location California,USA
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Experienced Professional
Experience 5 Years
Posted at 2025-05-25 9:52 am
Expires on 2026-01-06