Description:
onX is seeking a Site Reliability Engineer to build and maintain the infrastructure that enables our developers to ship reliably at scale. You'll manage onX's infrastructure platform, deployment automation, and observability through infrastructure-as-code—keeping systems reliable and performant while maintaining a simple path to production for development teams. This is a great opportunity to work on infrastructure that directly impacts millions of outdoor enthusiasts. This position will report to the Principal Site Reliability Engineer.
As an onX Site Reliability Engineer, your day to day responsibilities would look like: (Essential Job Duties)
- Deploy, monitor and maintain highly available systems using technologies such as Terraform, CockroachDB and GCP services to include GKE(Kubernetes), Cloud SQL, Bigtable, Google Composer (Airflow), Google Cloud Storage, BigQuery, Pub/Sub, Cloud Run, etc.
- Maintain and extend a large, mature Terraform codebase.
- Analyze systems and make recommendations to increase performance, availability and minimize cost.
- Automate manual systems to minimize toil wherever possible.
- Develop and maintain integrations with 3rd party monitoring and alerting systems, such as Google Cloud Monitoring, Prometheus, OpenTelemetry, Checkly, and Rootly.
- Drive incident response best practices for on-call engineering teams across onX. Participate in the SRE team's on-call rotation for core infrastructure.
- Collaborate in architectural decisions and direction involving our services and initiatives.
LOCATION onX has created a thriving distributed workforce community across several US locations. This position can be performed from an onX corporate office, “Basecamp,” or “Connection Hub”.
- Corporate Offices: onX was founded in Montana with offices in Missoula and Bozeman. If you prefer to work in an office at least part of the time this is a great option.
- Basecamps: onX’s Basecamps are established virtual workforce communities where a sizable number of distributed team members group for work, volunteering, socializing, and adventure.
- Our current Basecamps are located within a 90-mile radius of the following: Austin, TX; Charlotte, NC; Denver, CO; Kalispell, MT; Minneapolis, MN; Portland, OR; Salt Lake City, UT; and Seattle, WA.
- Connection Hubs: onX’s Connection Hub locations are smaller, emerging communities of distributed team members.
- Our current Connection Hubs are located within a 60-mile radius of the following: Boise, ID; Charleston, SC; Dallas/Fort Worth, TX; Phoenix, AZ; Richmond, VA; Spokane, WA; and Vermont.
What You’ll Bring
- You have a B.S. or M.S. in computer science or a related field or relevant experience
- You have at least 5+ years of experience where 3+ are supporting production systems
- You have a strong interest and experience with Kubernetes, networking, and infrastructure-as-code.
- You have experience with Terraform/OpenTofu
- You have exposure to at least one major cloud platform
- You evaluate technologies and solutions based on merit, stability, performance and the ability to debug
- You have practical experience with different types of datastores (SQL, NoSQL, object storage) and can explain when to use each based on data access patterns and scalability needs
- You have a strong computer science foundation
- You believe that your profession is a craft and you’re driven to improve every day
- You take strong ownership of your work and platform responsibilities