Senior Site Reliability Engineer

QGenda

Atlanta, GA

Category Software Engineering

Job Description

Role Overview

As a Senior Site Reliability Engineer, you will work with our Infrastructure and Product Development Teams to increase the scalability, reliability, and performance of our systems and services. You will build and extend existing automation for configuration and monitoring of our AWS hosted applications. You will have the opportunity to evaluate new AWS services and tools to determine if they could be utilized in our environments.

What You Will Do

Design, implement, and manage scalable systems that ensure high availability, fault tolerance, and optimal performance. Continuously monitor and enhance system health and performance through data analysis and metrics. Develop and advocate for automation tools to eliminate repetitive manual processes and improve efficiency.

Why It Might Be a Fit

This is an excellent opportunity to have a significant impact on the stability of our systems and contribute to the evolution of our technology stack. You will have the opportunity to work with a dynamic team and contribute to the growth and success of the company.

Requirements

B.S. in Computer Science, Computer Information Systems, or Computer Engineering from a major U.S. university or equivalent industry experience
7+ years of experience as a DevOps, SRE or Systems Engineer
Advanced proficiency with at least one scripting or programming language
Experience with Docker and container orchestration tools such as AWS ECS and EKS/Kubernetes
Hands-on experience building infrastructure and supporting applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache
Strong understanding of networking and DNS
Strong experience with Terraform for infrastructure provisioning and module development, along with configuration management and infrastructure as code (IaC) practices
Firm understanding and experience with Agile and Scrum SDLC processes
Using distributed version control system experience (Git preferred) to check-in code, branching, merging, pull request, code review, etc
Knowledge of CI/CD best practices and tools such as AWS CodeBuild, Jenkins and/or TeamCity
Experience using AI-assisted coding tools (e.g., Claude, GitHub Copilot) to accelerate IaC development, scripting, and operational workflows
Familiarity with AI/ML-driven approaches to observability, anomaly detection, log analysis, or incident triage
Experience designing and delivering secure, high performance and highly available cloud services
Experience with observability platforms (e.g., Datadog, CloudWatch, PagerDuty) for monitoring, alerting, and incident response
Awareness of cloud security best practices including IAM policies, network segmentation, and secrets management

Benefits

Fully company-paid options for medical (both in-person and virtual), dental and vision insurance
Generous paid time off (PTO) policy
Paid parental leave for birth, adoption or permanent placement
401(k) with company match
Annual Costco membership
Cell phone stipend
Commuter benefits
In-office perks

]]>