Site Reliability Engineer, Reliability Team - USDS

TikTok
San Jose, CA
Job Description
The Site Reliability Engineering (SRE) team at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems.

Requirements

  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience.
  • Proficiency in one or more programming languages (e.g., Go, Python, Java, or C++).
  • Strong understanding of Linux system internals, networking (TCP/IP, DNS, Load Balancing), and distributed systems.
  • Experience managing containerized environments (e.g., Kubernetes, Docker).
  • Proven experience in a high-traffic production environment with a focus on incident response and site stability.
  • Hands-on experience with Disaster Recovery strategies, including multi-region failover and data consistency in distributed databases.
  • Familiarity with observability and monitoring tools.
  • Experience with Infrastructure as Code.
]]>