Senior Staff Site Reliability Engineer

Twitter - permanent

United States

28 Oct

Open to all applicants globally!

Site Reliability Engineering at Twitter is responsible for the performance, reliability, and scalability of Twitter services in production. We build software to automate, optimize, manage, and maintain those services; driving down technical debt, operational cost, and toil every step of the way. We are the last line of defense for the Twitter platform, the chosen few tasked with keeping the tweets flowing.

As Site Reliability Engineers in a team supporting Druid as a service in Twitter, our mission is to build powerful solutions to make data accessible to a broad set of technical and non-technical customers for slice and dice analytics on both historical and real-time metrics. We work at an enormous scale — Terabytes of data are being collected every day and we make it searchable in seconds. Advertisers, data scientists and engineers need that data to be broken down by market segments and user attributes for real time insights.

What you will do:

  • You will work build and scale our 1000+ node Druid interactive query infrastructure and help to define, architect and build the next-generation engagement data processing architecture for advertising campaigns.
  • You will join passionate engineering team working on building Druid as a multi-tenant Platform service, which would be available to any Twitter engineering team to make better decisions for our customers.
  • You will work closely with product managers, data analysts, data scientists, and other engineers to build and maintain a robust data products. You will build and use the latest highly scalable and performant systems to process dozens of terabytes of data a day.
  • Your efforts will reveal invaluable business and user insights, leveraging vast amounts of Twitter revenue data to fuel numerous Revenue teams including Ads Analytics, Ads Experience, Ads Data Science, Marketplace, Targeting, Prediction, and many others.
  • You will troubleshoot issues across the entire stack: hardware, software, application and network
  • You will identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
  • You will take part in 24×7 on-call rotation
  • You will Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.

Experience

Who You Are:

  • You have solid experience building and operating Druid clusters on-prem or in the cloud.
  • You have a solid understanding of systems and application design, including the operational trade-offs of various designs.
  • You have practical, solid knowledge of shell scripting and at least one higher-level language (Python or Java)
  • You have an expert understanding of Linux systems, services, optimization, storage subsystems, and file systems
  • You have a minimum 5 years experience handling services in a large scale environment
  • You work well with and be able to influence a myriad of personalities at all levels.
  • You are able to prioritize tasks and work independently.
  • You are adaptable and able to focus on the simplest, most efficient & reliable solutions.
  • You have a track record of successful practical problem solving, excellent written and social communication, and documentation skills.
  • B.S. in computer science or similar experience.

Desired

  • Ability to lead technical teams through design and implementation across an organization.
  • Experience designing fault-tolerant distributed system
  • Experience with Hadoop or other MapReduce-based architecture
  • Experience with real-time streaming (Apache Kafka, Apache Beam, Heron, Spark Streaming
  • Experience with coordination (Apache Zookeeper)
  • Experience with compute (Apache Mesos, GCE, GKE)
  • Proficiency with SQL (Relational, Hive, Presto, MySQL)

Salary and Perks

Competitive salary and remote working opportunity from anywhere in the US.

About Twitter

We serve the public conversation.

Twitter serves the public conversation because conversation is a force for good in the world. The opportunity to help the world connect, debate, learn, and solve problems is what draws us to careers at Twitter, and it’s what keeps us here.

What we offer:

Small but mighty teams who serve the public conversation in ways people feel across the world, every day; a flat, non-hierarchical org structure; the freedom to design your own path; space to innovate and make big contributions to Twitter’s future; respect and support of your identity and background, whatever it is; actual work-life balance, including growing opportunities for remote work and flex schedules (depending on the role).

Apply

We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.

San Francisco applicants: Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

By applying for this role, you could choose to work in the following locations:

  • Sunnyvale
  • Boulder
  • Boston
  • New York City
  • US – Remote US
  • Seattle
  • San Francisco

Engineering Hiring Process

Step 1

Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.

Step 2

If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.

Step 3

If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.

Remote Jobs Weekly

Just the remote jobs you want sent straight to your inbox weekly.