- Keep an eye 24x7 (team of people) on our production deployments
- Develop Runbooks for routine maintenance of production deployments
- Be the front for all production issues. Triage the issue and execute runbooks, Escalate appropriately.
- Develop Runbooks.
- Be able to develop tooling using Python/Shell
- Understand Kafka/ZookeeperSpark/HDFS enough to reconfigure if necessary.
- Well versed in AWS EC2, EBS, S3 and other services. Have the ability to take action like expanding the disk. Increasing/decreasing instance capacity.
- Work with peers/junior in US and India to ensure 24x7 coverage.
- Ability to program with one or more high level languages, such as Python, Shell
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks