Your tasks:
- Monitoring and Incident Management: You will instrument our services with key metrics to enable continuous monitoring for system malfunctions. In the event of an incident, you will collaborate with development teams to debug issues within Google Cloud and resolve them as quickly as possible.
- Automation and Scalability: You will take our cloud infrastructure to the next level by identifying and automating repetitive tasks. Additionally, you will recommend Google Cloud managed services to development teams and drive their adoption across the company. You will also continuously improve our deployment process, enabling teams to execute simpler and more secure deployments to Kiwigrid’s production environment.
- Security and Compliance: You will help ensure a secure cloud environment for Kiwigrid by integrating tools for vulnerability monitoring and the detection of potentially malicious external activity. You will participate in establishing compliance policies and assist with audits.
- Documentation and Knowledge Sharing: You will help development teams structure their documentation to ensure the technology stack and request flows are clearly understandable. Furthermore, you will introduce SRE best practices to the development teams to foster a high level of standardization.
- Incident Response & On-Call Duty: You will lead incident response activities and participate in the regular on-call rotation.
