案件名: Site Reliability Engineer II
案件種類: Permanent
雇用形態: Full-time
業界: Information Technology < IT >
給与: 交渉可
所在地: Tokyo
掲載済み案件: 2024-05-02
案件ID: 44316

職務内容

SRE Engineer 

 

■ Your Role and Responsibilities 

• Use your on-call shift to prevent incidents from happening.

• Document actions taken, so your findings turn into repeatable actions–and then into automation.

• Design, build and maintain core infrastructure pieces that enable to support hundreds of thousands of concurrent users.

• Debug production issues across services and levels of the stack.

• Mentor Interns and Intermediate SREs in all areas and other SRE in their area of deep knowledge.

• Contribute improvements to the codebase to resolve issues

• Identify significant projects that result in substantial cost savings or revenue

• Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.

• Proactively plan for efficiency and capacity to set clear requirements and reduce system resources usage to make the company assets cheaper to run for all our customers.

• Identify parts of the system that do not scale, apply immediate palliative measures and drive long term resolution of these incidents.

• Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.

• Know a domain really well and radiate that knowledge through recorded demos, discussions in DNA meetings, or Incident Reviews

• Perform and run blameless RCAs on incidents and outages aggressively looking for answers that can prevent the incident from ever happening again.

• Set example for team of SREs with positive and inclusive leadership and discussion on work.

• Be able to de-escalate conflicts inside the team

 

■ Work Location

・Tokyo, Japan 

 

■ Experience and Qualifications 

• 2+ years of work experience in the IT sector

• Experience as a Cloud, DevOps or Reliability engineer

• Work closely with engineering teams to create and improve containerized technologies

• Able to collaborate in a global team environment, actively engage subject matter experts, and follow through on commitments

• Strong problem solving (debugging) skills. The ability the dissect, divide and conquer platform problems and find root cause

• Knowledge of Microsoft Azure and / or AWS and / or GCP is a must (Azure preferred)

• Scripting knowledge in PowerShell and / or Python

• Version control experience (Git)

• Knowledge of container orchestration technologies (Kubernetes)

• Knowledge of container technologies (Docker)

• CI/CD knowledge

 

■ Additional Preferred Qualifications

• Knowledge of Azure Security center, policy and initiatives

• Knowledge of Azure Sentinel is a nice to have

• Experience operating in a Linux environment using the command line

• Experience using Azure DevOps or similar project management & pipeline tools

• Experience managing Logging and Monitoring system, knowledge of Azure Monitoring