To apply for this job you must first either login or register

Director, Service Reliability Engineering

Toronto, Ontario  - Permanent

Job Description

As the Director of Service Reliability Engineering (SRE) you will set the overall strategic/tactical direction and priority for the SRE organization to support Online, Marketing and Retail initiatives. You will define, implement, and deliver the SRE standards across the organization, and work closely with external service providers and suppliers to evaluate and adopt the most effective solutions. The primary mission is to develop enterprise-grade development operation systems to accelerate the digital advancement, while improving the overall service levels overtime, and reducing the total cost of ownership (TCO) to the system.

Must Have Skills:

Define, communicate, and execute Digital service reliability engineering roadmap
Holding an approval position in the Change Approval Board (CAB), working closely with other CAB members to ensure changes to production environment are well planned, communicated, and executed.
Establish SRE and DevOps procedure and practices to support Digital development teams’ agile delivery
Develop and execute a transitional roadmap to migrate services from an on-premise model to a cloud-based hosting model (where applicable) to reduce TCO, improve resiliency and scalabilities, progressively and with minimal impact to the business operations.
Be the security-in-residence for the Digital team to provide education, heighten awareness, and implement best practices to maintain and improve customers’ trust.
Responsibility over continuous integration and continuous delivery pipeline
Establish and improve standardization of build and deployment processes
Drive the coverage of monitoring and improvement to alerting/communication practices for system and environment issues
Highlight issues/risks to project leads and management team
Analyze functional, technical and business requirements for projects
Maintain a supportive, positive, open and honest engineering culture
Understands the impact to the changing external environmental factors (competitive, regulatory, technical, etc.)
Responsibility over of entire CI suite Tools (JIRA, Jira Service Desk, Confluence, Bitbucket, TeamCity, OctopusDeploy, Package Hosts)
Collaborate with the Engineering Teams and Architects on system/server architecture and planning
Manage on-call rotation and incident escalation process.
Partner closely with architecture and IT operations team to design and develop highly reliable, fault tolerate, scalable, easily maintainable systems.


Computer Science Degree, other related diploma or equivalent experience accompanied with formal computer training
Minimum of 8 years of work experience in IT development or operations related field with emphasis on e-commerce preferred
Minimum of 5 years of work experience for SRE/DevOps experience
Minimum of 3 year leading a SRE/DevOps team
Certification in ITIL v3 would be a strong plus
Experience with developing and executing on project plans.
Excellent knowledge of SRE/DevOps practices and policies
Experience with Continuous Integration (TeamCity, Jenkins, Bamboo, Pipeline, …)
Experience with scripting (PowerShell, bash, python, TCL, …)
Experience with Azure/Heroku/AWS
Familiarity with .NET Framework(s)
Networking Experience (http, SSL, sftp, smtp, snmp…)
Familiarity with virtualization and containerization (VMWare, Hyper-V, Vargrant, Docker, Kubernetes)
Experience with configuration management (Chef, Puppet, Ansible)
Experience with deployment automation (OctopusDeploy, RunDeck, …)
Experience with monitoring and alerting systems. New Relic, DataDog and PagerDuty preferred.
Experience with managing and configuring Windows, IIS and MS SQL. Experience with Linux is desirable.
Industrial experience in PCI-DSS, PII, HIPPA compliant standards & best-practices are pluses
Proficiency in algorithms, data structures and production troubleshooting.
Proven experience in communicating highly technical scenarios to business stakeholders, both verbally and in writing.
Able to demonstrate calm command-n-control demeanor under high-tension production incidences.
Detail-oriented, able to concentrate and work quickly
Ability and enthusiasm to learn new technologies
Analytical and problem solving skills
Excellent communication and written skills

Starting: ASAP
To apply for this job you must first either login or register