- Strategy and Planning: Develop and implement a comprehensive strategy for site reliability, encompassing scalability, performance, and reliability improvements. Align SRE objectives with overall business goals and technology roadmaps. Foster the spirit of continuous improvement to the SRE and position it to benefit the organizational objectives.
- Leadership and Team Management: Provide strong leadership to the Site Reliability Engineering (SRE) team, fostering a culture of collaboration, innovation, and continuous improvement. Recruit, mentor, and develop a high-performing team of SRE professionals. Engrave a can do attitude into the team out of the box, combined with a passion for automation and engineering excellence.
- Operational Excellence: Oversee day-to-day operations of the SRE team, ensuring the reliability and availability of digital infrastructure. Establish and enforce best practices for incident response, monitoring, automation, and system reliability. Do so by incorporating tools and technologies that create a 36-degree view of the SRE efficiency, including but not limited to DevOps, App Support, Monitoring, Incident Management, Observability, Network/Infra/InfoSec, and Enterprise Architecture.
- Collaboration: Collaborate with teams across our lines of business, including development, DevOps, App Support, Monitoring, Network/Infra/InfoSec, and Enterprise Architecture, to drive a unified approach to site reliability that optimizes the work of all those teams and improves time-to-market for all respective objectives. Foster strong relationships with the leadership and partnering delivery organizations to align SRE efforts with organizational goals.
- Monitoring and Alerting: Implement robust monitoring and alerting systems to proactively identify potential issues, analyze system performance, and facilitate quick response to incidents.
- Automation and Efficiency: Drive the development and implementation of automation solutions to streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
- System Capacity Planning: Work closely with infrastructure and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand. Anticipate growth and scalability requirements.
- Incident Management: Establish and oversee effective SRE-focused incident response processes, ensure timely incident resolution, and conduct post-mortems to identify root causes and implement preventive measures.
- Bachelor's degree in computer science, Engineering, or related field.
- A minimum of 10 years of experience, including at least 3 years in the SRE or DevOps field, with a proven track record of progressively increasing responsibilities and leadership roles.
- Demonstrated ability to think strategically and develop a vision for site reliability engineering aligned with the organization's business objectives.
- Strong leadership and people management skills, including experience leading and developing high-performing teams.
- A "can do" attitude is necessary, combined with a deep belief that everything can be automated and systems must always be functional.
- Strong experience and understanding of software engineering, scripting, build/deployment pipelines, Infrastructure as Code, and SLA/SLO/SLIs.
- Strong understanding of cloud computing platforms (Azure required, Google Cloud a plus), including lift-and-shift environments (VMs, etc.) and cloud-native setups (AKS, serverless, etc.).
- Strong understanding and experience in automation tools and programming/scripting/descriptive languages (e.g., C#, PowerShell, Python, Bash, Terraform, JavaScript) to develop and implement automated system reliability and performance solutions.
- Strong understanding of observability, monitoring, and alerting tools (e.g., Azure AppInsights, Data Dog, Splunk, etc.) and the ability to design and implement effective monitoring strategies.
- Technical leadership skills, including technical collaboration/communication, problem-solving, and project management, are needed to lead the SRE team in delivering its objectives.
- Preference may be given to candidates with relevant certifications demonstrating cloud and reliability engineering expertise.
-
Director, Site Reliability Engineering
6 days ago
Bright Horizons Family Solutions Newton, United States Full timeThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Director, Site Reliability Engineering
1 week ago
Bright Horizons Newton, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Site Reliability Engineering Director
2 weeks ago
Bright Horizons Children's Centers Newton, MA, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Director, Site Reliability Engineering
2 weeks ago
Bright Horizons Newton, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Site Reliability Engineering Director
2 weeks ago
Bright Horizons Newton, MA, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Site Reliability Engineering Director
2 weeks ago
Bright Horizons Newton, MA, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Director, Site Reliability Engineering
2 weeks ago
Bright Horizons Children's Centers Newton, MA, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Director, Site Reliability Engineering
2 weeks ago
Bright Horizons Newton, MA, United StatesThe Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
-
Public Water System Coordinator
4 days ago
BARCLAY WATER MANAGEMENT, INC Newton, United States**Department**: · **Environmental Group** · **Position Reports to**: · **Public Water Systems Director** · **Classification**: · **Exempt** · BARCLAY WATER MANAGEMENT, INC. was founded in 1932 and is the oldest and largest employee-owned water treatment organization in North Amer ...
-
Operations Compliance Specialist
1 week ago
The RMR Group Newton, United StatesOverview: · The Operations Compliance Specialist ensures the accuracy and compliance of all Service Contracts/Service Orders, Construction Contracts, Certificates of Insurance as well as the Building Engines - CMMS (Work Order) System control. The Operations Compliance Specialist ...
-
Director of Quality
1 week ago
SIEMENS AG Newton, United StatesJoin us in pioneering breakthroughs in healthcare. For everyone. Everywhere. Sustainably. Our inspiring and caring environment forms a global community that celebrates diversity and individuality. We encourage you to step beyond your comfort zone, of Director, Quality Engineer, P ...
-
Principal Software Engineer
1 week ago
Bright Horizons Children's Centers Newton, United StatesPrimary PurposeReporting to the Senior Director, IT Application Development, the Principal Software Engineer isresponsible for developing and delivering development solutions.Essential Functions/ResponsibilitiesBe part of a fast paced, highly technical development team responsibl ...
-
Principal Software Engineer
1 week ago
Bright Horizons Newton, United StatesPrimary Purpose · Reporting to the Senior Director, IT Application Development, the Principal Software Engineer is responsible for developing and delivering development solutions. · Essential Functions/Responsibilities · Be part of a fast paced, highly technical development team ...
-
Stratacuity: Proven Scientific Placement Newton, United StatesLocation: Newton, MA (Onsite) · Science: Monoclonal antibody therapeutics · Job Overview: The Associate Director will lead CMC development activities and collaborate with internal and external partners to oversee and plan CMC development, supporting company growth towards clinica ...
-
Director, Environmental Health
12 hours ago
Boston College Newton, United StatesBoston College Introduction · Founded in 1863, Boston College is a Jesuit, Catholic university located six miles from downtown Boston with an enrollment of 9,484 full-time undergraduates and 5,250 graduate and professional students. Ranked 35 among national universities, Boston C ...
-
Senior Information Security Engineer
1 day ago
Boston College Newton, United States**Boston College Introduction**Founded in 1863, Boston College is a Jesuit, Catholic university located six miles from downtown Boston with an enrollment of 9,484 full-time undergraduates and 5,250 graduate and professional students. Ranked 35 among national universities, Boston ...
-
Senior Information Security Engineer
12 hours ago
Boston College Newton, United States**Boston College Introduction** · Founded in 1863, Boston College is a Jesuit, Catholic university located six miles from downtown Boston with an enrollment of 9,484 full-time undergraduates and 5,250 graduate and professional students. · Ranked 35 among national universities, ...
-
Director, Application Development
2 weeks ago
Bright Horizons Children's Centers Newton, United StatesThe Director, Application Development is responsible for providing management for applications development & support for application sets across emerging Lines of Business (LOB). · Bright Horizons is a leading education and care company, trusted by families and employers around ...
-
Director, Application Development
2 weeks ago
Bright Horizons Children's Centers Newton, United StatesThe Director, Application Development is responsible for providing management for applications development & support for application sets across emerging Lines of Business (LOB). · Bright Horizons is a leading education and care company, trusted by families and employers around t ...
-
Director, Application Development
6 days ago
Bright Horizons Family Solutions Newton, United States Full timeThe Director, Application Development is responsible for providing management for applications development & support for application sets across emerging Lines of Business (LOB). · Bright Horizons is a leading education and care company, trusted by families and employers around t ...
Site Reliability Engineering Director - Newton, United States - Bright Horizons
Description
The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance complex systems and applications' performance, scalability, and reliability. The Director of SRE is responsible for developing and implementing strategies to optimize our technologys reliability and uptime, managing incident response, and ensuring consistent use of best practices in automation, monitoring, and incident management. This role requires a deep understanding of cloud technologies, distributed systems, DevOps, Software Engineering, Automation / Scripting, Observability, App Support / Monitoring, and a proactive approach to preventing and mitigating potential issues. The Director of SRE must also foster a culture of innovation, continuous improvement, and collaboration within the team to meet the organization's evolving needs and deliver a superior digital experience to users.
What you will be doing:
What we hope you will bring to this role?