- Serve as technical lead for the implementation and operation of cloud-based infrastructure and platform including EKS and other AWS services supporting direct to consumer APIs and solving associated thundering herd problems including load testing, scaling up and scaling back down again.
- Work closely with Video & Player Engineering and 3rd party teams to help design and implement scalability, cost visibility and observability in the platform.
- Help to mentor and train less senior members of the team
- Assist with product/technology selection including evaluating maturity, support and design and implementation of POCs.
- Work with the Director, Site Reliability Engineering to foster a culture of learning and continuous improvement, help to conceptualize and visualize workflows and processes.
- Perform post-incident analysis to identify root causes and potential workarounds/solutions.
- Be fluid and open to change and evolving processes and tools.
- Other duties as assigned.
- Expert with EKS, Kubernetes and AWS including IAM, auto scaling, networking and load balancing/request routing.
- Proven experience with solving scalability problems both up and down including thundering herd scenarios.
- Expert with troubleshooting and root cause analysis
- Expert with at least 2 programming languages
- Strong analytical skills
- Strong communication skills, both verbal and written
- Proven experience with building deployment pipelines and enabling self-service.
- Strong teamwork and willingness to collaborate with others.
- Proven experience with training and mentoring engineers
- BS or equivalent
- AWS Solutions Architect Professional certification
-
Principle SRE
3 weeks ago
Avesta Computer Services Los Angeles, United StatesJob Title: Principal SRE · Location: Tempe, Arizona / Los Angeles, California, United States · Type: Fulltime · Job Description: · Our clients stands as a beacon of innovation, crafting world-class, large scale digital products that redefine the entertainment experience. We're on ...
-
DevOps Manager
3 weeks ago
First American Financial Corporation Santa Ana, United States**Who We Are**: · **What We Do**: · **What You'll Do** · - Collaborate: Partner with engineering teams to improve the reliability, scalability, and operability of solutions being developed through DevOps best practices. You will also collaborate with Solution/Infrastructure/Data ...
-
Site Reliability Engineer/Ops
3 weeks ago
Culver Careers Los Angeles, United StatesI'm recruiting for a $7B International technology manufacturer. Their products help people stay safe and connected. They are building a centralized global product hub in Orange County for product R&D/ Operations/Administrative/SaaS. The new office will be pivotal in propelling th ...
-
Linux Site Reliability Engineer
1 week ago
Beacon Hill Los Angeles, United StatesTo Apply for this Job Click Here · NOTE: This is a Hybrid position in Los Angeles · Linux Site Reliability Engineer (SRE) · If you're passionate about Linux, cloud infrastructure, and contributing to open-source projects, you've come to the right place. One of our clients in Lo ...
-
Linux Site Reliability Engineer
3 weeks ago
Dice Los Angeles, United StatesDice is the leading career destination for tech experts at every stage of their careers. Our client, Beacon Hill Staffing Group, is seeking the following. Apply via Dice today · NOTE: This is a Hybrid position in Los Angeles · Linux Site Reliability Engineer (SRE) · If you're ...
-
Site Reliability Engineer
2 weeks ago
City National Bank Los Angeles, United States Full timeSITE RELIABILITY ENGINEER WHAT IS THE OPPORTUNITY? As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of CNB systems ...
-
Site Reliability Engineer
2 weeks ago
City National Bank Los Angeles, United StatesOverview: · SITE RELIABILITY ENGINEER WHAT IS THE OPPORTUNITY? As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime o ...
-
Site Reliability Engineer
1 week ago
City National Bank Los Angeles, United StatesSITE RELIABILITY ENGINEER · WHAT IS THE OPPORTUNITY? · As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of CNB s ...
-
Linux Site Reliability Engineer
1 week ago
Beacon Hill Los Angeles, United StatesNOTE: This is a Hybrid position in Los Angeles · Linux Site Reliability Engineer (SRE) · If you're passionate about Linux, cloud infrastructure, and contributing to open-source projects, you've come to the right place. One of our clients in Los Angeles area is looking for Linux S ...
-
Senior Cloud Engineer
2 weeks ago
Chabez Tech Los Angeles, United StatesJob Description · Job DescriptionCompany Description · Job Title: Senior Cloud Engineer (AI/ML) · Location: Los Angeles, CA · Contract: 12++ Months · Short Requirement : Any cloud engineer with AI/ML knowledge is the perfect fit for the below job description. · Job Description: · ...
-
Senior Cloud Engineer
1 week ago
Chabez Tech Los Angeles, United StatesCompany Description · Job Title: Senior Cloud Engineer (AI/ML) · Location: Los Angeles, CA · Contract: 12++ Months · Short Requirement : Any cloud engineer with AI/ML knowledge is the perfect fit for the below job description. · Job Description: · We are seeking a highly skil ...
-
Supervisor, Service Reliability
2 weeks ago
Riot Games Los Angeles, United StatesThe Riot Operations Center (ROC) manages the 24x7 monitoring and response components of Riot's player-facing services. We are the first line of defense when things go wrong with any of Riot's live services. We leverage technical familiarity with best-practice processes to rapidly ...
-
Site Reliability Engineer Senior
3 weeks ago
Jobs for Humanity Los Angeles, United StatesCompany Description · Jobs for Humanity is collaborating with FIS Global to build an inclusive and just employment ecosystem. We support individuals coming from all walks of life. · Company Name: FIS Global · Job Description · Position Type : · Full time · Type Of Hire : ...
-
Supervisor, Service Reliability
1 week ago
Riot Games Los Angeles, United StatesSupervisor, Service Reliability - Live Operations, Riot Operations Center · Job Id: REQ · The Riot Operations Center (ROC) manages the 24x7 monitoring and response components of Riot's player-facing services. We are the first line of defense when things go wrong with any of Rio ...
-
Senior DevOps Engineer
2 weeks ago
Kforce Los Angeles, United StatesResponsibilities · Kforce has a client in Los Angeles, CA that is seeking a Senior DevOps Engineer to join their agile development team, focusing on service delivery, reliability, scalability, and infrastructure-as-code. This company values individuals who can create and maintai ...
-
Platform Engineer
2 weeks ago
Smile Los Angeles, United StatesPlatform Engineer - Platform Guild · is the world's largest loyalty platform, providing easy-to-use reward programs that help to scale ecommerce brands and transform one-time sales into repeat, loyal customers. Over 100,000 brands use Smile to turn transactional purchases into p ...
-
Staff Security Engineer
3 weeks ago
Incode Technologies Los Angeles, United StatesThe Opportunity · We seek a trustworthy and proactive · Staff Security Engineer · as the technical thought leader and driver of holistic security operations across Incode. As an early security hire at Incode, you will work across the security operations lifecycle for detection ...
-
Cloud DevOps Engineer
3 weeks ago
Zoom Corporation Los Angeles, United StatesWhat You Can Expect · Zoom is looking for Cloud DevOps Engineers to join our Zoom for Government organization. You will design and maintain scalable cloud infrastructure, and implement best practices for CI/CD, IaC, logging, monitoring, and automation. You'll partner with our in ...
-
Senior DevOps Engineer
2 weeks ago
NTWRK Los Angeles, United StatesCompany And Culture · Created in 2002 by Marc Eck?, Complex is a leading global youth entertainment network showcasing the evolution of major pop culture categories, including streetwear and style, music, sneakers, and sports. Complex is a juggernaut in the content and culture s ...
-
Senior DevOps Engineer
2 weeks ago
Complex NTWRK Los Angeles, United StatesCompany And Culture · Created in 2002 by Marc Eckō, Complex is a leading global youth entertainment network showcasing the evolution of major pop culture categories, including streetwear and style, music, sneakers, and sports. Complex is a juggernaut in the content and culture sp ...
Principle SRE - Los Angeles, United States - Avesta Computer Services
Description
Job Title
Location
Type:
Job Description:
Our clients stands as a beacon of innovation, crafting world-class, large scale digital products that redefine the entertainment experience. We're on the lookout for visionary individuals to join our pioneering team, tasked with shaping the future of streaming products. Now is your chance to be part of creating and delivering extraordinary digital experiences spanning Sports and Entertainment. As a key member of our team, you'll drive innovation and significantly contribute to our mission of pioneering the next generation of streaming products. Your opportunity to create unparalleled fan experiences for these iconic sports events is here. Our current advanced digital solutions, accessed by millions across web, mobile, and living room devices, signify just the start of our ambitious journey.
About The Role:
Our client is hiring a Principal SRE to build and operate infrastructure and platforms to support APIs around our live direct to consumer APIs for major live events such as the Super Bowl, World Cup, and World Series. The principal engineer will be the technical lead for solving thundering herd problems including partnering with the application team to load test, scale up and scale back down again and help design the platform and infrastructure to meet their needs.
A collaborative, peacemaker mindset is a must while fostering a culture of learning and continuous improvement for the entire team. The principal engineer will additionally work with the Director, Platform Engineering to visualize workflows, and refine processes and policies to keep the team throughput high.
A Snapshot of Your Responsibilities:
What You Will Need:
Nice To Have, But Not a Deal breaker: