- Responding to any critical incidents and ticket escalations.
- Following and documenting our post incident response/post mortem processes.
- Executing planned patching or improving related automation Engineering to reduce toil, tune alerts, and improve documentation
- Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems.
- Analyzing our AWS infrastructure and related applications/services for design and architectural opportunities to improve overall reliability and cost intelligence.
- Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved.
- Analyzing incident data to determine the next opportunity to improve reliability.
- Influencing engineers to improve application reliability and scalability to run efficiently.
- Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation.
- Improve operational processes (such as deployments and upgrades) to make them as boring as possible
- Proven experience triaging and debugging distributed systems on cloud infrastructure Proven experience in designing and engineering CI/CD pipelines within K8S and legacy ecosystems.
- Experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches.
- Experience in designing and engineering monitors, dashboards, and synthetic testing.
- Experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible.
- Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale.
- Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent.
- Good verbal and written communication skills
- SRE Certified in multiple categories.
- AWS Certified in multiple categories.
- Experience with Datadog Cloud Monitoring.
- Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design.
- Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions.
- Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
-
Site Reliability Engineer
3 days ago
Tipico - North America Denver, United States Full timeJob Description · We are looking for someone who will be instrumental in ensuring the reliability of our organization's Sports Betting and Casino applications: · Key duties and responsibilities include: · Approaching operations as a Software Engineering challenge applying softw ...
-
Site Reliability Engineer
11 hours ago
AXON-Networks Denver, United StatesAXON Networks delivers a robust AI-driven, analytics-based orchestration platform and a wide portfolio of next-gen high-speed routers that leverage the newest Wi-Fi technologies. Together, these technologies give ISPs the ability to manage and troubleshoot their networks in real ...
-
Site Reliability Engineer
2 days ago
Tipico - North America Denver, Colorado, United States PermanentJob Description · We are looking for someone who will be instrumental in ensuring the reliability of our organization's Sports Betting and Casino applications: · Key duties and responsibilities include: · Approaching operations as a Software Engineering challenge applying softw ...
-
Site Reliability Engineer
2 days ago
Autodesk Denver, United StatesJob Requisition ID # · 24WD76392 · Position Overview · Autodesk is a global leader in 3D design, engineering, and entertainment software. The work we do at Autodesk touches nearly every person on the planet. By creating software tools for making buildings, machines, and even the ...
-
Associate Reliability Engineer
4 days ago
AES Corporation Denver, United StatesAre you ready to be part of a company that's not just talking about the future, but actively shaping it? Join The AES Corporation (NYSE: AES), a · Fortune 500 company · that's leading the charge in the global energy revolution. With operations spanning · 14 countries , AES is ...
-
Site Reliability Engineer
14 hours ago
Tipico - North America Denver, United StatesJob Description · Job DescriptionCompany Description · Founded in Europe in 2004, Tipico is now a licensed U.S. Sportsbook operating in New Jersey, Iowa, Ohio, and Colorado. Renowned in Germany and globally, Tipico offers online betting across 30 sports. Guided by values such as ...
-
Site Reliability Engineer @
11 hours ago
Tilt Denver, United StatesSite Reliability Engineer @ Tilt · Salary, Exempt - Remote (USA) · Tilt (check us out here) is looking for a Site Reliability Engineer (SRE) to join our team and help us scale our business by ensuring the reliability, scalability, and performance of our systems and services. · ...
-
Product Reliability Engineer
1 day ago
Palantir Technologies Denver, United StatesA World-Changing Company · Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missi ...
-
Site Reliability Engineer
6 hours ago
Tipico - North America Denver, United StatesCompany Description · Founded in Europe in 2004, Tipico is now a licensed U.S. Sportsbook operating in New Jersey, Iowa, Ohio, and Colorado. Renowned in Germany and globally, Tipico offers online betting across 30 sports. Guided by values such as innovation and inclusion, Tipico ...
-
Site Reliability Engineer
11 hours ago
Cisco Denver, United States#WeAreCisco and we're so happy you're thinking of joining us. Follow us on social · @WeAreCisco to learn more about what employees say about why we love where we work, or check Cisco out on Glassdoor for the latest reviews. · What You'll Do · Think back on the latest significa ...
-
Site Reliability Engineering
1 hour ago
Blue Origin Denver, United StatesAt Blue Origin, we envision millions of people living and working in space for the benefit of Earth. We're working to develop reusable, safe, and low-cost space vehicles and systems within a culture of safety, collaboration, and inclusion. Join our diverse team of problem solvers ...
-
Associate Reliability Engineer
1 week ago
AES Corporation Denver, United StatesAre you ready to be part of a company that's not just talking about the future, but actively shaping it? Join The AES Corporation (NYSE: AES), a Fortune 500 company that's leading the charge in the global energy revolution. With operations spanning 14 countries, AES is committed ...
-
Site Reliability Engineer
11 hours ago
Fruition Denver, United StatesFruition is a leader in software development with a focus on delivering high-quality web solutions for clients across various sectors. Our projects involve a mix of content management systems, including Drupal, WordPress, and custom Python and applications. We are currently seek ...
-
Site Reliability Engineer
1 week ago
Ping Identity Denver, United StatesAt Ping Identity, we're changing the way people think about enterprise security technology. With our new Identity Defined Security platform, we're building a borderless world where people have total freedom to work wherever and however they want. Without friction. Without fear. · ...
-
Site Reliability Engineer
4 days ago
Fubo Sportsbook Denver, United StatesAt Fubo, we're good at pushing boundaries. Whether it's changing the way people watch live TV or find their next favorite show, we do it as a team — with creativity, support and really good coffee. · Who We Are · We are a diverse group of individuals, creatives, technologists, a ...
-
Site Reliability Engineer
11 hours ago
Ping Identity Denver, United StatesAt Ping Identity, we're changing the way people think about enterprise security technology. With our new Identity Defined Security platform, we're building a borderless world where people have total freedom to work wherever and however they want. Without friction. Without fear. ...
-
Senior Site Reliability Engineer
1 hour ago
Oracle Denver, United StatesOracle · Senior Site Reliability Engineer · Denver , · Colorado · Apply Now · Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Heal ...
-
Senior Site Reliability Engineer
2 weeks ago
Prove Denver, United States Full timeAbout Prove · As the world moves to a mobile-first economy, businesses need to modernize how they acquire, engage with and enable consumers. Prove's phone-centric identity tokenization and passive cryptographic authentication solutions reduce friction, enhance security and priva ...
-
Senior Site Reliability Engineer
1 week ago
PROVE Denver, United StatesTitle: Senior Site Reliability Engineer · Location: · Denver, CO · Job Summary · The Senior Site Reliability Engineer is responsible for bringing a software engineering approach to Prove operations. Using software as a tool to manage systems, solve problems, and automate oper ...
-
Senior Site Reliability Engineer
3 weeks ago
MentorMate Denver, United StatesAbout the Role · We are looking to hire a Senior Site Reliability Engineer to join an innovative project in the healthcare domain for one of the largest pharmaceutical companies in the world. The focus of the role is to enhance the reliability and performance of applications with ...
Senior Site Reliability Engineer - Denver, United States - Guidewire
![Default job background](https://contents.bebee.com/public/img/bg-user-ex-1.jpg)
Description
Guidewire is searching for a Sr. Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry's leading Analytics platform. As a member of the SRE-Analytics Team, you'll be responsible for building and evolving our SRE practice for Analytics. The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, generative automated intelligence (Gen AI), and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions. This role is a great opportunity for individuals motivated by learning cutting edge technologies and their application to solve real world business problems. Guidewire is the AWS for insurance companies that use our platforms and applications. The solutions developed by you and this team will be used by hundreds of insurance companies and impact billions of dollars in annual transactions
Downtime and failures are inevitable, but how SREs deal with the problem is what's important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments. Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.
When on-call, you will be responsible for: