- Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure.
- Incident Management: Play an active role in production on-call, responding swiftly to troubleshoot and resolve production issues. Minimize service disruptions and downtime by conducting thorough triaging and debugging of product or system issues. Continuously optimize the on-call process for sustainability and efficiency.
- Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
- Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
- Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
- Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
- Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable Site Reliability Engineer (SRE) insights during launch reviews, influencing and enhancing system architecture.
- Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
- 3+ years of professional experience maintaining production systems on Cloud based services and infrastructure.
- 8+ years of software development experience in one or more programming languages with a primary focus on leveraging, working on cloud-based services and infrastructure.
- Strong knowledge of AWS cloud platform
- Practical experience with containerization technologies, including Docker and Kubernetes.
- Familiarity with Python, Bash scripting and Ansible
- Familiarity with infrastructure as code tools like Terraform is essential.
- Solid understanding of databases, networking, security principles, and best practices.
- Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively.
- AWS Certifications (such as Solutions Architect, Security, etc.)
- Experience in a regulated industry or healthcare field
-
Reliability Engineer
2 weeks ago
Mainspring Energy, Inc. Menlo Park, United StatesJob Description · Job Description · Company Overview · Driven by our vision of the affordable, reliable, net-zero carbon grid of the future, Mainspring has developed a new category of power generation — the linear generator — that delivers local, scalable, and fuel-flexible pow ...
-
Site Reliability Engineer
1 week ago
Rubrik Palo Alto, United StatesMust be a US CItizen in order to be considered for this role - This is FedRamp requirement. · Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's infrastructure services run smoothly and have the capacity for future growth. · As a Site Rel ...
-
Site Reliability Engineer
3 weeks ago
Aptos Palo Alto, United StatesAptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way. · Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ...
-
Site Reliability Engineer
3 weeks ago
Mediaocean Palo Alto, United StatesMediaocean is powering the future of the advertising ecosystem with technology that empowers brands and agencies to deliver impactful omnichannel marketing experiences. With over $200 billion in annualized ad spend running through its software products, Mediaocean deploys AI and ...
-
Site Reliability Engineer
2 weeks ago
C3 AI Inc. Redwood City, United States, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI ...
-
Site Reliability Engineer
8 hours ago
Insight Global Redwood City, United StatesJob Description · Insight Global is looking for a skilled Site Reliability Engineer (SRE) to work remotely in Peru or Guatemala for a large AAA game employer on a 9-12 month contract. You will be working within the Production Infrastructure & Engineering (PI&E) organization that ...
-
Site Reliability Engineer
3 weeks ago
C3 AI Redwood City, United States, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI ...
-
Senior Engineering Manager, Reliability
1 day ago
Robinhood Menlo Park, United StatesJoin a leading fintech company that's democratizing finance for all. · Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to fin ...
-
Staff Site Reliability Engineer
3 weeks ago
GRAIL, Inc. Menlo Park, United StatesGRAIL is a healthcare company whose mission is to detect cancer early, when it can be cured. GRAIL is focused on alleviating the global burden of cancer by developing pioneering technology to detect and identify multiple deadly cancer types early. The company is using the power o ...
-
Staff Site Reliability Engineer
1 week ago
GRAIL, Inc. Menlo Park, United StatesGRAIL is a healthcare company whose mission is to detect cancer early, when it can be cured. GRAIL is focused on alleviating the global burden of cancer by developing pioneering technology to detect and identify multiple deadly cancer types early. The company is using the power o ...
-
Database Site Reliability Engineer
3 weeks ago
Robinhood Menlo Park, United StatesJoin a leading fintech company that's democratizing finance for all. · Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to fin ...
-
Site Reliability Engineer
1 week ago
Box Redwood City, United StatesWHAT IS BOX? · Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, coll ...
-
Manager, Site Reliability Engineering
2 days ago
Plume Design Inc Palo Alto, United StatesWe're looking for a seasoned Technical Manager, experienced with Customer Facing environments, to Captain our Site Reliability Engineering Team. This team is focused on deployments, fixes, and sustainability. The right candidate needs to have strong technical knowledge in key are ...
-
Sr. Field Reliability Engineer
1 week ago
Rivian Palo Alto, United StatesAbout Rivian: · Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract. · As a company, we constantly challenge whats possible, never simply accept ...
-
Senior Site Reliability Engineer
1 week ago
Velocity Global, LLC Palo Alto, United StatesPOSITION SUMMARY: · Velocity Global seeks a Senior Site Reliability Engineer (SRE) with extensive observability experience. In this role, you will help to lead the automation and support efforts of our cloud Infrastructure, identify strategies to improve our full-stack telemetry ...
-
Staff Site Reliability Engineer
1 week ago
General Motors Palo Alto, United StatesJob Description · Software-defined vehicles represent a new paradigm for automakers and consumers, fueled by technological advancements and an escalating demand for transportation solutions that are not only intelligent but also safer and more environmentally sustainable. At the ...
-
Senior Site Reliability Engineer
1 week ago
Rubrik Job Board Stanford, United StatesSenior Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's infrastructure services run smoothly and have the capacity for future growth. · As a Senior Site Reliability Engineer, you will be responsible for: · Ensure we maintain high avai ...
-
Senior Site Reliability Engineer
4 weeks ago
Assured Palo Alto, United StatesJob Description · Job DescriptionAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It's currently highly manual, involving phone calls, faxes, and gut instinct—cost ...
-
Sr. Site Reliability Engineer
1 week ago
Rubrik Palo Alto, United StatesMust be a US CItizen - This is a FedRamp Requirement for this role. · Sr. Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's infrastructure services run smoothly and have the capacity for future growth. · As a Sr. Site Reliability Enginee ...
-
Senior Site Reliability Engineer
1 week ago
Plume Design Inc Palo Alto, United StatesLife at Plume · At Plume, we believe that technology isn't about moving faster, it's about making life's moments better. Which is why we've built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, a ...
Staff Site Reliability Engineer #3718 - Menlo Park, United States - Grail
![Default job background](https://contents.bebee.com/public/img/bg-user-ex-1.jpg)
Description
GRAIL is a healthcare company whose mission is to detect cancer early, when it can be cured.GRAIL is focused on alleviating the global burden of cancer by developing pioneering technology to detect and identify multiple deadly cancer types early.
The company is using the power of next-generation sequencing, population-scale clinical studies, and state-of-the-art computer science and data science to enhance the scientific understanding of cancer biology, and to develop its multi-cancer early detection blood test.
GRAIL is headquartered in Menlo Park, CA with locations in Washington, D.C., North Carolina, and the United Kingdom. GRAIL, LLC is a wholly-owned subsidiary of Illumina, Inc.(NASDAQ:
ILMN). For more information, please visit
GRAIL is seeking a Staff Software Engineer in our Site Reliability Engineering (SRE) team to help us improve security and reliability of production systems that are critical for our mission to detect cancer early and save lives.
You will contribute to the architecture, design, development, implementation, and be responsible for secure, healthy, and reliable operation of critical cloud-based infrastructure, services, and applications.
You are someone who enjoys learning and implementing best industry technology trends and practices. You foster and contribute to the creative and collaborative culture to deliver results. You embrace ambiguity and enjoy exploring new technologies delivering robust, scalable solutions.This is a hybrid role and requires you to be onsite 2 days a week in Menlo Park, CA
Responsibilities
Based on the role, colleagues may be eligible to participate in an annual bonus plan tied to company and individual performance, or an incentive plan.
In addition, GRAIL offers a progressive benefit package, including flexible time-off, a 401k with a company match, and alongside our medical, dental, vision plans, carefully selected mindfulness offerings.
GRAIL is an Equal Employment Employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability or any other legally protected status.
We will reasonably accommodate all individuals with disabilities so that they can participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment.
Please contact us to request accommodation. GRAIL maintains a drug-free workplace.