Senior Technology Site Reliability Engineer - San Francisco
3 days ago

Job description
Senior Technology Site Reliability EngineerCooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operations team.
Position summary: The Senior Technology Site Reliability Engineer ("SRE") is responsible for ensuring the reliability, scalability, and performance of the firm's critical infrastructure and applications. The SRE blends software engineering with systems engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence. In addition to being technically advanced, the SRE will have a high degree of emotional intelligence and the ability to work as a team towards complex and layered objectives. Specific duties and responsibilities include, but are not limited to, the following:
Position responsibilities:
- Monitor and maintain production systems to ensure high availability and performance
- Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets
- Participate in on-call rotations and incident response, including root cause analysis and postmortems
- Develop and maintain infrastructure as code (IaC) using Terraform
- Automate deployment, scaling, and recovery processes to reduce manual intervention
- Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery
- Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)
- Proactively identify and resolve system bottlenecks and reliability risks
- Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle
- Contribute to a culture of blameless post-mortems and continuous improvement
- Document operational procedures and share knowledge across teams
- All other duties as assigned or required
Skills and experience:
Required:
- After orientation at Cooley LLP, exhibit proficiency in the Microsoft Office suite, iManage and other firm applications
- Ability to work extended and/or weekend hours, as required
- Ability to travel, as required
- 6+ years direct applicable experience (e.g. site reliability engineering or related field)
- Proficiency in Terraform and programming languages such as Python, Go, or Java
- Deep expertise in cloud platforms, particularly AWS, and container orchestration
- Strong background in distributed systems, performance tuning, and automation
- Hands-on experience with configuration management tools such as Puppet, Chef, or Salt
Preferred:
- Bachelor's Degree in Computer Science, Information Technology, Engineering, or associated discipline
- Experience working with advanced ETL data workflows including technologies such as AWS EMR, Azure Synapse, Azure Data Factory, or Apache Hive/Spark/Airflow
- Experience with IaC deployment of AKS/EKS/GKE architecture
- Experience with enterprise Data Lake environments using technologies such as DataBricks or Snowflake
Competencies:
- Expert analytical/quantitative, problem-solving, and deductive reasoning skills, experience performing advanced troubleshooting and root cause analysis of complex technical issues
- Excellent organizational, planning, and time management skills and ability to work independently and in a team environment to manage competing priorities and meet deadlines
- Advanced verbal and written communication skills with the ability to present findings, conclusions, alternatives, and information clearly and concisely
- Experience working with all levels of business professionals, management, stakeholders, and vendors with the ability to build effective relationships through trust and diplomacy
Cooley offers a competitive compensation and excellent benefits package and is committed to fair and equitable employment practices.
EOE.
The expected annual pay range for this position with a full-time schedule is $140,000 - $205,000. Please note that final offer amount will be dependent on geographic location, applicable experience and skillset of the candidate.
We offer a full range of elective benefits including medical, health savings account (with applicable medical plan), dental, vision, health and/or dependent care flexible spending accounts, pre-tax commuter benefits, life insurance, AD&D, long-term care coverage, backup care for children and/or adults and other parental support benefits. In addition to elective benefit options, benefited employees receive firm-paid life insurance, AD&D, LTD, short term medical benefits as well as 21 days of Paid Time Off ("PTO") and 10 paid holidays each year. We provide generous parental leave and fertility benefits. New employees will attend a detailed benefit orientation to learn more about our many benefits and resources.
Similar jobs
Cisco Silicon One ASICs are transforming the Future of the Internet. · Owning reliability test plans for new products. · Supporting High power Burn In, biased HAST and ESD/LU bring-up and debug for reliability qualification and evaluation. · ...
1 week ago
We are making sure that when businesses build AI agents the experience of doing so doesn't suck.Our team is a group of ex-athletes founders and builders with low egos and a high belief that life not about taking the easy road but challenging ourselves to find the most we can be. ...
1 week ago
We're building a software platform that empowers today's commercial contractors. Join our cloud infrastructure and reliability engineering team as a Site Reliability Engineer (SRE). Your primary responsibility will be to improve and protect the reliability, performance, and opera ...
3 days ago
Job summary · Our mission is to increase economic freedom in the world. It's a massive opportunity that demands the best of us every day · ,Responsibilities:Improve observability reliability and availability by defining and measuring key metrics · Build automation and improve sys ...
3 weeks ago
Mercor is creating a new category of work where expertise powers AI advancement. · Own reliability and production safety for core shared services and customer-facing systems. · Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and pr ...
1 month ago
We are looking for a Senior Site Reliability Engineer (SRE) to build the reliability foundation for a mission-critical healthcare platform. · This is not a "keep the lights on" SRE role. You'll own reliability end-to-end, · define what good looks like: SLIs, SLOs, incident respon ...
3 weeks ago
We are looking for experienced problem-solving engineers to ensure our systems scale. We seek to learn from deployment and distribute the benefits of AI while ensuring that this powerful tool is used responsibly and safely. · ...
1 week ago
+ Reliability expert to maintain and enhance the stability and scalability of our rapidly evolving infrastructure. · + Design and implement solutions to ensure the scalability of our infrastructure. · + Build and maintain load, chaos and synthetic testing software. · Job summary: ...
1 week ago
Join the engineering teams that bring OpenAI's ideas safely to the world as a Software Engineer in Reliability role at OpenAI in San Francisco. · ...
1 month ago
We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ...
1 week ago
We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · ...
1 month ago
We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. · Design and implement scalable infrastructure on Google Cloud Platform. · Own critical platform services. · ...
1 month ago
We're looking for engineers who are excited to improve the reliability of complex systems and enjoy digging into how things work. · Bring a generalist mindset and are comfortable working across infrastructure layers—from compute and networking to storage, databases, and app runti ...
1 month ago
About CodeRabbit · CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. · The Role · We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Ba ...
1 week ago
We believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. · Here, ...
1 month ago
We're hiring an SRE to join our engineering team at Plenful. · You'll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. · ...
1 month ago
We're a fully distributed team with employees across North American time zones. · We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, · minimizing downtime, and continuously improving service performance.The Site Rel ...
1 month ago
We are looking for a Systems Reliability Engineer to own the reliability, scalability and security posture of our platforms. · We run primarily on AWS; familiarity with GCP is a plus. · ...
1 week ago
+We are seeking a Site Reliability Engineer to build our observability foundation and shift how we detect, respond to and prevent system issues before they affect customers. · +Establish the foundation of Gallup's SRE function by defining standards, best practices and scalable sy ...
2 weeks ago
Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of its direct clients, recruitment ad agencies, and marketing partners. · ...
1 month ago
We are seeking a highly skilled Site Reliability Engineer to join our team. · A patient's access to life-saving treatment is directly tied to your work as the infrastructure expert who enables rapid product development and ensures 99.9%+ stability and performance of our clinical ...
5 days ago