-
Reliability Engineer
3 weeks ago
Kindeva Drug Delivery Los Angeles, United StatesMonday, May 6, 2024 · The Reliability Engineer will lead the sites Asset Reliability agenda, effectively promoting analytical problem-solving techniques and structured reliability improvement processes. · We have an immediate opening for a Reliability Engineers at Kindeva's Nort ...
-
Reliability Engineer
3 weeks ago
Kindeva Drug Delivery Company Los Angeles, United StatesThe Reliability Engineer will lead the sites Asset Reliability agenda, effectively promoting analytical problem-solving techniques and structured reliability improvement processes. · We have an immediate opening for a Reliability Engineers at Kindeva's Northridge, CA manufacturi ...
-
Reliability Engineer
2 weeks ago
The Mosaic Company Los Angeles, United StatesAre You Our Next Reliability Engineer-Multiple Level Applicants Welcome? · Join our dynamic team at the forefront of the global digital acceleration as a Reliability Engineer I, II, III or Sr to work at our Mosaic Uncle Sam Plant. The successful candidate will find opportunities ...
-
Reliability Engineer
3 weeks ago
GRN Hudson (Global Recruiters Network) Los Angeles, United StatesReliability Engineer · Supporting manufacturing operations to achieve superior equipment efficiency with minimal down time. The ideal candidate will play a crucial role in ensuring the reliability and performance of our industrial equipment through continuous improvement initiati ...
-
Reliability Engineer
1 week ago
Hargrove Engineers and Constructors Los Angeles, United StatesWho We Are · Hargrove supplies unparalleled services in engineering, procurement, construction management, and technical services in the industrial, commercial, and government sectors. With over 2,000 Teammates across the US, we build long-term support relationships in the energ ...
-
Site Reliability Engineer
3 weeks ago
BayOne Solutions Los Angeles, United StatesPosition: Site Reliability Engineer · Location: Los Angeles, CA · Duration: 6+ Months · Pay Range: $85/hr - 90/hr on W2 · Site Reliability Engineer · It is an exciting time to be part of SIE's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the inte ...
-
Site Reliability Engineer
3 weeks ago
MetroSys Inc Los Angeles, United StatesResponsibilities: · Operational Oversight: · Oversee the operation of software and services within the Traffic Front End and Coordination teams. · Ensure seamless functionality, uptime, and performance. · Networking and Traffic Expertise: · Leverage your deep understanding of n ...
-
Senior Reliability Engineer
3 weeks ago
ATR International Los Angeles, United StatesWe are seeking a Reliability Development Engineer for a very important client. · Job Overview - Principal Duties and Responsibilities · Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarization on a ...
-
Site Reliability Engineer
3 weeks ago
Epsilon3 Los Angeles, United States[Full Time] Site Reliability Engineer (Remote) at Epsilon3 (United States) | BEAMSTART Jobs · Site Reliability Engineer (Remote) · Epsilon3 United States · Date Posted · 27 Jun, 2023 · Work Location · Los Angeles, CA, United States · Salary Offered · $120000 — $180000 yearly · ...
-
Maintenance Reliability Engineer
1 week ago
Kelly Science, Engineering, Technology & Telecom Los Angeles, United StatesKelly Engineering is seeking a Maintenance Reliability Engineer in Westlake, LA to join one of our leading clients focused on the design and construction of an ethylene production facility to operate 2 billion pounds per year. · Hit Apply below to send your application for consi ...
-
Site Reliability Engineer
3 weeks ago
Standard Bank of South Africa Limited Los Angeles, United StatesBusiness Segment: Personal & Private Banking · Contribute to the resilience of Group Information Technology by improvingavailability, reliability, and performance of business-critical customer-facingsystems, whilst building sustainable capability. This complex task is delivered i ...
-
Site Reliability Engineer
3 weeks ago
SHEIN Technology LLC Los Angeles, United StatesJob Title: Site Reliability Engineer · Reports to: SRE Manager · Job Location: Los Angeles, CA · Job Status: Exempt, FT · About SHEIN · SHEIN is a global fashion and lifestyle e-retailer committed to making the beauty of fashion accessible to all. We use on-demand manufacturing t ...
-
Maintenance Reliability Engineer
2 weeks ago
Kelly Science, Engineering, Technology & Telecom Los Angeles, United StatesKelly Engineering is seeking a · Maintenance Reliability Engineer in Westlake, LA · to join one of our leading clients focused on the design and construction of an · ethylene · production facility to operate 2 billion pounds per year. · If you're looking to launch your caree ...
-
Site Reliability Engineer
3 weeks ago
BayOne Solutions Los Angeles, United StatesPosition: Site Reliability Engineer · Location: Los Angeles, CA · Duration: 6+ Months · Pay Range: $85/hr - 90/hr on W2 · Site Reliability Engineer · It is an exciting time to be part of SIEs CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the i ...
-
Site Reliability Engineer
2 weeks ago
BayOne Solutions Los Angeles, United StatesPosition: Site Reliability EngineerLocation: Los Angeles, CADuration: 6+ MonthsPay Range: $85/hr - 90/hr on W2 · Site Reliability Engineer · It is an exciting time to be part of SIE's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection ...
-
Site Reliability Engineer
1 week ago
City National Bank Los Angeles, United StatesSITE RELIABILITY ENGINEER · WHAT IS THE OPPORTUNITY? · As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of CNB s ...
-
Site Reliability Engineer
3 weeks ago
BayOne Solutions Los Angeles, United StatesPosition: Site Reliability Engineer · Location: Los Angeles, CA · Duration: 6+ Months · Pay Range: $85/hr - 90/hr on W2 · Site Reliability Engineer · It is an exciting time to be part of SIE's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the inte ...
-
Site Reliability Engineer
3 weeks ago
BayOne Solutions Los Angeles, United StatesPosition: Site Reliability Engineer · Location: Los Angeles, CA · Duration: 6+ Months · Pay Range: $85/hr - 90/hr on W2 · If your skills, experience, and qualifications match those in this job overview, do not delay your application. · Site Reliability Engineer · It is an e ...
-
Site Reliability Engineer
3 weeks ago
Adastra replica Los Angeles, United StatesJob Description · Job DescriptionOur client is looking for an experienced Site Reliability Engineer to design, operate, maintain, and scale mission-critical infrastructure and products. Products include (but are not limited to) automated Hardware-In-The-Loop (HITL) data analysis ...
-
Site Reliability Engineer
3 weeks ago
Optomi Los Angeles, United StatesOptomi, in partnership with a global leading entertainment corporation, are in search of a seasoned Site Reliability Engineer with a minimum of 4 years of experience to join their dynamic team. This opportunity would be located in either San Francisco, California, or Seattle, Was ...
Senior Site Reliability Engineer - Los Angeles, United States - NVIDIA
Description
We are now looking for a Sr.Site Reliability Engineer (SRE) NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years.
It's a unique legacy of innovation that's motivated by outstanding technology and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing.An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.
NVIDIA is at the forefront of generative AI models, from language to images. Doing what's never been done before takes vision, innovation, and the world's best talent.As an NVIDIAN, you'll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.
NVIDIA is looking for a Senior Site Reliability Engineer (SRE) to join its cloud service team for supporting, triaging, and building generative AI-powered visual applications.
As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems.
We live SRE practices that are key to product quality, such as limiting time spent on reactive operational work, blameless postmortems, proactive identification of potential outages, and iterative improvements, which all make for interesting and dynamic day-to-day work.
The person in this position will be responsible for Service Response and workflow and will drive tools/service development to maintain and improve service SLOs.
We partner with Service Owners to drive the reliability of the service.What You Will Be Doing
Support and work on groundbreaking Generative AI inferencing workloads running in a globally-distributed heterogeneous environment spanning 60+ edge locations plus all major cloud service providers.
Collaborate closely with the service owner, architecture, research, and tools teams at NVIDIA to achieve ideal results for AI problems at hand.
Monitoring & supporting critical high-performance, large-scale services running multi-cloud.Participate in the triage & resolution of complex infra-related issues.
Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice balanced incident response and blameless postmortems.
Be part of an on-call rotation to support production systems.
Lead significant production improvement around tooling, automation, and process.
Architect, design, and code using your expertise to optimize, deploy and productize services.
What We Need To See
8+ years of experience operating & owning end-to-end availability and performance of mission-critical services in a live-site production environment, either as an SRE or Service Owner.
Solid understanding of containerization and microservices architecture, K8s. Excellent understanding of the Kubernetes ecosystem and best practices with K8s.
Ability to dissect complex problems into simple sub-problems and use available solutions to resolve them.
Technical leadership beyond development that includes scoping, requirements capturing, leading and influencing multiple teams of engineers on broad development initiatives.
Lead significant production activities, including change management, post-mortem reviews, workflow processes, software design, and delivering software automation in various languages (Python, or Go ) and technologies (CI/CD auto-remediation, alert correlation).
Best in understanding SLO/SLIs, error budgeting, KPIs, and configuring for highly complex services.Experience with the ELK and Prometheus stacks as a power user and administrator.
Excellent understanding of cloud environments and technologies, especially AWS, Azure, GCP, or OCI.
Proven strengths in identifying, mitigating, and root-causing issues while continuously seeking ways to drive optimization, efficiency, and the bottom line.
Ways To Stand Out From The CrowdExposure to containerization and cloud-based deployments for AI models.
Excellent coding:
Python, Go (Any similar language).
Understanding of Deep Learning / Machine Learning / AI.
Experience with Cuda, PyTorch, TensorRT, TensorFlow, and/or Triton.
Excellent communication, presentation, social, and analytical skills; the ability to communicate complex concepts clearly and persuasively across different audiences and varying levels of the organization.
NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.The base salary range is 164,000 USD - 316,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.
As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
#J-18808-Ljbffr