-
Reliability Engineer
1 week ago
Natron Energy Santa Clara, United StatesNatron is seeking a Reliability Engineer to support the development and test of our high-power battery systems for data center UPS and EV charging applications. The occupant of this position will work with the Product Engineering, Reliability, Technology, and Operations teams to ...
-
Reliability Engineer
4 weeks ago
Comtech Telecom Santa Clara, United StatesComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...
-
Reliability Engineer
3 weeks ago
COMTECH TELECOMMUNICATIONS Santa Clara, United StatesJob Description · Job DescriptionComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, pr ...
-
Reliability Engineer
3 days ago
Advanced Micro Devices , Inc. San Jose, United StatesOverview: · WHAT YOU DO AT AMD CHANGES EVERYTHING · We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building bloc ...
-
Lead Reliability Engineer
1 week ago
Celestial AI Santa Clara, United StatesAbout Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
-
Lead Reliability Engineer
2 weeks ago
Celestial AI Santa Clara, United StatesAbout Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
-
Reliability Engineer
7 hours ago
Apple Cupertino, United StatesSummary · Posted: Apr 13, 2024 · Weekly Hours: · 40 · Role Number: · Do you ever wonder what goes into making Apple products an amazing user experience? Apple's innovative reliability team is responsible for insuring that our products exceed our customer's expectations for r ...
-
Reliability Engineer
3 days ago
Apple Cupertino, United StatesReliability Engineer · Cupertino,California,United States · Hardware · Do you ever wonder what goes into making Apple products an amazing user experience? Apples innovative reliability team is responsible for insuring that our products exceed our customers expectations for rob ...
-
Site Reliability Engineer
1 week ago
TEKsystems San Jose, United States ContractDescription: · Adobe is looking for an experienced Site Reliability Engineer to join the internal tooling team support, configure, integrate, upgrade, and automate the use of enterprise tools used across their large Engineering organization. Role will be focused on user interact ...
-
Service Reliability Engineer
2 weeks ago
Software Technology, Inc Santa Clara, United StatesJob Description · Job DescriptionPosition : Service Reliability Engineer / Sr. Devops Engineer · Location : Santa Clara, CA · Duration : 1 Year + · OK with any visa No OPT please · Local consultants only · Customer will not provide letter for H1B candidates. Please check with t ...
-
Site Reliability Engineer
3 weeks ago
HCLTech San Jose, United StatesAbout HCLTech: · HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients ac ...
-
Site Reliability Engineer
3 weeks ago
Cryptoware Technologies Inc Santa Clara, United StatesJob Description · Job DescriptionResponsibility · • Lead the effort of global expansion of Huobi globe spanning infrastructure. · • Work with engineering teams to make sure new features and changes are deployed quickly and safely. · • Constantly improve our system performance and ...
-
Senior Reliability Engineer
2 weeks ago
ServiceNow Santa Clara, United StatesCompany Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joining ServiceNow, you are part of an ...
-
Reliability Engineer
2 weeks ago
Comtech Telecom Santa Clara, United States Full time RegularComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...
-
Principal Site Reliability Engineer
2 weeks ago
Kofi Group Santa Clara, United States Direct HireTo Apply for this Job Click Here · Principal Site Reliability Engineer · San Francisco Bay Area, CA · We are partnering with a late-stage Cloud Security company that is looking for a Principal Level SRE · The ideal candidate will have: · Strong sense of architecture and design f ...
-
Principal Site Reliability Engineer
5 days ago
Palo Alto Networks Santa Clara, United StatesOur Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...
-
Sr Site Reliability Engineer
5 days ago
Palo Alto Networks Santa Clara, United StatesOur Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...
-
Site Reliability Engineer
1 week ago
Lawrence Harvey Sunnyvale, United StatesSite Reliability Engineer · Status: Full Time · Compensation: 120k to 145k · Hybrid Requirements: 3 days in office, 2 days remote · Lawrence Harvey has partnered with a leading Chinese fintech startup that is committed to democratizing payment services and empowering people and ...
-
Site Reliability Engineer
1 week ago
Advantis Global is now INSPYR Solutions Sunnyvale, United StatesABOUT THIS FEATURED OPPORTUNITY · The QoS Infrastructure Tools Team is responsible for building and maintaining tools that are essential for Site Reliability Engineers (SREs) and engineers across the organization. The team primarily develops applications using Golang for backend ...
-
Electrical Reliability Engineer
1 week ago
MRINetwork Jobs San Jose, United StatesJob Description · Job Description · We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technol ...
Service Reliability Engineer - Santa Clara, United States - Software Technology Inc
Description
Job DescriptionJob Description
Position :
Service Reliability Engineer / Sr. Devops Engineer
Location :
Santa Clara, CA
Duration : 1 Year +
OK with any visa No OPT please
Local consultants only
Customer will not provide letter for H1B candidates. Please check with the candidate and employers before submitting the resume. Face to face is mandatory so please submit local candidates only.
Responsibilities:
Development and Operations (DevOps) subject matter expert for 24x7 SaaS operation
Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's next generation TV platforms.
Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.
Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scalePoint of
escalation/decision
maker on response level of incidents
Participate in the Core SRE on-call roster and respond with command and control incident management during
High Pri Events
while maintaining internal and external SLAs
Act as
Technical Duty Officer
who leads resolution effort of the most complex service problems from network layer to the application at scale
Drive Problem
Management/Retrospectives
("post mortems")
Strong contribution and maintenance of our knowledge base
Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.
Contribute to the future growth of the team by conducting candidate screenings and assessments
Accountable for deploying services to production environments
Technologies:
Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.
Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on
Azure/AWS/Openstack
Azure, Openstack and AWS concepts and APIs
Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc
Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET
Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs
Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git
Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.
Qualifications:
Bachelor's Degree in CS, MIS, or equivalent experience
5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;
Solid communications skills both written and verbal.
Able to effectively tailor messaging to different audiences:
External Customer, Leadership, technical SME, or to Tier-1
Previous experience in customer facing roles during high stress situations
Demonstrated skills as an influencer within a previous organization
In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus
In-depth knowledge of business operations, objectives, and strategies.
Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).
#J-18808-Ljbffr