Service Reliability Engineer - Santa Clara, United States - Software Technology, Inc

Software Technology, Inc Santa Clara, United States

1 week ago

Description

Job Description

Position : Service Reliability Engineer / Sr. Devops Engineer

Location : Santa Clara, CA

Duration : 1 Year +

OK with any visa No OPT please

Local consultants only

Customer will not provide letter for H1B candidates. Please check with the candidate and employers before submitting the resume. Face to face is mandatory so please submit local candidates only.

Responsibilities:

Development and Operations (DevOps) subject matter expert for 24x7 SaaS operation

Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's next generation TV platforms.

Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.

Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale

Point of escalation/decision maker on response level of incidents

Participate in the Core SRE on-call roster and respond with command and control incident management during High Pri Events while maintaining internal and external SLAs

Act as Technical Duty Officer who leads resolution effort of the most complex service problems from network layer to the application at scale

Drive Problem Management/Retrospectives ("post mortems")

Strong contribution and maintenance of our knowledge base

Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.

Contribute to the future growth of the team by conducting candidate screenings and assessments

Accountable for deploying services to production environments

Technologies:

Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.

Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on Azure/AWS/Openstack

Azure, Openstack and AWS concepts and APIs

Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc

Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET

Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs

Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git

Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.

Qualifications:

Bachelor's Degree in CS, MIS, or equivalent experience

5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;

Solid communications skills both written and verbal. Able to effectively tailor messaging to different audiences: External Customer, Leadership, technical SME, or to Tier-1

Previous experience in customer facing roles during high stress situations

Demonstrated skills as an influencer within a previous organization

In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus

In-depth knowledge of business operations, objectives, and strategies..

Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).

Reliability Engineer

4 days ago

Natron Energy Santa Clara, United States

Natron is seeking a Reliability Engineer to support the development and test of our high-power battery systems for data center UPS and EV charging applications. The occupant of this position will work with the Product Engineering, Reliability, Technology, and Operations teams to ...
Reliability Engineer

2 weeks ago

Comtech Telecom Santa Clara, United States

Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...
Reliability Engineer

3 weeks ago

Comtech TCS Santa Clara, United States

Job Description · Job Description · Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a · Reliability/Failure · Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside cu ...
Reliability Engineer

1 week ago

Comtech Telecom Santa Clara, United States Full time Regular

Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...
Reliability Engineer

2 weeks ago

COMTECH TELECOMMUNICATIONS Santa Clara, United States

Job Description · Job DescriptionComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, pr ...
Reliability Engineer

3 days ago

Analog Devices San Jose, United States

Come join Analog Devices (ADI) – a place where Innovation meets Impact. For more than 55 years, Analog Devices has been inventing new breakthrough technologies that transform lives. At ADI you will work alongside the brightest minds to collaborate on solving complex problems that ...
Lead Reliability Engineer

1 week ago

Celestial Services Santa Clara, United States

Job Description: · We are looking for a Lead Reliability Engineer to spearhead reliability efforts specifically tailored for datacenter and high-performance computing (HPC) applications. The ideal candidate will have a strong background in reliability engineering with a focus on ...
Site Reliability Engineer

3 weeks ago

NVIDIA Santa Clara, United States

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology—and outstanding people. Today, we're tapping into the unlimited potential of AI to define the next era ...
Lead Reliability Engineer

1 week ago

Celestial AI Santa Clara, United States

About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
Lead Reliability Engineer

4 days ago

Celestial AI Santa Clara, United States

About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
Site Reliability Engineer

3 weeks ago

Cryptoware Technologies Inc Santa Clara, United States

Job DescriptionJob Description · Responsibility · • Lead the effort of global expansion of Huobi globe spanning infrastructure. · • Work with engineering teams to make sure new features and changes are deployed quickly and safely. · • Constantly improve our system performance ...
Service Reliability Engineer

2 weeks ago

Software Technology Inc Santa Clara, United States

Job Description · Job Description · Position : Service Reliability Engineer / Sr. Devops Engineer · Location : Santa Clara, CA · Duration : 1 Year + · OK with any visa No OPT please · Local consultants only · Customer will not provide letter for H1B candidates. Please check wi ...
Site Reliability Engineer

2 weeks ago

Cryptoware Technologies Inc Santa Clara, United States

Job Description · Job DescriptionResponsibility · • Lead the effort of global expansion of Huobi globe spanning infrastructure. · • Work with engineering teams to make sure new features and changes are deployed quickly and safely. · • Constantly improve our system performance and ...
Senior Reliability Engineer

1 week ago

ServiceNow Santa Clara, United States

Company Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joining ServiceNow, you are part of an ...
Senior Reliability Engineer

1 week ago

ServiceNow Santa Clara, United States

Company Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joining ServiceNow, you are part of an ...
Electrical Reliability Engineer

2 weeks ago

Peak Demand San Jose, United States

We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technology that is being deployed with custom ...
Sr. Reliability Engineer

3 days ago

Activate Global, Inc. San Jose, United States

At Antora, we're on a mission to stop climate change. And we can't do that unless we tackle the 30% of global emissions that come from industry. · Antora is unlocking zero-emissions industrial energy, cheaper than fossil fuels. Antora's thermal batteries store energy from renewab ...
Senior Reliability Engineer

2 weeks ago

Theery San Jose, United States

Job Description: · Perform reliability evaluation of IC products, packages, and process technology with focus on suitability to end applications and conformance to industry standards. Perform device level failure analysis for an in-depth understanding of IC device failures. Anal ...
Electrical Reliability Engineer

3 weeks ago

Peak Demand Inc San Jose, United States

We are working with a company operating in the best of both worlds an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technology that is being deployed with customer ...
Semiconductor Reliability Engineer

2 days ago

Diverse Lynx San Jose, United States

· Semiconductor Reliability Engineer · experience in IC reliability engineering with hands-on experience in 1 or more related areas such as Product Engineering, Test Engineering, Failure Analysis. · Good understanding of Semiconductor, manufacturing process (Fab, Assembly and ad ...

Service Reliability Engineer - Santa Clara, United States - Software Technology, Inc

Description

Job Description

Reliability Engineer

Reliability Engineer

Reliability Engineer

Reliability Engineer

Reliability Engineer

Reliability Engineer

Lead Reliability Engineer

Site Reliability Engineer

Lead Reliability Engineer

Lead Reliability Engineer

Site Reliability Engineer