Jobs
>
Santa Clara

    Service Reliability Engineer - Santa Clara, United States - Software Technology Inc

    Default job background
    Description
    Job Description

    Job Description

    Position :
    Service Reliability Engineer / Sr. Devops Engineer

    Location :
    Santa Clara, CA

    Duration : 1 Year +

    OK with any visa No OPT please
    Local consultants only

    Customer will not provide letter for H1B candidates. Please check with the candidate and employers before submitting the resume. Face to face is mandatory so please submit local candidates only.


    Responsibilities:
    Development and Operations (DevOps) subject matter expert for 24x7 SaaS operation

    Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's next generation TV platforms.

    Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.

    Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale
    Point of

    escalation/decision

    maker on response level of incidents
    Participate in the Core SRE on-call roster and respond with command and control incident management during

    High Pri Events

    while maintaining internal and external SLAs
    Act as

    Technical Duty Officer

    who leads resolution effort of the most complex service problems from network layer to the application at scale
    Drive Problem

    Management/Retrospectives

    ("post mortems")
    Strong contribution and maintenance of our knowledge base
    Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.
    Contribute to the future growth of the team by conducting candidate screenings and assessments
    Accountable for deploying services to production environments

    Technologies:
    Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.
    Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on

    Azure/AWS/Openstack
    Azure, Openstack and AWS concepts and APIs
    Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc
    Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET
    Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs
    Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git
    Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.

    Qualifications:
    Bachelor's Degree in CS, MIS, or equivalent experience
    5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;
    Solid communications skills both written and verbal.

    Able to effectively tailor messaging to different audiences:
    External Customer, Leadership, technical SME, or to Tier-1
    Previous experience in customer facing roles during high stress situations
    Demonstrated skills as an influencer within a previous organization
    In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus
    In-depth knowledge of business operations, objectives, and strategies.
    Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).

    #J-18808-Ljbffr


  • Natron Energy Santa Clara, United States

    Natron is seeking a Reliability Engineer to support the development and test of our high-power battery systems for data center UPS and EV charging applications. The occupant of this position will work with the Product Engineering, Reliability, Technology, and Operations teams to ...

  • Comtech Telecom

    Reliability Engineer

    4 weeks ago


    Comtech Telecom Santa Clara, United States

    Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...

  • COMTECH TELECOMMUNICATIONS

    Reliability Engineer

    3 weeks ago


    COMTECH TELECOMMUNICATIONS Santa Clara, United States

    Job Description · Job DescriptionComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, pr ...


  • Advanced Micro Devices , Inc. San Jose, United States

    Overview: · WHAT YOU DO AT AMD CHANGES EVERYTHING · We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building bloc ...


  • Celestial AI Santa Clara, United States

    About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...


  • Celestial AI Santa Clara, United States

    About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...

  • Apple

    Reliability Engineer

    7 hours ago


    Apple Cupertino, United States

    Summary · Posted: Apr 13, 2024 · Weekly Hours: · 40 · Role Number: · Do you ever wonder what goes into making Apple products an amazing user experience? Apple's innovative reliability team is responsible for insuring that our products exceed our customer's expectations for r ...


  • Apple Cupertino, United States

    Reliability Engineer · Cupertino,California,United States · Hardware · Do you ever wonder what goes into making Apple products an amazing user experience? Apples innovative reliability team is responsible for insuring that our products exceed our customers expectations for rob ...


  • TEKsystems San Jose, United States Contract

    Description: · Adobe is looking for an experienced Site Reliability Engineer to join the internal tooling team support, configure, integrate, upgrade, and automate the use of enterprise tools used across their large Engineering organization. Role will be focused on user interact ...


  • Software Technology, Inc Santa Clara, United States

    Job Description · Job DescriptionPosition : Service Reliability Engineer / Sr. Devops Engineer · Location : Santa Clara, CA · Duration : 1 Year + · OK with any visa No OPT please · Local consultants only · Customer will not provide letter for H1B candidates. Please check with t ...


  • HCLTech San Jose, United States

    About HCLTech: · HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients ac ...


  • Cryptoware Technologies Inc Santa Clara, United States

    Job Description · Job DescriptionResponsibility · • Lead the effort of global expansion of Huobi globe spanning infrastructure. · • Work with engineering teams to make sure new features and changes are deployed quickly and safely. · • Constantly improve our system performance and ...


  • ServiceNow Santa Clara, United States

    Company Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joining ServiceNow, you are part of an ...

  • Comtech Telecom

    Reliability Engineer

    2 weeks ago


    Comtech Telecom Santa Clara, United States Full time Regular

    Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...


  • Kofi Group Santa Clara, United States Direct Hire

    To Apply for this Job Click Here · Principal Site Reliability Engineer · San Francisco Bay Area, CA · We are partnering with a late-stage Cloud Security company that is looking for a Principal Level SRE · The ideal candidate will have: · Strong sense of architecture and design f ...


  • Palo Alto Networks Santa Clara, United States

    Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...


  • Palo Alto Networks Santa Clara, United States

    Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...


  • Lawrence Harvey Sunnyvale, United States

    Site Reliability Engineer · Status: Full Time · Compensation: 120k to 145k · Hybrid Requirements: 3 days in office, 2 days remote · Lawrence Harvey has partnered with a leading Chinese fintech startup that is committed to democratizing payment services and empowering people and ...


  • Advantis Global is now INSPYR Solutions Sunnyvale, United States

    ABOUT THIS FEATURED OPPORTUNITY · The QoS Infrastructure Tools Team is responsible for building and maintaining tools that are essential for Site Reliability Engineers (SREs) and engineers across the organization. The team primarily develops applications using Golang for backend ...


  • MRINetwork Jobs San Jose, United States

    Job Description · Job Description · We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technol ...