Jobs
>
San Francisco

    Site Reliability Engineer - San Francisco, United States - Cypress HCM

    Cypress HCM background
    Description

    Job Description

    Job Description
    Site Reliability Engineer (Grafana)Responsibilities:
    • Collaborate with Service Owners and Observability Leaders to develop a strategy for monitoring the technology stack using Grafana.
    • Initiate data ingestion by deploying Telegraf and exporters (if necessary), utilizing discovery to feed data into Grafana Mimir.
    • Establish initial alerting by creating alert rules and enabling self-service alerting in Grafana.
    • Create initial dashboards to monitor the health and capacity of services.
    • Transition responsibilities to the service owner by providing comprehensive documentation and necessary training.
    Required Skills:
    • 3+ years of experience as a Site Reliability Engineer with proficiency in the Grafana platform.
    • Expertise in Grafana, particularly in dashboard best practices and writing PromQL to create widgets and alert rules.
    • Proficiency in Grafana Mimir or equivalent systems (e.g., Thanos, Cortex).
    • Extensive knowledge of Prometheus.
    • Advanced skills in Telegraf.
    • Expertise in Ansible, including writing playbooks and deploying/configuring services.
    • Proficient in using Git (GitLab) for self-service as code.
    • Broad expertise in various technology stacks and transitioning their monitoring to the Grafana ecosystem.
    • Experience with modern operating systems, specifically CentOS and Ubuntu.
    Pay Rate: $60-$70/hour


  • Austin Allen Company San Francisco, United States Permanent

    Reliability Engineer - Electrical Engineer - Kentucky · Salary $80,000 - $95,000 + Benefits + Paid Relocation to Kentucky where it's a wonderful place to raise a family City amenities with a small-town feel. History, fun music & food festivals with a charming downtown. Love the ...

  • OpenAI

    Reliability Engineer

    3 weeks ago


    OpenAI San Francisco, United States

    Join the engineering teams that bring OpenAI's ideas safely to the world · The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits ...


  • Jones Lange Lasalle, Inc. West Valley City, United States

    The Junior Reliability Engineer is responsible for performing data validation around assets (HVAC, Electrical, Plumbing, etc.) that are managed by both Mobile and Static Facilities Management Technicians at all managed facilities within our West Caro Reliability Engineer, Liabili ...


  • WEX San Francisco, United States

    (*) This is a remote position; however, the candidate must reside within 30 miles of one of the following locations: Boston, MA; Dallas, TX; San Francisco Bay Area, CA; Portland, ME; and Washington, D.C. · About the Team/Role · The WEX Site Reliability Engineering (SRE) team is ...


  • Cloudflare Inc San Francisco, United States

    Available Locations: · Remote Australia, Singapore · Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products with ...


  • Best Secret San Francisco, United States

    About BestSecretGroup · We are a leading European members-only online destination for premium and luxury off-price fashion. Partnering with over 3,000 international brands, our tech-focused mindset and strong commitment to sustainability drives a truly unique experience for our m ...


  • StarTree San Francisco, United States

    At StarTree we're a group of passionate individuals that desire to improve the lives of many by developing tools and technologies that support availability and speed in the world of real-time analytics. · Our aim is to make it simple for every company to delight their users - ex ...


  • Vertisystem San Francisco, United States

    Duration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organization's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The ...


  • AEG San Francisco, United States

    In order to be considered for this role, after clicking "Apply Now" above and being redirected, you must fully complete the application process on the follow-up screen. · Swish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictiv ...


  • Instabase San Francisco, United States

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. · With customers representing some of the largest and most complex organizations in the ...


  • CAPTIVATEIQ INC San Francisco, United States

    [Full Time] Site Reliability Engineer - Remote at CaptivateIQ (United States) | BEAMSTART Jobs · Site Reliability Engineer - Remote · CaptivateIQ United States · Date Posted · 31 Jan, 2023 · Work Location · San Francisco, United States · Salary Offered · $139000 — $186000 yearl ...


  • Wasmer San Francisco, United States

    [Full Time] Site Reliability Engineer at Wasmer (United States) | BEAMSTART Jobs · Site Reliability Engineer · Wasmer United States · Date Posted · 25 Mar, 2023 · Work Location · San Francisco, United States · Salary Offered · Not Specified · Job Type · Full Time · Experience R ...


  • Vertisystem San Francisco, United States

    Duration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organizations CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. T ...


  • PostHog Enterprise San Francisco, United States

    PostHog helps engineers build better products. We are a single platform to analyze, test, observe, and deploy new features. We give engineers product analytics, session recording, feature flags, A/B testing, event pipelines, SQL access, and a data warehouse... and there's plenty ...


  • Appspace San Francisco, United States

    At Appspace, we're passionate about creating better work experiences for people everywhere, and we're looking for people that feel the same way. Our global office locations and flexible work culture help you work wherever and however you're at your best. Plus, we take the time to ...


  • Together AI San Francisco, United States

    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, a ...


  • Telestream San Francisco, United States

    About Us: · Welcome to the forefront of innovation at Telestream, an industry leading digital video delivery company. We are a dynamic and forward-thinking organization committed to leveraging cutting-edge cloud technologies to drive our success. If you're ready to be part of a ...


  • DAOmatch San Francisco, United States

    Aptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way.Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ser ...


  • Cypress Human Capital Management, LLC San Francisco, United States

    Site Reliability Engineer (Grafana) · Responsibilities · Collaborate with Service Owners and Observability Leaders to develop a strategy for monitoring the technology stack using Grafana. · Initiate data ingestion by deploying Telegraf and exporters (if necessary), utilizing di ...


  • Vertisystem San Francisco, United States

    Duration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organization's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. Th ...