- Collaborate with Service Owners and Observability Leaders to develop a strategy for monitoring the technology stack using Grafana.
- Initiate data ingestion by deploying Telegraf and exporters (if necessary), utilizing discovery to feed data into Grafana Mimir.
- Establish initial alerting by creating alert rules and enabling self-service alerting in Grafana.
- Create initial dashboards to monitor the health and capacity of services.
- Transition responsibilities to the service owner by providing comprehensive documentation and necessary training.
- 3+ years of experience as a Site Reliability Engineer with proficiency in the Grafana platform.
- Expertise in Grafana, particularly in dashboard best practices and writing PromQL to create widgets and alert rules.
- Proficiency in Grafana Mimir or equivalent systems (e.g., Thanos, Cortex).
- Extensive knowledge of Prometheus.
- Advanced skills in Telegraf.
- Expertise in Ansible, including writing playbooks and deploying/configuring services.
- Proficient in using Git (GitLab) for self-service as code.
- Broad expertise in various technology stacks and transitioning their monitoring to the Grafana ecosystem.
- Experience with modern operating systems, specifically CentOS and Ubuntu.
-
Reliability Engineer
5 days ago
Austin Allen Company San Francisco, United States PermanentReliability Engineer - Electrical Engineer - Kentucky · Salary $80,000 - $95,000 + Benefits + Paid Relocation to Kentucky where it's a wonderful place to raise a family City amenities with a small-town feel. History, fun music & food festivals with a charming downtown. Love the ...
-
Reliability Engineer
3 weeks ago
OpenAI San Francisco, United StatesJoin the engineering teams that bring OpenAI's ideas safely to the world · The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits ...
-
Junior Reliability Engineer
5 days ago
Jones Lange Lasalle, Inc. West Valley City, United StatesThe Junior Reliability Engineer is responsible for performing data validation around assets (HVAC, Electrical, Plumbing, etc.) that are managed by both Mobile and Static Facilities Management Technicians at all managed facilities within our West Caro Reliability Engineer, Liabili ...
-
Site Reliability Engineer
6 days ago
WEX San Francisco, United States(*) This is a remote position; however, the candidate must reside within 30 miles of one of the following locations: Boston, MA; Dallas, TX; San Francisco Bay Area, CA; Portland, ME; and Washington, D.C. · About the Team/Role · The WEX Site Reliability Engineering (SRE) team is ...
-
Systems Reliability Engineer
1 week ago
Cloudflare Inc San Francisco, United StatesAvailable Locations: · Remote Australia, Singapore · Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products with ...
-
Site Reliability Engineer
3 weeks ago
Best Secret San Francisco, United StatesAbout BestSecretGroup · We are a leading European members-only online destination for premium and luxury off-price fashion. Partnering with over 3,000 international brands, our tech-focused mindset and strong commitment to sustainability drives a truly unique experience for our m ...
-
Engineering Director, Reliability
2 weeks ago
StarTree San Francisco, United StatesAt StarTree we're a group of passionate individuals that desire to improve the lives of many by developing tools and technologies that support availability and speed in the world of real-time analytics. · Our aim is to make it simple for every company to delight their users - ex ...
-
Site Reliability Engineer
3 weeks ago
Vertisystem San Francisco, United StatesDuration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organization's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The ...
-
Site Reliability Engineer
5 days ago
AEG San Francisco, United StatesIn order to be considered for this role, after clicking "Apply Now" above and being redirected, you must fully complete the application process on the follow-up screen. · Swish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictiv ...
-
Site Reliability Engineer
1 week ago
Instabase San Francisco, United StatesAt Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. · With customers representing some of the largest and most complex organizations in the ...
-
Site Reliability Engineer
3 weeks ago
CAPTIVATEIQ INC San Francisco, United States[Full Time] Site Reliability Engineer - Remote at CaptivateIQ (United States) | BEAMSTART Jobs · Site Reliability Engineer - Remote · CaptivateIQ United States · Date Posted · 31 Jan, 2023 · Work Location · San Francisco, United States · Salary Offered · $139000 — $186000 yearl ...
-
Site Reliability Engineer
3 weeks ago
Wasmer San Francisco, United States[Full Time] Site Reliability Engineer at Wasmer (United States) | BEAMSTART Jobs · Site Reliability Engineer · Wasmer United States · Date Posted · 25 Mar, 2023 · Work Location · San Francisco, United States · Salary Offered · Not Specified · Job Type · Full Time · Experience R ...
-
Site Reliability Engineer
3 weeks ago
Vertisystem San Francisco, United StatesDuration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organizations CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. T ...
-
Site Reliability Engineer
1 week ago
PostHog Enterprise San Francisco, United StatesPostHog helps engineers build better products. We are a single platform to analyze, test, observe, and deploy new features. We give engineers product analytics, session recording, feature flags, A/B testing, event pipelines, SQL access, and a data warehouse... and there's plenty ...
-
Site Reliability Engineer
2 weeks ago
Appspace San Francisco, United StatesAt Appspace, we're passionate about creating better work experiences for people everywhere, and we're looking for people that feel the same way. Our global office locations and flexible work culture help you work wherever and however you're at your best. Plus, we take the time to ...
-
Site Reliability Engineer
3 weeks ago
Together AI San Francisco, United StatesAs a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, a ...
-
Site Reliability Engineer
3 weeks ago
Telestream San Francisco, United StatesAbout Us: · Welcome to the forefront of innovation at Telestream, an industry leading digital video delivery company. We are a dynamic and forward-thinking organization committed to leveraging cutting-edge cloud technologies to drive our success. If you're ready to be part of a ...
-
Site Reliability Engineer
3 weeks ago
DAOmatch San Francisco, United StatesAptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way.Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ser ...
-
Site Reliability Engineer
1 week ago
Cypress Human Capital Management, LLC San Francisco, United StatesSite Reliability Engineer (Grafana) · Responsibilities · Collaborate with Service Owners and Observability Leaders to develop a strategy for monitoring the technology stack using Grafana. · Initiate data ingestion by deploying Telegraf and exporters (if necessary), utilizing di ...
-
Site Reliability Engineer
2 weeks ago
Vertisystem San Francisco, United StatesDuration: 6 months contract · Pay rate: $90/hr on W2 · Job Summary: · It is an exciting time to be part of the organization's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. Th ...
Site Reliability Engineer - San Francisco, United States - Cypress HCM
Description
Job Description
Job DescriptionSite Reliability Engineer (Grafana)Responsibilities: